Miguel Angel Bautista

I'm a research scientist at Apple MLR in San Francisco, where I lead a small team that focuses on generative modeling research. At Apple I've worked on research on generative modeling methodologies for images, video, 3D, graphs and scientific problems. I did my PhD at University of Barcelona, where I was advised by Sergio Escalera. I spent a big part of my PhD at CMU working with Fernando De la Torre on matrix factorization methods. I did my postdoc training with Björn Ommer working on unsupervised deep learning and generative models.

Email  /  CV  /  Bio  /  Scholar  /  Twitter  /  Bluesky

profile photo

Research Interests

I'm interested in scalable and efficient generative modeling approaches that make as little assumptions about data as possible. My long term goal is to unify training recipes and architectures across different data domains (image, text, 3D, graphs, video, etc.). My research style is to focus on simplifying overly complex pipelines and design decisions, and finding what are the key designs that makes things actually work.

Prospective Applicants

Im always on the look out for strong full time RS and PhD interns to join my team and work on generative models. If you think you can be a good fit please reach out via email and describe your biggest accomplishment.

Publications

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Gu, Jiatao … Bautista, Miguel Angel … Zhai, Shuangfei
arXiv 2025
arXiv

Scales latent normalizing flows so they happily pump out crisp, megapixel images without breaking a sweat.

INRFlow: Flow Matching for INRs in Ambient Space
Wang, Yuyang … Bautista, Miguel Angel
ICML 2025
arXiv

Teaches implicit neural representations to flow-match in ambient space, making continuous signals play nice.

Normalizing Flows Are Capable Generative Models
Zhai, Shuangfei … Bautista, Miguel Angel … Susskind, Josh
ICML 2025
arXiv

Dispels the myth that flows lag behind by proving they can hit diffusion-level quality.

World-consistent Video Diffusion with Explicit 3-D Modeling
Zhang, Qihang … Bautista, Miguel Angel … Gu, Jiatao
CVPR 2025
arXiv

Knits diffusion with a global 3-D scene so every video frame keeps its bearings in the same universe.

Scalable Pre-training of Large Autoregressive Image Models
El-Nouby, Alaaeldin … Bautista, Miguel Angel … Joulin, Armand
ICML 2024
arXiv

Shows how to pre-train skyscraper-sized AR image models at you-won’t-believe-it scale and keep them stable.

Swallowing the Bitter Pill: Simplified Scalable Conformer Generation
Wang, Yuyang … Bautista, Miguel Ángel
ICML 2024
arXiv

Simplifies conformer sampling so chemists can spin up billions of candidate molecules before lunch.

3-D Shape Tokenization
Chang, Jen-Hao Rick … Bautista, Miguel Angel … Tuzel, Oncel
arXiv 2024
arXiv

Turns complex shapes into bite-sized tokens so transformers can snack on 3-D data.

CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control and Altering of T2I Models
Stracke, Nick … Bautista, Miguel Angel … Ommer, Björn
ECCV 2024
arXiv

Drops a tiny LoRA-style adapter that lets you steer or remix any text-to-image model on the fly.

Manifold Diffusion Fields
Elhag, Ahmed A … Bautista, Miguel Angel
ICLR 2024
arXiv

Generalises diffusion to live directly on manifolds, so everything stays on-surface where it belongs.

Diffusion Probabilistic Fields
Zhuang, Peiye … Bautista, Miguel Ángel
ICLR 2023
arXiv

Extends diffusion to continuous fields, letting you sample functions instead of pixels.

Value-Function Estimation Using Conditional Diffusion Models for Control
Mazoure, Bogdan … Bautista, Miguel Angel… Susskind, Josh
arXiv 2023
arXiv

Uses diffusion models to guess value functions that keep robots from face-planting.

Is Generalised Dynamic Novel View Synthesis from Monocular Videos Possible Today?
Zhao, Xiaoming … Bautista, MA … Schwing, Alexander G
ICLR 2023
arXiv

Kicks the tyres on dynamic NeRFs and asks if one-camera view synthesis is truly ready for prime time.

f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation
Gu, Jiatao … Bautista, Miguel Angel … Susskind, Josh
ICLR 2023
arXiv

Stacks diffusion stages so each cleans up the mess left by the previous one.

Adaptivity & Modularity for Efficient Generalisation over Task Complexity
Abnar, Samira … Bautista, Miguel Angel … Susskind, Josh
arXiv 2023
arXiv

Builds adaptive modules that reshuffle themselves to tackle tasks from trivial to torturous.

Gaudi: A Neural Architect for Immersive 3-D Scene Generation
Bautista, Miguel Angel … Toshev, Alexander … others
NeurIPS 2023
arXiv

Lets a neural maestro draft entire 3-D worlds in one forward pass.

Fast and Explicit Neural View Synthesis
Guo, Pengsheng … Bautista, Miguel Angel … Shan, Qi
WACV 2022
arXiv

Trades a bit of fancy math for a big speed boost in NeRF-style renderers.

FvOR: Robust Joint Shape & Pose Optimisation for Few-view Object Reconstruction
Yang, Zhenpei … Bautista, Miguel Angel … Huang, Qixing
CVPR 2022
arXiv

Squeezes the most geometry out of a handful of photos by optimising pose and shape together.

Unconstrained Scene Generation with Locally Conditioned Radiance Fields
DeVries, Terrance … Bautista, Miguel Angel … Susskind, Joshua M
ICCV 2021
arXiv

Generates scenes everywhere at once by letting every point consult its local pals.

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
Roberts, Mike … Bautista, Miguel Angel … Susskind, Joshua M
ICCV 2021
arXiv

Drops a massive synthetic indoor dataset so models stop over-fitting IKEA.

Equivariant Neural Rendering
Dupont, Emilien … Bautista, Miguel Angel … Shan, Qi
ICML 2020
arXiv

Bakes symmetry into NeRF so rotating the scene doesn’t scramble its predictions.

On the Generalisation of Learning-based 3-D Reconstruction
Bautista, Miguel Angel … Susskind, Joshua M
WACV 2020
arXiv

Nails down why some recon models wobble on unseen objects—and how to toughen them up.

Set Distribution Networks: A Generative Model for Sets of Images
Zhai, Shuangfei … Bautista, Miguel Angel … Susskind, Josh M
arXiv 2020
arXiv

Learns to sample whole image sets, not just singles, so galleries come pre-curated.

Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment
Huang, Chen … Bautista, Miguel Angel … Susskind, Josh
ICML 2019
arXiv

Aligns your training loss with evaluation metrics so your model stops chasing the wrong goal.

Deep Unsupervised Learning of Visual Similarities
Sanakoyeu, Artsiom … Bautista, Miguel A … Ommer, Björn
Pattern Recognition 2018
arXiv

Figures out what looks alike without ever seeing a label.

Beyond One-Hot Encoding: Lower-Dimensional Target Embedding
Rodríguez, Pau … Bautista, Miguel A … Escalera, Sergio
CVIU 2018
arXiv

Compresses targets so classifiers stop wasting neurons on zeros.

Deep Unsupervised Similarity Learning Using Partially Ordered Sets
Bautista, Miguel A … Ommer, Björn
CVPR 2017
arXiv

Learns visual hierarchies by arranging images into gentle ranking chains.

Unsupervised Video Understanding by Reconciliation of Posture Similarities
Milbich, Timo … Bautista, Miguel Angel … Ommer, Björn
CVPR 2017
arXiv

Clusters video frames by matching body poses, turning chaos into choreography.

CliqueCNN: Deep Unsupervised Exemplar Learning
Bautista, Miguel A … Ommer, Björn
NeurIPS 2016
arXiv

Lets every image teach itself by forming little friendship cliques.

Error-Correcting Factorization
Bautista, Miguel Angel … Escalera, Sergio
IEEE TPAMI 2015
arXiv

Breaks matrices into factors that double as error-correcting codes.

A Gesture Recognition System for Detecting Behavioral Patterns of ADHD
Bautista Martín, Miguel Ángel … Escalera, Sergio
IEEE Trans. Cybernetics 2014
arXiv

Uses body gestures to spot ADHD behaviour without sticking sensors on kids.

Probability-based Dynamic Time Warping & Bag-of-Visual-and-Depth-Words for Gesture Recognition in RGB-D
Hernández-Vela, Antonio … Bautista, Miguel Angel… Angulo, Cecilio
Pattern Recognition 2014

Mixes time warping with 3-D bag-of-words to keep gesture detection in sync.

HuPBA 8k+: Dataset & ECOC-GraphCut-based Segmentation of Human Limbs
Sanchez, Daniel … Bautista, Miguel Ángel … Escalera, Sergio
Neurocomputing 2014

Serves up 8 k limb annotations and a graph-cut method to slice them cleanly.

On the Design of an ECOC-Compliant Genetic Algorithm
Bautista Martín, Miguel Ángel … Pujol, Oriol
Patter Recognition 2014

Evolves error-correcting codes that keep classifiers honest.

Probability-based Dynamic Time Warping for Gesture Recognition on RGB-D Data
Bautista, Miguel Angel … Escalera, Sergio
ICPR 2014

Speeds up DTW by sprinkling probability, making gesture matching less twitchy.

BoVDW: Bag-of-Visual-and-Depth-Words for Gesture Recognition
Hernández-Vela, Antonio … Bautista, Miguel Angel … Escalera, Sergio
CVPR 2013 Workshop Faces and Gestures

Combines 2-D and depth words so gestures pop out clearly.

Minimal Design of Error-Correcting Output Codes
Bautista, Miguel Ángel … Pujol, Oriol
Pattern Recognition Letters 2012

Shows you don’t need bloated codes to get robust multiclass classifiers.