Miguel Angel Bautista

I'm a research scientist at Apple MLR in San Francisco, where I lead a small team that focuses on generative modeling research. At Apple I've worked on research on generative modeling methodologies for images, video, 3D, graphs and scientific problems. I did my PhD at University of Barcelona, where I was advised by Sergio Escalera. I spent a big part of my PhD at CMU working with Fernando De la Torre on matrix factorization methods. I did my postdoc training with Björn Ommer working on unsupervised deep learning and generative models.

Email / CV / Bio / Scholar / Twitter / Bluesky

Research Interests

I'm interested in scalable and efficient generative modeling approaches that make as little assumptions about data as possible. My long term goal is to unify training recipes and architectures across different data domains (image, text, 3D, graphs, video, etc.). My research style is to focus on simplifying overly complex pipelines and design decisions, and finding what are the key designs that makes things actually work.

Prospective Applicants

Im always on the look out for strong full time RS and PhD interns to join my team and work on generative models. If you think you can be a good fit please reach out via email and describe your biggest accomplishment.

Publications

	STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis Gu, Jiatao … Bautista, Miguel Angel … Zhai, Shuangfei arXiv 2025 arXiv Scales latent normalizing flows so they happily pump out crisp, megapixel images without breaking a sweat.
	INRFlow: Flow Matching for INRs in Ambient Space Wang, Yuyang … Bautista, Miguel Angel ICML 2025 arXiv Teaches implicit neural representations to flow-match in ambient space, making continuous signals play nice.
	Normalizing Flows Are Capable Generative Models Zhai, Shuangfei … Bautista, Miguel Angel … Susskind, Josh ICML 2025 arXiv Dispels the myth that flows lag behind by proving they can hit diffusion-level quality.
	World-consistent Video Diffusion with Explicit 3-D Modeling Zhang, Qihang … Bautista, Miguel Angel … Gu, Jiatao CVPR 2025 arXiv Knits diffusion with a global 3-D scene so every video frame keeps its bearings in the same universe.
	Scalable Pre-training of Large Autoregressive Image Models El-Nouby, Alaaeldin … Bautista, Miguel Angel … Joulin, Armand ICML 2024 arXiv Shows how to pre-train skyscraper-sized AR image models at you-won’t-believe-it scale and keep them stable.
	Swallowing the Bitter Pill: Simplified Scalable Conformer Generation Wang, Yuyang … Bautista, Miguel Ángel ICML 2024 arXiv Simplifies conformer sampling so chemists can spin up billions of candidate molecules before lunch.
	3-D Shape Tokenization Chang, Jen-Hao Rick … Bautista, Miguel Angel … Tuzel, Oncel arXiv 2024 arXiv Turns complex shapes into bite-sized tokens so transformers can snack on 3-D data.
	CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control and Altering of T2I Models Stracke, Nick … Bautista, Miguel Angel … Ommer, Björn ECCV 2024 arXiv Drops a tiny LoRA-style adapter that lets you steer or remix any text-to-image model on the fly.
	Manifold Diffusion Fields Elhag, Ahmed A … Bautista, Miguel Angel ICLR 2024 arXiv Generalises diffusion to live directly on manifolds, so everything stays on-surface where it belongs.
	Diffusion Probabilistic Fields Zhuang, Peiye … Bautista, Miguel Ángel ICLR 2023 arXiv Extends diffusion to continuous fields, letting you sample functions instead of pixels.
	Value-Function Estimation Using Conditional Diffusion Models for Control Mazoure, Bogdan … Bautista, Miguel Angel… Susskind, Josh arXiv 2023 arXiv Uses diffusion models to guess value functions that keep robots from face-planting.
	Is Generalised Dynamic Novel View Synthesis from Monocular Videos Possible Today? Zhao, Xiaoming … Bautista, MA … Schwing, Alexander G ICLR 2023 arXiv Kicks the tyres on dynamic NeRFs and asks if one-camera view synthesis is truly ready for prime time.
	f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation Gu, Jiatao … Bautista, Miguel Angel … Susskind, Josh ICLR 2023 arXiv Stacks diffusion stages so each cleans up the mess left by the previous one.
	Adaptivity & Modularity for Efficient Generalisation over Task Complexity Abnar, Samira … Bautista, Miguel Angel … Susskind, Josh arXiv 2023 arXiv Builds adaptive modules that reshuffle themselves to tackle tasks from trivial to torturous.
	Gaudi: A Neural Architect for Immersive 3-D Scene Generation Bautista, Miguel Angel … Toshev, Alexander … others NeurIPS 2023 arXiv Lets a neural maestro draft entire 3-D worlds in one forward pass.
	Fast and Explicit Neural View Synthesis Guo, Pengsheng … Bautista, Miguel Angel … Shan, Qi WACV 2022 arXiv Trades a bit of fancy math for a big speed boost in NeRF-style renderers.
	FvOR: Robust Joint Shape & Pose Optimisation for Few-view Object Reconstruction Yang, Zhenpei … Bautista, Miguel Angel … Huang, Qixing CVPR 2022 arXiv Squeezes the most geometry out of a handful of photos by optimising pose and shape together.
	Unconstrained Scene Generation with Locally Conditioned Radiance Fields DeVries, Terrance … Bautista, Miguel Angel … Susskind, Joshua M ICCV 2021 arXiv Generates scenes everywhere at once by letting every point consult its local pals.
	Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding Roberts, Mike … Bautista, Miguel Angel … Susskind, Joshua M ICCV 2021 arXiv Drops a massive synthetic indoor dataset so models stop over-fitting IKEA.
	Equivariant Neural Rendering Dupont, Emilien … Bautista, Miguel Angel … Shan, Qi ICML 2020 arXiv Bakes symmetry into NeRF so rotating the scene doesn’t scramble its predictions.
	On the Generalisation of Learning-based 3-D Reconstruction Bautista, Miguel Angel … Susskind, Joshua M WACV 2020 arXiv Nails down why some recon models wobble on unseen objects—and how to toughen them up.
	Set Distribution Networks: A Generative Model for Sets of Images Zhai, Shuangfei … Bautista, Miguel Angel … Susskind, Josh M arXiv 2020 arXiv Learns to sample whole image sets, not just singles, so galleries come pre-curated.
	Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment Huang, Chen … Bautista, Miguel Angel … Susskind, Josh ICML 2019 arXiv Aligns your training loss with evaluation metrics so your model stops chasing the wrong goal.
	Deep Unsupervised Learning of Visual Similarities Sanakoyeu, Artsiom … Bautista, Miguel A … Ommer, Björn Pattern Recognition 2018 arXiv Figures out what looks alike without ever seeing a label.
	Beyond One-Hot Encoding: Lower-Dimensional Target Embedding Rodríguez, Pau … Bautista, Miguel A … Escalera, Sergio CVIU 2018 arXiv Compresses targets so classifiers stop wasting neurons on zeros.
	Deep Unsupervised Similarity Learning Using Partially Ordered Sets Bautista, Miguel A … Ommer, Björn CVPR 2017 arXiv Learns visual hierarchies by arranging images into gentle ranking chains.
	Unsupervised Video Understanding by Reconciliation of Posture Similarities Milbich, Timo … Bautista, Miguel Angel … Ommer, Björn CVPR 2017 arXiv Clusters video frames by matching body poses, turning chaos into choreography.
	CliqueCNN: Deep Unsupervised Exemplar Learning Bautista, Miguel A … Ommer, Björn NeurIPS 2016 arXiv Lets every image teach itself by forming little friendship cliques.
	Error-Correcting Factorization Bautista, Miguel Angel … Escalera, Sergio IEEE TPAMI 2015 arXiv Breaks matrices into factors that double as error-correcting codes.
	A Gesture Recognition System for Detecting Behavioral Patterns of ADHD Bautista Martín, Miguel Ángel … Escalera, Sergio IEEE Trans. Cybernetics 2014 arXiv Uses body gestures to spot ADHD behaviour without sticking sensors on kids.
	Probability-based Dynamic Time Warping & Bag-of-Visual-and-Depth-Words for Gesture Recognition in RGB-D Hernández-Vela, Antonio … Bautista, Miguel Angel… Angulo, Cecilio Pattern Recognition 2014 Mixes time warping with 3-D bag-of-words to keep gesture detection in sync.
	HuPBA 8k+: Dataset & ECOC-GraphCut-based Segmentation of Human Limbs Sanchez, Daniel … Bautista, Miguel Ángel … Escalera, Sergio Neurocomputing 2014 Serves up 8 k limb annotations and a graph-cut method to slice them cleanly.
	On the Design of an ECOC-Compliant Genetic Algorithm Bautista Martín, Miguel Ángel … Pujol, Oriol Patter Recognition 2014 Evolves error-correcting codes that keep classifiers honest.
	Probability-based Dynamic Time Warping for Gesture Recognition on RGB-D Data Bautista, Miguel Angel … Escalera, Sergio ICPR 2014 Speeds up DTW by sprinkling probability, making gesture matching less twitchy.
	BoVDW: Bag-of-Visual-and-Depth-Words for Gesture Recognition Hernández-Vela, Antonio … Bautista, Miguel Angel … Escalera, Sergio CVPR 2013 Workshop Faces and Gestures Combines 2-D and depth words so gestures pop out clearly.
	Minimal Design of Error-Correcting Output Codes Bautista, Miguel Ángel … Pujol, Oriol Pattern Recognition Letters 2012 Shows you don’t need bloated codes to get robust multiclass classifiers.