Released FLUX.2-dev Synthetic 2M — a large-scale text-to-image dataset on Hugging Face
Published:
We released FLUX.2-dev Synthetic 2M, a large-scale synthetic text-to-image dataset, on Hugging Face, with a citable DOI: 10.57967/hf/8311.
The dataset contains roughly 2.28 million image–caption pairs generated with the FLUX.2-dev diffusion model. It is intended to support research on text-to-image generation, synthetic data, and generative models.
At a glance:
- ~2,282,665 synthetic image–caption pairs
- Images generated with FLUX.2-dev at 512×512 resolution (50 inference steps, guidance scale 3.5)
- Captions sourced unmodified from the Text2Image-2M dataset
- Packaged as WebDataset — 571 shards (~4,000 samples each); each sample is a
.pngimage, a.txtcaption, and a.jsonmetadata file recording the seed and generation parameters - Generated through a distributed multi-GPU pipeline on the Kempner AI cluster
A few samples




The dataset is released for research use and is subject to the upstream FLUX.2-dev and Text2Image-2M licenses.
Authors: Naeem Khoshnevis, Gabriel Guo, Eric Vanden-Eijnden, Nicholas Boffi, and Michael S. Albergo (Kempner Institute, Harvard University).
