Released FLUX.2-dev Synthetic 2M — a large-scale text-to-image dataset on Hugging Face

Published:

We released FLUX.2-dev Synthetic 2M, a large-scale synthetic text-to-image dataset, on Hugging Face, with a citable DOI: 10.57967/hf/8311.

The dataset contains roughly 2.28 million image–caption pairs generated with the FLUX.2-dev diffusion model. It is intended to support research on text-to-image generation, synthetic data, and generative models.

At a glance:

  • ~2,282,665 synthetic image–caption pairs
  • Images generated with FLUX.2-dev at 512×512 resolution (50 inference steps, guidance scale 3.5)
  • Captions sourced unmodified from the Text2Image-2M dataset
  • Packaged as WebDataset — 571 shards (~4,000 samples each); each sample is a .png image, a .txt caption, and a .json metadata file recording the seed and generation parameters
  • Generated through a distributed multi-GPU pipeline on the Kempner AI cluster

A few samples

A woman in a hat stands in front of a colorful umbrella.
“A woman in a hat stands in front of a colorful umbrella.”
A plate of Chinese food with dumplings, noodles, and vegetables.
“A plate of Chinese food with a variety of dishes including dumplings, noodles, and vegetables…”
An open air sculpture with trees in the background.
“an open air sculpture with trees in the background”
An animated young girl with green eyes and black hair eating noodles at an outdoor table.
“An animated figure, likely a young girl with green eyes and black hair, enjoys a meal of noodles at an outdoor wooden table…”

The dataset is released for research use and is subject to the upstream FLUX.2-dev and Text2Image-2M licenses.

Authors: Naeem Khoshnevis, Gabriel Guo, Eric Vanden-Eijnden, Nicholas Boffi, and Michael S. Albergo (Kempner Institute, Harvard University).