Tuesday, January 9, 2024

Fondant AI Releases Fondant-25M Dataset of Image-Text Pairs with a Creative Commons License - Mohammad Arshad, MarkTechPost

The current challenge with generative AI, such as Stable Diffusion and Dall-E, is trained on hundreds of millions of images from the public Internet, including copyrighted work. This creates legal risks and uncertainties for users of these images and is unfair toward copyright holders who may not want their proprietary work reproduced without consent. To tackle it, researchers have developed a data-processing pipeline to create 500 million datasets of Creative Commons images to train the latent diffusion image generation models. Data-processing pipelines are steps and tasks designed to collect, process, and move data from one source to another, where it can be stored and analyzed for various purposes.

https://www.marktechpost.com/2023/10/15/fondant-ai-releases-fondant-25m-dataset-of-image-text-pairs-with-a-creative-commons-license/