The ZINC250K dataset is a large dataset of molecules used for molecular design and generation. It contains 250,000 molecules with their corresponding properties and structures.
The ChEMBL dataset is a large collection of bioactive molecules, with over 10 million molecules, that can be used for various machine learning tasks, including molecular design.