MolecularDiffusion / README.md
pregH's picture
Update README.md
2a1aa1a verified
metadata
license: mit
tags:
  - chemistry
  - drug-discovery
  - materials-science
  - generative-ai
  - diffusion-models

MolecularDiffusion: Pre-trained Models and Datasets

Welcome to the repository for pre-trained models and datasets accompanying the MolecularDiffusion framework.

MolecularDiffusion is a unified Generative AI framework designed to streamline the entire lifecycle of 3D molecular diffusion models, from efficient training to seamless deployment in data-driven computational chemistry pipelines.

Find more details in our paper: arXiv

Models

This repository hosts several pre-trained 3D molecular diffusion models described in our paper.

  • Pre-trained General Model: A diffusion model trained on our comprehensive compiled dataset of 3D molecules.
  • GEOM-Trained Models: Diffusion models trained on the GEOM dataset, potentially exploring different training methodologies or variations described in the paper.

Datasets

We provide the datasets used for training our models, as well as novel datasets generated by our models.

Training Datasets

  • QM9: Small organic molecules

  • FORMED: Synthesizable molecules from CSD

  • Compiled 3D Molecules: Our custom-compiled dataset used for pre-training, combining GEOM, QMug, COMPAS1, COMPAS3, FORMED, and OSCAR.

  • IFLP Dataset: Dataset of IFLP derived from the CoRE MOF 2019 database

Generated Datasets

These datasets were generated using the MolecularDiffusion models:

  • Asymmetric Cp Dataset: A generated dataset focusing on asymmetric cyclopentadienyl ligands.

  • Target IFLP Dataset: Generated IFLP with desired geometrical features for the catlytic hydrogenation of CO$_2$

  • Singlet Fission Candidates: A curated dataset of potential generated candidates for singlet fission applications.