Open Catalyst Challenge

Winners' announcement and presentations:
December 07, 2021 @ NeurIPS 2021                                                                                                                          

Overview

The Open Catalyst Challenge invites participants to help in addressing the pressing challenges faced by the world due to energy scarcity and climate change. In this area, a critical problem is the discovery of new catalysts for driving efficient and carbon neutral means for energy storage and generation. A common approach in discovering high performance catalysts is using molecular simulations, where simpler surrogate descriptors are generated to correlate with experimental measurements of catalyst activity and selectivity. The task for this year’s challenge is to design new machine learning models to predict the outcome of catalyst simulations used to understand activity.

Specifically, each simulation models the interaction of a catalyst surface with adsorbates that are commonly seen in electrochemical reactions. The simulations correspond to local relaxations of the atomic positions to identify local minima. ML models are trained to predict the energies of the adsorbate-catalyst system at the local minima, known as the “relaxed state”, starting from a provided initial state. By predicting these interactions accurately, the catalyst's impact on the overall rate of a chemical reaction may be estimated; a key factor in filtering potential electrocatalysis materials and addressing the world's energy needs.

Dates

NeurIPS session

The Open Catalyst Challenge session at NeurIPS 2021 is scheduled for Tuesday, December 07, 2021, 18:05 GMT (or 10:05 AM PST) onwards. Note that it is mandatory to register for NeurIPS to attend the session.

The session starts with a broadcast of the challenge overview and results at 18:05 GMT, followed by a breakout session on Zoom. The Zoom link for the breakout session will be available on the NeurIPS schedule website here.

Schedule


18:05 - 18:25 GMT Challenge overview, results and analysis Abhishek Das, Muhammed Shuaibi, Aini Palizhati
18:25 - 18:30 GMT Buffer for attendees to join Zoom; room link here. -
18:30 - 18:50 GMT Runner-up talk + Q&A (to be announced)
18:50 - 19:10 GMT Winner talk + Q&A (to be announced)
19:10 - 19:30 GMT Discussion Attendees, participants, organizers

All talk recordings will be made available after NeurIPS.

Challenge results

To be announced at NeurIPS 2021

IS2RE Task

The challenge will consist of one primary task -- Initial Structure to Relaxed Energy (IS2RE) [1]. Here the input consists of the atomic positions for an initial structure, and the goal is to predict the energy of the structure’s relaxed state.

Relaxed energies are a critical indicator in determining the reaction rate resulting from the use of a catalyst. By placing an adsorbate in multiple locations above a catalyst's surface and relaxing the structure, the binding site between the adsorbate and catalyst with the lowest relaxed energy can be determined. This lowest energy binding site is likely to be the one realized in practice under experimental conditions. The relaxed energy of the lowest energy binding site is also highly correlated with the reaction rates or selectivity of the chemical reaction. If successful, these techniques could be used to screen millions or even billions of potential catalyst materials for the chemical reactions involved in renewable energy storage and solar fuel generation.

Traditionally, relaxed energies are found by first performing structure relaxations through an iterative local optimization process that estimates the gradients (atomic forces) using Density Functional Theory (DFT), which are in turn used to update atom positions until convergence. This very computationally expensive process typically requires hundreds of DFT calculations to converge (hours or days of compute per relaxation) and forms the basis of most computational catalysis efforts.

One approach to the IS2RE task is using ML to approximate DFT relaxations i.e. iteratively estimate atomic forces and update atomic positions until a relaxed state is reached and finally predict the energy of that state. Evaluation of the IS2RE task on models built for approximating DFT relaxations will help determine whether this approach is sufficiently accurate and fast for practical applications. These models have the additional benefit of predicting the relaxed structure and accelerating future DFT calculations. Alternatively, it may be possible to predict the relaxed energy directly, without estimating intermediate relaxation states, as many of the changes during a relaxation (say due to particular initial guess strategies) are systematic. These direct IS2RE approaches may lead to even greater improvements in computational efficiency. As such, we place no restrictions on the possible ML approaches to solve this task and used to participate in this challenge. We encourage submissions that are significantly more computationally efficient than DFT. For example, a standard relaxation using DFT takes 8-10 hours, while ML approaches are desired that can bring this down to < 10 seconds per relaxation or < 1 second per direct prediction, at least a 1000x improvement!

To ensure consistent and fair evaluation, we use a public evaluation server hosted on EvalAI.

Dataset

The challenge will be conducted on the Open Catalyst Dataset (OC20). OC20 training and validation data are available here. A new test-challenge split has been released here specifically for this challenge. This is to ensure there is no overfitting on the test data through repeated submissions.

OC20 contains approximately ~1.2M DFT relaxations. Due to its significant scale, the dataset required over ~70M hours of compute to generate. Computation was performed on servers Facebook has committed to be 100% supported by renewable energy since 2020. Each relaxation contains a series of structures as the atoms move from an initial structure to a relaxed structure obtained through a standard local minimizer built in the computational chemistry code. Structures contain the atoms corresponding to the adsorbate and catalyst. The initial structures are heuristically determined and the relaxed structures correspond to a state in which the atoms are at a local energy minima. Since each step in the relaxation may be used for training / evaluation, the total number of simulation points is over ~264M! The largest training split has ~134M simulation points. For each structure, DFT computed system energies, per-atom forces and per-atom positions are available as annotations.

The OC20 validation and test splits have several subsplits to help evaluate a model's performance on interpolative and extrapolative tasks. A model's interpolative ability is evaluated on samples from the same distribution as the training dataset (In Domain). Extrapolation is evaluated on two dimensions -- new adsorbates and new catalyst compositions. Subsplits are created by considering all combinations of potential extrapolations -- Out-of-Domain Adsorbate (OOD Adsorbate), OOD Catalyst, and OOD Both (both unseen adsorbate and unseen catalyst compositions).

Summary of all evaluation splits


Split Size Max submissions Metrics Results Leaderboard
val ~100k - Energy MAE, EwT On EvalAI -
test ~100k 10 Energy MAE, EwT On EvalAI On opencatalystproject.org all year round
test-challenge 120k 10 Energy MAE, EwT TBA at NeurIPS '21 TBA at NeurIPS '21

Evaluation

All submissions to the Open Catalyst Challenge will be made to the EvalAI server and evaluated on the following metrics:

  • Energy MAE: mean absolute error between the predicted relaxed energy and the DFT-computed ground-truth relaxed energy.
  • Energy within Threshold (EwT): the percentage of predicted relaxed energies within 0.02 eV of the DFT-computed ground-truth relaxed energy.

Challenge winners will be decided based on the Energy MAE metric.

We acknowledge that resource availability may become a bottleneck for some participants given the large size of the OC20 trajectory data (~134M training points). Thus, as detailed on the discussion forum, we will be recognizing 2 winners for the challenge based on:

  1. The best overall performance with no constraints on data used
  2. The best performance using ONLY the IS2RE dataset (size 460,328)
Participants will be prompted while making submissions to EvalAI to specify whether they used only the IS2RE dataset or not. Participants submitting to track (2) are prohibited from using any other datasets and/or pretrained S2EF models. Data augmentation is permitted as long as it comes ONLY from the IS2RE dataset. Pretraining in any form that uses S2EF data will not be allowed for track (2). Participants submitting to track (1) are free to use any dataset. Participants are free to participate in both tracks if they wish. We will be inviting the winners of each track for an oral presentation at NeurIPS 2021. If a single team wins both tracks, we will additionally invite the second place team of track 2 to present. Using DFT is prohibited for both tracks.

Submission Guidelines

To participate in the Open Catalyst Challenge, create a team on EvalAI and upload submissions to the "Predicting relaxed state energy from initial structure (IS2RE) -- Test-challenge" phase:

Submissions must be an `.npz` numpy binary file in the following format:

{
  "challenge_ids": array(['0', '1', ...]),
  "challenge_energy": array([-3.63920, -1.08237, 12.92103, ...,])
}
            
where both `challenge_ids` and `challenge_energy` are arrays of size 120000.

>>> data["challenge_ids"].dtype, data["challenge_ids"].shape
(dtype('<U6'), (120000,))
>>> data["challenge_energy"].dtype, data["challenge_energy"].shape
(dtype('float64'), (120000,))
            
A dummy submission file is available here. Please use this for verification / debugging purposes. We also provide helper code and docs for training models and preparing EvalAI submission files for the IS2RE task here.

Please reach out to us on the discussion forum or via email [1, 2] if you have any questions or concerns regarding the challenge.

The OC20 dataset paper has more details on how the OC20 dataset was created, the various tasks and evaluation metrics, and performance of baseline ML algorithms. The paper is accompanied by our constantly-evolving OCP codebase that provides implementations of several state-of-the-art graph neural network algorithms.
Consider citing the following if you use it in your work:

@article{ocp_dataset,
    author = {Chanussot*, Lowik and Das*, Abhishek and Goyal*, Siddharth and Lavril*, Thibaut and Shuaibi*, Muhammed and Riviere, Morgane and Tran, Kevin and Heras-Domingo, Javier and Ho, Caleb and Hu, Weihua and Palizhati, Aini and Sriram, Anuroop and Wood, Brandon and Yoon, Junwoong and Parikh, Devi and Zitnick, C. Lawrence and Ulissi, Zachary},
    title = {Open Catalyst 2020 (OC20) Dataset and Community Challenges},
    journal = {ACS Catalysis},
    year = {2021},
    doi = {10.1021/acscatal.0c04525},
}
        

NO PURCHASE NECESSARY TO ENTER/WIN. A PURCHASE WILL NOT INCREASE YOUR CHANCES OF WINNING. Submission Period begins September 20, 2021 at 12:00:00 am UTC and ends October 6, 2021 at 11:59:59 pm UTC. Open to legal residents of the Territory, 18+ & age of majority. "Territory" means any country, state, or province where the laws of the US or local law do not prohibit participating or receiving a prize in the Challenge and excludes Cuba, Crimea, North Korea, Iran, Syria, Venezuela and any other jurisdiction or area designated by the United States Treasury's Office of Foreign Assets Control. Void outside the Territory and where prohibited by law. Participation subject to Official Rules. See Official Rules for entry requirements, judging criteria and full details. Winners are invited to attend & present at the virtual Open Catalyst Challenge session at NeurIPS on December 13 or 14, 2021. Winners are responsible for all costs to attend workshop/conference, including conference registration fee. Sponsor: Facebook, Inc., 1 Hacker Way, Menlo Park, CA 94025 USA.