Open Catalyst Challenge

Deadline: October 06, 2021, 23:59:59 GMT                                                                                                                          

Overview

The Open Catalyst Challenge invites participants to help in addressing the pressing challenges faced by the world due to energy scarcity and climate change. In this area, a critical problem is the discovery of new catalysts for driving efficient and carbon neutral means for energy storage and generation. A common approach in discovering high performance catalysts is using molecular simulations, where simpler surrogate descriptors are generated to correlate with experimental measurements of catalyst activity and selectivity. The task for this year’s challenge is to design new machine learning models to predict the outcome of catalyst simulations used to understand activity.

Specifically, each simulation models the interaction of a catalyst surface with adsorbates that are commonly seen in electrochemical reactions. The simulations correspond to local relaxations of the atomic positions to identify local minima. ML models are trained to predict the energies of the adsorbate-catalyst system at the local minima, known as the “relaxed state”, starting from a provided initial state. By predicting these interactions accurately, the catalyst's impact on the overall rate of a chemical reaction may be estimated; a key factor in filtering potential electrocatalysis materials and addressing the world's energy needs.

Dates

Task Guidelines

The challenge will consist of one primary task -- Initial Structure to Relaxed Energy (IS2RE) [1]. Here the input consists of the atomic positions for an initial structure, and the goal is to predict the energy of the structure’s relaxed state.

Relaxed energies are a critical indicator in determining the reaction rate resulting from the use of a catalyst. By placing an adsorbate in multiple locations above a catalyst's surface and relaxing the structure, the binding site between the adsorbate and catalyst with the lowest relaxed energy can be determined. This lowest energy binding site is likely to be the one realized in practice under experimental conditions. The relaxed energy of the lowest energy binding site is also highly correlated with the reaction rates or selectivity of the chemical reaction. If successful, these techniques could be used to screen millions or even billions of potential catalyst materials for the chemical reactions involved in renewable energy storage and solar fuel generation.

Traditionally, relaxed energies are found by first performing structure relaxations through an iterative local optimization process that estimates the gradients (atomic forces) using Density Functional Theory (DFT), which are in turn used to update atom positions until convergence. This very computationally expensive process typically requires hundreds of DFT calculations to converge (hours or days of compute per relaxation) and forms the basis of most computational catalysis efforts.

One approach to the IS2RE task is using ML to approximate DFT relaxations i.e. iteratively estimate atomic forces and update atomic positions until a relaxed state is reached and finally predict the energy of that state. Evaluation of the IS2RE task on models built for approximating DFT relaxations will help determine whether this approach is sufficiently accurate and fast for practical applications. These models have the additional benefit of predicting the relaxed structure and accelerating future DFT calculations. Alternatively, it may be possible to predict the relaxed energy directly, without estimating intermediate relaxation states, as many of the changes during a relaxation (say due to particular initial guess strategies) are systematic. These direct IS2RE approaches may lead to even greater improvements in computational efficiency. As such, we place no restrictions on the possible ML approaches to solve this task and used to participate in this challenge. We encourage submissions that are significantly more computationally efficient than DFT. For example, a standard relaxation using DFT takes 8-10 hours, while ML approaches are desired that can bring this down to < 10 seconds per relaxation or < 1 second per direct prediction, at least a 1000x improvement!

To ensure consistent and fair evaluation, we use a public evaluation server hosted on EvalAI.

Dataset

This challenge will be conducted on the The Open Catalyst Dataset (OC20). OC20 training and validation data are already publicly available. A new test-challenge split will be released specifically for this challenge close to the submission deadline. This is to ensure there is no overfitting on the test data through repeated submissions.

OC20 contains approximately ~1.2M DFT relaxations. Due to its significant scale, the dataset required over ~70M hours of compute to generate. Computation was performed on servers Facebook has committed to be 100% supported by renewable energy since 2020. Each relaxation contains a series of structures as the atoms move from an initial structure to a relaxed structure obtained through a standard local minimizer built in the computational chemistry code. Structures contain the atoms corresponding to the adsorbate and catalyst. The initial structures are heuristically determined and the relaxed structures correspond to a state in which the atoms are at a local energy minima. Since each step in the relaxation may be used for training / evaluation, the total number of simulation points is over ~264M! The largest training split has ~134M simulation points. For each structure, DFT computed system energies, per-atom forces and per-atom positions are available as annotations.

The OC20 validation and test splits have several subsplits to help evaluate a model's performance on interpolative and extrapolative tasks. A model's interpolative ability is evaluated on samples from the same distribution as the training dataset (In Domain). Extrapolation is evaluated on two dimensions -- new adsorbates and new catalyst compositions. Subsplits are created by considering all combinations of potential extrapolations -- Out-of-Domain Adsorbate (OOD Adsorbate), OOD Catalyst, and OOD Both (both unseen adsorbate and unseen catalyst compositions).

All the IS2RE evaluation splits are summarized below:

Split Size Max submissions Metrics Results Leaderboard
val ~100k - Energy MAE, EwT On EvalAI -
test ~100k 10 Energy MAE, EwT On EvalAI On opencatalystproject.org all year round
test-challenge ~100k 10 Energy MAE, EwT TBA at NeurIPS TBA at NeurIPS

Evaluation Metrics

All submissions to the Open Catalyst Challenge will be evaluated on the following metrics:

  • Energy MAE: mean absolute error between the predicted relaxed energy and the DFT-computed ground-truth relaxed energy.
  • Energy within Threshold (EwT): the percentage of predicted relaxed energies within 0.02 eV of the DFT-computed ground-truth relaxed energy.

Challenge winners will be decided based on the Energy MAE metric.

Please refer to the OC20 dataset paper for more details, and consider citing the following if you use it in your work:

@article{ocp_dataset,
    author = {Chanussot*, Lowik and Das*, Abhishek and Goyal*, Siddharth and Lavril*, Thibaut and Shuaibi*, Muhammed and Riviere, Morgane and Tran, Kevin and Heras-Domingo, Javier and Ho, Caleb and Hu, Weihua and Palizhati, Aini and Sriram, Anuroop and Wood, Brandon and Yoon, Junwoong and Parikh, Devi and Zitnick, C. Lawrence and Ulissi, Zachary},
    title = {Open Catalyst 2020 (OC20) Dataset and Community Challenges},
    journal = {ACS Catalysis},
    year = {2021},
    doi = {10.1021/acscatal.0c04525},
}
        

NO PURCHASE NECESSARY TO ENTER/WIN. A PURCHASE WILL NOT INCREASE YOUR CHANCES OF WINNING. Submission Period begins September 20, 2021 at 12:00:00 am UTC and ends October 6, 2021 at 11:59:59 pm UTC. Open to legal residents of the Territory, 18+ & age of majority. "Territory" means any country, state, or province where the laws of the US or local law do not prohibit participating or receiving a prize in the Challenge and excludes Cuba, Crimea, North Korea, Iran, Syria, Venezuela and any other jurisdiction or area designated by the United States Treasury's Office of Foreign Assets Control. Void outside the Territory and where prohibited by law. Participation subject to Official Rules. See Official Rules for entry requirements, judging criteria and full details. Winners are invited to attend & present at the virtual Open Catalyst Challenge session at NeurIPS on December 13 or 14, 2021. Winners are responsible for all costs to attend workshop/conference, including conference registration fee. Sponsor: Facebook, Inc., 1 Hacker Way, Menlo Park, CA 94025 USA.