Open Catalyst Challenge

Submission Deadline: Oct 7, 2022
Winners' announcement: NeurIPS 2022                                                                                                                          

Overview

The Open Catalyst Challenge 2022 invites participants to help in addressing the pressing challenges faced by the world due to energy scarcity and climate change. In this area, a critical problem is the discovery of new catalysts for driving efficient and carbon neutral means for energy storage and generation. A common approach in discovering high performance catalysts is using molecular simulations, where simpler surrogate descriptors are generated to correlate with experimental measurements of catalyst activity and selectivity. The task for this year's challenge is to design new machine learning models to predict the outcome of catalyst simulations used to understand activity.

Specifically, each simulation models the interaction of a catalyst surface with adsorbates that are commonly seen in electrochemical reactions. The simulations correspond to local relaxations of the atomic positions to identify local minima. ML models are trained to predict the energies of the adsorbate-catalyst system at the local minima, known as the “relaxed state”, starting from a provided initial state. By predicting these interactions accurately, the catalyst's impact on the overall rate of a chemical reaction may be estimated; a key factor in filtering potential electrocatalysis materials and addressing the world's energy needs.

This is the 2nd edition of the Open Catalyst Challenge. The 1st edition was held last year, and the results announced at NeurIPS 2021. More details about last year's challenge can be found here.

This year's challenge focuses on the same task -- Initial Structure to Relaxed Energy (IS2RE) -- as last year (details here). The primary differences are: 1) instead of two tracks, we will have a single track where using the IS2RE data and/or the Structure-to-Energy-Forces (S2EF) 2M training data is allowed (details here). 2) A new test-challenge split will be released in September specifically for this year's challenge.

Dates

NeurIPS session

The Open Catalyst Challenge session at NeurIPS 2022 was held on Thursday, December 08, 2022.

The entire session was recorded and is viewable here.

Schedule


15:00 - 15:05 CST Buffer for attendees to join Zoom [room link] -
15:05 - 15:20 CST Challenge overview, results and analysis [video]
Abhishek Das (Open Catalyst Project team)
15:20 - 15:50 CST Invited talk + Q&A [video]
15:50 - 16:10 CST Runner-up talk + Q&A [video]
Yi-Lun Liao, Tess Smidt
(Atomic Architects, MIT)
16:10 - 16:30 CST Winner talk + Q&A [video]
Jiaqi Han, Tian Bian, Geyan Ye, Kaili Ma, Yuduo Zhi, Kangfei Zhao, Tingyang Xu, Wenbing Huang, Yu Rong
(Tencent AI Lab, Tsinghua University, Renmin University of China, The Chinese University of Hong Kong)
16:30 - 17:00 CST Invited talk + Q&A [video]
17:00 - 17:30 CST Discussion [video] Attendees, participants, organizers

Challenge results

The Open Catalyst Challenge received 25 submissions in total from 6 teams. All submissions were evaluated on the test-challenge-2022 dataset split consisting of the following 4 subsplits:

  • test-like: similar to OC20 test and used to pick winners
  • rotated: used to evaluate rotational invariance
  • anomalous: structures with desorptions and dissociations
  • dense: dense sampling of adsorbate placements for evaluating recall of lowest energy site
Team TTRC won the challenge with an energy MAE of 0.396 eV on test-like.

Energy MAE (eV)
Rank Team Test-like Rotated Anomalous Dense
1 TTRC (previously "Tencent AI Lab")
Jiaqi Han1, Tian Bian2, Geyan Ye3, Kaili Ma2, Yuduo Zhi3, Kangfei Zhao3, Tingyang Xu3, Wenbing Huang4, Yu Rong3
1=Tsinghua University, 2=The Chinese University of Hong Kong, 3=Tencent AI Lab, 4=Renmin University of China
0.3960 0.3964 0.8904 0.3630
2 Atomic Architects MIT
Yi-Lun Liao, Tess Smidt
Massachusetts Institute of Technology
0.4266 0.4235 0.9228 0.3838
3 XJTUNRTeam
0.5255 0.5176 1.1577 0.4718
3 AutoGraph
Xu Wang, Huan Zhao
4Paradigm
0.5263 0.5176 1.0789 0.4933
5 Shanghai Jiao Tong University
Wei Yang1, Frank Ji1, Yulian He2, Cheng Hua3, Guanjie Zheng3, Zhanyu Liu3, Feixiang Tian2, Tianhua Li3, Junlin He3
1=Yalotein Biotech, 2=University of Michigan - Shanghai Jiao Tong University Joint Institute, 3=Shanghai Jiao Tong University
0.6529 0.6603 1.2163 0.5875
6 personal test
Bangjian Zhou1, Ji Wei Yoon1, Zhuoyi Lin1, J Senthilnath1, Chaitanya K. Joshi2
1=Agency for Science, Technology and Research, Singapore, 2=University of Cambridge
1.0709 1.0671 1.5259 0.6058

IS2RE Task

The challenge will consist of one primary task -- Initial Structure to Relaxed Energy (IS2RE) [1]. Here the input consists of the atomic positions for an initial structure, and the goal is to predict the energy of the structure's relaxed state.

Relaxed energies are a critical indicator in determining the reaction rate resulting from the use of a catalyst. By placing an adsorbate in multiple locations above a catalyst's surface and relaxing the structure, the binding site between the adsorbate and catalyst with the lowest relaxed energy can be determined. This lowest energy binding site is likely to be the one realized in practice under experimental conditions. The relaxed energy of the lowest energy binding site is also highly correlated with the reaction rates or selectivity of the chemical reaction. If successful, these techniques could be used to screen millions or even billions of potential catalyst materials for the chemical reactions involved in renewable energy storage and solar fuel generation.

Traditionally, relaxed energies are found by first performing structure relaxations through an iterative local optimization process that estimates the gradients (atomic forces) using Density Functional Theory (DFT), which are in turn used to update atom positions until convergence. This very computationally expensive process typically requires hundreds of DFT calculations to converge (hours or days of compute per relaxation) and forms the basis of most computational catalysis efforts.

One approach to the IS2RE task is relaxation-based, i.e. using ML to approximate DFT relaxations. These models iteratively estimate atomic forces and update atomic positions until a relaxed state is reached and finally predict the energy of that state. Evaluation of the IS2RE task on models built for approximating DFT relaxations will help determine whether this approach is sufficiently accurate and fast for practical applications. These models have the additional benefit of predicting the relaxed structure and accelerating future DFT calculations.

Alternatively, it may be possible to develop direct approaches that predict the relaxed energy directly, without estimating intermediate relaxation states, as many of the changes during a relaxation (say due to particular initial guess strategies) are systematic. These direct IS2RE approaches may lead to even greater improvements in computational efficiency.

As such, we place no restrictions on the possible ML approaches to solve this task and used to participate in this challenge. We encourage submissions that are significantly more computationally efficient than DFT. For example, a standard relaxation using DFT takes 8-10 hours, while ML approaches are desired that can bring this down to < 10 seconds per relaxation or < 1 second per direct prediction, at least a 1000x improvement!

To ensure consistent and fair evaluation, we use a public evaluation server hosted on EvalAI.

Dataset

The challenge will be conducted on the Open Catalyst Dataset (OC20). OC20 training and validation data are available here. A new test-challenge-2022 split has been released here specifically for this challenge. This is to ensure there is no overfitting on the test data through repeated submissions.

OC20 contains approximately ~1.2M DFT relaxations. Due to its significant scale, the dataset required over ~200M hours of compute to generate. Computation was performed on servers Facebook has committed to be 100% supported by renewable energy since 2020. Each relaxation contains a series of structures as the atoms move from an initial structure to a relaxed structure obtained through a standard local minimizer built in the computational chemistry code. Structures contain the atoms corresponding to the adsorbate and catalyst. The initial structures are heuristically determined and the relaxed structures correspond to a state in which the atoms are at a local energy minima. Since each step in the relaxation may be used for training / evaluation, the total number of simulation points is over ~264M! The largest training split has ~134M simulation points. However, for this challenge, only the 2M dataset may be used for training. For each structure, DFT computed system energies, per-atom forces and per-atom positions are available as annotations.

The OC20 validation and test splits have several subsplits to help evaluate a model's performance on interpolative and extrapolative tasks. A model's interpolative ability is evaluated on samples from the same distribution as the training dataset (In Domain). Extrapolation is evaluated on two dimensions -- new adsorbates and new catalyst compositions. Subsplits are created by considering all combinations of potential extrapolations -- Out-of-Domain Adsorbate (OOD Adsorbate), OOD Catalyst, and OOD Both (both unseen adsorbate and unseen catalyst compositions).

Summary of all evaluation splits


Split Size Max submissions Metrics Results Leaderboard
val ~100k - Energy MAE, EwT On EvalAI -
test ~100k 10 Energy MAE, EwT On EvalAI On opencatalystproject.org all year round
test-challenge-2022 ~100k 10 Energy MAE, EwT Will be announced at NeurIPS '22 Will be announced at NeurIPS '22

Evaluation

All submissions to the Open Catalyst Challenge will be made to the EvalAI server and evaluated on the following metrics:

  • Energy MAE: mean absolute error between the predicted relaxed energy and the DFT-computed ground-truth relaxed energy.
  • Energy within Threshold (EwT): the percentage of predicted relaxed energies within 0.02 eV of the DFT-computed ground-truth relaxed energy.

Challenge winners will be decided based on the Energy MAE metric.

The challenge will have a single track, wherein participants are allowed to train on the IS2RE dataset (size 460k) and/or the S2EF 2M dataset. We are expanding the scope from last year to include and encourage training on intermediate trajectory data from S2EF (in addition to IS2RE direct data) because we have seen that to consistently improve performance. We acknowledge that resource availability may become a bottleneck for some participants and hence will not be accepting entries that train on S2EF splits larger than 2M.

The table below summarizes training compute costs and accuracies of various direct and relaxation-based IS2RE baseline approaches. Training a relaxation-based GemNet-OC S2EF-2M model is about twice as expensive as training a direct Graphormer model, but improves energy MAE by ~20%! Training on S2EF-All (in red) improves accuracies further but gets prohibitively expensive, and hence training on S2EF-All is not allowed for challenge entrants. Our non-challenge test evaluation server and leaderboards are open all year round for these larger trained models. The North star of this challenge will be to marry the accuracy of relaxation-based models with the efficiency of direct models.

Model IS2RE approach Training dataset Training GPU hours Test Energy MAE
GemNet-dT
(paper, code)
Direct IS2RE 75 0.634
PaiNN
(paper, code)
Direct IS2RE 518 0.573
Graphormer
Open Catalyst Challenge 2021 winner
(paper, code)
Direct IS2RE 568 0.538
GemNet-dT
(paper, code)
Relaxation S2EF-2M 863 0.438
GemNet-OC
(paper, code)
Relaxation S2EF-2M 1183 0.407
PaiNN
(paper, code)
Relaxation S2EF-All 1600 0.471
GemNet-dT
(paper, code)
Relaxation S2EF-All 11820 0.400
GemNet-OC
(paper, code)
Relaxation S2EF-All 8067 0.355

Participants will be prompted while making submissions to EvalAI to confirm that they didn't use any training data outside of the IS2RE and the S2EF-2M datasets. Data augmentation and pretraining is permitted as long as it comes only from the IS2RE or S2EF-2M datasets. Using DFT is not allowed.

We will be inviting the winner and runner-up teams, and optionally teams with interesting entries (e.g. best direct approach) for oral presentations at NeurIPS 2022.

Submission Guidelines

To participate in the Open Catalyst Challenge, create a team on EvalAI and upload submissions to the "Predicting relaxed state energy from initial structure (IS2RE) -- Test-challenge-2022" phase:

Submissions must be an `.npz` numpy binary file in the following format:

          {
            "challenge_ids": array(['0', '1', ...]),
            "challenge_energy": array([-3.63920, -1.08237, 12.92103, ...,])
          }
                      
where both `challenge_ids` and `challenge_energy` are arrays of size 100010.

          >>> data["challenge_ids"].dtype, data["challenge_ids"].shape
          (dtype('<U6'), (100010,))
          >>> data["challenge_energy"].dtype, data["challenge_energy"].shape
          (dtype('float64'), (100010,))
                      
A dummy submission file is available here. Please use this for verification / debugging purposes. We also provide helper code and docs for training models and preparing EvalAI submission files for the IS2RE task here.

Please reach out to us on the discussion forum or via email [1, 2] if you have any questions or concerns regarding the challenge.

The OC20 dataset paper has more details on how the OC20 dataset was created, the various tasks and evaluation metrics, and performance of baseline ML algorithms. The paper is accompanied by our constantly-evolving OCP codebase that provides implementations of several state-of-the-art graph neural network algorithms.
Please consider citing the following if you use it in your work:

@article{ocp_dataset,
    author = {Chanussot*, Lowik and Das*, Abhishek and Goyal*, Siddharth and Lavril*, Thibaut and Shuaibi*, Muhammed and Riviere, Morgane and Tran, Kevin and Heras-Domingo, Javier and Ho, Caleb and Hu, Weihua and Palizhati, Aini and Sriram, Anuroop and Wood, Brandon and Yoon, Junwoong and Parikh, Devi and Zitnick, C. Lawrence and Ulissi, Zachary},
    title = {Open Catalyst 2020 (OC20) Dataset and Community Challenges},
    journal = {ACS Catalysis},
    year = {2021},
    doi = {10.1021/acscatal.0c04525},
}
                  

NO PURCHASE NECESSARY TO ENTER/WIN. A PURCHASE WILL NOT INCREASE YOUR CHANCES OF WINNING. Submission Period begins September 21, 2022 at 12:00:00 am UTC and ends October 7, 2022 at 11:59:59 pm UTC. Open to legal residents of the Territory, 18+ & age of majority. "Territory" means any area, country, state, territory, or province where United States or local laws do not prohibit participating or receiving a prize in the Challenge and excludes any country or jurisdiction that is the target of U.S., EU, United Nations, or UK comprehensive trade sanctions (e.g., Crimea, Donetsk, and Luhansk regions of Ukraine, Cuba, North Korea, Iran, and Syria, as such list may be amended). Void outside the Territory and where prohibited by law. Participation subject to Official Rules. See Official Rules for entry requirements, judging criteria and full details. Winners are invited to attend & present at the Open Catalyst Challenge workshop at NeurIPS in December 2022. Winners are responsible for all costs to attend workshop/conference, including conference registration fee. Sponsors: Meta Platforms, Inc., 1 Hacker Way, Menlo Park, CA 94025 USA and Carnegie Melon University., 5000 Forbes Ave, Pittsburgh, PA 15213.

Overview

The Open Catalyst Challenge invites participants to help in addressing the pressing challenges faced by the world due to energy scarcity and climate change. In this area, a critical problem is the discovery of new catalysts for driving efficient and carbon neutral means for energy storage and generation. A common approach in discovering high performance catalysts is using molecular simulations, where simpler surrogate descriptors are generated to correlate with experimental measurements of catalyst activity and selectivity. The task for this year’s challenge is to design new machine learning models to predict the outcome of catalyst simulations used to understand activity.

Specifically, each simulation models the interaction of a catalyst surface with adsorbates that are commonly seen in electrochemical reactions. The simulations correspond to local relaxations of the atomic positions to identify local minima. ML models are trained to predict the energies of the adsorbate-catalyst system at the local minima, known as the “relaxed state”, starting from a provided initial state. By predicting these interactions accurately, the catalyst's impact on the overall rate of a chemical reaction may be estimated; a key factor in filtering potential electrocatalysis materials and addressing the world's energy needs.

Dates

NeurIPS session

The Open Catalyst Challenge session at NeurIPS 2021 is scheduled for Tuesday, December 07, 2021, 18:05 GMT (or 10:05 AM PST) onwards. Note that it is mandatory to register for NeurIPS to attend the session.

The session starts with a broadcast of the challenge overview and results at 18:05 GMT on the NeurIPS website here, followed by a breakout session on Zoom.

Schedule


18:05 - 18:25 GMT Challenge overview, results and analysis [video] Abhishek Das, Muhammed Shuaibi, Aini Palizhati
18:25 - 18:30 GMT Buffer for attendees to join Zoom [room link] -
18:30 - 18:50 GMT Runner-up talk + Q&A [video] Innopolis AI
18:50 - 19:10 GMT Winner talk + Q&A [video] Microsoft Research Asia (previously "MachineLearning")
19:10 - 19:30 GMT Discussion [video] Attendees, participants, organizers

Challenge results

The Open Catalyst Challenge received 30 submissions in total from 7 teams. All submissions were to the IS2RE-only track, and were evaluated on the test-challenge dataset split consisting of the following 4 subsplits:

  • test-like: similar to OC20 test and used to pick winners
  • rotated: used to evaluate rotational invariance
  • anomalous: structures with desorptions and dissociations
  • dense: dense sampling of adsorbate placements for evaluating recall of lowest energy site
Microsoft Research Asia won the challenge with an energy MAE of 0.5474 eV on test-like.

Energy MAE (eV)
Rank Team Test-like Rotated Anomalous Dense
1 Microsoft Research Asia (previously "MachineLearning")
Guolin Ke1, Chengxuan Ying2, Shuxin Zheng1, Di He1, Jiacheng You3, Yihan He4
1=Microsoft Research Asia, 2=Dalian University of Technology, 3=Tsinghua University, 4=Carnegie Mellon University
0.5474 0.5467 1.0312 0.6353
2 Innopolis AI
Rostislav Grigoriev, Ruslan Lukin, Adel Yarullin, Max Faleev
Innopolis University, Russia
0.6180 0.6170 1.1859 0.6839
3 Up and Atom
Adam Maximilian Wilson, Sam Walton Norwood, Peter Bjørn Jørgensen
Technical University of Denmark
0.6694 0.6707 1.1402 0.7398
3 DIVE @ TAMU
Limei Wang, Yuchao Lin, Xiner Li, Jingtun Zhang, Yi Liu, Shurui Gui, Keqiang Yan, Shuiwang Ji
Texas A&M University
0.6710 0.6712 1.1810 0.7398
5 RedSeaSeed
Hao Yu
King Abdullah University of Science and Technology
0.6830 0.6811 1.1876 0.7435
6 air
Alexey Korovin, Roman Eremin, Innokentiy Humonen, Artem Vasilyev, Vladimir Lazarev, Semeon Budennyy
Artificial Intelligence Research Institute, Moscow
0.6973 0.6999 1.3089 0.7594
7 EnergyNet
Mayank Baranwal1, Nawaf Alampara2, Ravi Bhadauria3
1=Tata Consultancy Services Research and Innovation, Mumbai, Indian Institute of Technology, Bombay, 2=QpiVolta Technologies Pvt. Ltd., Bengaluru, India, 3=Etsy, New York, USA
0.7351 0.8842 1.3399 0.8033

IS2RE Task

The challenge will consist of one primary task -- Initial Structure to Relaxed Energy (IS2RE) [1]. Here the input consists of the atomic positions for an initial structure, and the goal is to predict the energy of the structure’s relaxed state.

Relaxed energies are a critical indicator in determining the reaction rate resulting from the use of a catalyst. By placing an adsorbate in multiple locations above a catalyst's surface and relaxing the structure, the binding site between the adsorbate and catalyst with the lowest relaxed energy can be determined. This lowest energy binding site is likely to be the one realized in practice under experimental conditions. The relaxed energy of the lowest energy binding site is also highly correlated with the reaction rates or selectivity of the chemical reaction. If successful, these techniques could be used to screen millions or even billions of potential catalyst materials for the chemical reactions involved in renewable energy storage and solar fuel generation.

Traditionally, relaxed energies are found by first performing structure relaxations through an iterative local optimization process that estimates the gradients (atomic forces) using Density Functional Theory (DFT), which are in turn used to update atom positions until convergence. This very computationally expensive process typically requires hundreds of DFT calculations to converge (hours or days of compute per relaxation) and forms the basis of most computational catalysis efforts.

One approach to the IS2RE task is using ML to approximate DFT relaxations i.e. iteratively estimate atomic forces and update atomic positions until a relaxed state is reached and finally predict the energy of that state. Evaluation of the IS2RE task on models built for approximating DFT relaxations will help determine whether this approach is sufficiently accurate and fast for practical applications. These models have the additional benefit of predicting the relaxed structure and accelerating future DFT calculations. Alternatively, it may be possible to predict the relaxed energy directly, without estimating intermediate relaxation states, as many of the changes during a relaxation (say due to particular initial guess strategies) are systematic. These direct IS2RE approaches may lead to even greater improvements in computational efficiency. As such, we place no restrictions on the possible ML approaches to solve this task and used to participate in this challenge. We encourage submissions that are significantly more computationally efficient than DFT. For example, a standard relaxation using DFT takes 8-10 hours, while ML approaches are desired that can bring this down to < 10 seconds per relaxation or < 1 second per direct prediction, at least a 1000x improvement!

To ensure consistent and fair evaluation, we use a public evaluation server hosted on EvalAI.

Dataset

The challenge will be conducted on the Open Catalyst Dataset (OC20). OC20 training and validation data are available here. A new test-challenge split has been released here specifically for this challenge. This is to ensure there is no overfitting on the test data through repeated submissions.

OC20 contains approximately ~1.2M DFT relaxations. Due to its significant scale, the dataset required over ~70M hours of compute to generate. Computation was performed on servers Facebook has committed to be 100% supported by renewable energy since 2020. Each relaxation contains a series of structures as the atoms move from an initial structure to a relaxed structure obtained through a standard local minimizer built in the computational chemistry code. Structures contain the atoms corresponding to the adsorbate and catalyst. The initial structures are heuristically determined and the relaxed structures correspond to a state in which the atoms are at a local energy minima. Since each step in the relaxation may be used for training / evaluation, the total number of simulation points is over ~264M! The largest training split has ~134M simulation points. For each structure, DFT computed system energies, per-atom forces and per-atom positions are available as annotations.

The OC20 validation and test splits have several subsplits to help evaluate a model's performance on interpolative and extrapolative tasks. A model's interpolative ability is evaluated on samples from the same distribution as the training dataset (In Domain). Extrapolation is evaluated on two dimensions -- new adsorbates and new catalyst compositions. Subsplits are created by considering all combinations of potential extrapolations -- Out-of-Domain Adsorbate (OOD Adsorbate), OOD Catalyst, and OOD Both (both unseen adsorbate and unseen catalyst compositions).

Summary of all evaluation splits


Split Size Max submissions Metrics Results Leaderboard
val ~100k - Energy MAE, EwT On EvalAI -
test ~100k 10 Energy MAE, EwT On EvalAI On opencatalystproject.org all year round
test-challenge 120k 10 Energy MAE, EwT Announced at NeurIPS '21 Announced at NeurIPS '21

Evaluation

All submissions to the Open Catalyst Challenge will be made to the EvalAI server and evaluated on the following metrics:

  • Energy MAE: mean absolute error between the predicted relaxed energy and the DFT-computed ground-truth relaxed energy.
  • Energy within Threshold (EwT): the percentage of predicted relaxed energies within 0.02 eV of the DFT-computed ground-truth relaxed energy.

Challenge winners will be decided based on the Energy MAE metric.

We acknowledge that resource availability may become a bottleneck for some participants given the large size of the OC20 trajectory data (~134M training points). Thus, as detailed on the discussion forum, we will be recognizing 2 winners for the challenge based on:

  1. The best overall performance with no constraints on data used
  2. The best performance using ONLY the IS2RE dataset (size 460,328)
Participants will be prompted while making submissions to EvalAI to specify whether they used only the IS2RE dataset or not. Participants submitting to track (2) are prohibited from using any other datasets and/or pretrained S2EF models. Data augmentation is permitted as long as it comes ONLY from the IS2RE dataset. Pretraining in any form that uses S2EF data will not be allowed for track (2). Participants submitting to track (1) are free to use any dataset. Participants are free to participate in both tracks if they wish. We will be inviting the winners of each track for an oral presentation at NeurIPS 2021. If a single team wins both tracks, we will additionally invite the second place team of track 2 to present. Using DFT is prohibited for both tracks.

Submission Guidelines

To participate in the Open Catalyst Challenge, create a team on EvalAI and upload submissions to the "Predicting relaxed state energy from initial structure (IS2RE) -- Test-challenge" phase:

Submissions must be an `.npz` numpy binary file in the following format:

          {
            "challenge_ids": array(['0', '1', ...]),
            "challenge_energy": array([-3.63920, -1.08237, 12.92103, ...,])
          }
                      
where both `challenge_ids` and `challenge_energy` are arrays of size 120000.

          >>> data["challenge_ids"].dtype, data["challenge_ids"].shape
          (dtype('<U6'), (120000,))
          >>> data["challenge_energy"].dtype, data["challenge_energy"].shape
          (dtype('float64'), (120000,))
                      
A dummy submission file is available here. Please use this for verification / debugging purposes. We also provide helper code and docs for training models and preparing EvalAI submission files for the IS2RE task here.

Please reach out to us on the discussion forum or via email [1, 2] if you have any questions or concerns regarding the challenge.

The OC20 dataset paper has more details on how the OC20 dataset was created, the various tasks and evaluation metrics, and performance of baseline ML algorithms. The paper is accompanied by our constantly-evolving OCP codebase that provides implementations of several state-of-the-art graph neural network algorithms.
Consider citing the following if you use it in your work:

@article{ocp_dataset,
    author = {Chanussot*, Lowik and Das*, Abhishek and Goyal*, Siddharth and Lavril*, Thibaut and Shuaibi*, Muhammed and Riviere, Morgane and Tran, Kevin and Heras-Domingo, Javier and Ho, Caleb and Hu, Weihua and Palizhati, Aini and Sriram, Anuroop and Wood, Brandon and Yoon, Junwoong and Parikh, Devi and Zitnick, C. Lawrence and Ulissi, Zachary},
    title = {Open Catalyst 2020 (OC20) Dataset and Community Challenges},
    journal = {ACS Catalysis},
    year = {2021},
    doi = {10.1021/acscatal.0c04525},
}
                  

NO PURCHASE NECESSARY TO ENTER/WIN. A PURCHASE WILL NOT INCREASE YOUR CHANCES OF WINNING. Submission Period begins September 20, 2021 at 12:00:00 am UTC and ends October 6, 2021 at 11:59:59 pm UTC. Open to legal residents of the Territory, 18+ & age of majority. "Territory" means any country, state, or province where the laws of the US or local law do not prohibit participating or receiving a prize in the Challenge and excludes Cuba, Crimea, North Korea, Iran, Syria, Venezuela and any other jurisdiction or area designated by the United States Treasury's Office of Foreign Assets Control. Void outside the Territory and where prohibited by law. Participation subject to Official Rules. See Official Rules for entry requirements, judging criteria and full details. Winners are invited to attend & present at the virtual Open Catalyst Challenge session at NeurIPS on December 13 or 14, 2021. Winners are responsible for all costs to attend workshop/conference, including conference registration fee. Sponsor: Facebook, Inc., 1 Hacker Way, Menlo Park, CA 94025 USA.