Data-driven materials design

Our society relies on innovative materials to bust its progress. This link exists since the dawn of the civilization – some of the first periods of human kind are named based on dominant material used (Stone Age, Iron Age …). As progress of science pushes the boundaries of knowledge, related to materials this reflects in the urge to go smaller. Today we live in a ‘Nanomaterial Age’, with science outreach to design material one atom at the time, such as e.g. in a single atom catalyst. This is hugely supported since the last decades of XX century by the development of a computational methods implementing quantum mechanics (to some approximation). 

Last years, and in particular rise of deep learning, promise yet another materials informatics revolution. 

“Data‐driven science is heralded as a new paradigm in materials science. In this field, data is the new resource, and knowledge is extracted from materials datasets that are too big or complex for traditional human reasoning—typically with the intent to discover new or improved materials or materials phenomena.”

L.Himanen et al. in Advanced science 6 (2019) 1900808

Valuable information on structure-property relation and patterns of knowledge-driven materials design might lay silent in a large amount of data (both experimental and theory-based) produced through specific research. The first step in extracting the information from them using deep learning is a good dataset to start with. In this story, let’s go to the foundation of such effort and contribute by listing selected major open source databases for materials science. In particular, open-source inorganic crystal-based materials databases which are easy to use are in focus.

1. Materials project
Large database of over 124000 inorganic compounds, 35000 molecules, and over 500000 porous materials. The database contains a variety of data obtained from the state-of-the-art electronic structure methods and REST API feature. Besides, various property prediction tools such as materials explorer, battery explorer, or predictor of interface reactions among solids are also available.
AFLOW provides a method for high-throughput computational materials design, by automating calculations using ab initio electronic structure methods such as Quantum Espresso. It has a globally available database with over 3 million materials compounds and 150 times more calculated properties, e,g, Bader charges, elastic properties. Aflowlib REST API enables users to search and download results from the Aflowlib.org consortium. Also, it offers few machine-learning-based predictors, to predict e.g. electronic and thermomechanical properties, or vibrational free energies and entropies of a crystal.
Open-access collection of experimental data regarding crystal structures of organic, inorganic, metal-organic compounds and minerals. With over 400 000 entries, it also features REST API for easy querying by formula, containing elements, or lattice parameters.
4. Open quantum materials database
A database containing DFT calculated thermodynamic and structural properties of over 600 000 materials. REST API is also available.
5. Catalysis Hub
A web-platform for sharing data and software for computational catalysis research includes the Surface Reactions database. It contains over 100 000 thousand reaction energies and barriers from density functional theory (DFT) calculations on surface systems. Additionally, information on the optimized structure and calculation details are included, as well as many features such as the search for specific reaction energies, transition states, exploration of activity maps, machine learning models… The database can be queried via a GraphQL API, which can also be accessed directly by using curl.
6. NOMAD repository, a place to publish your DFT data, following FAIR principles. It contains the input and output files from more than 100 million high-quality calculations. For registered users advanced search and data analytics tools are available.