close
close

Algorithm for representing three-way interaction for neural networks with crystal graphs

Algorithm for representing three-way interaction for neural networks with crystal graphs

Within this framework, the initial step involves representing the crystal structure using a graph neural network. This network encodes atomic information and depicts the connecting interactions between atoms in the form of a crystalline graph. Subsequently, a graph convolutional neural network is constructed based on this graph representation. Finally, predictions of target properties are obtained through the training process. In this experiment, data calculated using density functional theory (DFT) serves as a training data set, allowing the graph neural network structure to achieve optimal representation of material structures and make accurate predictions of target properties.

Dataset

The dataset for this work comes from the well-known open access materials database Materials Project (MP).31. All data are checked against three constraints: (i) using GGA-PBE as the exchange-correlation functional, (ii) setting the kinetic energy threshold at 520 eV, and (iii) ensuring that the accuracy mode is set to “Accurate” or “High”. These restrictions make the data as reliable and comparable as possible. To ensure the generalizability of the model, the selected data set includes inorganic crystals ranging from simple metals to complex compounds. The dataset contains a total of 23,744 data records, covering 87 different elements, 7 crystal systems, and 230 space groups. In addition to the usual binary, ternary and quaternary compounds, the data set includes complex compounds containing more than 5 elements in the range of number of atoms per unit cell from 1 to 200. The specific distribution of the data is shown in Fig. 3. Training The validation, validation, and testing sets were divided in a ratio of 8:1:1: about 19,000 in the training set and about 2,400 in the test set. In addition, the test set covered the types of chemical elements in the training set (from mono compounds to hepta compounds) and contained octa compounds (which did not appear in the training set). The test kit also contains unusual noble gases such as helium and other elements. We show the proportions of compounds in the data set: a large proportion are binary, ternary and quaternary compounds, as well as those containing more complex compounds. In addition, we show the scattered distribution of the data in Fig. 3a and b, as well as the histogram graph in Fig. 3c. Quantile-quantile plots, which help evaluate whether a data set is plausibly drawn from some theoretical distribution such as normal or exponential, are shown in supplementary figures.

Rice. 3
figure 3

Structural distribution of all data sets, (A) is intended for a set of trains and (b) is intended for the test set. (With) – distribution of the number of different types of elements.

Setting and optimizing parameters

To build a material graph neural network, we first use One-hot coding to initialize the input characteristics of the crystal structure and define the input data for the graph neural network model. The feature vector sizes are as follows: starting with 19 bits for elemental group number, 7 bits for periodic element number, 10 bits for covalent radius, 10 bits for electronegativity, 10 bits for first ionization energy, 10 bits for electron affinity, 4 bits in total for each orbital (s, p, d and f), 12 bits for atomic volume and 10 bits for valence electrons. A total of 92 bits are synthesized to encode atomic input functions. However, the edge vector function is different because it is based on encoding the interatomic distance. For the interatomic distance, we set the maximum radius of the atomic local environment to be 8 Å and the maximum number of nearest neighbor atoms to be 12. And the interatomic distance is encoded using RBF (radial basis function), thus obtaining the input edge feature vector to be 80-dimensional. To encode atomic input features, we follow 10 range-based categories for continuous values ​​and encode features separately. For continuous property values, the range of values ​​is evenly divided into 10 categories, and then the vectors are encoded according to which range those property values ​​fall into. More precisely, the range of values ​​of the “continuous values” attribute is divided equally into ten categories, encoded in a one-hot encoding form, with the position corresponding to the category coded as 1 and the remaining positions as 0. For discrete values, we directly encode them based on predefined categories. The following Table 1 presents the measurements after encoding the atomic attributes.

Table 1. Coding of atomic attributes and their dimensions.

When training neural network models, the choice of batch size plays a significant role in the results. This directly impacts learning speed, memory usage, model convergence, and parameter update stability. Therefore, choosing the appropriate batch size is critical. To achieve better training results, we conducted tests on batches of different sizes, the test results are shown in Fig. 4, more detailed information can be obtained from additional tables.

Rice. 4
figure 4

Effect of batch size on model training time and accuracy. The line graph shows the changes in training time and test results for different batch sizes. You can visualize the impact of different batch sizes on learning.

From Fig. 4, it becomes obvious that with an increase in the number of epochs on the same test set, the mean absolute error (MAE) gradually decreases. Batch size has a significant impact on both training time and accuracy. For example, when running for 300 epochs, MAE is minimized at a batch size of 64, while a batch size of 128 corresponds to a shorter training time. The best trade-off between training time and accuracy is achieved with a batch size of 128 for 400 and 500 epochs. Therefore, choosing a packet size of 128 seems to be the most appropriate option.

Additionally, parameters related to the training process include the use of the AdamW optimizer.32improvement over Adam+L2 regularization. The learning rate is set to 0.001 and the activation function uses the Tanya function, which allows the model to obtain better learning and representation capabilities. In terms of neural network size settings, the maximum radius of the local environment of an atom is set to 8 Å, and the maximum number of nearest neighbor atoms interacting with the central atom is limited to 12. The input atom embedding vectors are 92-dimensional as described previously, and the input edge embedding vectors span 80 dimensions, reflecting a combined representation of interatomic distances and interactions. To account for the variability of the material data set and minimize the impact of outliers on the results, MAE is used as a function of cost and coefficient of determination. \(R^{2}\) is used to evaluate the fitting characteristics of the model.

In the model, graph convolutional layers capture interactions involving the central atom and its immediate neighbors, allowing information to propagate through the convolution of atom feature vectors and edge feature vectors, resulting in a more accurate characterization of the local environment of the central atom. In this experiment, interactions between a central atom and its 12 immediate neighbors are characterized and propagated through multiple convolution layers throughout the graph. The number of convolution layers directly affects the extent of this diffusion, allowing the central atom to interact with atomic elements over large distances. Notably, in the crystal structure model, closer atoms have stronger interactions with the central atom, while more distant atoms have weaker interactions. To determine the optimal number of convolutional layers, a series of tests were run under the following conditions: 300 epochs, test set size 3000, and consistent hyperparameters from previous settings. The results are presented in Table 2.

From Table 2, we can see that with only two convolutional layers, although the training time is shorter, the model performance is significantly worse. On the other hand, as the number of convolutional layers increases from 3 onwards, the training time also increases, but the payoff in terms of model performance decreases. Thus, the most favorable results are achieved by using three convolutional layers.

Table 2. Effect of different numbers of convolutional layers on the prediction results of the reservoir energy model.

Reservoir energy forecast

In this experiment, a previously optimized set of parameters was used to train the model and predict the values ​​of the energetic properties of inorganic compound formation. Results, in terms of MAE and \(R^{2}\) the values ​​are presented in Fig. 5. As shown in Fig. 5, the trained model achieved a mean absolute error (MAE) of 0.021 eV/atom with a coefficient of determination (\(R^{2}\)) 0.9981. Meanwhile, the tested model showed an MAE of 0.050 eV/atom s \(R^{2}\) value 0.9916.

Rice. 5
number 5

The effectiveness of the model in predicting the energy of formation of crystalline compounds, (A) is intended for a set of trains and (b) is intended for the test set.