AI for Science: Going Beyond GenAI and LLMs

Written by

March 20, 2024

Share this post

Artificial Intelligence (AI) has achieved significant success across various domains. This is mainly driven by progress in efficient and scalable machine learning algorithms, increased computational power, and the vast availability of data. For instance, AI has streamlined software development through automated code generation and bug detection, transformed content creation with tools capable of generating written, visual, and audio content, and revolutionized information search by delivering highly relevant search results.

The intersection of AI and the materials and chemical formulations industry is ripe with potential, promising revolutionary advancements in the way we discover, develop, and deploy new materials and chemical compounds. However, the unique challenges inherent in these fields require a nuanced approach, blending the broad capabilities of the newer types of AI with the focused precision of domain-specific models.

Materials and Chemical Formulations Industry Challenges

The materials and chemical formulations industries are essential in developing new products that fuel innovation across sectors such as energy, household products, packaging, and electronics. However, they face several challenges, including the need to accelerate discovery processes, optimize material properties for specific applications, and to reduce development costs and time. There is also a new and urgent focus on improving sustainability and minimizing toxicity in these products. Traditional experimental approaches to these challenges are almost always time-consuming, costly, and resource-intensive. AI and ML technologies have the potential to address these challenges and revolutionize the industry by predicting material behaviors, optimizing chemical formulations, and simulating experiments, while saving years of research and development costs.

Success and Limitations of Large Generative Models

Large Generative Models, like GPT and its successors, have demonstrated remarkable capabilities in understanding and generating human-like text, offering potential applications in content creation, language translation, and even code generation. In essence, the main function of large generative models is to capture distributions and statistics of the vast datasets they are trained on. They can then use this to generate new data. For example, language models learn to predict the next or masked tokens in a wide variety of text. In learning to perform this task on a large variety of text, the models build an internal representation of the world. This allows them to then be used in various downstream applications through prompt engineering and in-context learning.

It is tempting to generalize this and apply this to more complex domains such as material science and chemistry. We find that these models are good at summarization and do fairly well at data retrieval but struggle with precise quantitative tasks. See this report released by Microsoft Research AI4Science and Microsoft Azure Quantum for more details. The main challenges these models currently face are

Data Complexity and Specificity: Chemical and material science data is highly complex, often involving intricate structures, properties, and behaviors that general models are not trained to understand deeply. For instance, predicting material properties such as tensile strength or thermal conductivity from molecular structure involves intricate relationships that general AI models are not specifically trained to understand.
Domain-Specific Knowledge: While large models are trained on a wide array of data, they typically lack the depth of training in specialized areas of chemistry and materials science. This limits their effectiveness in generating insights that require deep domain knowledge.
Lack of Explainability: These models often cannot provide clear reasoning or provide inconsistent reasoning for their outputs, making it difficult for researchers to understand the rationale behind certain suggestions or hypotheses.

These are consequences of the generality of these models and it is unlikely that purely scaling them up will solve these issues.

Domain-Specific Models: Bridging the Gap with Science-Based AI

One approach to solve the above challenges involves training or fine-tuning extensive chemistry-specific datasets. Yet, the intricate nature of chemical interactions and the varying requirements of different applications makes training a general a challenge. For example, accurately modeling the performance of liquid electrolytes in batteries (link) requires a different focus than predicting the durability of coatings (link). To overcome this, it is necessary to develop models specialized for specific domains within materials and chemistry. Further, by integrating scientific knowledge and principles into these models, they can become more grounded and reliable. This enables

Targeted Training: By focusing on narrower datasets, these models can develop a deeper understanding of specific chemical interactions and material properties. For example, an AI model trained exclusively on data from organic photovoltaic materials can more accurately predict which new compounds might improve solar cell efficiency.
Incorporation of Scientific Principles: Integrating fundamental chemical and physical laws into the AI models can enhance their predictive accuracy and relevance to real-world applications. For example, incorporating equations governing Li-ion transport can help a model better predict the effectiveness of an electrolyte.
Collaboration Between AI and Domain Experts: The development of these models requires close collaboration between AI experts and domain specialists, ensuring that the models are both scientifically accurate and practically useful.

These models are then not just technically proficient but also deeply aligned with the specific needs of the materials and chemical formulations industry.

Future Directions: Best of Both Worlds

The rapid advances in machine learning techniques opens up more exciting possibilities in materials and chemistry when large general models are integrated with smaller, domain specific models. This combines the wide-reaching and creative capabilities of large models with the detailed focus of smaller specialized models. Many approaches have been proposed to do this and have been successfully implemented in specific domains. For example, the Mixture of Experts paradigm shows how various specialized models can be orchestrated together. Another recent approach is to combine a specialized model with a more general model is to combine them through a trainable cross attention block, a method called CALM. Success in solving complex geometry problems has also been seen with AlphaGeometry, which combines a large generative model with a deductive reasoning engine. These are only initial successes and challenges like creating frameworks that can effectively merge the strengths of both model types, finding compatible data, and preserving the interpretability of combined model outputs will need to be addressed for each specific domain. Despite these challenges, the potential and opportunities to accelerate the pace of innovation and development is limitless including developing more sustainable materials and energy efficient production processes. At NobleAI, we expand on this approach by generalizing the method to select, adapt and use any type of models that may be appropriate to a problem. We find this approach achieves the best results across a wide range of chemical and materials applications. These advances will also drive innovation in next-gen technologies across sectors – from energy storage solutions to advanced materials for construction and packaging. It will redefine our approach to product development in the materials and chemical formulations industry.

‍

Back to Resources