jan-janssen / langsim Goto Github PK

Application of Large Language Models (LLM) for computational materials science

Home Page: https://mybinder.org/v2/gh/jan-janssen/LangSim/HEAD?labpath=notebooks/demonstration.ipynb

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 43.37% Python 56.63%

langsim's Introduction

Run Calculations with a Large Language Model

The computational chemistry and computational materials science community have both developed a great number of simulation tools. Still these tools typically require either rather cryptic input files or at least a fundamental programming experience in a language like Python to control them. Furthermore, many questions are only answered in the documentation, like:

Which physical units does the code use?
Which inputs match to which variables in the equations in the paper?
...

We address this challenge by developing a Large Language Model (LLM) extension which provides LLM agents to couple the LLM to scientific simulation codes and calculate physical properties from a natural language interface.

Installation

Via pip

While our package is not yet available on the Python Package Index, you can install it directly using:

pip install git+https://github.com/jan-janssen/LangSim.git

The pip package includes optional dependencies for the mace model and the jupyter integration.

Via conda

As the conda package is not yet available on Anaconda.org still you can clone the repository and install the dependencies directly from conda using the environment.yml file.

Prerequisites:

git
conda

git clone https://github.com/jan-janssen/LangSim
cd LangSim
conda env create -f environment.yml --name LangSim

Via Docker Container

We build a docker container on every commit to the main branch. You can pull the container from the Docker Hub using:

docker run -p 8866:8866 ltalirz/langsim

Using the package

The package currently provides two interfaces, one for python / jupyter users to query the large language model directly from a python environment and a second web based interface.

Contributors

langsim's People

Contributors

Stargazers

Watchers

Forkers

ltalirz rajeshkochi444 ritesh001 giuseppefisicaro roysh utkarshp1161 fraricci leo3701 gengyuedong

langsim's Issues

consider switching basic ipython magic to `jupyter-ai`

Motivation

Enabling experimentalists to run simulations is a great long-term goal, but requires substantial work in vetting workflows, documenting them, and then actual testing with a broad audience of experimentalists in order to weed out edge cases and earn trust.

A much lower-hanging fruit is to make computational scientists more productive (and dogfooding always makes a product better). For this use case, hardcoded agents are typically too restrictive - instead, we want langsim to be a Copilot that helps us write Python code for running simulations.

This is possible via the ipython magics (basic implementation for an immediately executable "code response" here), but once you get into the details (e.g. streaming responses rather than having to wait for a lengthy code completion) it starts to get tricky.

jupyter-ai follows the same route, but adds extra features, including streaming responses, as well as other nifty UI integrations (e.g. copilot in side bar).

Downsides

adds significant extra dependency
at the time of the hackathon, jupyter-ai was not supported on python 3.12

Steps

As far as I understand, in order to connect our setup to jupyter-ai, we need to create a "Jupyter AI module", for which they offer a cookiecutter.

Jupyter AI already uses langchain, so that should help with the integration, but when I briefly looked into this during the hackathon (see code) was at the level of langchain_core.language_models.llms.LLM rather than at the level of agents/agent executors that we use in langsim.

I was not able to quickly determine whether this poses a problem; perhaps @chiang-yuan can give some pointers on whether establishing this link is straightforward or whether coupling agents to jupyter-ai is difficult with the current implementation.

Different LLM models for different tasks

Reading stuff around, I think a better approach to deal with a query from the user
and the different tool is to use dedicated LLM agents, possibly with different models. In particular, one agent at higher level that digest the initial user query and decide what to do. And another (or multiple) agent which are dedicated to generate json input for the different tools and gather the json output.

I think this is what LLaMP is also doing (@chiang-yuan could comment on that).
Also, it seems a general approach suggested by Groq when presenting their new
model, specifically tuned for tool calling: https://wow.groq.com/introducing-llama-3-groq-tool-use-models/.

Implement memory

At the moment memory is only available when using the ipython magic %%chat.

I think some type of memory should be implemented at the level of the agent executor, by using e.g. ConversationBufferWindowMemory.

I can give it a try.

name for the project

Our agent should have a name - let's collect one name per comment and use thumbs up to vote

error in JS console when opening voila app

OpenAPI Keys

waiting for approval

Implement ReAct

Maybe start from here

Install LangSim locally

There are currently different ways to install LangSim locally:

git clone https://github.com/jan-janssen/LangSim
cd LangSim
conda env create -f environment.yml --name LangSim

Currently, the code requires pymace which is not included in the environment.yml, it can be installed using:

conda install -c conda-forge pymace

Afterwards you can either install the LangSim package locally:

pip install .

Alternatively, if you do not want to install the package you can either set the python path:

export PYTHONPATH="/path/to/LangSimRepository":$PYTHONPATH

Or you execute the tests directly in the repository folder, in that case you have to copy the demonstration.ipynb notebook from the notebooks folder:

mv notebooks/demonstration.ipynb .

Demo

I converted out youtube video to a GIF at least the demo part:
https://www.youtube.com/watch?v=7JFncD9WaIY

Similar concepts in literature

List of papers with similar concepts:

(general) Leaderboard of LLMs: https://gorilla.cs.berkeley.edu/leaderboard.html

add mechanism for tool discovery

Motivation

For this project to grow successfully, I think we want the following to be true:

Adding a new tool should be as easy as possible
Adding a new tool should not "bloat" the software (download size, startup time, ...)

Thoughts

For 1. for the moment we could simply have one file per tool and autodiscover all @tools that are defined in this folder.
If we feel necessary, we could later even discover tools provided by other packages via Python entry points, but centralizing development in the beginning has advantages (much easier to refactor globally) and can go a long way.

For 2. we need

some solution to keep dependencies in check. different solutions can work, e.g.
- use python package extras to allow users to select the dependencies they want (adds complications though)
- do not install tool-specific dependencies automatically but rather have the tool prompt the user to install them when it's called
- ...
Tools must import their specific dependencies only when they are invoked, not at the tool discovery stage.
This is already relevant now, since it affects load time of the ipython magic

Challenges of Large Language Models

Open Source
Unfortunately most llama based and other free models fail to work with the tools defined by langchain. It works for single functions but already the current complexity of langsim they struggle.

ChatGPT

ChatGPT 3.5 turbo can execute the calculation of one nobel metal but fails to execute a loop over all nobel metals. It seems like the abstract structure of a loop which is implicitly defined is not clear to ChatGPT 3.
ChatGPT 4 works fine with one state available in branch working_with_chatgpt4 but fails with the current main branch with an JSONDecodeError.
ChatGPT 4o works fine with the latest changes - in particular the state in branch working_with_chatgpt4o. The interesting part is when it comes to the implicit loop ChatGPT 4 executes the steps (generate the crystal structure, equilibrate it and calculate the bulk modulus) for one element out of the nobel metals and then moves to the next, in contrast Chat GPT 4o first executes the first step of generating the crystal structure for all elements, then equilibrates all resulting structures and finally calculates the bulk modulus for all equilibrated structures.

The behaviour seems to be somewhat reproducible so I wanted to quickly summarise it here.

Tools we can potentially use

add open-source license

I think I speak for all members of the team when I say that we want this tool to be used as widely as possible.

If we want to be serious about the license, we need to do the following

decide on viral (GPL, LGPL, ...) vs permissive (MIT, BSD, Apache, ...) license
check license of dependencies (direct dependencies should suffice, assuming their licenses are compatible with their dependencies)
decide on entity name for authors
get ok from all authors (e.g. here on thread)
add license to repo

Benchmark with ChatGPT

Benchmark Query:

Calculate the bulk modulus for aluminium with ASE using the EMT simulation code.

ChatGPT 3.5 (public):

from ase import Atoms
from ase.calculators.emt import EMT
from ase.optimize import BFGS
from ase.io import write
import numpy as np

# Step 1: Create aluminum structure
aluminum = Atoms('Al', [(0, 0, 0)], cell=(5, 5, 5))

# Step 2: Relax the structure
relaxation = BFGS(aluminum)
relaxation.run(fmax=0.01)

# Step 3: Apply hydrostatic pressure
pressures = np.linspace(0, 10, 11)  # Example pressures in GPa
energies = []

for pressure in pressures:
    volume_scaling_factor = (1 - 3 * pressure * 1e-9) ** (1 / 3)
    strained_structure = aluminum.copy()
    strained_structure.set_cell(aluminum.cell * volume_scaling_factor, scale_atoms=True)
    
    # Step 4: Calculate energy
    strained_structure.set_calculator(EMT())
    energy = strained_structure.get_potential_energy()
    energies.append(energy)

# Step 5: Fit equation of state
# (Not included here, but you can use scipy.optimize.curve_fit)

# Step 6: Determine bulk modulus
# (From the fitted equation of state)

ChatGPT 4.0 (paid):

from ase.build import bulk
from ase.calculators.emt import EMT
from ase.eos import calculate_eos

# Create a bulk aluminium structure (face-centered cubic)
al = bulk('Al', 'fcc', a=4.05)  # You can adjust 'a' to experimental lattice constant

# Set up the EMT calculator
al.calc = EMT()

# Calculate the equation of state
eos = calculate_eos(al)
v0, e0, B = eos.fit()

# Print the bulk modulus (B is in eV/Angstrom^3 which can be converted to GPa)
print(f"Bulk Modulus: {B / 1.60219e-19 * 1e21 / 1e9} GPa")

The answer for ChatGPT 4.0 is nearly correct, its just the unit conversion:

from ase.units import kJ
print(B / kJ * 1.0e24, 'GPa')

This is even given in the DocString of the corresponding function:

Signature: eos.fit(warn=True)
Docstring:
Calculate volume, energy, and bulk modulus.

Returns the optimal volume, the minimum energy, and the bulk
modulus.  Notice that the ASE units for the bulk modulus is
eV/Angstrom^3 - to get the value in GPa, do this::

  v0, e0, B = eos.fit()
  print(B / kJ * 1.0e24, 'GPa')

So ChatGPT is nearly right but still wrong and unless you have expert knowledge the answer is useless. I guess also for our final presentation, comparing to existing models is going to be important to highlight what we achieved.

Additional experimental references

Currently we have interfaces for:

mendeleev - with the get_element_property_mendeleev() function which is replaced by the get_chemical_information_from_mendeleev() function in #58
wikipedia - with the get_experimental_elastic_property_wikipedia() function
periodictable.com - with the get_chemical_information_from_wolframalpha() function as introduced in #60

Additional sources would be:

materialproject - as implemented by @chiang-yuan in https://github.com/chiang-yuan/llamp
CALPHAD - this requires the collection of so called tdb files, which partially commercial. Still a few open ones are available for example from opencalphad, pycalphad and nist .

Any suggestions for other experimental databases are very welcome.

test suite

experience shows that the docstrings describing the tools are rather brittle, even for a transition between gpt4 and gpt4o #48

we should implement a rudimentary test suite that can check, for a few exemplary use requests, whether (a) the right tools are used, and (b) whether the result is correct

see e.g. https://docs.smith.langchain.com/old/cookbook/testing-examples for inspiration

tool that interfaces with scheduler for long-running tasks

Motivation

The current implementation of tools works for fast toy calculations, but scientifically relevant calculations in chemistry and materials science often make tradeoffs between compute cost and accuracy that results in calculations that run several hours or days, even on powerful hardware.

In the current implementation, the notebook will be blocked for the time of the calculation the calculation will be killed once the ipython kernel is stopped.

We would therefore like langsim to be able to submit computationally intensive tasks to remote scheduling systems, check the status of these calculations, and retrieve the result once they have completed.

Thoughts

I think this is a tough one to make user friendly, particularly if you think about the original target audience: an experimentalist wanting to run calculations.
Do we ask them to install slurm on their local work station (they may be running Windows)? Do they need to apply for computational time on a HPC resource (and then figure out how to run the simulation code they need there)? I think with such asks we already lose a large fraction of the target audience.

The only feasible way I see for letting someone without HPC expertise run on HPC is either

A) Their computational colleagues configure the setup for them
B) They get some cloud account and let langsim connect to a dedicated cloud service for running DFT calculations with a well-defined API [1]

That said, adding the basic functionality for interacting with schedulers is certainly feasible, if the user can provide all necessary information (credentials for connecting, scheduler type, partitions you have access to, where codes are located, etc.).

There is some light at the end of the tunnel, as also academic HPC centers are moving from giving users SSH access to REST APIs (example), but this process is still underway and to my knowledge no clear standard has emerged.

Also, none of the APIs I've seen so far offer a mechanism for discovering the simulation codes that are installed and how to module load them... perhaps we could draft a specification for how we would like such an API to look like and then approach HPC centers with this idea.

[1] Or, if that is not available, some HPC cluster template with pre-installed software in standard locations (e.g. there are interesting efforts like the CernVMFS build cache from Compute Canada or also the spack build caches), but that already adds a lot of complexity.

Brainstorming

Transfer convergence checks from one materials property to another, e.g. the dependence on the energy cutoff and kpoint mesh.
Collaboration with experiment
Teaching - give students easy access to simulation.
Collect expert knowledge for specific calculation e.g. VASP input files.
Write a custodian for quantum espresso - maybe analyse a heterogeneous database like Nomad.

Debugging in LangChain

To see all the messages langchain generates when communicating with the different LLM models, you can add the following lines: https://python.langchain.com/v0.1/docs/guides/development/debugging/

from langchain.globals import set_debug
set_debug(True)

Mitigating unreliable docstring though Graphs?

Problem Description

As detailed in Challenges of Large Language Models #48, there is challenge in reproducing results with complex workflows.

Explore

Representing tools (functions) as nodes in a graph, with edges indicating the next logical steps in the workflow for the LLM.

Example:
Query: Calculate the bulk modulus of Aluminum (Al).
Probable Response: The LLM identifies four potential workflows to compute the bulk modulus. By selecting the shortest workflow, it calculates and provides the bulk modulus of Aluminum.

jan-janssen / langsim Goto Github PK

langsim's Introduction

Run Calculations with a Large Language Model

Installation

Via pip

Via conda

Via Docker Container

Using the package

Contributors

langsim's People

Contributors

Stargazers

Watchers

Forkers

langsim's Issues

Motivation

Downsides

Steps

Motivation

Thoughts

Motivation

Thoughts

Problem Description

Explore

References

Recommend Projects

Recommend Topics

Recommend Org