I'm having a concern regarding the MBD benchmark. The protocol suggests training on th

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Benchmark design questions about matbench-discovery HOT 15 CLOSED

janosh commented on June 1, 2024

Benchmark design questions

from matbench-discovery.

Comments (15)

CompRhys commented on June 1, 2024 3

All you need to do for the benchmark is predict E_f we pre calculate everything else. This is intentional to reduce the surface area where people might make mistakes.

All the wbm data prep code is here: https://github.com/janosh/matbench-discovery/blob/main/data/wbm/fetch_process_wbm_dataset.py

That said if you have explicit errors feel free to post the trace here (in another issue for clarity) or better yet in pymatgen.

from matbench-discovery.

CompRhys commented on June 1, 2024 2

In practice I expect that the most likely way people will use UIP for discovery is via a prototype based approach at least until we've got enough data in the field to train "foundation" models that are generally trusted for GA/RSS/SA methods. As such the choice of UIP vs Wrenformer is whether you just look at the prototype to make a prediction then relax with DFT or whether you perform a cheap UIP relaxation then make a prediction before a final DFT relaxation if predicted below hull. As Janosh alluded to this is about 100x slower for UIP vs Wrenformer but still fast with the time per material being measured in seconds not minutes. If UIPs are the best according to our benchmark (current situation) then even as the author of Wren, upon which Janosh's Wrenformer variant is based, I would suggest to use a UIP for this type of campaign. The point to translating the evaluation framework in a concrete benchmark is to answer that question for what people should use.

from matbench-discovery.

hongshuh commented on June 1, 2024 1

Thanks for your reply! It will be a great benchmark for materials discovery

from matbench-discovery.

janosh commented on June 1, 2024 1

The thing is that training data already exists openly and is growing. So people don't mind the original cost to generate it. They just care about the best model you can train right now.

from matbench-discovery.

janosh commented on June 1, 2024 1

@hongshuh Like Rhys said, just send us your formation energy preds. However, in case you're still interested, here's the specific line using the PatchedPhaseDiagram:

matbench-discovery/data/wbm/fetch_process_wbm_dataset.py

Line 559 in 297251c

e_above_hull = ppd_mp.get_e_above_hull(cse, allow_negative=True)

from matbench-discovery.

CompRhys commented on June 1, 2024

The intended protocol is to train on the corrected formation energy (or if you prefer you could train on the vasp energy - although if you do so you will need to carefully set the corrections and terminal species) our analysis code can then calculate what the predicted energy above the convex hull (EACH) would be given your prediction of the formation energy from which the metrics of interest (DAF/F1/Precision/Recall) can be determined. You're free to use any of the MP training data to train any model you might like that could be used to classify materials as being below the known convex hull, this is part of the design of the evaluation framework that we want researchers to be able to compare and contrast the performance of different problem formulations e.g. be able to compare a coordinate-free approach such as Wren/Wrenformer with UIP approaches such as CHGNet/M3gNet.

The distributional difference is intended as it matches prospective applications. By definition, the training set cannot contain any material below the known convex hull of the training set. The aim of a discovery workflow is to identify structures that lie below the convex hull of known materials. Hence, our test set needs to contain materials below the training set convex hull in order to actually test prospective discovery.

Matbench is a great effort but it doesn't test the same extrapolatory insights as our proposed evaluation framework. Mixing the WBM and MP data sets and following a Matbench-esque CV setup would not provide any more insight into utility for materials discovery than might be obtained from the Matbench formation energy task that already exists. The reason we proposed the Matbench Discovery evaluation framework is that we believe it provides unique insights for a real application that are missing from more conventional IID testing setups.

from matbench-discovery.

hongshuh commented on June 1, 2024

I see, that makes sense, is there any way to deal with the imbalance? Predicting every material as "unstable" can reach an accuracy of 83%, only CHGNet is slightly above.

from matbench-discovery.

hongshuh commented on June 1, 2024

Could you remind me which code is used to calculate the predicted energy above the convex hull from the prediction of formation energy?

from matbench-discovery.

janosh commented on June 1, 2024

@hongshuh For the most part, we don't need to compute actual hull distances. Have a look at https://matbench-discovery.materialsproject.org/si#formation-energy-mae--hull-distance-mae for an explanation.

from matbench-discovery.

CompRhys commented on June 1, 2024

There's no class imbalance if you frame the problem as a regression for the formation energies.

The DAF of your proposed baseline is 0. This is why we were careful to select metrics that actually test what we're interested in - machine learning models that accelerate discovery.

The hull energies are calculated using the PatchedPhaseDiagram class we contributed to pymatgen

from matbench-discovery.

janosh commented on June 1, 2024

@hongshuh Thanks for your interest and feedback. We certainly hope so. If you'd like to get involved, let us know. Since this is not a conventional paper, we can add whatever analysis we like whenever. All the model predictions are in this repo. If you'd like to dig in and suggest new metrics or visualizations that reveal useful insights into what ML can and can't do in the context of materials discovery, you're very welcome!

from matbench-discovery.

hongshuh commented on June 1, 2024

And one of the insterested point is mentioned by @CompRhys , I would like to know how to compared the structured-based or UIP models with coordinate-free. Even though CHGNet is the current SOTA model, it requires considerably more information compared to the coordinate-free model like Wreformer. I am not an expert in DFT, so I am wondering the cost of generating the relaxed structure data. If the cost is high, then the Wrenformer will be the prefered one even the DAF is lower, It would be beneficial to have a metrics to represent the efficency.

from matbench-discovery.

janosh commented on June 1, 2024

You mean efficiency to generate the training data? That'll be very hard to quantify, I'm afraid.

I think such analysis would be somewhat relevant but I expect most people would still select UIPs if the trade-off is 1000x faster than DFT and 4x the hit rate, or 100,000x faster and 2x the hit rate for a coordinate-free model like Wrenformer.

from matbench-discovery.

hongshuh commented on June 1, 2024

I mean for discovery the potential materials, We can easily generates millions of formula and space group (those are just strings) in minutes, how abouts those UIPs?

from matbench-discovery.

hongshuh commented on June 1, 2024

I am currently trying to calculate the energy above hull using my predictions of formation energies, and I am facing challenges in using the PatchedPhaseDiagram class, in there any instructions or example code? Is it possible to add this as a function?

from matbench-discovery.

Benchmark design questions about matbench-discovery HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent