Comments (10)
I was able to get the fine-tuning tutorial working with the changes from these two PRs: Open-Catalyst-Project/tutorial#4 and #630. You can try these branches to see if they solve the problem.
from ocp.
I think you are correct that this was an oversight when converting to the new trainer/configs. The new location for dataset format makes more sense but is not backwards compatible. You should be able to get around this error by adding "format":"ase_db"
to the dataset
config.
from ocp.
That did the trick regarding that part - thank you!
However, I now receive the following error in the output after the model loads:
2024-02-26 09:22:56 (INFO): Loading dataset: ase_db
2024-02-26 09:22:56 (INFO): Batch balancing is disabled for single GPU training.
2024-02-26 09:22:56 (INFO): Batch balancing is disabled for single GPU training.
2024-02-26 09:22:56 (INFO): Batch balancing is disabled for single GPU training.
2024-02-26 09:22:56 (INFO): Loading model: gemnet_oc
C:\Users\gls5443\Desktop\ocp-main\ocpmodels\datasets\ase_datasets.py:108: UserWarning: Supplied sid is not numeric (or missing). Using dataset indices instead.
warnings.warn(
2024-02-26 09:22:59 (INFO): Loaded GemNetOC with 38864438 parameters.
2024-02-26 09:22:59 (WARNING): Model gradient logging to tensorboard not yet supported.
2024-02-26 09:22:59 (WARNING): Using `weight_decay` from `optim` instead of `optim.optimizer_params`.Please update your config to use `optim.optimizer_params.weight_decay`.`optim.weight_decay` will soon be deprecated.
2024-02-26 09:23:00 (INFO): Loading checkpoint from: gnoc_oc22_oc20_all_s2ef.pt
C:\Users\gls5443\Desktop\ocp-main\ocpmodels\datasets\ase_datasets.py:108: UserWarning: Supplied sid is not numeric (or missing). Using dataset indices instead.
warnings.warn(
C:\Users\gls5443\Desktop\ocp-main\ocpmodels\datasets\ase_datasets.py:108: UserWarning: Supplied sid is not numeric (or missing). Using dataset indices instead.
warnings.warn(
Traceback (most recent call last):
File "C:\Users\gls5443\Desktop\ocp-main\main.py", line 92, in <module>
Runner()(config)
File "C:\Users\gls5443\Desktop\ocp-main\main.py", line 36, in __call__
self.task.run()
File "C:\Users\gls5443\Desktop\ocp-main\ocpmodels\tasks\task.py", line 51, in run
self.trainer.train(
File "C:\Users\gls5443\Desktop\ocp-main\ocpmodels\trainers\ocp_trainer.py", line 158, in train
loss = self._compute_loss(out, batch)
File "C:\Users\gls5443\Desktop\ocp-main\ocpmodels\trainers\ocp_trainer.py", line 317, in _compute_loss
target = batch[target_name]
File "c:\Users\gls5443\AppData\Local\miniconda3\envs\ocp_new1\lib\site-packages\torch_geometric\data\batch.py", line 175, in __getitem__
return super().__getitem__(idx)
File "c:\Users\gls5443\AppData\Local\miniconda3\envs\ocp_new1\lib\site-packages\torch_geometric\data\data.py", line 498, in __getitem__
return self._store[key]
File "c:\Users\gls5443\AppData\Local\miniconda3\envs\ocp_new1\lib\site-packages\torch_geometric\data\storage.py", line 111, in __getitem__
return self._mapping[key]
KeyError: 'energy'
Are there additional tags I need to supply to the config for it to parse the databases?
from ocp.
Thanks for flagging this. The new trainer has renamed the targets from y
and force
to energy
and forces
respectively. The ASE datasets were not updated to reflect this. Until the datasets are updated, you should be able to get around this by using the following in the dataset config:
key_mapping:
y: energy
force: forces
Referencing these lines from the new example config
https://github.com/Open-Catalyst-Project/ocp/blob/394e9bad7780a05d3371f52550c1f92c47a61ce3/configs/ocp_example.yml#L20
from ocp.
Unfortunately it still throws that error. Just for reference, here is the currently used config.yml:
amp: true
checkpoint: ./gnoc_oc22_oc20_all_s2ef.pt
dataset:
test:
a2g_args:
r_energy: false
r_forces: false
format: ase_db
key_mapping:
force: forces
y: energy
src: test.db
train:
a2g_args:
r_energy: true
r_forces: true
format: ase_db
key_mapping:
force: forces
y: energy
src: train.db
val:
a2g_args:
r_energy: true
r_forces: true
format: ase_db
key_mapping:
force: forces
y: energy
src: val.db
eval_metrics:
metrics:
energy:
- mae
forces:
- forcesx_mae
- forcesy_mae
- forcesz_mae
- mae
- cosine_similarity
- magnitude_error
misc:
- energy_forces_within_threshold
primary_metric: forces_mae
gpus: 1
loss_fns:
- energy:
coefficient: 1
fn: mae
- forces:
coefficient: 1
fn: l2mae
model:
activation: silu
atom_edge_interaction: true
atom_interaction: true
cbf:
name: spherical_harmonics
cutoff: 12.0
cutoff_aeaint: 12.0
cutoff_aint: 12.0
cutoff_qint: 12.0
direct_forces: true
edge_atom_interaction: true
emb_size_aint_in: 64
emb_size_aint_out: 64
emb_size_atom: 256
emb_size_cbf: 16
emb_size_edge: 512
emb_size_quad_in: 32
emb_size_quad_out: 32
emb_size_rbf: 16
emb_size_sbf: 32
emb_size_trip_in: 64
emb_size_trip_out: 64
enforce_max_neighbors_strictly: false
envelope:
exponent: 5
name: polynomial
extensive: true
forces_coupled: false
max_neighbors: 30
max_neighbors_aeaint: 20
max_neighbors_aint: 1000
max_neighbors_qint: 8
name: gemnet_oc
num_after_skip: 2
num_atom: 3
num_atom_emb_layers: 2
num_before_skip: 2
num_blocks: 4
num_concat: 1
num_global_out_layers: 2
num_output_afteratom: 3
num_radial: 128
num_spherical: 7
otf_graph: true
output_init: HeOrthogonal
qint_tags:
- 1
- 2
quad_interaction: true
rbf:
name: gaussian
regress_forces: true
sbf:
name: legendre_outer
noddp: false
optim:
batch_size: 10
clip_grad_norm: 10
ema_decay: 0.999
energy_coefficient: 1
eval_batch_size: 10
eval_every: 1
factor: 0.8
force_coefficient: 1
load_balancing: atoms
loss_energy: mae
lr_initial: 0.0005
max_epochs: 10
mode: min
num_workers: 2
optimizer: AdamW
optimizer_params:
amsgrad: true
patience: 3
scheduler: ReduceLROnPlateau
weight_decay: 0
outputs:
energy:
level: system
forces:
eval_on_free_atoms: true
level: atom
train_on_free_atoms: true
task:
dataset: ase_db
trainer: forces
from ocp.
The 'key_mapping' functionality has not hit main yet. It currently lives in this PR - #622.
@lbluque is there an update on what's blocking this.
In the mean time you can checkout that branch if you would like to use it before we land it to main.
from ocp.
Nothing holding it up. This should be ready to merge, unless @mshuaibii or @emsunshine have any further suggestions
from ocp.
Hello,
I now have a two part problem, one of which I fixed but which may need to be committed to a future branch, and the other I am unable to solve.
First, I think that line 1018 of ocpmodels/common/utils.py
may need to change from
loss_fns=config.get("loss_functions", {}),
to
loss_fns=config.get("loss_fns", {}),
as it is the only way for the configs to be read properly without throwing NotImplementedError
.
While the new branch - #622 - which was recommended for using the ASE db's does enable the first inferencing step, it quickly resolves into the second error:
2024-02-26 14:25:48 (INFO): Loading dataset: ase_db
2024-02-26 14:25:49 (INFO): Batch balancing is disabled for single GPU training.
2024-02-26 14:25:49 (INFO): Batch balancing is disabled for single GPU training.
2024-02-26 14:25:49 (INFO): Batch balancing is disabled for single GPU training.
2024-02-26 14:25:49 (INFO): Loading model: gemnet_oc
2024-02-26 14:25:51 (INFO): Loaded GemNetOC with 38864438 parameters.
2024-02-26 14:25:51 (WARNING): Model gradient logging to tensorboard not yet supported.
2024-02-26 14:25:51 (WARNING): Using `weight_decay` from `optim` instead of `optim.optimizer_params`.Please update your config to use `optim.optimizer_params.weight_decay`.`optim.weight_decay` will soon be deprecated.
2024-02-26 14:25:52 (INFO): Loading checkpoint from: gnoc_oc22_oc20_all_s2ef.pt
2024-02-26 14:25:58 (INFO): Evaluating on val.
device 0: 0%| | 0/3 [00:00<?, ?it/s]
device 0: 33%|███▎ | 1/3 [00:03<00:06, 3.38s/it]
device 0: 67%|██████▋ | 2/3 [00:03<00:01, 1.48s/it]
device 0: 100%|██████████| 3/3 [00:03<00:00, 1.15it/s]
device 0: 100%|██████████| 3/3 [00:04<00:00, 1.37s/it]
2024-02-26 14:26:02 (INFO): energy_mae: 2.9834, forcesx_mae: 0.0080, forcesy_mae: 0.0130, forcesz_mae: 0.0073, forces_mae: 0.0094, forces_cosine_similarity: 0.1755, forces_magnitude_error: 0.0144, energy_forces_within_threshold: 0.0000, loss: 3.0039, epoch: 0.0417
2024-02-26 14:26:02 (INFO): Predicting on test.
device 0: 0%| | 0/3 [00:00<?, ?it/s]
device 0: 33%|███▎ | 1/3 [00:03<00:06, 3.41s/it]
device 0: 67%|██████▋ | 2/3 [00:03<00:01, 1.46s/it]
device 0: 100%|██████████| 3/3 [00:03<00:00, 1.18it/s]
device 0: 100%|██████████| 3/3 [00:04<00:00, 1.35s/it]
Traceback (most recent call last):
File "C:\Users\gls5443\Desktop\ocp-ase_data_updates\main.py", line 92, in <module>
Runner()(config)
File "C:\Users\gls5443\Desktop\ocp-ase_data_updates\main.py", line 36, in __call__
self.task.run()
File "C:\Users\gls5443\Desktop\ocp-ase_data_updates\ocpmodels\tasks\task.py", line 51, in run
self.trainer.train(
File "C:\Users\gls5443\Desktop\ocp-ase_data_updates\ocpmodels\trainers\ocp_trainer.py", line 215, in train
self.update_best(
File "C:\Users\gls5443\Desktop\ocp-ase_data_updates\ocpmodels\trainers\base_trainer.py", line 706, in update_best
self.predict(
File "c:\Users\gls5443\AppData\Local\miniconda3\envs\ocp_new1\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Users\gls5443\Desktop\ocp-ase_data_updates\ocpmodels\trainers\ocp_trainer.py", line 528, in predict
predictions[key] = np.array(predictions[key])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (29,) + inhomogeneous part.
from ocp.
This issue has been marked as stale because it has been open for 30 days with no activity.
from ocp.
This should be fixed in #622. closing.
from ocp.
Related Issues (20)
- Skipping scheduler for EquiformerV2 with OC22 HOT 6
- Nan loss in Equiformer_v2 HOT 4
- Adsorption energy for OC20 mapping HOT 2
- Problem When Submitting Result Files HOT 4
- TypeError: a bytes-like object is required, not 'NoneType'
- scale file paths need to be updated for monorepo organization HOT 1
- Having troubles installing fairchem.core. when running"pip install packages/farichem-core" I get an error. HOT 8
- assert isinstance( ^^^^^^^^^^^ AssertionError when run ocptrainer.predict(ocptrainer.test_loader)
- Does ODAC use the same VASP settings as OC20? HOT 2
- How to get the totale system energy of a "fairchem.data.oc.core.Slab" or "fairchem.data.oc.core.Adsorbate" HOT 2
- The use of pre-trained JMP models HOT 1
- Rename OCPCalculator to FAIRChemCalculator or similar HOT 1
- `OCPCalculator` not compatible with ocp 2.0 format configs HOT 4
- Errors with ODAC23 EquiformerV2 models HOT 4
- Numpy 2.0 breaks installation HOT 1
- How to add a new model to fair-chem HOT 2
- Memory leak for s2ef tasks with otf_graph=True? HOT 5
- Fine-tuning tutorial would be a great place to show-case the new linear reference configs
- Run-relaxations from LMDB / ASE databases HOT 3
- NEB notebook broken
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ocp.