Code Monkey home page Code Monkey logo

Comments (10)

emsunshine avatar emsunshine commented on July 30, 2024 1

I was able to get the fine-tuning tutorial working with the changes from these two PRs: Open-Catalyst-Project/tutorial#4 and #630. You can try these branches to see if they solve the problem.

from ocp.

emsunshine avatar emsunshine commented on July 30, 2024

I think you are correct that this was an oversight when converting to the new trainer/configs. The new location for dataset format makes more sense but is not backwards compatible. You should be able to get around this error by adding "format":"ase_db" to the dataset config.

from ocp.

gunnarpsu avatar gunnarpsu commented on July 30, 2024

That did the trick regarding that part - thank you!
However, I now receive the following error in the output after the model loads:

2024-02-26 09:22:56 (INFO): Loading dataset: ase_db
2024-02-26 09:22:56 (INFO): Batch balancing is disabled for single GPU training.
2024-02-26 09:22:56 (INFO): Batch balancing is disabled for single GPU training.
2024-02-26 09:22:56 (INFO): Batch balancing is disabled for single GPU training.
2024-02-26 09:22:56 (INFO): Loading model: gemnet_oc
C:\Users\gls5443\Desktop\ocp-main\ocpmodels\datasets\ase_datasets.py:108: UserWarning: Supplied sid is not numeric (or missing). Using dataset indices instead.
  warnings.warn(
2024-02-26 09:22:59 (INFO): Loaded GemNetOC with 38864438 parameters.
2024-02-26 09:22:59 (WARNING): Model gradient logging to tensorboard not yet supported.
2024-02-26 09:22:59 (WARNING): Using `weight_decay` from `optim` instead of `optim.optimizer_params`.Please update your config to use `optim.optimizer_params.weight_decay`.`optim.weight_decay` will soon be deprecated.
2024-02-26 09:23:00 (INFO): Loading checkpoint from: gnoc_oc22_oc20_all_s2ef.pt
C:\Users\gls5443\Desktop\ocp-main\ocpmodels\datasets\ase_datasets.py:108: UserWarning: Supplied sid is not numeric (or missing). Using dataset indices instead.
  warnings.warn(
C:\Users\gls5443\Desktop\ocp-main\ocpmodels\datasets\ase_datasets.py:108: UserWarning: Supplied sid is not numeric (or missing). Using dataset indices instead.
  warnings.warn(
Traceback (most recent call last):
  File "C:\Users\gls5443\Desktop\ocp-main\main.py", line 92, in <module>
    Runner()(config)
  File "C:\Users\gls5443\Desktop\ocp-main\main.py", line 36, in __call__
    self.task.run()
  File "C:\Users\gls5443\Desktop\ocp-main\ocpmodels\tasks\task.py", line 51, in run
    self.trainer.train(
  File "C:\Users\gls5443\Desktop\ocp-main\ocpmodels\trainers\ocp_trainer.py", line 158, in train
    loss = self._compute_loss(out, batch)
  File "C:\Users\gls5443\Desktop\ocp-main\ocpmodels\trainers\ocp_trainer.py", line 317, in _compute_loss
    target = batch[target_name]
  File "c:\Users\gls5443\AppData\Local\miniconda3\envs\ocp_new1\lib\site-packages\torch_geometric\data\batch.py", line 175, in __getitem__
    return super().__getitem__(idx)
  File "c:\Users\gls5443\AppData\Local\miniconda3\envs\ocp_new1\lib\site-packages\torch_geometric\data\data.py", line 498, in __getitem__
    return self._store[key]
  File "c:\Users\gls5443\AppData\Local\miniconda3\envs\ocp_new1\lib\site-packages\torch_geometric\data\storage.py", line 111, in __getitem__
    return self._mapping[key]
KeyError: 'energy'

Are there additional tags I need to supply to the config for it to parse the databases?

from ocp.

emsunshine avatar emsunshine commented on July 30, 2024

Thanks for flagging this. The new trainer has renamed the targets from y and force to energy and forces respectively. The ASE datasets were not updated to reflect this. Until the datasets are updated, you should be able to get around this by using the following in the dataset config:

key_mapping:
    y: energy
    force: forces

Referencing these lines from the new example config
https://github.com/Open-Catalyst-Project/ocp/blob/394e9bad7780a05d3371f52550c1f92c47a61ce3/configs/ocp_example.yml#L20

from ocp.

gunnarpsu avatar gunnarpsu commented on July 30, 2024

Unfortunately it still throws that error. Just for reference, here is the currently used config.yml:

amp: true
checkpoint: ./gnoc_oc22_oc20_all_s2ef.pt
dataset:
  test:
    a2g_args:
      r_energy: false
      r_forces: false
    format: ase_db
    key_mapping:
      force: forces
      y: energy
    src: test.db
  train:
    a2g_args:
      r_energy: true
      r_forces: true
    format: ase_db
    key_mapping:
      force: forces
      y: energy
    src: train.db
  val:
    a2g_args:
      r_energy: true
      r_forces: true
    format: ase_db
    key_mapping:
      force: forces
      y: energy
    src: val.db
eval_metrics:
  metrics:
    energy:
    - mae
    forces:
    - forcesx_mae
    - forcesy_mae
    - forcesz_mae
    - mae
    - cosine_similarity
    - magnitude_error
    misc:
    - energy_forces_within_threshold
  primary_metric: forces_mae
gpus: 1
loss_fns:
- energy:
    coefficient: 1
    fn: mae
- forces:
    coefficient: 1
    fn: l2mae
model:
  activation: silu
  atom_edge_interaction: true
  atom_interaction: true
  cbf:
    name: spherical_harmonics
  cutoff: 12.0
  cutoff_aeaint: 12.0
  cutoff_aint: 12.0
  cutoff_qint: 12.0
  direct_forces: true
  edge_atom_interaction: true
  emb_size_aint_in: 64
  emb_size_aint_out: 64
  emb_size_atom: 256
  emb_size_cbf: 16
  emb_size_edge: 512
  emb_size_quad_in: 32
  emb_size_quad_out: 32
  emb_size_rbf: 16
  emb_size_sbf: 32
  emb_size_trip_in: 64
  emb_size_trip_out: 64
  enforce_max_neighbors_strictly: false
  envelope:
    exponent: 5
    name: polynomial
  extensive: true
  forces_coupled: false
  max_neighbors: 30
  max_neighbors_aeaint: 20
  max_neighbors_aint: 1000
  max_neighbors_qint: 8
  name: gemnet_oc
  num_after_skip: 2
  num_atom: 3
  num_atom_emb_layers: 2
  num_before_skip: 2
  num_blocks: 4
  num_concat: 1
  num_global_out_layers: 2
  num_output_afteratom: 3
  num_radial: 128
  num_spherical: 7
  otf_graph: true
  output_init: HeOrthogonal
  qint_tags:
  - 1
  - 2
  quad_interaction: true
  rbf:
    name: gaussian
  regress_forces: true
  sbf:
    name: legendre_outer
noddp: false
optim:
  batch_size: 10
  clip_grad_norm: 10
  ema_decay: 0.999
  energy_coefficient: 1
  eval_batch_size: 10
  eval_every: 1
  factor: 0.8
  force_coefficient: 1
  load_balancing: atoms
  loss_energy: mae
  lr_initial: 0.0005
  max_epochs: 10
  mode: min
  num_workers: 2
  optimizer: AdamW
  optimizer_params:
    amsgrad: true
  patience: 3
  scheduler: ReduceLROnPlateau
  weight_decay: 0
outputs:
  energy:
    level: system
  forces:
    eval_on_free_atoms: true
    level: atom
    train_on_free_atoms: true
task:
  dataset: ase_db
trainer: forces

from ocp.

mshuaibii avatar mshuaibii commented on July 30, 2024

The 'key_mapping' functionality has not hit main yet. It currently lives in this PR - #622.

@lbluque is there an update on what's blocking this.

In the mean time you can checkout that branch if you would like to use it before we land it to main.

from ocp.

lbluque avatar lbluque commented on July 30, 2024

Nothing holding it up. This should be ready to merge, unless @mshuaibii or @emsunshine have any further suggestions

from ocp.

gunnarpsu avatar gunnarpsu commented on July 30, 2024

Hello,
I now have a two part problem, one of which I fixed but which may need to be committed to a future branch, and the other I am unable to solve.

First, I think that line 1018 of ocpmodels/common/utils.py may need to change from
loss_fns=config.get("loss_functions", {}),
to
loss_fns=config.get("loss_fns", {}),
as it is the only way for the configs to be read properly without throwing NotImplementedError.

While the new branch - #622 - which was recommended for using the ASE db's does enable the first inferencing step, it quickly resolves into the second error:

2024-02-26 14:25:48 (INFO): Loading dataset: ase_db
2024-02-26 14:25:49 (INFO): Batch balancing is disabled for single GPU training.
2024-02-26 14:25:49 (INFO): Batch balancing is disabled for single GPU training.
2024-02-26 14:25:49 (INFO): Batch balancing is disabled for single GPU training.
2024-02-26 14:25:49 (INFO): Loading model: gemnet_oc
2024-02-26 14:25:51 (INFO): Loaded GemNetOC with 38864438 parameters.
2024-02-26 14:25:51 (WARNING): Model gradient logging to tensorboard not yet supported.
2024-02-26 14:25:51 (WARNING): Using `weight_decay` from `optim` instead of `optim.optimizer_params`.Please update your config to use `optim.optimizer_params.weight_decay`.`optim.weight_decay` will soon be deprecated.
2024-02-26 14:25:52 (INFO): Loading checkpoint from: gnoc_oc22_oc20_all_s2ef.pt
2024-02-26 14:25:58 (INFO): Evaluating on val.

device 0:   0%|          | 0/3 [00:00<?, ?it/s]
device 0:  33%|███▎      | 1/3 [00:03<00:06,  3.38s/it]
device 0:  67%|██████▋   | 2/3 [00:03<00:01,  1.48s/it]
device 0: 100%|██████████| 3/3 [00:03<00:00,  1.15it/s]
device 0: 100%|██████████| 3/3 [00:04<00:00,  1.37s/it]
2024-02-26 14:26:02 (INFO): energy_mae: 2.9834, forcesx_mae: 0.0080, forcesy_mae: 0.0130, forcesz_mae: 0.0073, forces_mae: 0.0094, forces_cosine_similarity: 0.1755, forces_magnitude_error: 0.0144, energy_forces_within_threshold: 0.0000, loss: 3.0039, epoch: 0.0417
2024-02-26 14:26:02 (INFO): Predicting on test.

device 0:   0%|          | 0/3 [00:00<?, ?it/s]
device 0:  33%|███▎      | 1/3 [00:03<00:06,  3.41s/it]
device 0:  67%|██████▋   | 2/3 [00:03<00:01,  1.46s/it]
device 0: 100%|██████████| 3/3 [00:03<00:00,  1.18it/s]
device 0: 100%|██████████| 3/3 [00:04<00:00,  1.35s/it]
Traceback (most recent call last):
  File "C:\Users\gls5443\Desktop\ocp-ase_data_updates\main.py", line 92, in <module>
    Runner()(config)
  File "C:\Users\gls5443\Desktop\ocp-ase_data_updates\main.py", line 36, in __call__
    self.task.run()
  File "C:\Users\gls5443\Desktop\ocp-ase_data_updates\ocpmodels\tasks\task.py", line 51, in run
    self.trainer.train(
  File "C:\Users\gls5443\Desktop\ocp-ase_data_updates\ocpmodels\trainers\ocp_trainer.py", line 215, in train
    self.update_best(
  File "C:\Users\gls5443\Desktop\ocp-ase_data_updates\ocpmodels\trainers\base_trainer.py", line 706, in update_best
    self.predict(
  File "c:\Users\gls5443\AppData\Local\miniconda3\envs\ocp_new1\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\gls5443\Desktop\ocp-ase_data_updates\ocpmodels\trainers\ocp_trainer.py", line 528, in predict
    predictions[key] = np.array(predictions[key])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (29,) + inhomogeneous part.

from ocp.

github-actions avatar github-actions commented on July 30, 2024

This issue has been marked as stale because it has been open for 30 days with no activity.

from ocp.

lbluque avatar lbluque commented on July 30, 2024

This should be fixed in #622. closing.

from ocp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.