Comments (6)
Dear Savio,
This behavior is surprising.
In my opinion the 1st thing to do is to check the integrity of the checkpoint.
If you didn't can you confirm the result of the following command :
$ h5dump -a /patch-000000/species ../ClusterSim_2/checkpoints/dump-00000-0000000000.h5
According to your error, it should return :
... {
ATTRIBUTE "species" {
DATATYPE H5T_STD_U32LE
DATASPACE SCALAR
DATA {
(0): 0
}
}
}
If it's the case don't you have an error file from the third simulation ?
Regards.
Julien
from smilei.
Seems like it might be the checkpoint file which is corrupted. I've attached the logfile from the final run to show the error that it observes.
log5.txt
If ClusterSim2 does need to be rerun can you recommend how I avoid this error?
Thanks
Savio
from smilei.
looks like h5dump
command is not properly installed. So we still don't know if the file is corrupted and why it was corrupted.
Since the simulation did several checkpoints, a wise thing to try is keep on disk more than one checkpoint. You can achieve this with keep_n_dumps
: https://smileipic.github.io/Smilei/namelist.html#keep_n_dumps
set it to 2 an even if the latest checkpoint is corrupted you will still have the previous one.
from smilei.
Ah I forgot to load some of the mpi modules before. Here is the results from the h5dump file:
[svr11@cx2-login checkpoints]$ h5dump -a /patch-000000/species dump-00000-0000000000.h5
HDF5 "dump-00000-0000000000.h5" {
ATTRIBUTE "species" {
DATATYPE H5T_STD_U32LE
DATASPACE SCALAR
DATA {
(0): 3
}
}
}
[svr11@cx2-login checkpoints]$ h5stat dump-00000-00000000*.h5
Filename: dump-00000-0000000000.h5
File information
# of unique groups: 48501
# of unique datasets: 336547
# of unique named datatypes: 0
# of unique links: 0
# of unique other: 0
Max. # of links to object: 1
Max. # of objects in group: 12126
File space information for file metadata (in bytes):
Superblock: 96
Superblock extension: 0
User block: 0
Object headers: (total/unused)
Groups: 9013816/0
Datasets(exclude compact data): 91540784/43684480
Datatypes: 0/0
Groups:
B-tree/List: 51970648
Heap: 9471152
Attributes:
B-tree/List: 0
Heap: 0
Chunked datasets:
Index: 0
Datasets:
Heap: 0
Shared Messages:
Header: 0
B-tree/List: 0
Heap: 0
Free-space managers:
Header: 0
Amount of free space: 0
Small groups (with 0 to 9 links):
# of groups with 0 link(s): 11106
# of groups with 9 link(s): 25269
Total # of small groups: 36375
Group bins:
# of groups with 0 link: 11106
# of groups with 1 - 9 links: 25269
# of groups with 10 - 99 links: 12125
# of groups with 10000 - 99999 links: 1
Total # of groups: 48501
Dataset dimension information:
Max. rank of datasets: 1
Dataset ranks:
# of dataset with rank 1: 336547
1-D Dataset information:
Max. dimension size of 1-D datasets: 57353
Small 1-D datasets (with dimension sizes 0 to 9):
# of datasets with dimension sizes 4: 7
# of datasets with dimension sizes 7: 14
# of datasets with dimension sizes 9: 21
Total # of small datasets: 42
1-D Dataset dimension bins:
# of datasets with dimension size 1 - 9: 42
# of datasets with dimension size 10 - 99: 52716
# of datasets with dimension size 100 - 999: 146748
# of datasets with dimension size 1000 - 9999: 131273
# of datasets with dimension size 10000 - 99999: 5768
Total # of datasets: 336547
Dataset storage information:
Total raw data size: 2459043134
Total external raw data size: 0
Dataset layout information:
Dataset layout counts[COMPACT]: 0
Dataset layout counts[CONTIG]: 336547
Dataset layout counts[CHUNKED]: 0
Dataset layout counts[VIRTUAL]: 0
Number of external files : 0
Dataset filters information:
Number of datasets with:
NO filter: 336547
GZIP filter: 0
SHUFFLE filter: 0
FLETCHER32 filter: 0
SZIP filter: 0
NBIT filter: 0
SCALEOFFSET filter: 0
USER-DEFINED filter: 0
Dataset datatype information:
# of unique datatypes used by datasets: 3
Dataset datatype #0:
Count (total/named) = (260739/0)
Size (desc./elmt) = (22/8)
Dataset datatype #1:
Count (total/named) = (25269/0)
Size (desc./elmt) = (14/2)
Dataset datatype #2:
Count (total/named) = (50539/0)
Size (desc./elmt) = (14/4)
Total dataset datatype count: 336547
Small # of attributes (objects with 1 to 10 attributes):
# of objects with 1 attributes: 12125
# of objects with 2 attributes: 36375
Total # of objects with small # of attributes: 48500
Attribute bins:
# of objects with 1 - 9 attributes: 48500
# of objects with 10 - 99 attributes: 1
Total # of objects with attributes: 48501
Max. # of attributes to objects: 12
Free-space persist: FALSE
Free-space section threshold: 1 bytes
Small size free-space sections (< 10 bytes):
Total # of small size sections: 0
Free-space section bins:
Total # of sections: 0
File space management strategy: H5F_FSPACE_STRATEGY_FSM_AGGR
File space page size: 4096 bytes
Summary of file space information:
File metadata: 161996496 bytes
Raw data: 2459043134 bytes
Amount/Percent of tracked free space: 0 bytes/0.0%
Unaccounted space: 1747760 bytes
Total space: 2622787390 bytes
I'll try with two dump files. Maybe one will work.
from smilei.
One note about this issue. If you terminate your job too early after the time of the checkpoint, then the storage of data into checkpoint files may be interrupted, causing corrupt files. To avoid this, you should let the simulation at least 5 minutes to complete the checkpoint. In some cases, 5 minutes is not sufficient.
from smilei.
The issue was in fact that I had run out of space and the checkpoint file couldn't complete its save! I think it is running fine now.
Thanks!
from smilei.
Related Issues (20)
- Error message in 1Dcartesian HOT 2
- Problem installing Smilei for A100 GPUs HOT 7
- Crash with GPU computing HOT 5
- Error on MPI HOT 9
- Integer wrapper for MPI communication HOT 3
- Particle Binning: Axes limits [0, '"auto"] bug HOT 1
- Choosing output number HOT 1
- Possibility of adding time-dependent ionization rate HOT 2
- This requires setting certain spatial attributes of the materials such as dielectric constants and conductivity in the simulation space. HOT 3
- smilei_test passed, but actual run failed HOT 5
- Operations between quantities in Scalar Diagnostic HOT 1
- Tasks Parallelisation HOT 2
- Initial phase for LaserGaussian3D HOT 2
- Explanation/example for ParticleBinning units HOT 2
- EM_boundary_conditions set as "PML" in 3Dcartesian is ok? HOT 2
- The Screen diagnostic Data at instantaneous time step. HOT 1
- Clarification on the Screen diagnostic HOT 1
- Shortcut to profiles in the documentation HOT 7
- Segmentation faults HOT 16
- Segmentation fault HOT 16
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from smilei.