Comments (15)
Here is a minimally reproducible example with the structure (Crystal.cif) attached, highlighting that the argument check_occu
cannot turn off the occupancy check. In this example structure, Site1 -- Site3 and Site4--6 are problematic positions in my structure. Atoms C4 and C5 mimic the normal atom sites in my structure.
To reproduce the bug:
from pymatgen.io.cif import CifParser
parser = CifParser("Crystal.cif", site_tolerance = 1e-4, occupancy_tolerance = 1.0)
b = parser.parse_structures(check_occu = False)
Error that I encountered:
Some occupancies ([2.0, 2.0, 2.0, 1.0, 1.0]) sum to > 1! If they are within the occupancy_tolerance, they will be rescaled. The current occupancy_tolerance is set to: 1.0
No structure parsed for section 1 in CIF.
...
in CifParser.parse_structures(self, primitive, symmetrized, check_occu, on_error)
1219 warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
1221 if len(structures) == 0:
-> 1222 raise ValueError("Invalid CIF file with no structures!")
1223 return structures
ValueError: Invalid CIF file with no structures!
Expected behavior: occupancy would not be checked and a structure with all atoms should be given. I think code review could be more helpful to see the issue though.
Crystal.txt
from pymatgen.
Hi thanks for following up. I have updated the behavior and docstring in #3819.
Meanwhile I have a question though, as per the docstring, "unphysical occupancy" refers to occupancy != 1
, but the implementation just handles cases where occupancy > 1
, is there any chance for the occupancy to be smaller than zero and still need to be included (I'm not aware of such cases)?
I would look into the site label soon, thanks for reporting.
from pymatgen.
I'm not a maintainer, but if you could provide the cif file and code snippet to reproduce this issue, that would be very helpful :)
from pymatgen.
Hi @KunhuanLiu, I just had a look at this issue, and the inconsistency between comment and implementation you mentioned is indeed an issue:
Lines 985 to 989 in 2e1c301
But in our case, that's not the cause of the failure.
Cause of code failure
The issue is the bad default value of occupancy_tolerance
for CifParser
(and lack of proper error message):
Line 290 in 2e1c301
And the condition for scaling occupancy:
Lines 1085 to 1086 in 2e1c301
Which means the default occupancy_tolerance
value would prevent invalid occupancy (in our case 2.0
) being scaled, the thus the code breaks as Structure
does not allow such invalid occupancy:
Line 1105 in 2e1c301
I would talk to @janosh on changing the bad default value of occupancy_tolerance
soon, and I appreciate your report.
To fix your code
To fix your code, you need to set the occupancy_tolerance
to a value greater than the max occupancy (2.0
in your case), for example:
from pymatgen.io.cif import CifParser
parser = CifParser("Crystal_min.cif", occupancy_tolerance=2.0)
structures = parser.parse_structures(check_occu=False)
Hope it helps :)
from pymatgen.
Thanks @DanielYang59 , for investigating the case. Totally agree with your thoughts; I came up with a similar manual fix (I offered a test case where manually setting the occupancy tolerance
to 2.0 can address the issue. In my structure, I had to use occupancy tolerance = 1000
.) Of course, it will be difficult to manually check every structure (and set tolerance) for high throughput screening...
In addition to attending to the behavior of occupancy tolerance,
I think the argument check_occu
according to the API reference would be very useful if it can work properly: "site occupancy will not be checked, allowing unphysical occupancy != 1..."
Again thank you for the attention and time!
from pymatgen.
In addition to attending to the behavior of
occupancy tolerance,
I think the argumentcheck_occu
according to the API reference would be very useful if it can work properly: "site occupancy will not be checked, allowing unphysical occupancy != 1..."
Maybe the behavior should be changed such that if not check_occu
, any occupancy would be allowed with a warning message. What do you think? Feel free to comment on #3819.
Again thank you for the attention and time!
No worries at all.
from pymatgen.
Sorry about the delayed response -- I noticed another thing about the current check_occu
behavior. I am sorry I cannot offer more help reviewing the algorithm right now.
Maybe the behavior should be changed such that if not check_occu, any occupancy would be allowed with a warning message. What do you think? Feel free to comment on #3819.
I think that's what I thought the argument should do based on the API documentation description. When not check_occu
, all atom sites, regardless of occup_tolerance
, will be returned.
I want to note the following behavior when I use check_occu = False
though:
b = parser.parse_structures(check_occu = False)[0]
# from pymatgen.analysis.structure_analyzer import get_neighbors
for site in b:
neighbors = b.get_neighbors(site, 0.1) # 0.5 Å as an example threshold for overlap
if neighbors:
print(f"Overlap suspected near {site.label} at {site.frac_coords}:\n{neighbors}")
Returned atoms do not have labels:
Overlap suspected near C at [0.1191 0.5596 0.5713]:
[PeriodicNeighbor: C (-5.305, 16.0, 13.02) [0.1191, 0.5595, 0.5713]]
Overlap suspected near C at [0.4404 0.5595 0.5713]:
[PeriodicNeighbor: C (5.306, 16.0, 13.02) [0.4405, 0.5596, 0.5713]]
Overlap suspected near C at [0.4405 0.8809 0.5713]:
[PeriodicNeighbor: C (-0.001651, 25.19, 13.02) [0.4404, 0.8809, 0.5713]]
Overlap suspected near C at [0.4404 0.8809 0.5713]:
[PeriodicNeighbor: C (0.001651, 25.19, 13.02) [0.4405, 0.8809, 0.5713]]
Overlap suspected near C at [0.4405 0.5596 0.5713]:
[PeriodicNeighbor: C (5.305, 16.0, 13.02) [0.4404, 0.5595, 0.5713]]
Overlap suspected near C at [0.1191 0.5595 0.5713]:
[PeriodicNeighbor: C (-5.306, 16.0, 13.02) [0.1191, 0.5596, 0.5713]]
from pymatgen.
The label issue is fixed by 05c11d4. Thanks.
from pymatgen.
Hi @KunhuanLiu : the CIF file you provided does not appear to be valid, as the three sites at the end, Site6
, Site7
, and Site8
, are duplicates of Site3
, Site2
, and Site1
, respectively. Removing these latter three sites successfully produces a structure with pymatgen
Pymatgen does not currently include functionality to fix bad CIFs. This isn't an issue with the occupancy tolerance function
from pymatgen.
Hello @esoteric-ephemera , this ticket inquiries whether the implementation of the check_occu
argument is consistent with the algorithm described in both the source code comment and the API documentation.
Removing these latter three sites successfully produces a structure with pymatgen
Example structure serves as a minimal example to highlight the impact of the issue.
Pymatgen does not currently include functionality to fix bad CIFs.
Thank you for sharing. Can you clarify what you mean? I am not requesting such functionality here. Can you comment on how or why check_occu = False
should not return a structure when the perceived occupancy exceeds the user-set occu_tolerance
? This can be important as @DanielYang59 has implemented several changes in #3819 and your review can be helpful. Just to make sure I am not proposing something based on misunderstanding of the intended behavior.
I also don't think that "bad cifs" (a term that should be defined more clearly) are the sole reasons for such errors. A structure with large unit cell and dense atoms can have atoms such as H that are "close" to each other under the 1e-4 default site tolerance (I cannot publish the structure here). If check_occu = False
is not intended to keep the overlapping or nonphysical atoms, then the documentation, instead of the code, needs revision. Of course, it's then the user's responsibility to catch error or adjust tolerance parameters if they decide to continue their analysis on those structures.
This isn't an issue with the occupancy tolerance function
I never thought it was in this case. Setting a higher occupancy tolerance is an unorthodox trick to let pymatgen return my structure. The question concerns the proper behavior with the check_occu
argument, and I will appreciate your comments on this
from pymatgen.
@KunhuanLiu: the example is not a reasonable CIF because it contains multiply-occupied sites. The site occupancy represents fractionally what elements occupy a given site, e.g., a disordered structure will have at least two sites with occupancies < 1, and an ordered structure will have all site occupancies == 1. A site occupancy > 1 is not physical
A "bad CIF" in this context means a CIF that is either out of spec per the IUCr or is unphysical. The ICUr specifically states that the occupancy represents:
The fraction of the atom type present at this site.
The sum of the occupancies of all the atom types at this site
may not significantly exceed 1.0 unless it is a dummy site.
Right now, pymatgen does not support this kind of "dummy site" because it's not compatible with other functionality in pymatgen. pymatgen does support disordered structures, where at least two elements occupy the same site with fractional (< 1) occupancy.
check_occ
is not intended to ensure non-overlapping sites. It is intended to ensure that the fractional composition on a given site does not exceed 1
from pymatgen.
Hi @esoteric-ephemera thanks a lot for your comment.
Combine of sites
the CIF file you provided does not appear to be valid, as the three sites at the end, Site6, Site7, and Site8, are duplicates of Site3, Site2, and Site1, respectively.
Pymatgen does not currently include functionality to fix bad CIFs.
It appears CifParser
is able to handle such cifs:
Lines 306 to 329 in bb68c78
And:
Lines 328 to 329 in bb68c78
The cif @KunhuanLiu provided:
Site1 C 0.11910 0.55960 0.57130 1.0
Site2 C 0.44040 0.55950 0.57130 1.000 <<
Site3 C 0.44050 0.88090 0.57130 1.000
C4 C 0.07117 0.21573 0.93523 1.000
C5 C 0.14457 0.21573 0.93523 1.000
Site6 C 0.44040 0.88090 0.57130 1.000
Site7 C 0.44050 0.55960 0.57130 1.000 << would be combined into Site2
Site8 C 0.11910 0.55950 0.57130 1.000
Site2
and Site7
(and two other pairs similarly) are not duplicate, but within the site_tolerance
, and were therefore combined into a single site (with occupancy summed).
Handle of "unphysical" (occupancy > 1) sites
Right now, pymatgen does not support this kind of "dummy site" because it's not compatible with other functionality in pymatgen.
In parse_structures
, there is:
Lines 1263 to 1266 in bb68c78
Which allows user to parse cif with occupancy > 1, once check_occu
is turned off.
from pymatgen.
@esoteric-ephemera Thanks so much for clarifying, and I apologize for my oversight on the fact that in a realistic crystal structure, no atom sites would have summed occupancy > 1. I understand and support the design that raises the error ValueError: Invalid CIF file with no structures!
when there are sites with occupancy > occupancy_tolerance and not rescaled to 1.
I am still unsure what the algorithm should do when check_occu
is set to FALSE
, but according to @esoteric-ephemera it should still not return a bad CIF with site occupancy > 1 that is caused by merging atom sites.
I'm very thankful to the attention and help from @DanielYang59, and I want to make sure we are not changing the code implementation unnecessarily. It looks like I had some misunderstanding caused by warning messages and documentation.
To make sure we are aligned on the intended behavior, here's my (corrected) understanding:
site_tolerance
: decides whether sites are merged and the resulted site has occupancy as sum of all the site occupancyoccupancy_tolerance
: used to scale down sites with occupancy <= occupancy_tolerancecheck_occu
: setting toTrue
(default value), check if any site has occupancy > 1; if yes, raiseValueError
. Setting toFalse
, ignores_atom_site_occupancy
. However, it is intended to still raiseValueError
when any site has occupancy >1. (Needs confirmation)- Observed in my testing:
check_occu = False
removes atom site labels.
@DanielYang59 thanks for your attention. Based on our discussion I think the original version has a very reasonable design and raises ValueError as intended. It sounds reasonable to me to refer the structures with occupancy > 1 as bad cifs
. Changes around tolerance
are not necessary. I think there can be some improvement on the documentation.
Lines 1011 to 1018 in 2e1c301
Here,
({sum_occu}) sum to > 1!
is confusing and can lead to a very very long warning message when there are lots of atoms. This happened in my case and made it difficult to correctly understand the warning message.I suggest the following:
unphysical_occu = [occu for occu in sum_occu if occu > 1]
... f"Some occupancies sum to > 1! If they are within "
"the occupancy_tolerance, they will be rescaled. "
f"The current occupancy_tolerance is set to: {self._occupancy_tolerance}. Sites with occupancy sum > 1: {unphysical_occu}"
@esoteric-ephemera Can please you confirm that the behavior described above on check_occu
is correct? I also wonder if this is a good place to close the ticket, or to open new issues with sharpened focus. Thanks for all the attention!
from pymatgen.
Related Issues (20)
- Overlayed subplots from `BSPlotterProjected.get_projected_plots_dots()` and `get_projected_plots_dots_patom_pmorb()` HOT 1
- `atom_site_label` in CIF file are not unique HOT 10
- CrystalNN gives incorrect result for simple aromatic ring HOT 13
- ChemEnv unable to identify the environments of supercell HOT 7
- Atom labels in CIF file are silently rewritten by CifWriter HOT 6
- Collect possible issues might come with Python 3.12
- Incompatibility Issue with ALGO Parameter in Incar and Vasprun Classes
- Periodic boundary condition is not considered in the interpolator of VolumetricData HOT 1
- [Dev] `datetime.datetime.utcnow()` deprecated and replacement breaks `strptime`
- Read in POSCAR with `Structure.from_file` seems slow? HOT 3
- `get_points_in_sphere()` has inconsistent return types HOT 1
- Monthly issue metrics report
- StructureMatcher might be wrong when used to check if two structures are equal. HOT 3
- Error in CP2K output parser structure parsing
- Bug in `core.composition` comparison
- Check inputs provided to MPRester functions
- `.as_dict()` method on `VaspInput` causes an `AttributeError`
- A minor document error in `POTCAR Setup`.
- Incorrect letter cases for memory in `core.units`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pymatgen.