project-gemmi / gemmi Goto Github PK

View Code? Open in Web Editor NEW

195.0 195.0 38.0 8.56 MB

macromolecular crystallography library and utilities

Home Page: https://project-gemmi.github.io/

License: Mozilla Public License 2.0

C++ 90.33% Python 7.75% Shell 0.63% CMake 1.04% Fortran 0.05% JavaScript 0.10% Makefile 0.10%

ccp4 cif crystallography mmcif molecular-structures mtz pdb-files protein-structure structural-biology

gemmi's People

Contributors

Stargazers

Watchers

gemmi's Issues

DBREF and SEQADV to go with SEQRES records

It is great that the SEQRES records were added. Is there any change you could add the DBREF and SEQADV records as well, since these relate strongly to the SEQRES record?
_struct_ref_seq
_struct_ref_seq_dif
In particular, I'd like the conversion in the to_pdb.hpp file

Memory error on superimposition

Hi @wojdyr , I tried the superposition algorithm from python, but it seems to crash often.

My configuration

MacOS Big Sur / Red Hat 7.4
gemmi: 0.4.4 from conda-forge
Python 3.7.8

Data

minimal code to reproduce

 import gemmi

 x = gemmi.read_structure('2vta_updated.cif')[0]['A'].get_polymer()
 y = gemmi.read_structure('6q49_updated.cif')[0]['A'].get_polymer()

 r = gemmi.calculate_superposition(x, y, gemmi.PolymerType.PeptideL, gemmi.SupSelect.CaP) # this line fails

What I could get out if that is:

on Mac: [1] 25560 segmentation fault
on LInux: I got MemoryError: std::bad_alloc

there is additional line of code that crashes (again segmentation fault) I could find in the documentation and that is: x.check_polymer_type(), that is why I use gemmi.PolymerType.PeptideL in the code.

Interestingly, when I use archive files (https://www.ebi.ac.uk/pdbe/entry-files/download/2vta.cif and https://www.ebi.ac.uk/pdbe/entry-files/download/6q49.cif) I dont get the memmory error, but nothing is computed, only thing I can see is message More than 50 iterations needed! So my natural question is how can I increase this limit and is there anything I can do to provide more info so that you guys can track down the root of the problem?

Thank you for looking into this!

[Feature Request] Python Acess to UnmergedHklMover

Right now there is no obvious way to map an unmerged miller index to the reciprocal space asymmetric unit from Python. At the C++ level, the closest thing seems to be the UnmergedHklMover in mtz.hpp. It would be really great if we could have access to this object through Pybind11. @JBGreisman and I have reimplemented this functionality with numpy, but we would like to be able to compare it to Gemmi's implementation for testing purposes.

More generally, might it make sense to refactor UnmergedHklMover into symm.hpp and make its constructor take a SpaceGroup instead of an Mtz. This would require a bit more work, but I think it makes sense to have a less specialized class for this task.

On an unrelated note, thanks for your work on this project. It has drastically reduced the overhead involved in processing crystallographic data in my research.

Setting spacegroup of gemmi FloatGrid

gemmi.FloatGrid seems to have a method for setting its corresponding unit cell (FloatGrid.set_unit_cell()), but there is not a corresponding method for setting the grid's spacegroup. This seems useful for being able to get the correct ASU mask from the FloatGrid.asu() method.

This is the current behavior:

grid = gemmi.FloatGrid(100, 100, 100)
grid.spacegroup = gemmi.SpaceGroup(19)
print(grid.spacegroup)

returns:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-142-5698f3495619> in <module>
----> 1 print(grid.spacegroup)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 20: invalid start byte

Is there a way to set the spacegroup for a FloatGrid object that was not instantiated directly from a call to gemmi.transform_f_phi_grid_to_map()?

Thanks,
Jack

Keep Refmac metadata in mmcif in Structure

Opening refmac mmcif by gemmi.read_structure("refmac.mmcif") and saving it by make_mmcif_document().write_file("foo.mmcif") results in loss of some records. Can gemmi keep them? If it is too much trouble I can work on it. Currently, following records are lost.

_atom_site.auth_atom_id
_atom_site.auth_comp_id
_atom_site.calc_flag
_atom_site.group_pdb
_atom_site.pdbx_pdb_atom_name
_atom_site.pdbx_pdb_residue_name
_atom_site.pdbx_pdb_residue_no
_atom_type.n_electrons
_atom_type.scat_cromer_mann_a1
_atom_type.scat_cromer_mann_a2
_atom_type.scat_cromer_mann_a3
_atom_type.scat_cromer_mann_a4
_atom_type.scat_cromer_mann_b1
_atom_type.scat_cromer_mann_b2
_atom_type.scat_cromer_mann_b3
_atom_type.scat_cromer_mann_b4
_atom_type.scat_cromer_mann_c
_atom_type.scat_z
_ccp4_form_factor.scat_method
_ccp4_refine.details
_ccp4_refine_ls.jelly_body
_ccp4_refine_ls.jelly_body_dist
_ccp4_refine_ls.jelly_body_sigma
_entity_poly_seq.hetero
_refine.aniso_b[1][1]
_refine.aniso_b[1][2]
_refine.aniso_b[1][3]
_refine.aniso_b[2][2]
_refine.aniso_b[2][3]
_refine.aniso_b[3][3]
_refine.b_iso_mean
_refine.correlation_coeff_fo_to_fc
_refine.correlation_coeff_fo_to_fc_free
_refine.details
_refine.ls_number_reflns_r_free
_refine.ls_number_reflns_r_work
_refine.ls_percent_reflns_r_free
_refine.ls_r_factor_all
_refine.ls_r_factor_r_free
_refine.ls_r_factor_r_work
_refine.ls_wr_factor_r_free
_refine.ls_wr_factor_r_work
_refine.overall_su_b
_refine.overall_su_ml
_refine.pdbx_average_fsc_free
_refine.pdbx_average_fsc_overall
_refine.pdbx_average_fsc_work
_refine.pdbx_overall_esu_r
_refine.pdbx_overall_esu_r_free
_refine.pdbx_solvent_ion_probe_radii
_refine.pdbx_solvent_shrinkage_radii
_refine.pdbx_solvent_vdw_probe_radii
_refine.solvent_model_details
_refine_ls_restr.dev_ideal
_refine_ls_restr.dev_ideal_target
_refine_ls_restr.number
_refine_ls_restr.pdbx_refine_id
_refine_ls_restr.type
_refine_ls_shell.d_res_high
_refine_ls_shell.d_res_low
_refine_ls_shell.number_reflns_all
_refine_ls_shell.number_reflns_r_free
_refine_ls_shell.number_reflns_r_work
_refine_ls_shell.pdbx_fsc_free
_refine_ls_shell.pdbx_fsc_work
_refine_ls_shell.pdbx_refine_id
_refine_ls_shell.pdbx_total_number_of_bins_used
_refine_ls_shell.percent_reflns_obs
_refine_ls_shell.r_factor_all
_refine_ls_shell.r_factor_r_free
_refine_ls_shell.r_factor_r_work
_refine_ls_shell.wr_factor_r_work
_software.contact_author
_software.contact_author_email
_software.description

set_mmcif_category overload for table

Thank you very much for the time taken in developing the parser and especially the python bindings.

I have a feature request for set_mmcif_category to take a table. Looping and modifying a list of dicts is easier to work with sometimes. The current functionality is also very useful (dict of lists).

Access beyond string size in small.hpp

I am seeing a memory issue when using gemmi::make_small_structure_from_block. When running under valgrind, I get:

$ cat a.cpp
#include <gemmi/cif.hpp>
#include <gemmi/smcif.hpp>

int main (void) {
    auto block = gemmi::cif::read_file("1544173.cif").sole_block();
    gemmi::SmallStructure SiC = gemmi::make_small_structure_from_block(block);
}
$ g++ a.cpp -I $PWD/gemmi-0.4.1/include 
$ valgrind ./a.out                     
==2764260== Memcheck, a memory error detector
==2764260== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==2764260== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==2764260== Command: ./a.out
==2764260== 
==2764260== Conditional jump or move depends on uninitialised value(s)
==2764260==    at 0x11B874: void gemmi::split_element_and_charge<gemmi::SmallStructure::Site>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, gemmi::SmallStructure::Site*) (in /home/coudert/a.out)
==2764260==    by 0x11781B: gemmi::make_small_structure_from_block(gemmi::cif::Block const&) (in /home/coudert/a.out)
==2764260==    by 0x10BBAF: main (in /home/coudert/a.out)
==2764260== 
==2764260== 
==2764260== HEAP SUMMARY:
==2764260==     in use at exit: 0 bytes in 0 blocks
==2764260==   total heap usage: 849 allocs, 849 frees, 367,416 bytes allocated
==2764260== 
==2764260== All heap blocks were freed -- no leaks are possible
==2764260== 
==2764260== For counts of detected and suppressed errors, rerun with: -v
==2764260== Use --track-origins=yes to see where uninitialised values come from
==2764260== ERROR SUMMARY: 50 errors from 1 contexts (suppressed: 0 from 0)

With another valgrind version, a slightly different message but I think it is the same issue:

==17170== Memcheck, a memory error detector
==17170== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==17170== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==17170== Command: ./cif
==17170== 
==17170== Invalid read of size 1
==17170==    at 0x6B545C: void gemmi::split_element_and_charge<gemmi::SmallStructure::Site>(std::string const&, gemmi::SmallStructure::Site*) (small.hpp:77)
==17170==    by 0x6B1CB4: gemmi::make_small_structure_from_block(gemmi::cif::Block const&) (smcif.hpp:73)
==17170==    by 0x6A4AD5: chemfiles::CIFFormat::init_() (CIF.cpp:67)
==17170==    by 0x5E6EB1: chemfiles::CIFFormat::CIFFormat(std::string, chemfiles::File::Mode, chemfiles::File::Compression) (CIF.hpp:36)

Loading ccp4 map from memory

gemmi/include/gemmi/ccp4.hpp

Line 172 in f804883

void read_ccp4_header(Stream f, const std::string& path) {

When loading ccp4 map from MemoryStream, the position of the stream is not advanced, since the Stream is passed by value. This breaks the loader since the header bytes are read again as grid data later.
Passing the Stream f by reference fixes this issue.

PDB SEQRES conversion from MMCIF bug

Getting the "PDB" header (make_pdb_headers) from 6Y2E doesn't work because the length of the chain ID is longer than a single character. I suspect the below is the problem:

gemmi/include/gemmi/to_pdb.hpp

Line 476 in d59ef39

gf_snprintf(buf, 82, "SEQRES%4d%2s%5zu %62s\n",

The lines of the header are not as expected. The SEQRES is just one big "line".
line: SEQRES 1AAA 306 SER GLY PHE ARG LYS MET ALA PHE PRO SER GLY LYS VAL SEQRES 2AAA 306 GLU GLY CYS MET VAL
...

I see a similar problem with HELIX/SHEET.
I think the real problem here is that the multi-character chain IDs (AAA, BBB), cause incompatibility with the PDB spec and thus these sections.
At minimum perhaps detect the use of multi-character chain IDs and perhaps just don't write these sections.

segfault on badly formed mmCIF files

We've recently gotten some segfaults reading certain mmCIF files. I've attached an example here.
6j40.cif.txt

We see this in the trace:
terminate called after throwing an instance of 'tao::pegtl::parse_error'
what(): string:107819:0(9901474): Wrong number of values in the loop

It seems to be due to badly formed MMCIF files, because in some cases when we download a revised version of those same MMCIF files the problem goes away.

Is there a way we can safeguard the parser to avoid this?

pdb_read: Hg element assigned to C-gamma2 hydrogens of valine

pdb fragment:

ATOM      1  N   VAL     1     -16.669 -10.959  28.434  1.00  0.00
ATOM      2  H1  VAL     1     -17.186 -11.857  28.339  1.00  0.00
ATOM      3  H2  VAL     1     -17.171 -10.340  29.102  1.00  0.00
ATOM      4  H3  VAL     1     -16.611 -10.493  27.506  1.00  0.00
ATOM      5  CA  VAL     1     -15.289 -11.229  28.944  1.00  0.00
ATOM      6  HA  VAL     1     -15.385 -11.932  29.771  1.00  0.00
ATOM      7  CB  VAL     1     -14.375 -11.856  27.866  1.00  0.00
ATOM      8  HB  VAL     1     -14.217 -11.118  27.080  1.00  0.00
ATOM      9  CG1 VAL     1     -13.033 -12.232  28.471  1.00  0.00
ATOM     10 HG11 VAL     1     -12.398 -12.673  27.702  1.00  0.00
ATOM     11 HG12 VAL     1     -12.552 -11.340  28.872  1.00  0.00
ATOM     12 HG13 VAL     1     -13.185 -12.954  29.274  1.00  0.00
ATOM     13  CG2 VAL     1     -15.028 -13.077  27.254  1.00  0.00
ATOM     14 HG21 VAL     1     -14.368 -13.502  26.498  1.00  0.00
ATOM     15 HG22 VAL     1     -15.215 -13.818  28.031  1.00  0.00
ATOM     16 HG23 VAL     1     -15.973 -12.791  26.791  1.00  0.00
ATOM     17  C   VAL     1     -14.653  -9.919  29.386  1.00  0.00
ATOM     18  O   VAL     1     -14.750  -8.913  28.681  1.00  0.00
ATOM     19  N   VAL     2     -14.017  -9.937  30.554  1.00  0.00
ATOM     20  H   VAL     2     -13.943 -10.768  31.141  1.00  0.00
ATOM     21  CA  VAL     2     -13.360  -8.760  31.109  1.00  0.00
ATOM     22  HA  VAL     2     -13.357  -7.980  30.347  1.00  0.00
ATOM     23  CB  VAL     2     -14.104  -8.228  32.362  1.00  0.00
ATOM     24  HB  VAL     2     -14.122  -9.019  33.112  1.00  0.00
ATOM     25  CG1 VAL     2     -13.372  -7.021  32.952  1.00  0.00
ATOM     26 HG11 VAL     2     -13.910  -6.663  33.830  1.00  0.00
ATOM     27 HG12 VAL     2     -12.362  -7.313  33.239  1.00  0.00
ATOM     28 HG13 VAL     2     -13.322  -6.226  32.208  1.00  0.00
ATOM     29  CG2 VAL     2     -15.533  -7.860  32.012  1.00  0.00
ATOM     30 HG21 VAL     2     -16.041  -7.489  32.902  1.00  0.00
ATOM     31 HG22 VAL     2     -15.532  -7.086  31.245  1.00  0.00
ATOM     32 HG23 VAL     2     -16.054  -8.741  31.637  1.00  0.00
ATOM     33  C   VAL     2     -11.949  -9.145  31.526  1.00  0.00
ATOM     34  O   VAL     2     -11.743 -10.200  32.132  1.00  0.00
ATOM     35  N   VAL     3     -10.980  -8.318  31.150  1.00  0.00
ATOM     36  H   VAL     3     -11.139  -7.480  30.591  1.00  0.00
ATOM     37  CA  VAL     3      -9.584  -8.537  31.507  1.00  0.00
ATOM     38  HA  VAL     3      -9.459  -9.513  31.977  1.00  0.00
ATOM     39  CB  VAL     3      -8.664  -8.525  30.259  1.00  0.00
ATOM     40  HB  VAL     3      -8.780  -7.564  29.758  1.00  0.00
ATOM     41  CG1 VAL     3      -7.208  -8.688  30.664  1.00  0.00
ATOM     42 HG11 VAL     3      -6.579  -8.677  29.774  1.00  0.00
ATOM     43 HG12 VAL     3      -6.919  -7.868  31.322  1.00  0.00
ATOM     44 HG13 VAL     3      -7.080  -9.636  31.187  1.00  0.00
ATOM     45  CG2 VAL     3      -9.062  -9.632  29.304  1.00  0.00
ATOM     46 HG21 VAL     3      -8.408  -9.613  28.432  1.00  0.00
ATOM     47 HG22 VAL     3      -8.970 -10.596  29.805  1.00  0.00
ATOM     48 HG23 VAL     3     -10.094  -9.485  28.987  1.00  0.00
ATOM     49  C   VAL     3      -9.203  -7.386  32.446  1.00  0.00
ATOM     50  O   VAL     3      -9.320  -6.210  32.076  1.00  0.00

This fragment was generated by amber and seems to be complaint with PDB 3.0 nomenculature

Question: How to make a deep copy of gemmi.Structure in python

Hi,

Is there a way to make a copy of a gemmi.Structure object in python? I want to write a function that takes a structure as input, makes a copy (so the original is not affected), makes some modifications (e.g. adding residues) and returns the modified structure.

I've tried copy.copy and copy.deepcopy but both give me the error:
TypeError: can't pickle gemmi.Structure objects

Any help would be appreciated.

Cheers,
Paul

question about SpaceGroup GroupOp order

Hi,

first of all thank you for this great lib.

I have a question about the order of operations (triplets) e.g. for P 1 21/n 1 gemmi gives:
x,y,z (identity) -x,-y,-z -x+1/2,y+1/2,-z+1/2 x+1/2,-y+1/2,z+1/2

which is fine but not in the order given by the tables, where the order would be
(e.g. http://img.chem.ucl.ac.uk/sgp/large/014hy1.htm)
x,y,z -x+1/2,y+1/2,-z+1/2 -x,-y,-z x+1/2,-y+1/2,z+1/2

so how can I get the order corresponding to the tables?
(this may be an obvious/stupid question, but I am not a crystallographer)

Regards

[Feature Request] getstate and setstate__ for grid objects to enable multiprocessing

Being able to move grid objects between python processes would be great for enabling multiprocessing using normal python libraries.

Would also enable pickleing, which may be useful to some.

read_pdb unites TER\n-separted chains

I faced a problem reading .pdb files generated by tleap (from amber md package).

Reduced version of the .pdb file (4 protein chains):

CRYST1  116.952   70.342  107.488  90.00  90.00  90.00 P 1           1
ATOM      1  N   VAL     1     -16.669 -10.959  28.434  1.00  0.00
ATOM      2  H1  VAL     1     -17.186 -11.857  28.339  1.00  0.00
TER
ATOM   2081  N   VAL   139     -23.085 -10.959 -28.434  1.00  0.00
ATOM   2082  H1  VAL   139     -22.568 -11.857 -28.339  1.00  0.00
TER
ATOM   4161  N   VAL   277      23.085  12.972  28.434  1.00  0.00
ATOM   4162  H1  VAL   277      22.569  12.074  28.339  1.00  0.00
TER
ATOM   6241  N   VAL   415      16.670  12.972 -28.434  1.00  0.00
ATOM   6242  H1  VAL   415      17.186  12.074 -28.339  1.00  0.00
TER
END

Sample script:

import gemmi
st = gemmi.read_pdb("test.pdb")
for model in st:
    for chain in model:
        print(chain)

Expected:

<gemmi.Chain  with 1 res>
<gemmi.Chain  with 1 res>
<gemmi.Chain  with 1 res>
<gemmi.Chain  with 1 res>

Actual:

<gemmi.Chain  with 4 res>

PDB parsing is not an easy task. Most of troubles come from plenty of slightly varying (often undocumented) .pdb flavors. Every library choose subset of this zoo to support. Support of amber .pdb files may be outside of gemmi area of interest.

reciprocal space Grid

When a Grid representing a reciprocal space is generated by get_f_phi_on_grid, it applies a periodic boundary condition like a real space grid. I found this confusing. Is this intended?

Another related issue is that symmetrize_* applies real space, space group operations, not reciprocal space, point group operations (and relevant phase relationships).

Segmentation fault in a ContactSearch setup

Hi guys, thanks for this tool. I've encountered segmentation faults for a couple of entries with the latest version (as of writing this post) of gemmi from the master branch and I wonder if anyone could have a look what is wrong. I'm using it from python.

Setup

OS: MacOS: 10.15.4
python: 3.7.7

minimal code to reproduce the issue

import gemmi
path = '1to4-assembly-1_atom_site.cif.gz '
st = gemmi.read_structure(path)
st.setup_entities()
st.remove_ligands_and_waters()

cs = gemmi.ContactSearch(distance)
cs gemmi.ContactSearch.Ignore.SameChain
sub_cells = gemmi.SubCells(st[0], st.cell, 6)
sub_cells.populate() # here it blows up

It crashes for the assembly id 1 for all of the following structures:

1to4, 2gpc, 4xvv, 2hk3, 3n77, 3fbt, 3chi, 3esf, 2r3v, 4yiw, 2x4i, 5hm4, 3kbf, 4f2n, 3mux, 4whj
3zrd, 3glv, 3mnd, 4u04, 2goj, 1e59, 1tz3, 3clk, 2pmr, 3toe, 1y7y, 2xhf, 2wcu, 3p9x

You can download the assembly file from here: http://www.ebi.ac.uk/pdbe/static/entry/download/1to4-assembly-1_atom_site.cif.gz

gemmi validate: stopping early?

Does the validation only ever report the first instance of an issue for a given item? E.g. I know that a mmCIF file has several new _pdbx_diffrn_data_section_contents.content_type values, but only ever get one problem report back.

Is that intential or a buglet?

pypi wheel builds?

Could we have wheel releases on pypi?
I am currently setting up CI builds for a new project which will be using gemmi, and it looks like building gemmi takes about 4 minutes, which adds up to some significant time with a build matrix of 4 python environments on 3 platforms.

Matthews coeff., solvent content and number of copies in the ASU

Hi,
Thank you for this great package, I find it really useful. I was wondering if there is an easy way to find the matthews coefficient and the solvent content for a given mtz file (assuming we know the sequence). Basically I'm interested in estimating the number of copies in the ASU.
Thanks

cif.read() raises exception by a certain file

Hi,
I am trying to parse a cif file 1ue4. My code looks like this:

doc = cif.read("1ue4.cif")

By other cif files it works normally, but by this file it raises an exception: Wrong number of values in the loop

(In the previous version it worked.)

[Feature Request] FFT resolution parameter for mtz.transform_f_phi_to_map

Image processing real space density for many datasets is much easier when they are sampled to the same grid, but currently there is no easy way to do this that I can find: transform_f_phi_to_map often errors when given an exact size that is too small, and there isn't an easy way to find out what grid a map would bee fft'd to anyhow.

Also, because sometimes their preferred size is large, so being able to fft to the lowest common resolution would be helpful anyway when dealing with many datasets!

PanDDA in particular relies heavily on truncating datasets down to the lowest common resolution for its event finding, so this would be useful there!

Small molecule: no _atom_site_occupancy leads to uninitialized occupancy

File: smcif.hpp
Function: make_atomic_structure_from_block

In the case, when there is no occupancy pair defined in the cif file, it leads to the occ member of the AtomicStructure::Site being undefined / uninitialized. Maybe, for the sake of the high-level access, it would be better to set it to NAN by default? For now, I am checking if the file has _atom_site_occupancy.

Example:
loop_
_atom_site_label
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
H 1.0 1.0 1.0

ConnectionType.None is not accessible from python

ConnectionType.None cannot be used from python since None is a keyword.

Example:

import gemmi

print(gemmi.ConnectionType.Covale)
print(gemmi.ConnectionType.Disulf)
print(gemmi.ConnectionType.Hydrog)
print(gemmi.ConnectionType.MetalC)
print(gemmi.ConnectionType(4))
# print(gemmi.ConnectionType.None)

Syntax error (uncomment last line):

    print(gemmi.ConnectionType.None)
                                  ^
SyntaxError: invalid syntax

Current work-around: ConnectionType.None can be constructed as ConnectionType(4)

[Feature Request] RMSD for comparing small molecules

Having a function to calculate the RMSD between small molecules whose atoms are not necessarily listed in the same order seems like it would be potentially very useful, although it would need some graph matching and symmetry shenanigans!

Consistency of call signatures in gemmi.Op and vectorized apply_to_hkl() and phase_shift()

Hi --

I have been writing a package for representing and manipulating reflection tables in pandas dataframes, and have been using gemmi as a backend for some of the features related to IO of MTZ files and space group symmetry operations.

Currently, gemmi.Op.apply_to_hkl() takes HKL indices as a list/tuple; however, gemmi.Op.phase_shift() takes HKL indices as 3 subsequent int arguments. Should these two call signatures be made consistent by changing phase_shift() to take HKL indices as a list/tuple?

i.e. changing this:
phase_shift(self: gemmi.Op, h: int, k: int, l: int) -> float

to:
phase_shift(self: gemmi.Op, hkl: List[int[3]]) -> float

On a related note, it seems like it would be useful to vectorize these operations to work on lists/np.ndarrays of HKL indices. While it is easy to write a for-loop to handle it, it seems like a common enough use case that it might warrant overloading the functions to also handle call signatures like:

apply_to_hkl(self: gemmi.Op, hkl: List[ List[ int[3] ] ]) -> List[ List[ int[3] ] ]
and
phase_shift(self: gemmi.Op, hkl: List[ List[ int[3] ] ]) -> List[ float ]

These are really more of feature requests than "issues," but I look forward to your thoughts--

Thanks,
Jack

Reindex MTZ using python

Hi @wojdyr, thanks for the package, I find it really useful. I was wondering if there is a way to change the spacegroup of a gemmi.Mtz instance in python (including the reindexing of the reflections). For example, if I had a MTZ in P321 and I want to change it to P 31 2 1 I know it would possible to do mtz.spacegroup = gemmi.SpaceGroup('P 31 2 1') but I have the impression this is just changing some attribute in gemmi.Mtz and not reindexing the reflections. Thanks!

How can I cite gemmi?

How can I cite gemmi in a scientific publication?

Thanks

Grid.interpolate_value crashes Python

Hi, I noticed that the following program causes segmentation fault in Python 3.6.4.

import gemmi

fn_mtz = "4MAA_map_coeffs.mtz"

mtz = gemmi.read_mtz_file(fn_mtz)
grid = mtz.get_f_phi_on_grid('2Fo-Fc', 'PH2Fo-Fc', mtz.get_size_for_hkl(sample_rate=3.0))

real_space = gemmi.transform_f_phi_grid_to_map(grid)
coord = gemmi.Position(0, 7.57895, 56.0842)
print(real_space.interpolate_value(coord))

The problematic MTZ file was created as:

phenix.fetch_pdb --mtz 4MAA
phenix.maps reflection_data.file_name=4MAA.mtz 4MAA.pdb map_coefficients.map_type=2Fo-Fc

It is available here: https://drive.google.com/open?id=1p87TDu4n9N15EVljGGu8U1YbunefEOmj

GroupOps.epsilon_factor_without_centering([0, 0, 0]) is twice Sgtbx result for centric space groups

I am unsure if this is a bug in Gemmi or a bug in Sgtbx. Either way it is probably academic.

The following script can be used to confirm that Gemmi sometimes calculates an epsilon factor without centering which is twice the value given by Sgtbx. Oddly, this only applies to reflection 0,0,0 and affects all centric space groups.

#!/usr/bin/env cctbx.python
from cctbx import sgtbx
import gemmi

h = [0, 0, 0]


for s in list(sgtbx.space_group_symbol_iterator()):
    xhm  = s.universal_hermann_mauguin()
    sg   = sgtbx.space_group(s)
    sgtbx_epsilon = sg.epsilon(h)
    gemmi_go = gemmi.SpaceGroup(xhm).operations()
    gemmi_epsilon = gemmi_go.epsilon_factor_without_centering(h)
    if gemmi_epsilon != sgtbx_epsilon:
        centric = gemmi_go.is_centric()
        print(f"{xhm:>12}\t{centric}\t{gemmi_epsilon}\t{sgtbx_epsilon}")

Publish v0.3.7 on PyPI

Hi Marcin,

Thanks for the recent updates -- they have made our testing much easier. Would it be possible to push the most recent version to PyPI? I've seen it is on conda, but I was hoping to make sure that pip can reach it as well.

I also know that you just pushed a new release very recently, but please let me know if you are planning on pushing a v0.3.8 soon with the updates to gemmi.SpaceGroup.

Thanks again,
Jack

[Question] Read/Write model chain from mmCIF to mmCIF or PDB format.

Hi @wojdyr ,
First of all, thanks for this awesome tool !
Since two days ago I was reading the documentation and doing some testing in RCSB mmCIF files.
And when a was trying to get a particular model chain a few doubts came to me.

At the moment I did this to get all models:

doc = gemmi.cif.read_file('2mfe.cif')
block = doc.sole_block()

1 WAY

models = block.find('_atom_site.', ['group_PDB', 'id', 'type_symbol', 'label_atom_id', 'label_alt_id', 'label_comp_id', 'label_asym_id', 'label_entity_id', 'label_seq_id', 'pdbx_PDB_ins_code', 'Cartn_x', 'Cartn_y', 'Cartn_z', 'occupancy', 'B_iso_or_equiv', 'pdbx_formal_charge', 'auth_seq_id', 'auth_comp_id', 'auth_asym_id', 'auth_atom_id', 'pdbx_PDB_model_num'])

models
Out: <gemmi.cif.Table 65280 x 21>

2 WAY

models = block.find_mmcif_category('_atom_site.')

models
Out: <gemmi.cif.Table 65280 x 21>

3 WAY

models = block.find_loop('_atom_site.group_PDB').get_loop()

models
<gemmi.cif.Loop 65280 x 21>

Now I got a problem...
How I could save in mmCIF or PDB (style PDBx) format those models from chain "D" in separate files?

I'm stack with this poorly solution:

for row in models:
   if row[6] is 'D':
       print('%s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s ' % tuple(row))

On the other hand, I was trying Biopython (v1.76) PDB class, which they get above result doing this:

from Bio.PDB import MMCIFParser() as p
from Bio.PDB import MMCIFIO() as io

file_path = "2mfe.cif"
pdb_code = file_path.split('.')[0]
chain = "D"
file = open(file_path, "r")
structure_file = p.get_structure(pdb_code, file)
for model in structure_file:
    filename = (pdb_code + "_" + str(model.id) + "_" + chain + ".cif")
    io.set_structure(model[chain])
    io.save(filename)

Biopython's results only have this loop data (missing '_atom_site.pdbx_formal_charge ', '_atom_site.auth_comp_id' and '_atom_site.auth_atom_id'):

data_pdb
#
loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.auth_seq_id
_atom_site.auth_asym_id
_atom_site.pdbx_PDB_model_num
ATOM ....
...
HETATM ....
#

Confirm the equivalence of different surfaces belong to different spacegroups which are related via the change-of-basis transformation.

Hi,

Based on the descriptions in this paper, see top left column on pp.3 for more info, I learned the following fact:

The (010) surface in C2/c is equivalent to the (001) surface in I2/b. And these two space groups are related through a change-of-basis transformation. An I2/b cell is with tetragonal lattice while a C2/c cell is related to the conventional monoclinic cell.

But The above description is very abstract and difficult to understand, so I want to know whether I can have a better understanding of the problem based on the help of gemmi?

Regards,
HY

Cannot read NH2 chem comp

The CCDs NH2, NH, GTE, DML and TML fail to parse.

gemmi seems to fall over on the _pdbx_chem_comp_identifier.identifier which has an unquoted string like $l^{2}-azane.

gemmi returns either "Wrong number of values in the loop" if the value is in a loop or "expected value" if in a pair.

I suspect it has to do with the unquoted string but since I am on the python side, all I get is a RunTimeError at gemmi.cif.read(file_path) method and nothing more.

<parent>.add_<child> functions ivalidate references

Molecular <parent>.add_<child> functions apparently invalidate early stored references. While invalidation is common in C++ realm it might be quite surprising behavior to python users. Probably it deserves to be documented.

MRE

import gemmi

pdb_string = """\
ATOM      1  N   LYS A  12       8.016  44.382 113.565  1.00 33.28           N
"""

st = gemmi.read_pdb_string(pdb_string)

model = st[0]
chain = model[0]
residue = chain[0]
atom = residue[0]

st.add_model(model)
# print(model) # uncomment to SIGSEGV

model.add_chain(chain)
# print(chain)  # uncomment to SIGSEGV

residue.add_atom(atom)
# print(atom)  # uncomment to SIGSEGV

Issue with find_shortest_path

find_shortest_path in chemcomp.hpp fails to find shortest path between ("C4", "O6", {"C5", "C6"}) and ("C5", "C2", {"C6", "O6"}) for acedrg-generated KDA, KDO, KDR, KME and KOT. Fix not urgent - can use workaround for now.

Inconsistent reciprocalspace ASU between gemmi and cctbx

Summary

As @kmdalton mentioned in #28, we have been implementing a few tests to make sure that our vectorized implementations of certain spacegroup-related operations in reciprocalspaceship are correct. In testing some code that maps Miller indices to the reciprocalspace ASU, we noticed some differences between our "reference implementations" in cctbx and gemmi.

We iterate over spacegroups using sgtbx.space_group_symbol_iterator(), which has 530 spacegroup settings, and use the extended Hermann-Mauguin for instantiating the gemmi.SpaceGroup. We are finding inconsistencies for 47/530 settings. This list of 47 spacegroups can be generated using the example below:

Example

reproduce_sg_inconsistency.py

from cctbx import sgtbx
from cctbx import miller
import gemmi
import numpy as np

def build_emptyMTZ(sg):
    """Build empty gemmi.MTZ"""
    m = gemmi.Mtz()
    m.spacegroup = sg
    m.add_dataset('crystal')
    m.add_column('H', 'H')
    m.add_column('K', 'H')
    m.add_column('L', 'H')
    m.add_column('M/ISYM', 'Y')
    return m
    
def gemmi_map2asu(H, symbol):
    """Map HKLs to reciprocal space ASU using gemmi"""
    xhm = symbol.universal_hermann_mauguin()
    sg = gemmi.SpaceGroup(xhm)
    m = build_emptyMTZ(sg)
    data = np.append(H, np.zeros((len(H), 1)), 1)
    m.set_data(data)
    m.switch_to_asu_hkl()
    h = m.column_with_label('H').array.astype(int)
    k = m.column_with_label('K').array.astype(int)
    l = m.column_with_label('L').array.astype(int)
    return np.vstack([h, k, l]).T
    
def cctbx_map2asu(H, symbol):
    """Map HKLs to reciprocal space ASU using cctbx"""
    sg  = sgtbx.space_group(symbol)
    asu = sgtbx.reciprocal_space_asu(sg.type())
    H_asu = np.vstack([miller.asym_index(sg, asu, i.tolist()).h() for i in H])
    return H_asu
    
hmin,hmax = -5, 5
H = np.mgrid[hmin:hmax+1:1,hmin:hmax+1:1,hmin:hmax+1:1].reshape((3, -1)).T

for symbol in sgtbx.space_group_symbol_iterator():
    xhm  = symbol.universal_hermann_mauguin()
    H_cctbx = cctbx_map2asu(H, symbol)
    H_gemmi = gemmi_map2asu(H, symbol)
    if not np.array_equal(H_cctbx, H_gemmi):
        print(xhm)

Output

Collapsed Output

I 1 2 1
I 1 1 2
I 2 1 1
P 1 n 1
P 1 1 n
P n 1 1
I 1 m 1
I 1 1 m
I m 1 1
A 1 n 1
I 1 a 1
I 1 c 1
B 1 1 n
I 1 1 b
A 1 1 n
I 1 1 a
C n 1 1
I c 1 1
B n 1 1
I b 1 1
I 1 2/m 1
I 1 1 2/m
I 2/m 1 1
P 1 2/n 1
P 1 1 2/n
P 2/n 1 1
P 1 21/n 1
P 1 1 21/n
P 21/n 1 1
A 1 2/n 1
I 1 2/a 1
I 1 2/c 1
B 1 1 2/n
I 1 1 2/b
A 1 1 2/n
I 1 1 2/a
C 2/n 1 1
I 2/c 1 1
B 2/n 1 1
I 2/b 1 1
R 3 :R
R -3 :R
R 3 2 :R
R 3 m :R
R 3 c :R
R -3 m :R
R -3 c :R

Single HKL example:

symbol = sgtbx.space_group_symbols("I 1 2 1") 
hkl = np.array([[-5, -5, 1]])
print(cctbx_map2asu(hkl, symbol)) # [[-5  5  1]]
print(gemmi_map2asu(hkl, symbol)) # [[ 5  5 -1]]

Issue

Is this expected behavior, or should gemmi and sgtbx both be using the same conventions for the ASU?

PDB read: peptide residues are marked as EntityType.Unknown

A consequence of this issue is removal of TER records in .pdb round trip which cannot be overwritten by any flags.

MRE:

import gemmi

st = gemmi.read_pdb_string("""\
ATOM      1  N   THR A   1     137.600 117.719 224.191  1.00107.30           N  
ATOM      2  CA  THR A   1     136.249 117.607 224.728  1.00121.14           C  
ATOM      3  C   THR A   1     136.208 116.647 225.914  1.00114.05           C  
ATOM      4  O   THR A   1     135.371 116.784 226.806  1.00103.45           O  
ATOM      5  CB  THR A   1     135.254 117.130 223.655  1.00127.19           C  
ATOM      6  H1  THR A   1     137.645 118.519 223.528  1.00107.30           H  
ATOM      7  H2  THR A   1     138.271 117.877 224.970  1.00107.30           H  
ATOM      8  H3  THR A   1     137.849 116.841 223.692  1.00107.30           H  
ATOM      9  HA  THR A   1     135.957 118.603 225.061  1.00121.14           H  
ATOM     10  N   ALA A   2     137.110 115.672 225.916  1.00117.01           N  
ATOM     11  CA  ALA A   2     137.250 114.762 227.047  1.00123.58           C  
ATOM     12  C   ALA A   2     138.018 115.456 228.166  1.00131.44           C  
ATOM     13  O   ALA A   2     137.725 115.279 229.359  1.00121.63           O  
ATOM     14  CB  ALA A   2     137.957 113.491 226.619  1.00112.08           C  
ATOM     15  H   ALA A   2     137.755 115.491 225.147  1.00117.01           H  
ATOM     16  HA  ALA A   2     136.260 114.489 227.414  1.00123.58           H  
ATOM     17  HB1 ALA A   2     138.054 112.823 227.475  1.00112.08           H  
ATOM     18  HB2 ALA A   2     137.379 112.999 225.837  1.00112.08           H  
ATOM     19  HB3 ALA A   2     138.948 113.737 226.237  1.00112.08           H  
TER
""")

print(st[0][0][0].entity_type)
print(st[0][0][1].entity_type)

output:

EntityType.Unknown
EntityType.Unknown

Issue reading/writing mmcif file

Hi Marcin,

I am having an issue with Gemmi to read/write mmcif files and I was wondering if you could tell me what I am doing wrong.

I have included a script that does the following:

Read cif file I obtained from the PDB into a gemmi structure
Write the structure to a new cif file
Read the new cif file back into gemmi in the same way.

import gemmi
    
def test(in_filename, out_filename):
    
    # Read the structure
    structure = gemmi.read_structure(in_filename)

    # Write the structure
    structure.make_mmcif_document().write_file(out_filename)

if __name__ == '__main__':
    test("4v5d.cif", "out1.cif")
    test("out1.cif", "out2.cif")

When I do this, I get the following error:

$ python test.py 
Traceback (most recent call last):
  File "test.py", line 36, in <module>
    test("out1.cif", "out2.cif")
  File "test.py", line 6, in test
    structure = gemmi.read_structure(in_filename)
RuntimeError: out1.cif:149:0(2233): Wrong number of values in the loop

I have attached the script and the initial cif file here. file_and_script.zip

Thanks in advance for your help!

Python Bindings for add_row to handle ? and .

The set_mmcif_category has this code section

for (size_t col = 0; col != w; ++col) {
             size_t idx = col;
             for (auto handle : values[col]) {
               std::string& val = loop.values[idx];
               PyObject* ptr = handle.ptr();
               if (handle.is_none()) {
                 val = "?";
               } else if (ptr == Py_False) {
                 val = ".";
               } else if (ptr == Py_True) {
                 throw py::value_error("unexpected value True");
               } else if (raw || PyFloat_Check(ptr) || PyLong_Check(ptr)) {
                 val = py::str(handle);
               } else {
                 val = quote(py::str(handle));
               }
               idx += w;
             }
           }

For add_row, there is no such binding forcing the List to be a List[str]. Could we allow for automatic conversion if raw=False (raw can be defaulted to True to avoid breaking existing usage).

mmcif empty loop

Gemmi stops parsing mmcif file when encountered no values in loop. Below is the example from Refmac's output. This may be a violation of cif format and Refmac will be fixed, but can gemmi support reading this?

...
loop_
   _chem_comp_tor.comp_id
   _chem_comp_tor.id
   _chem_comp_tor.atom_id_1
   _chem_comp_tor.atom_id_2
   _chem_comp_tor.atom_id_3
   _chem_comp_tor.atom_id_4
   _chem_comp_tor.value_angle
   _chem_comp_tor.value_angle_esd
   _chem_comp_tor.period
var_1     FES      FE1  S2   FE2  S1          0.000      20.000   1
var_2     FES      FE2  S2   FE1  S1          0.000      20.000   1
var_3     FES      FE2  S1   FE1  S2          0.000      20.000   1
#
loop_
   _chem_comp_chir.comp_id
   _chem_comp_chir.id
   _chem_comp_chir.atom_id
   _chem_comp_chir.volume_flag
   _chem_comp_chir.volume_three
loop_
   _chem_comp_chir_atom.comp_id
   _chem_comp_chir_atom.chir_id
   _chem_comp_chir_atom.atom_id
#
data_comp_MA6
....

Space group 141 error

I'm quite new to gemmi and crystal summetry, so I apologize in advance is the following issue is just a non-sense but, when I retrieve retrieved sym_ops operations using gemmi for space group 141:

ops = gemmi.find_spacegroup_by_number(141).operations()
sym_ops = ops.sym_ops

Retrieved operations are:
x,y,z
-y,x+1/2,z+1/4
-x+1/2,-y+1/2,z+1/2
y+1/2,-x,z+3/4
x,-y+1/2,-z+1/4
-y,-x,-z
-x+1/2,y,-z+3/4
y+1/2,x+1/2,-z+1/2
-x,-y+1/2,-z+1/4
y,-x,-z
x+1/2,y,-z+3/4
-y+1/2,x+1/2,-z+1/2
-x,y,z
y,x+1/2,z+1/4
x+1/2,-y+1/2,z+1/2
-y+1/2,-x,z+3/4

However, those operations do not match with those listed for instance in the bilbao crystallographic database (https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-wp-list):

(x,y,z) (-x+1/2,-y,z+1/2) (-y+1/4,x+3/4,z+1/4) (y+1/4,-x+1/4,z+3/4)
(-x+1/2,y,-z+1/2) (x,-y,-z) (y+1/4,x+3/4,-z+1/4) (-y+1/4,-x+1/4,-z+3/4)
(-x,-y,-z) (x+1/2,y,-z+1/2) (y+3/4,-x+1/4,-z+3/4) (-y+3/4,x+3/4,-z+1/4)
(x+1/2,-y,z+1/2) (-x,y,z) (-y+3/4,-x+1/4,z+3/4) (y+3/4,x+3/4,z+1/4)

I was wondering if this is an error, as my calculation runs smoothly with bilbao's but not gemmi operations, or I am missing something important.

Thanks very much in advance.

Hexagonal crystals (R 3 -type spacegroups)

I ran into some confusion regarding the R 3 family of spacegroups when writing tests related to spacegroups in reciprocalspaceship. I understand that spacegroups such as R 3 can be represented in two different ways -- rhombohedral / hexagonal. However, when they are specified explicitly, it seems that GroupOps.find_centering() does not return a value for hexagonal representations:

import gemmi

r3 = gemmi.SpaceGroup("R 3")
r3r = gemmi.SpaceGroup("R 3:R")
r3h = gemmi.SpaceGroup("R 3:H")

print(r3.operations().find_centering())  # prints '\x00'
print(r3r.operations().find_centering()) # prints 'P'
print(r3h.operations().find_centering()) # prints'\x00'

I am only showing this for R 3:H, but it is also true for the 6 other related spacegroups. It seems to me that there should be a returned centering in these cases, although I can understand not including a centering in the slightly ambiguous case of R 3.

This also may be a misunderstanding of these spacegroups on my part, but the following symmetry operations are found for R 3:H: [<gemmi.Op("x,y,z")>, <gemmi.Op("-y,x-y,z")>, <gemmi.Op("-x+y,-x,z")>]. However, those symops match the rhombohedral setting here: http://img.chem.ucl.ac.uk/sgp/large/146bz1.htm, and vice versa for R 3:R. I am uncertain whether these two are mixed-up in gemmi, or whether I am mixing them up in looking at the space group diagrams.

Thanks for your help!
Jack

python FloatGrid <-> C++ Grid<float> Conversion

Curretly, python side functions like transform_f_phi_to_map return gemmi.FloatGrid, a python class rather than the underlying C++ Grid.

This seems problematic as there doesn't seem to be a easy way to pass grids generated on the python side back to C++ for more intensive operations like resampling.

I think this could be fixed relatively easily by making the underlying C++ object accessible as a property?

List of categories in block

Is it possible to get a list of categories in a block without looping? This is not so much a must have as it is a convenience function.

Use case: was trying to add a unit test for a function that removes categories and wanted to see categories before and after and confirm all the categories in the list have been removed. I had to use a list comprehension but thought it should be useful enough to have a convenience function

REVDAT entry converted to PDB format

in the write_remarks function in to_pdb.hpp , could you add the REVDAT entry. I've pasted some code that should work below.

std::string token;
std::istringstream revNum(st.get_info("_pdbx_audit_revision_history.ordinal"));
std::vector<std::string> revNums;
while(std::getline(revNum, token, ';'))
{
  revNums.push_back(token);
}

std::istringstream revDate(st.get_info("_pdbx_audit_revision_history.revision_date"));
std::vector<std::string> revDates;
while(std::getline(revDate, token, ';'))
{
  token.erase(std::remove_if(token.begin(), token.end(), ::isspace), token.end());
  revDates.push_back(token);
}

for(int i = (int) revNums.size() -1; i >= 0; --i)
{
  WRITEU("REVDAT %3s   %-9s %-51s",
         revNums[i].c_str(), revDates[i].c_str(), st.get_info("_entry.id").c_str());
}

and in mmcif.hpp

  std::string old_revnum_tag = "_database_PDB_rev.num";
  std::string new_revnum_tag = "_pdbx_audit_revision_history.ordinal";
  add_info(old_revnum_tag);
  add_info(new_revnum_tag);
  if (st.info.count(old_revnum_tag) == 1 && st.info.count(new_revnum_tag) == 0)
    st.info[new_revnum_tag] = st.info[old_revnum_tag];

  std::string old_revdate_tag = "_database_PDB_rev.date";
  std::string new_revdate_tag = "_pdbx_audit_revision_history.revision_date";
  add_info(old_revdate_tag);
  add_info(new_revdate_tag);
  if (st.info.count(old_revdate_tag) == 1 && st.info.count(new_revdate_tag) == 0)
    st.info[new_revdate_tag] = st.info[old_revdate_tag];

MTZ resolution with modified data

For the gemmi.Mtz class, would it be possible to have min_1_d2 and max_1_d2 (and hence resolution_low and resolution_high) calculated using the current data? Currently, if I create a new MTZ object they return nan. It is possible to get the values by other means, but it isn't obvious why some methods don't work as expected. I'm guessing this also applies if you read an MTZ and modify the data.

Here is some code to reproduce:

import gemmi
import numpy


def create_mtz():
    """Create a new MTZ object as described in the documentation"""
    mtz = gemmi.Mtz()
    mtz.spacegroup = gemmi.find_spacegroup_by_name("P 21 21 2")
    mtz.cell.set(77.7, 149.5, 62.4, 90, 90, 90)
    mtz.add_dataset("HKL_base")
    for label in ["H", "K", "L"]:
        mtz.add_column(label, "H")
    mtz.add_dataset("synthetic")
    mtz.add_column("F", "F")
    mtz.add_column("SIGF", "Q")
    mtz.add_column("FREE", "I", dataset_id=0, pos=3)
    data = numpy.array(
        [[4, 13, 8, 1, 453.9, 19.12], [4, 13, 9, 0, 102.0, 27.31]], numpy.float
    )
    mtz.set_data(data)
    return mtz


def test():
    """Print different MTZ properties"""
    mtz = create_mtz()
    print(mtz.nreflections)  # 2
    print(mtz.make_d_array())  # [6.1258473 5.678323 ]
    print(mtz.min_1_d2)  # nan
    print(mtz.max_1_d2)  # nan
    print(mtz.resolution_low())  # nan
    print(mtz.resolution_high())  # nan


if __name__ == "__main__":
    test()

gemmi validate: use of quotes

When checking against a controlled vocabulary, the list of allowed values is given as a plain, comma-separated list. This makes it very hard to read and doesn't take into account that the allowed values themselves might have a comma (if they are character items).

It would be nice to get (shortened)

some.cif:16 in data_1: _pdbx_diffrn_data_section_contents.content_type: ''X-Ray structure factor amplitudes(+)'' is not one of:
"X-ray structure factor amplitudes"
"X-ray structure factor intensities, unmerged"

(each allowed value on new line) instead of the current one-liner:

some.cif:16 in data_1: _pdbx_diffrn_data_section_contents.content_type: ''X-Ray structure factor amplitudes(+)'' is not one of: X-ray structure factor amplitudes, X-ray structure factor intensities, unmerged.

Question about as_json

Hello, I would like to ask a question about the as_json method. I believe the default behaviour is for it to lowercase the names (line 24 in to_json.hpp) but I was wondering if there is any way in python to be able to set this behaviour to false? I see there is the mmjson value but this also changes other settings? Thanks in advance for any tips advice you can provide.

support for multithreading in python?

Hi @wojdyr,

I wonder if there is any plan on considering supporting multithreading in python. Presently, when I try to use Python's multiprocessing module it crashes with message similar to:

TypeError: can't pickle gemmi.ResidueSpan objects

I triad approach described here to add custom pickling to the class shown here: , but only managed to make another step before it crashed with

TypeError: gemmi.ResidueSpan: No constructor defined!

So I'm not sure if this is something that needs to be supportd by the library, or if I'm doing something wrong. Any help would be greatly appreciated!

edit: I've just noticed #13, but have not seen any comments there...