Code Monkey home page Code Monkey logo

cif-ontology's Introduction

CIF Ontology

CI tests DOI

This repository provides an ontologisation of the CIF Dictionary Definition Language (DDLm) and the CIF core dictionary by IUCr. The development version of these dictionaries can be found in the COMCIFS/cif_core GitHub repository.

The CIF Ontology has no dependencies to any upper ontology. But the EMMC crystallography task group is providing an EMMO-based Crystallography Domain Ontology, which is based on both the CIF Ontology and EMMO.

Obtaining CIF-ontology

A table with available releases can be found in the documentation.

Manually generating the cif core ontology

It is also possible to clone this repository and generate the CIF ontology.

First clone this repository with

git clone https://github.com/emmo-repo/CIF-ontology.git

and then run the dic2owl tool following the instructions in the dic2owl/README.md file.

Attributions and credits

Contributors

  • Jesper Friis, SINTEF
  • James Hester
  • Casper Welzel Andersen, EPFL
  • Saulius Grazulis
  • Rickard Armiento
  • Emanuele Ghedini
  • Francesca Lønstad Bleken, SINTEF
  • Joana Morgado, Fraunhofer IWM
  • Stuart Chalk

Contributing projects

  • Demystify ontologies - Internal project at SINTEF
  • MarketPlace; Grant Agreement No: 760173
  • OntoTrans; Grant Agreement No: 862136
  • BIG-MAP; Grant Agreement No: 957189

License

The CIF ontology is released under the Creative Commons Attribution 4.0 International license (CC BY 4.0). See also the LICENSE file.

cif-ontology's People

Contributors

ahashibon avatar casperwa avatar dependabot[bot] avatar emanueleghedini avatar francescalb avatar jesper-friis avatar sauliusg avatar vaitkus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cif-ontology's Issues

Running dic2owl on the latest ddlm version of cif core I get an error

Below is the output I am getting trying to process the latest cif_core ddlm dictionary. I am assuming this is a dic2owl error but I guess it might be becuase of an update to the core dictionary. Anyway let me know if there is a way I can ginore/get around this issue as I realize that both dic2owl and the ddlm dictionaries are both in development...

((scidata_cif2) ) n00002621@chalk3 scidata_cif2 % dic2owl static/cifdics/cif_core.dic -o static/onts/cif_core.ttl
CORE_DIC3.2.0 is a DDLm dictionary
Import mode All applied to following frames
['_geom_torsion.site_symmetry_1', '_diffrn_orient_refln.angle_phi', '_diffrn_standard_refln.index_h', '_atom_site_aniso.u_13_su', '_diffrn_orient_refln.angle_chi_su', '_atom_type_scat.hi_ang_fox_c3', '_diffrn_orient_refln.angle_psi', '_display_colour.red', '_atom_site_aniso.u_11', '_atom_sites_fract_transform.mat_21_su', '_atom_site_aniso.b_13', '_atom_sites_cartn_transform.mat_13_su', '_cell.angle_gamma_su', '_refln.f_squared_calc_su', '_atom_sites_fract_transform.mat_32_su', '_atom_type_scat.cromer_mann_a3', '_model_site.display_colour', '_atom_type.radius_bond', '_atom_site_aniso.u_33', '_geom_torsion.site_symmetry_4', '_atom_site.fract_z', '_diffrn_reflns_transf_matrix.21', '_diffrn_scale_group.i_net_su', '_diffrn_refln.intensity_total_su', '_atom_sites_cartn_transform.vec_1', '_atom_site_aniso.u_12_su', '_atom_sites_fract_transform.mat_21', '_cell.length_c', '_chemical.enantioexcess_bulk_su', '_cell.length_b_su', '_exptl_crystal_face.diffr_kappa', '_chemical.enantioexcess_crystal_su', '_geom_bond.site_symmetry_2', '_diffrn_reflns.limit_h_min', '_refine_ls.restrained_s_all_su', '_exptl_crystal.size_max_su', '_geom_angle.atom_site_label_3', '_refln.phase_meas_su', '_atom_site_aniso.b_13_su', '_atom_site_aniso.b_12', '_geom_hbond.atom_site_label_d', '_geom_bond.site_symmetry_1', '_atom_sites_cartn_transform.mat_11', '_atom_sites_cartn_transform.mat_13', '_atom_sites_fract_transform.vec_2', '_atom_site.fract_x_su', '_atom_sites_fract_transform.vec_1_su', '_diffrn_refln.intensity_peak_su', '_display_colour.hue', '_atom_site_aniso.u_12', '_exptl_crystal_face.diffr_psi_su', '_geom_angle.site_symmetry_3', '_refln.a_calc_su', '_atom_sites_cartn_transform.mat_32_su', '_diffrn_reflns.limit_k_max', '_atom_type_scat.dispersion_real_cu', '_geom_angle.atom_site_label_1', '_exptl.transmission_factor_min_su', '_cell.angle_gamma', '_refine_ls.restrained_s_gt_su', '_atom_sites_cartn_transform.vec_3_su', '_atom_type_scat.hi_ang_fox_c0', '_diffrn_orient_refln.angle_omega', '_refln.b_calc_su', '_diffrn_orient_refln.index_h', '_diffrn_orient_refln.angle_kappa', '_cell.angle_alpha_su', '_diffrn_reflns_transf_matrix.33', '_exptl_crystal.size_mid_su', '_atom_site.label_component_4', '_exptl_crystal_face.index_k', '_diffrn_reflns.limit_l_min', '_atom_type.atomic_number', '_atom_sites_cartn_transform.mat_11_su', '_diffrn_orient_refln.angle_omega_su', '_diffrn_standard_refln.index_k', '_atom_site_aniso.u_22_su', '_atom_site.label', '_atom_sites_cartn_transform.vec_2_su', '_geom_bond.atom_site_label_2', '_cell.angle_beta_su', '_atom_sites_fract_transform.mat_32', '_atom_sites_fract_transform.mat_23', '_atom_sites_fract_transform.vec_2_su', '_diffrn_standards.decay_percent_su', '_atom_site_aniso.u_23', '_reflns.limit_h_min', '_atom_sites_fract_transform.mat_31', '_atom_site_aniso.b_23_su', '_atom_type_scat.cromer_mann_a4', '_diffrn_refln.angle_psi', '_atom_sites_cartn_transform.mat_31_su', '_diffrn_reflns.limit_l_max', '_atom_site.cartn_y', '_geom_hbond.atom_site_label_h', '_geom_hbond.site_symmetry_d', '_diffrn_reflns_transf_matrix.13', '_atom_sites_fract_transform.vec_3_su', '_diffrn_orient_refln.angle_phi_su', '_cell_measurement_refln.index_l', '_atom_site_aniso.b_22', '_diffrn_refln.angle_chi', '_exptl_crystal_face.diffr_phi_su', '_geom_angle.site_symmetry_1', '_refln.a_meas_su', '_atom_sites_fract_transform.mat_11', '_atom_site.fract_y_su', '_cell_measurement_refln.index_h', '_geom_contact.site_symmetry_2', '_atom_type.electron_count', '_refln.intensity_calc_su', '_atom_type_scat.cromer_mann_c', '_diffrn_orient_matrix.ub_21', '_atom_sites_fract_transform.mat_11_su', '_atom_type_scat.cromer_mann_b1', '_atom_site_aniso.u_33_su', '_atom_sites_fract_transform.mat_13', '_exptl_crystal.size_min_su', '_atom_analytical_mass_loss.temperature_su', '_atom_sites_cartn_transform.vec_3', '_geom_hbond.site_symmetry_h', '_exptl.transmission_factor_max_su', '_atom_sites_fract_transform.mat_22_su', '_atom_site.fract_y', '_reflns.limit_k_min', '_exptl_crystal.size_length_su', '_atom_analytical_mass_loss.percent_su', '_reflns.limit_h_max', '_diffrn_refln.index_k', '_diffrn_refln.intensity_bg_1_su', '_atom_sites_cartn_transform.mat_22', '_atom_type.atomic_mass', '_atom_sites_fract_transform.mat_33_su', '_atom_site.label_component_1', '_atom_type_scat.dispersion_imag_mo', '_atom_site_aniso.b_11_su', '_atom_sites_cartn_transform.vec_2', '_display_colour.blue', '_cell.angle_beta', '_diffrn_reflns_transf_matrix.11', '_atom_site.fract_z_su', '_diffrn_orient_refln.index_k', '_geom_torsion.atom_site_label_1', '_diffrn_refln.counts_net_su', '_diffrn_reflns_transf_matrix.12', '_atom_type_scat.cromer_mann_b3', '_refln.f_calc_su', '_atom_site.label_component_6', '_diffrn_refln.intensity_net_su', '_geom_hbond.site_symmetry_a', '_diffrn_reflns_transf_matrix.23', '_diffrn_orient_matrix.ub_22', '_atom_type_scat.cromer_mann_b4', '_diffrn_refln.angle_omega', '_diffrn_source.target', '_atom_sites_cartn_transform.vec_1_su', '_atom_site.cartn_x_su', '_database_related.database_id', '_exptl_crystal_appearance.hue', '_diffrn_refln.index_l', '_geom_torsion.atom_site_label_3', '_exptl_crystal_face.perp_dist_su', '_chemical.melting_point_su', '_geom_torsion.atom_site_label_2', '_atom_sites_fract_transform.mat_22', '_atom_site_aniso.b_33_su', '_atom_type_scat.hi_ang_fox_c1', '_atom_site_aniso.b_23', '_cell.length_c_su', '_cell.length_a_su', '_refln.index_l', '_cell.length_a', '_diffrn_reflns.limit_k_min', '_refine_ls.f_calc_precision_su', '_atom_type.element_symbol', '_diffrn_orient_refln.angle_kappa_su', '_exptl_crystal_face.diffr_kappa_su', '_exptl_crystal_face.index_h', '_diffrn_orient_matrix.ub_33', '_geom_contact.atom_site_label_1', '_atom_sites_cartn_transform.mat_33_su', '_diffrn_orient_matrix.ub_31', '_atom_sites_cartn_transform.mat_12', '_atom_site_aniso.u_13', '_diffrn_refln.angle_phi', '_atom_type_scat.cromer_mann_b2', '_atom_sites_fract_transform.mat_33', '_atom_sites_fract_transform.mat_31_su', '_space_group.name_schoenflies', '_diffrn_orient_matrix.ub_13', '_exptl_crystal_face.diffr_chi_su', '_reflns.limit_k_max', '_atom_sites_fract_transform.mat_23_su', '_atom_site.cartn_z_su', '_atom_sites_fract_transform.vec_1', '_exptl_crystal_face.diffr_chi', '_atom_sites_fract_transform.mat_12', '_refln.index_k', '_atom_site_aniso.u_11_su', '_reflns_scale.meas_f_squared_su', '_atom_site_aniso.b_33', '_exptl_crystal.density_diffrn_su', '_atom_site.cartn_z', '_atom_sites_cartn_transform.mat_12_su', '_display_colour.green', '_geom_torsion.site_symmetry_3', '_atom_type_scat.cromer_mann_a1', '_atom_sites_fract_transform.mat_13_su', '_reflns.limit_l_max', '_atom_sites_cartn_transform.mat_23_su', '_atom_sites_fract_transform.mat_12_su', '_chemical_formula.weight_meas_su', '_atom_sites_cartn_transform.mat_22_su', '_atom_type_scat.hi_ang_fox_c2', '_diffrn_orient_refln.angle_psi_su', '_atom_sites_cartn_transform.mat_31', '_atom_site.cartn_y_su', '_atom_sites_cartn_transform.mat_23', '_diffrn_refln.angle_kappa', '_diffrn_orient_matrix.ub_32', '_refln.b_meas_su', '_diffrn_refln.intensity_bg_2_su', '_cell_measurement_refln.theta_su', '_diffrn_reflns_transf_matrix.31', '_exptl_crystal_face.diffr_psi', '_exptl_crystal.size_rad_su', '_geom_torsion.atom_site_label_4', '_geom_angle.site_symmetry_2', '_geom_bond.valence_su', '_geom_bond.atom_site_label_1', '_geom_hbond.atom_site_label_a', '_diffrn_orient_matrix.ub_23', '_atom_site.label_component_2', '_atom_analytical.chemical_species_mass_percent_su', '_atom_sites_fract_transform.vec_3', '_diffrn_orient_matrix.ub_12', '_atom_type_scat.dispersion_imag_cu', '_atom_site.wyckoff_symbol', '_cell.length_b', '_exptl_crystal_face.diffr_phi', '_atom_site_aniso.b_11', '_atom_sites_cartn_transform.mat_21_su', '_atom_site.cartn_x', '_atom_site.label_component_3', '_diffrn_orient_refln.angle_theta', '_diffrn_orient_refln.angle_theta_su', '_diffrn_refln.angle_theta', '_diffrn_orient_refln.index_l', '_diffrn_reflns_transf_matrix.22', '_atom_sites_cartn_transform.mat_33', '_space_group.name_h-m_ref', '_atom_type_scat.cromer_mann_a2', '_geom_torsion.site_symmetry_2', '_atom_analytical.analyte_mass_percent_su', '_diffrn_standard_refln.index_l', '_atom_sites_cartn_transform.mat_21', '_atom_type_scat.dispersion_real_mo', '_reflns.limit_l_min', '_diffrn_orient_matrix.ub_11', '_atom_site_aniso.b_22_su', '_atom_type.display_colour', '_space_group_wyckoff.letter', '_atom_site.fract_x', '_refln.f_complex_su', '_reflns_scale.meas_intensity_su', '_diffrn_refln.index_h', '_reflns_scale.meas_f_su', '_refln.index_h', '_refln.phase_calc_su', '_atom_site.label_component_5', '_atom_site.label_component_0', '_diffrn_reflns_transf_matrix.32', '_diffrn_reflns.limit_h_max', '_atom_sites_cartn_transform.mat_32', '_atom_site_aniso.u_22', '_atom_site_aniso.u_23_su', '_atom_type.analytical_mass_percent_su', '_atom_site_aniso.b_12_su', '_atom_analytical.analyte', '_diffrn_orient_refln.angle_chi', '_refine_ls.goodness_of_fit_ref_su', '_geom_contact.site_symmetry_1', '_exptl_crystal_face.index_l', '_model_site.symop', '_diffrn_standards.scale_su_average_su', '_geom_contact.atom_site_label_2', '_cell.angle_alpha', '_cell_measurement_refln.index_k']
TEMPL_ATTR1.4.10 is a DDLm dictionary
Import mode All applied to following frames
[]
Loopable cats:[]
Expansion list:{}
Expansion cat/obj values: {}
Keys for categories{}
Added https://www.iucr.org/__data/iucr/cif/dictionaries/templ_attr.cif to cached dictionaries
COM_VAL1.4.8 is a DDLm dictionary
Import mode All applied to following frames
[]
Loopable cats:[]
Expansion list:{}
Expansion cat/obj values: {}
Keys for categories{}
Added https://www.iucr.org/__data/iucr/cif/dictionaries/templ_enum.cif to cached dictionaries
Loopable cats:['atom_site_aniso', 'diffrn_radiation_wavelength', 'exptl_crystal_face', 'atom_analytical', 'atom_analytical_source', 'citation_author', 'publ_manuscript_incl_extra', 'geom_torsion', 'geom_hbond', 'geom_angle', 'diffrn_refln', 'citation_editor', 'reflns_shell', 'diffrn_orient_refln', 'diffrn_standard_refln', 'publ_contact_author', 'geom_bond', 'space_group_wyckoff', 'diffrn_reflns_class', 'valence_param', 'diffrn_scale_group', 'publ_body', 'diffrn_attenuator', 'reflns_scale', 'atom_type_scat', 'chemical_conn_bond', 'atom_analytical_mass_loss', 'journal_index', 'space_group_symop', 'publ_author', 'database_related', 'audit_author_role', 'geom_contact', 'refine_ls_class', 'valence_ref', 'atom_site', 'audit_contact_author', 'audit_author', 'display_colour', 'atom_type', 'space_group_generator', 'chemical_conn_atom', 'cell_measurement_refln', 'audit_link', 'audit_support', 'audit_conform', 'reflns_class', 'citation', 'refln', 'model_site']
Expansion list:{'atom_analytical': ['atom_analytical_source', 'atom_analytical_mass_loss'], 'atom_site': ['atom_site_aniso'], 'atom_type': ['atom_type_scat']}
Expansion cat/obj values: {('atom_analytical', 'technique'): ['_atom_analytical_source.technique'], ('atom_analytical', 'equipment_make'): ['_atom_analytical_source.equipment_make'], ('atom_analytical', 'special_details'): ['_atom_analytical_mass_loss.special_details'], ('atom_analytical', 'analysis_temperature_su'): ['_atom_analytical_mass_loss.temperature_su'], ('atom_analytical', 'percent_su'): ['_atom_analytical_mass_loss.percent_su'], ('atom_analytical', 'meas_id'): ['_atom_analytical_mass_loss.meas_id'], ('atom_analytical', 'temperature'): ['_atom_analytical_mass_loss.temperature'], ('atom_analytical', 'percent'): ['_atom_analytical_mass_loss.percent'], ('atom_site', 'u_13_su'): ['_atom_site_aniso.U_13_su'], ('atom_site', 'label'): ['_atom_site_aniso.label'], ('atom_site', 'matrix_u'): ['_atom_site_aniso.matrix_U'], ('atom_site', 'u_11'): ['_atom_site_aniso.U_11'], ('atom_site', 'b_13'): ['_atom_site_aniso.B_13'], ('atom_site', 'type_symbol'): ['_atom_site_aniso.type_symbol'], ('atom_site', 'u_33'): ['_atom_site_aniso.U_33'], ('atom_site', 'u_12_su'): ['_atom_site_aniso.U_12_su'], ('atom_site', 'b_13_su'): ['_atom_site_aniso.B_13_su'], ('atom_site', 'b_12'): ['_atom_site_aniso.B_12'], ('atom_site', 'u_12'): ['_atom_site_aniso.U_12'], ('atom_site', 'ratio'): ['_atom_site_aniso.ratio'], ('atom_site', 'u_22_su'): ['_atom_site_aniso.U_22_su'], ('atom_site', 'u_23'): ['_atom_site_aniso.U_23'], ('atom_site', 'b_23_su'): ['_atom_site_aniso.B_23_su'], ('atom_site', 'matrix_b_su'): ['_atom_site_aniso.matrix_B_su'], ('atom_site', 'b_22'): ['_atom_site_aniso.B_22'], ('atom_site', 'u_33_su'): ['_atom_site_aniso.U_33_su'], ('atom_site', 'b_11_su'): ['_atom_site_aniso.B_11_su'], ('atom_site', 'b_33_su'): ['_atom_site_aniso.B_33_su'], ('atom_site', 'b_23'): ['_atom_site_aniso.B_23'], ('atom_site', 'u_13'): ['_atom_site_aniso.U_13'], ('atom_site', 'u_11_su'): ['_atom_site_aniso.U_11_su'], ('atom_site', 'b_33'): ['_atom_site_aniso.B_33'], ('atom_site', 'matrix_b'): ['_atom_site_aniso.matrix_B'], ('atom_site', 'b_11'): ['_atom_site_aniso.B_11'], ('atom_site', 'b_22_su'): ['_atom_site_aniso.B_22_su'], ('atom_site', 'matrix_u_su'): ['_atom_site_aniso.matrix_U_su'], ('atom_site', 'u_22'): ['_atom_site_aniso.U_22'], ('atom_site', 'u_23_su'): ['_atom_site_aniso.U_23_su'], ('atom_site', 'b_12_su'): ['_atom_site_aniso.B_12_su'], ('atom_type', 'hi_ang_fox_c3'): ['_atom_type_scat.hi_ang_Fox_c3'], ('atom_type', 'cromer_mann_a3'): ['_atom_type_scat.Cromer_Mann_a3'], ('atom_type', 'dispersion_imag'): ['_atom_type_scat.dispersion_imag'], ('atom_type', 'dispersion_real_cu'): ['_atom_type_scat.dispersion_real_Cu'], ('atom_type', 'symbol'): ['_atom_type_scat.symbol'], ('atom_type', 'hi_ang_fox_c0'): ['_atom_type_scat.hi_ang_Fox_c0'], ('atom_type', 'source'): ['_atom_type_scat.source'], ('atom_type', 'cromer_mann_a4'): ['_atom_type_scat.Cromer_Mann_a4'], ('atom_type', 'cromer_mann_c'): ['_atom_type_scat.Cromer_Mann_c'], ('atom_type', 'cromer_mann_b1'): ['_atom_type_scat.Cromer_Mann_b1'], ('atom_type', 'dispersion_imag_mo'): ['_atom_type_scat.dispersion_imag_Mo'], ('atom_type', 'cromer_mann_b3'): ['_atom_type_scat.Cromer_Mann_b3'], ('atom_type', 'cromer_mann_b4'): ['_atom_type_scat.Cromer_Mann_b4'], ('atom_type', 'versus_stol_list'): ['_atom_type_scat.versus_stol_list'], ('atom_type', 'hi_ang_fox_c1'): ['_atom_type_scat.hi_ang_Fox_c1'], ('atom_type', 'hi_ang_fox_coeffs'): ['_atom_type_scat.hi_ang_Fox_coeffs'], ('atom_type', 'cromer_mann_b2'): ['_atom_type_scat.Cromer_Mann_b2'], ('atom_type', 'dispersion_source'): ['_atom_type_scat.dispersion_source'], ('atom_type', 'length_neutron'): ['_atom_type_scat.length_neutron'], ('atom_type', 'cromer_mann_a1'): ['_atom_type_scat.Cromer_Mann_a1'], ('atom_type', 'dispersion_real'): ['_atom_type_scat.dispersion_real'], ('atom_type', 'hi_ang_fox_c2'): ['_atom_type_scat.hi_ang_Fox_c2'], ('atom_type', 'dispersion_imag_cu'): ['_atom_type_scat.dispersion_imag_Cu'], ('atom_type', 'dispersion'): ['_atom_type_scat.dispersion'], ('atom_type', 'cromer_mann_a2'): ['_atom_type_scat.Cromer_Mann_a2'], ('atom_type', 'cromer_mann_coeffs'): ['_atom_type_scat.Cromer_Mann_coeffs'], ('atom_type', 'dispersion_real_mo'): ['_atom_type_scat.dispersion_real_Mo']}
Keys for categories{'atom_site_aniso': [['_atom_site_aniso.label']], 'diffrn_radiation_wavelength': [['_diffrn_radiation_wavelength.id']], 'exptl_crystal_face': [['_exptl_crystal_face.index_h', '_exptl_crystal_face.index_k', '_exptl_crystal_face.index_l']], 'atom_analytical': [['_atom_analytical.id'], ['_atom_analytical_source.id'], ['_atom_analytical_mass_loss.id']], 'atom_analytical_source': [['_atom_analytical_source.id']], 'citation_author': [['_citation_author.citation_id', '_citation_author.ordinal']], 'publ_manuscript_incl_extra': [['_publ_manuscript_incl_extra.item']], 'geom_torsion': [['_geom_torsion.atom_site_label_1', '_geom_torsion.atom_site_label_2', '_geom_torsion.atom_site_label_3', '_geom_torsion.atom_site_label_4', '_geom_torsion.site_symmetry_1', '_geom_torsion.site_symmetry_2', '_geom_torsion.site_symmetry_3', '_geom_torsion.site_symmetry_4']], 'geom_hbond': [['_geom_hbond.atom_site_label_D', '_geom_hbond.atom_site_label_H', '_geom_hbond.atom_site_label_A', '_geom_hbond.site_symmetry_D', '_geom_hbond.site_symmetry_H', '_geom_hbond.site_symmetry_A']], 'geom_angle': [['_geom_angle.atom_site_label_1', '_geom_angle.atom_site_label_2', '_geom_angle.atom_site_label_3', '_geom_angle.site_symmetry_1', '_geom_angle.site_symmetry_2', '_geom_angle.site_symmetry_3']], 'diffrn_refln': [['_diffrn_refln.hkl']], 'citation_editor': [['_citation_editor.citation_id', '_citation_editor.ordinal']], 'reflns_shell': [['_reflns_shell.d_res_low', '_reflns_shell.d_res_high']], 'diffrn_orient_refln': [['_diffrn_orient_refln.index_h', '_diffrn_orient_refln.index_k', '_diffrn_orient_refln.index_l']], 'diffrn_standard_refln': [['_diffrn_standard_refln.index_h', '_diffrn_standard_refln.index_k', '_diffrn_standard_refln.index_l']], 'publ_contact_author': [['_publ_contact_author.id']], 'geom_bond': [['_geom_bond.atom_site_label_1', '_geom_bond.atom_site_label_2', '_geom_bond.site_symmetry_1', '_geom_bond.site_symmetry_2']], 'space_group_wyckoff': [['_space_group_Wyckoff.id']], 'diffrn_reflns_class': [['_diffrn_reflns_class.code']], 'valence_param': [['_valence_param.id']], 'diffrn_scale_group': [['_diffrn_scale_group.code']], 'publ_body': [['_publ_body.label']], 'diffrn_attenuator': [['_diffrn_attenuator.code']], 'reflns_scale': [['_reflns_scale.group_code']], 'atom_type_scat': [['_atom_type_scat.symbol']], 'chemical_conn_bond': [['_chemical_conn_bond.atom_1', '_chemical_conn_bond.atom_2']], 'atom_analytical_mass_loss': [['_atom_analytical_mass_loss.id']], 'journal_index': [['_journal_index.id']], 'space_group_symop': [['_space_group_symop.id']], 'publ_author': [['_publ_author.id']], 'database_related': [['_database_related.id']], 'audit_author_role': [['_audit_author_role.id', '_audit_author_role.role']], 'geom_contact': [['_geom_contact.atom_site_label_1', '_geom_contact.atom_site_label_2', '_geom_contact.site_symmetry_1', '_geom_contact.site_symmetry_2']], 'refine_ls_class': [['_refine_ls_class.code']], 'valence_ref': [['_valence_ref.id']], 'atom_site': [['_atom_site.label'], ['_atom_site_aniso.label']], 'audit_contact_author': [['_audit_contact_author.id']], 'audit_author': [['_audit_author.id']], 'display_colour': [['_display_colour.hue']], 'atom_type': [['_atom_type.symbol'], ['_atom_type_scat.symbol']], 'space_group_generator': [['_space_group_generator.key']], 'chemical_conn_atom': [['_chemical_conn_atom.number']], 'cell_measurement_refln': [['_cell_measurement_refln.index_h', '_cell_measurement_refln.index_k', '_cell_measurement_refln.index_l']], 'audit_link': [['_audit_link.block_code']], 'audit_support': [['_audit_support.id']], 'audit_conform': [['_audit_conform.dict_name']], 'reflns_class': [['_reflns_class.code']], 'citation': [['_citation.id']], 'refln': [['_refln.index_h', '_refln.index_k', '_refln.index_l']], 'model_site': [['_model_site.label', '_model_site.symop']]}
Traceback (most recent call last):
  File "/Users/n00002621/.local/share/virtualenvs/scidata_cif2-FqeurwlC/bin/dic2owl", line 8, in <module>
    sys.exit(main())
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/cli.py", line 87, in main
    dic2owl_run(dicfile=args.dicfile, ttlfile=args.ttlfile)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 272, in main
    onto = gen.generate()
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 117, in generate
    self._add_item(item)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 128, in _add_item
    self._add_data_value(item)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 184, in _add_data_value
    self._add_item(parent)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 126, in _add_item
    self._add_category(item)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 161, in _add_category
    self._add_category(parent_item)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 161, in _add_category
    self._add_category(parent_item)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 155, in _add_category
    self._add_top(item)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 141, in _add_top
    self._add_annotations(top, item)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 216, in _add_annotations
    raise MissingAnnotationError(annotation_name)
dic2owl.dic2owl.MissingAnnotationError: _dictionary.doi

Update repository name to be CIF-specific

Change the repository name/move the content to a different repository that's CIF-specific concerning the new ontology.

The EMMO-Crystallography domain ontology should then import this ontology and use it as a Language.

Publish current CIF ontology

Publish the current edition of the CIF ontology using the newly developed schema for CIF ontologization and the CIF to Turtle parser software tool developed based on the schema.

Repository hierarchy/layout

I suggest to rearrange the repository into two namespaces, one pertaining to the Python script (making it into a pip-installable package with a CLI, e.g. named dic2owl or similar) and the other is the collected ontology.
Each of the namespaces will be versioned separately and only if the ontology-namespace ups its version will the specific release workflow for the ontology trigger.
The repository as a whole can be versioned per calendar versioning (CalVer) as standard semantic versioning (SemVer) doesn't really make sense.
The namespaces will still be versioned by SemVer.

Furthermore, the ontology namespace package will be sub-divided in terms of versioning for each of the separate ttl-files, i.e., each of the ontology parts: top, core, and the future ontologies based on CIF dictionaries.

An example of a graphical representation is here:

.  # Repository root - automatic versioning using CalVer.
+-- README.md
+-- dic2owl  # Python generation package/CLI. Versioned separately using SemVer.
|   +-- setup.py
|   +-- dic2owl
|   |   +-- __init__.py
|   |   +-- generate_cif.py
|   |   +-- ...
+-- ontology  # CIF ontology (total). Automatic versioning based on the content's versions.
|   +-- cif.ttl  # CIF Core dictionary. Versioned separately using SemVer.
|   +-- cif_top.ttl  # Top CIF ontology. Versioned separately using SemVer.
|   +-- ...

The result will be that the only versioning we will have to do manually are what makes sense: It's for the Python package/CLI when new changes are made, and it's for each of the ontology-files when a change is made in them. This should also naturally drive PRs to try and only have changes for a specific ontology and not for multiple - something that makes reviewing and handling easier, I believe.

dic2owl error on building cif_core.dic

When using dic2owl to create cif_core.tll I get the following error:

Traceback (most recent call last):
  File "/Users/n00002621/Library/Python/3.11/bin/dic2owl", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/cli.py", line 95, in main
    dic2owl_run(
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 274, in main
    onto = gen.generate()
           ^^^^^^^^^^^^^^
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 118, in generate
    self._add_item(item)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 129, in _add_item
    self._add_data_value(item)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 185, in _add_data_value
    self._add_item(parent)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 127, in _add_item
    self._add_category(item)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 162, in _add_category
    self._add_category(parent_item)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 162, in _add_category
    self._add_category(parent_item)
  File "/Users/n00002621/PycharmProjects/CIF-ontology/dic2owl/dic2owl/dic2owl.py", line 165, in _add_category
    cls = types.new_class(name, (self.onto[parent_name],))
                                 ~~~~~~~~~^^^^^^^^^^^^^
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ontopy/ontology.py", line 249, in __getitem__
    item = self.get_by_label(name)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ontopy/ontology.py", line 422, in get_by_label
    raise NoSuchLabelError(f"No label annotations matches '{label}'")
ontopy.utils.NoSuchLabelError: No label annotations matches 'CIF_CORE'

Any ideas how to fix?
I am using the most recent version of the cif_core.dic (3.3.0 5/2/24) and the latest version of dic2owl (commits up through Commits on Nov 9, 2023)

Stuart

Handle non-ASCII Unicode symbols in CIF dictionary files

The CIF2 format permits files to contain non-ASCII Unicode symbol. This is a somewhat novel development as earlier versions of CIF format and the related STAR format were restricted to the ASCII character set.

The CIF_CORE DDLm dictionary has been recently updated to use proper Unicode characters for Greek symbol instead of the LaTeX-like markup (e.g. α instead of \a). However, this seems to now trip up the STAR parser which is used by the dic2owl:

> dic2owl cif_core.dic

Fail value check, match only 0-208 in string '\n    The reciprocal space matrix for converting the U(ij) matrix of\n    atomic displacement parameters to a dimensionless beta(IJ) matrix.\n    The ADP factor in a structure factor expression:\n\n    t = exp - 2π^2^ ( U11 h h a* a* + ...... 2 U23 k l b* c* )\n    t = exp - 0.25  ( B11 h h a* a* + ...... 2 B23 k l b* c* )\n      = exp -       ( β11 h h + ............ 2 β23 k l )\n\n    The conversion of the U or B matrices to the β matrix\n\n        β =   C U C   =    C B C /8π^2^\n\n    where C is conversion matrix defined here.'
Traceback (most recent call last):
...
CifFile.StarFile.StarError: 
Star Format error: Data item "'\n    The reciprocal space matrix for converting the U(ij) matrix of\n    atomic displacement parameters to a dimensionless beta(IJ) matrix.\n    The ADP factor in a structure factor expression:\n\n    t = exp - 2π^2^ ( U11 h h a* a* + ...... 2 U23 k l b* c* )\n    t = exp - 0.25  ( B11 h h a* a* + ...... 2 B23 k l b* c* )\n      = exp -       ( β11 h h + ............ 2 β23 k l )\n\n    The conversion of the U or B matrices to the β matrix\n\n        β =   C U C   =    C B C /8π^2^\n\n    where C is conversion matrix defined here.'"... contains forbidden characters


Repo layout and ontology names

We have a set of redirection rules for http://emmc.info/ described in https://github.com/emmo-repo/EMMO/blob/master/doc/EMMO_governance.md#releases-and-versioning

Even though there is a need to update these redirections (see emmo-repo/EMMO#151) we should aim at utilising them in the base IRI and owl:versionIRI of the ontologies. That mean that we have to follow the general rules for repo layout and ontology naming agreed on by EMMO, including the name of master...

I suggest we bring in our needs in the discussion of emmo-repo/EMMO#151.

Extend `dic2owl` README

Extend the dic2owl README with:

  • Installation instructions
  • Conversion schema documentation/explanation (#18)
  • CLI usage instructions

Move to 'flit' build system

Use the flit build system over setuptools due to increased simplicity and less management files, since most of the root dic2owl files can be removed and the information moved into the existing pyproject.toml file.

Make unit tests self-contained

Since the dic2owl tool downloads various dictionaries it needs for the ontology generation the testing is not self-contained.

To avoid relying on external online resources for testing, either the needed files should be added to the tests/dic2owl/static folder and provided when Python tries to download the various files - or the COMCIFS/cif_core repository should be added as a submodule, since this is where the files are from.

For the latter solution we would potentially always be testing against the latest files (as long as the submodule is updated). Furthermore, the files would not become an actual part of the repository as long as the submodule is not initiated, which it would only need to be for testing purposes.

Ontologizing CIF data

In a discussion between myself, @jesper-friis and @emanueleghedini, we tried to tackle the hurdles for getting development on this ontology started in a practical way. In other words, we tried setting up a basic taxonomy and parthood graph for CIF data.
By CIF data we mean the semantics of the actual data (the values) not the semantics of the associated keywords. However, since the values are represented by their keywords/data names, these have been used in the mock up.

The resulting graph is shown below.
CIF EMMO

The graph has essentially two important parts. One pertains to the hierarchy of the data, the other relates to the semantics of the data types.
For the hierarchy, we see that CIF_DATA has a part loop_. This is not to be taken as the syntactical loop_, but rather the concept of the CIF data expressed as loops.
A loop has a part ROW, which is our attempt to define a collection of a single row of data within a loop. Note that we do not care here whether the CIF file syntactically defines a ROW using key+value lines or as part of a syntactic loop_. ROW encompasses both as the same concept semantically.
Now we come to how one may practically extend the ontology. Here we have added the concept of _space_group_symop_[], which isA ROW and hasParts _space_group_symop_id, _space_group_symop_operation_xyz, and _space_group_symop_sg_id. There is a restriction of how many times _space_group_symop_[] can have each of these parts (max 1).
Now, all of these are SPACE_GROUP_SYMOP, i.e., they are of the CIF category SPACE_GROUP_SYMOP.

For the data types, you can see that we have given each of the CIF data names types according to the type definitions of the CIF dictionary. In CIFv1 (which we are currently only concerned with) there are only three types: null, char, and numb (REF).
Each of the three data types has been defined and also related to the general xsd types via the types defined in EMMO.
This creates a data type relationship for all CIF data to that of EMMO.

Now if one wants to extend this, you would simply add another null type/category_overview CIF data as a sub-class of both ROW, cif:null, and its associated category, and afterwards add all its containing data keys/names as parts of it, sub-classing both the category (again) and the related type.

Finally, this can be automated by going through the actual .dic file, which defines all the relevant metadata for each data key/name (link to coreCIF .dic-file).


This is not meant to be the absolute way of ontologizing CIF data, but rather, it is our currently suggested way of doing it. This issue is meant to be a discussion of its validity and one can suggest or ask questions freely.

As an added bonus, I have created a branch where one can see the implementation of the graph above into a Turtle file in the current repository (cif-data). If you checkout this branch and open the Turtle file cif-data.ttl in Protégé, you should see the suggested implementation, which could act as a template for adding more CIF data keys/names (the added concepts are marked in bold).

Add support for the new `Word` DDLm content type

A new Word content type (see the _type.contents attribute [1]) was recently introduced to the DDLm dictionary [2]. Due to this, dic2owl fails when processing the latest version of cif_core.cif:

  dic2owl cif_core.dic

Error:

<...>
  File "<virtualenv-dir>/cif-emmo/lib/python3.8/site-packages/ontopy/ontology.py", line 243, in get_by_label
    raise NoSuchLabelError('No label annotations matches %s' % label)
ontopy.ontology.NoSuchLabelError: No label annotations matches Word

Replacing the Word content type with Text in the downloaded dic and cif files temporarily resolves the issue, but it would be nice to have things working out of the box.

[1] https://github.com/COMCIFS/cif_core/blob/55c1600e728dd5b5cf274985ef9a40e2483b8993/ddl.dic#L1659
[2] https://github.com/COMCIFS/cif_core/blob/55c1600e728dd5b5cf274985ef9a40e2483b8993/ddl.dic#L2585

Consider using SINTEF/ci-cd

SINTEF/ci-cd is a repo one can use for callable CI/CD workflows in GH Actions. It removes the need to keep the current workflows up-to-date and rely on the external repository to be updated instead.

Update DDL ontology

The DDL ontology was originally written when the ddl.dic file was a v4.0.1. The ddl.dic file has been updated to v4.2.0.
Indeed, the latest version introduced a new content type (Word), which has been implemented in #76 - however, it may be added wrongly (semantically speaking).

There are further enhancements that could be done to upgrade the DDL ontology - several concepts are missing annotations present in the dic-file, while other concepts include all annotations.
The version could be lifted and given to the ontology file as well.

Further improvements can be thought up, and one is welcome to comment in this issue to list these improvements.

Readme in Master Branch out of data or files missing

Dear @CasperWA it seems the Readme.md file refers to a gif.ttl files which is not in the repository.
under ontology folder, there are the XML catalog, cif-core.ttl and cid-ddl.ttl however there is no description of each in the Readme file. Moreover, while the Readme mentions that loading cigs.ttl should using the modules load the proper EMMO version, there is no mention to this in the catalog file.

I wonder if Master is the right branch to test or not? if yes, we need to update the Readme and fix potentially the catalog so it loads the proper EMMO modules.

thanks and best wishes,

image

Upgrade to min. Python 3.8

Python 3.7 is nearing its end-of-support.
There are also NumPy vulnerabilities that are only fixed in versions supporting minimum Python 3.8.

pip install error due to Owlready2/EMMO-Python discrepancy

The latest pip dependency resolver complains when installing dic2owl due to EMMO-Python depending on Owlready2==0.29 and dic2owl now depends on Owlready2~=0.30.

Since EMMO-Python depends on Owlready2, and we don't directly use it in dic2owl, I would consider removing the Owlready2 dependency in this package and simply get it through EMMO-Python?

Fix security issue with `urllib.urlopen`

According to the CI run for bandit we should audit the input for urllib.urlopen (or more specifically urllib.request.urlretrieve).

Output from bandit run:

>> Issue: [B310:blacklist] Audit url open for permitted schemes. Allowing use of file:/ or custom schemes is often unexpected.
   Severity: Medium   Confidence: High
   Location: dic2owl/dic2owl/generate_cif.py:197
   More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b310-urllib-urlopen
196	            print("downloading", dic)
197	            urllib.request.urlretrieve(baseurl + dic, dic)
198	

Clarify licensing of CIF dictionaries

(Capturing this here to make sure it gets dealt with)

CIF dictionaries have no formal licence, but instead are covered by an IUCr policy meant to reassure users that the IUCr will not come after them. Note that this policy was published in 2000, a year before the Creative Commons organisation was founded. Is this group comfortable with this policy?

`dic2owl` debugging "features"

I think we either should keep these for convenient debugging or remove everything from line 262 and down.

Originally posted by @jesper-friis in #52 (comment)

This comment and its answers reflects the possible need to have simple debugging features of dic2owl and how this may be achieved.

To me there are two things here: Unit testing and debugging.
I want to add some unit tests that check the generation is executed as would be expected. At the same time, it might be nice to return some central variable values during development.

Develop a `pre-commit` hook to format ontology files

Right. Due to the weird way different Protégé versions stores files?
Could we maybe have a pre-commit hook that always re-reads and writes any changed ontology files in a specific way? In this way it should be fine no matter the Protégé version, since the change will happen upon git commit.

Originally posted by @CasperWA in #22 (comment)

[FIXED] Python script doesn't work with Windows (due to PyCifRW)

When testing this on my Windows 10 environment (which is easier for me due to the use of Protégé in this environment as well), the generate_cif.py script didn't work.

The main culprit was the PyCifRW repository, which doesn't seem to handle Windows paths properly when finding templ_attr.cif and similar.
There doesn't seem to be anything we can do on this end, but we could contribute to PyCifRW to fix this issue - although I expect Windows support will not be something to be pursued or prioritized.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.