Comments (7)
Hi Caleb,
Given the old and new versions, i could program a manual diff to see what's changed, I'm gonna start with this and let you know what I found.
from chemicals.
for a preliminar parsing:
there are more synonyms, compared to the old database:
Old
julia> CC.load_db!(:inorganic_old2)
[ Info: :inorganic_old2 arrow file not generated, processing...
syms_i = 6326 #amount of synonyms
syms_unique = 6325 # unique elements (there is one element repeated that i have yet find)
(Arrow.Table with 153 rows, 9 columns, and schema:
.....
New
julia> CC.load_db!(:inorganic_new)
[ Info: :inorganic_new database file not found, downloading from https://github.com/CalebBell/chemicals/files/6912649/Inorganic.db.csv
[ Info: :inorganic_new database file downloaded.
[ Info: :inorganic_new arrow file not generated, processing...
syms_i = 9461
syms_unique = 9438
(Arrow.Table with 164 rows, 9 columns, and schema:
comparing the differences, by InChI:
InChI contained in the old database, not present in the new database
"InChI=1S/CH2.Co/h1H2;/q-1;+1"
"InChI=1S/Cr.2H2Si/h;2*1H2"
"InChI=1S/H4Si/h1H4"
"InChI=1S/Na.H3O4P/c;1-5(2,3)4/h;(H3,1,2,3,4… "InChI=1S/F6Si.2H3N/c1-7(2,3,4,5)6;;/h;2*1H3… "InChI=1S/Bi.2ClH.2H/h;2*1H;;/q+2;;;;/p-2"
"InChI=1S/Al.Na.2O.2H/q-1;+1;;;;"
"InChI=1S/BrHO3.Cs/c2-1(3)4;/h(H,2,3,4);/q;+… "InChI=1S/2Na.H3O4P/c;;1-5(2,3)4/h;;(H3,1,2,… "InChI=1S/2BH2.Ti/h2*1H2;"
"InChI=1S/F6Si.2Na/c1-7(2,3,4,5)6;;/q-2;2*+1" ""
"InChI=1S/2Na.3H2O4S/c;;3*1-5(2,3)4/h;;3*(H2…
InChI contained in the new database, not present in the old database
"InChI=1S/Cl2S2/c1-3-4-2"
"InChI=1S/O.Pr"
"InChI=1S/Bi.2ClH/h;2*1H/q+2;;/p-2"
"InChI=1S/Cr.2Si"
"InChI=1S/C32H16N8.Cu/c1-2-10-18-17(9-1)25-33-26(18)38-28-21-13-5-6-14-22(21)30(35-28)40-32-24-… "InChI=1S/Al.Na.2O/q-1;+1;;"
"InChI=1S/Al.La.O"
"InChI=1S/2B.Ti"
"InChI=1S/Na.H3O4P/c;1-5(2,3)4/h;(H3,1,2,3,4)"
"InChI=1S/C.Co/q-1;+1"
"InChI=1S/3O.2Yb/q3*-2;2*+3"
"InChI=1S/2HI.Sm/h2*1H;/q;;+2/p-2"
"InChI=1S/3ClH.Ru/h3*1H;/q;;;+3/p-3"
"InChI=1S/H2O/h1H2"
"InChI=1S/2B.Zr"
"InChI=1S/10CO.2Re/c10*1-2;;"
"InChI=1S/H3NO.H2O4S/c1-2;1-5(2,3)4/h2H,1H2;(H2,1,2,3,4)"
"InChI=1S/Li.H"
"InChI=1S/Na.H2O4S/c;1-5(2,3)4/h;(H2,1,2,3,4)"
"InChI=1S/C.2W/q+1;;-1"
"InChI=1S/6Al.2O2Si.9O/c;;;;;;2*1-3-2;;;;;;;;;"
"InChI=1S/B.Li.O"
"InChI=1S/Cd.2FH/h;2*1H/q+2;;/p-2"
from chemicals.
doing the same thing with the formulas:
julia> setdiff(set_new,set_old)
Set{String} with 21 elements:
"Cl3Ru"
"O3Yb2"
"H2O" #water is in new the inorganics database
"AlLaO"
"I2Sm"
"B2Zr"
"H3NaO4P"
"HLi"
"Al6O13Si2"
"Cl2S2"
"As2H12O3"
"CW2"
"C32H16CuN8"
"OPr"
"ClH2Tl"
"H5NO5S"
"C10O10Re2"
"BLiO"
"H2NaO4S"
"BrH2Tl"
"CdF2"
julia> setdiff(set_old,set_new)
Set{String} with 11 elements:
"HNa2O4P"
"ClTl"
"H4Si"
"H4Na2O12S3"
"As2O3"
"BrCsO3"
"BrTl"
"H2NaO4P"
"F6H8N2Si"
"F6Na2Si"
"D2Se"
from chemicals.
Hi Andrés,
Like all software not maintained, bits and pieces of the chemicals-metadata repository have rotted away. I cannot get the inchi module in rdkit to work for me, and I am having issues building rdkit.
Thanks for letting me know about the issue. I'm afraid we may have to manually patch the file for now.
Sincerely,
Caleb
from chemicals.
Hi Andrés,
I found a version of rdkit which works on linux - and it's on pypi! One step closer to being able to update the database again. I think I actually need to port chemical-metadata to Python 3 as well.
Sincerely,
Caleb
from chemicals.
what do you think of adding ;
as an aditional separator? the main problem would checking if other names actually have ;
as part of their name.
maybe adding:
line = line.replace(';','\t')
before this line
chemicals/chemicals/identifiers.py
Line 370 in c5b1014
could solve the problem temporally?
Also, i noticed (by a quick view, nothing exhaustive) that those synonyms separated by ';' are always at the end of the list.
Edit: the split ;
must always be done after parsing the InChI
from chemicals.
Hi Andrés,
I have fixed the chemical-metadata repository a lot, and generated a new inorganic file without this particular issue. I attached it.
What is hard to do is that the online data has changed so much, I can't even use a diff program to see what changed. Because of that, it's hard to replace the current file with the new one. Do you want to look at it?
Sincerely,
Caleb
from chemicals.
Related Issues (20)
- identifiers.py HOT 1
- Heat capacity functions by mol HOT 3
- DIPPR 801 2019? HOT 1
- conda-forge Package Missing HOT 5
- Update GWPs for IPCC Report version 5 HOT 2
- Pandas' new high-precision csv float parser is sometimes quite low precision HOT 1
- Problem with Antoine's coefficients for Toluene HOT 1
- The vectorized submodule is not fully supported throughout the package HOT 6
- numba_vectorized reports TypeError HOT 3
- Incorrect calculation of density when using saturation temperature as input HOT 2
- Get chemical name from chemical notation HOT 3
- GWP IPCC 5th edition (2018)
- Nitrogen Lennard-Jones values missing HOT 2
- Searching for CO returns methanol. HOT 2
- Vs in COSTALD_compressed should be Vsat?
- Many chemicals input are found but no data given
- Incorrect formula for dichlorosilane and trichlorosilane
- `TypeError` when operating in no-JIT mode HOT 2
- Steam density calculations change significantly with new release HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chemicals.