calebbell / chemicals Goto Github PK
View Code? Open in Web Editor NEWchemicals: Chemical database of Chemical Engineering Design Library (ChEDL)
License: MIT License
chemicals: Chemical database of Chemical Engineering Design Library (ChEDL)
License: MIT License
Topics of discussion:
Steam density calculations have changed in some cases by more than 7% between versions 1.1.1 and 1.1.4. Interestingly, version 1.1.1 matches the steam densities reported in coolprop. Is this behavior to be expected?
Attached is an excel table showing density calculations at 5 different pressures and 3 different temperatures between the two versions of chemical and coolprop.
I am very impressed with this package (and thermo and fluids). Thank you, Caleb, for all of your hard work.
The IAPWS functions return an erroneous density value when calculatiog with the saturation temperature.
import numpy as np
from chemicals import iapws95_Psat, iapws95_Tsat, iapws95_rho
pressure = 558497.3519367648
tsat = iapws95_Tsat(pressure)
iapws95_rho(tsat, pressure) # density wrong
iapws95_rho(np.floor(tsat), np.floor(pressure)) # rounding down leads to proper density
I guess this error is invoked due to some boundary issues with IAPWS. If it is too much of an effort to make it work at the saturation temperature exactly, an error message with a hint would be welcome.
Thanks for the great package anyway!
In the COSTALD_compressed
method, and perhaps others, there is an argument Vs
for saturation volume. It was my impression that the suffix s
on an argument name meant plural as in the properties of each component in the mixture. I am trying to write a wrapper for these functions, and have been relying on this implied API (any arg ending with s
triggers a look-up of properties from multiple phases). I think this Vs
should be Vsat
. Also, I hope that am correct about you reserving the s
suffix for mixture properties because this is quite handy. If this is just a loose convention, might I suggest or request that you consider making it a hard-coded fact?
Describe the bug
The C
value for Toluene is way off, perhaps it was not correctly adjusted to Pa/K units from mmHg/C?
Minimal Reproducible Example
>>> from chemicals import Antoine
>>> from chemicals import CAS_from_any
>>> from chemicals.vapor_pressure import Psat_data_AntoinePoling
>>> Psat_data_AntoinePoling.loc[CAS_from_any('Toluene')]
'Toluene': Chemical toluene
A 9.05043
B 1327.62
C -255.525
Tmin 286.44
Tmax 409.61
Additional context
Benzene on the other hand gives:
'Benzene': Chemical benzene
A 8.98523
B 1184.24
C -55.578
Tmin 279.64
Tmax 377.06
The C
value is significantly different. Also, xylene and cumene give C
values in the 50's, so 255 for toluene is an outlier.
What goes in this module?
Here is my local variant of these files. I understand it's pretty important module for biosteam, so I want to make sure this file does what you want. I am quite happy if you want to take the lead on this file! You are doing great work. I am sorry I cannot always get back to you quickly.
I am opening an issue to track the load speed of chemicals
. I had already forgotten from last weekend how I was measuring load speed, so documenting it seems like a good idea.
I put the following code in a file called load_one_library.py
import cProfile
import os
import numpy as np
from scipy import special
from scipy import interpolate
from scipy import optimize
import pandas as pd
import sys
import json
import io
import datetime
from time import time
import fluids.constants
import fluids.numerics
import fluids
import ht
original_modules = set(sys.modules.keys())
pr = cProfile.Profile()
t0 = time()
pr.enable()
import chemicals
pr.disable()
print('Elapsed time: %f seconds' %(time() - t0))
pr.dump_stats('load_one_library.out')
after_modules = set(sys.modules.keys())
print('Loaded libraries')
print(after_modules.difference(original_modules))
Then I run that script with
python3 -OO load_one_library.py
You have to run it a second time after the first time to ensure all the python bytecode is up to date.
Then I look at where the time is spent with
python3 -m snakeviz load_one_library.out
Then I find the elements.py file, currently the longest to load.
Let's leave this issue open indefinitely for now and I'll update it with timings periodically - maybe get some development docs going and move this there at some point.
One side note - the -OO flag optimizes the compiled byte code so docstrings, asserts, and a few other things are not loaded. This is the meaningful number I am targeting. I refuse to be interested in increasing load speed by having less documentation.
This is typically used when building an actual application out of libraries, or on a server when processes are starting up and shutting down often. Because of this, it is important to remember that assert
statements should not be used for control flow; they should be development-only checks.
The rest of the script above outputs something like this:
Elapsed time: 0.005867 seconds
Loaded libraries
{'chemicals.solubility', 'chemicals.acentric', 'chemicals.dippr', 'chemicals.elements', 'chemicals.miscdata', 'chemicals', 'chemicals.dipole', 'chemicals.temperature', 'chemicals.critical', 'chemicals.utils', 'chemicals.refractivity', 'chemicals.exceptions', 'chemicals.vapor_pressure', 'chemicals.data_reader', 'chemicals.environment', 'chemicals.virial', 'chemicals.triple', 'chemicals.lennard_jones', 'chemicals.phase_change'}
The reactions and heat_capacity modules have finally been introduced. It comes with several new data lookup functions, optimizations, and a few changes.
We now have lookup methods for energies at each phase: Hfs, Hfl, Hfg, S0s, S0l, S0g. The lookup methods for solids search only the CRC data. The column names for the data tables have been changed to fit this convention (e.g. Sf(g) -> S0g, Hf(g) -> Hfg, Sfc -> S0s). These changes are also reflected in the commented sections of elements.py that use CRC_standard_data.
A couple of tests for the heat_capacity module are not passing, but they should be! The results for the chemical property functions are very close. All equations and coefficients are the same, so the problem may possibly come from fluids.assert_close
. The tests use values from literature to compare and the small difference leads to these assertion errors. If you'd like, I can increase the tolerance of these methods to make sure the assertions pass.
Hi CalebBell
The data for dichlorosilane and trichlorosilane from pubchem are outdated. Hydrogen is missing in the formula.
Sincerely,
TStrhm
On this page:
https://chemicals.readthedocs.io/chemicals.elements.html#module-chemicals.elements
The attribute "ionization" is listed twice with different descriptions:
>>> from chemicals import *
>>> h2co3_from_name = CAS_from_any('Carbonic Acid')
>>> h2co3_from_notation = CAS_from_any('h2co3')
>>> h2co3_from_name == h2co3_from_notation # True
>>> get_name(h2co3_from_notation) # is there something I can use to do something like this and get "Carbonic Acid"?
Also, may I know the difference between CAS_from_any('h2co3')
and CAS_from_any('H2CO3')
? They seem to return different results: '463-79-6'
, and '107-32-4'
.
Describe the bug
chemicals.heat_capacity.PPDS2 does not render their latex equation correctly:
Minimal Reproducible Example
https://chemicals.readthedocs.io/chemicals.heat_capacity.html#chemicals.heat_capacity.PPDS2
Describe the bug
I got the numba_vectorized func working for my single component calcs, but now the functions based on mixing rules are now giving me grief.
Minimal Reproducible Example
numba_vectorized.Lindsay_Bromley([333, 333], [[0.2, 0.2], [0.8, 0.8]], [[1, 1], [1, 1]], [[1, 1], [1, 1]], [351, 370], [18, 33])
Additional context
TypeError: return type must be specified for object mode
Hi Yoel,
A long time ago I put a lot of effort into the docstring format I was using, and made a reader and shim to make the functions I was writing in all my projects work with pint. I just added it to chemicals also, but you can see more details here: https://fluids.readthedocs.io/fluids.units.html
You can try it out on master as follows; the return type will be a pint value.
from chemicals.units import Tc
Tc('64-17-5')
I use this framework pretty often because I like doing calculations with pint.
I feel keeping pint as an optional dependency is a good one, and keeping it out of the library internals avoids slowing things down and allows it to be switched out for another library in the future.
However, the return signature of a function is expected to be what it says in the documentation. This is just one more complication to changing things.
Hi Yoel,
Per our discussion, I have created a tentative wrapper around the functions in the fluids
library and the chemicals
library. Numba is never going to work perfectly for all cases, and optimizations to make numba perfect sometimes make CPython and especially PyPy slower. However I am very interested in it for what uses it can have. All I am looking to target with numba at this point is computation functions, not lookup functions. It seems anything with a dictionary or a string is probably going to get slower as well, often not possible to compile.
I believe you have more experience with it than I do, so I especially welcome your thoughts.
Here are some examples of the current wrapper. There are a few more tests.
import chemicals.numba
import chemicals.numba_vectorized
chemicals.numba.Antoine(100.0, 8.7687, 395.744, -6.469) # speed
chemicals.numba_vectorized.Psat_IAPWS(np.linspace(100.0,200.0, 3)) # vectorized computation speed
I find that a function has to be pretty slow to benefit from a single-point numba version of a function; but for functions with vector arguments, at 10 elements it is already showing significantly improved performance, compared to numpy also.
I think for some applications numba really wants a custom function as well, not to wrap the existing one. The optimizations of e.g.
def zs_to_ws(zs, MWs):
cmps = range(len(zs))
ws = [zs[i]*MWs[i] for i in cmps]
Mavg = 0.0 # Cannot use sum and list comprehension with numba; otherwise Mavg = 1.0/sum(ws)
for v in ws:
Mavg += v
Mavg = 1.0/Mavg
for i in cmps:
ws[i] *= Mavg
return ws
Make the numba function return a list, but what we want there is a numpy array. It is not as elegant but I have no problem writing a few duplicate functions.
@numba.njit
def zs_to_ws(zs, MWs):
ws = zs*MWs
Mavg = 1.0/np.sum(ws)
ws *= Mavg
return ws
It seems some functions need to be re-written in a more verbose way to work with numba also, for example https://github.com/CalebBell/fluids/blob/master/fluids/friction.py#L2917
I consider this all pretty preliminary.
Describe the bug
I was wondering why the CI started failing, and it turns out Pandas 1.2.0 updated some defaults for their CSV parser. Well, one of those was to use a higher-precision floating point converter. Chemicals reveals at least one bug in the new parser.
Minimal Reproducible Example
chemicals.viscosity.mu_data_VDI_PPDS_8['D']
In Pandas 1.1.2 when reading "0.00000000000001953" we get:
1.953E-14
In Pandas 1.2.1 we get:
1.95E-14
Additional context
This also breaks results in people using data data source from this library.
Workaround
It is possible to set the old behavior with float_precision='legacy'. The two data files with this bug have had this default set to this in master now. Ideally, Pandas will fix their bug. I didn't find any issue reported with this in a cursory search.
Hi Caleb,
I'm working on adding additional methods to chemicals.environment.GWP
for the IPCC 5th edition (2018). I should be done within a week, then you can review and make sure it's up to the library's standards. I was wondering if there is a preferred method you like use to facilitate getting all the data in csv format (e.g. a software, manually)?
Thanks!
it is posible to add the DIPPR 801 2019 to the list of available databases?, i have access and can produce the corresponding TSVs
Hi Caleb,
I was wondering if there has been any corrections to any of the tsv files in chemicals (as compared to thermo). I'm getting significantly different numbers for the following:
>>> # Expecting 495689880.0
>>> sum([abs(Hfg(i, method='TRC')) for i in TRC_gas_data.index[pd.notnull(TRC_gas_data['Hfg'])]])
494559380.0
>>> # Expecting 300592764.0
>>> sum([abs(Hfg(i, method='ATCT_G')) for i in Hfg_ATcT_data.index])
299452426.0
Thanks!
Describe the bug
When operating in no-JIT mode, we get the following error:
TypeError: return type must be specified for object mode
Minimal Reproducible Example
This happened during our automated tests, please see here. Here's the log:
----------------------------- Captured stderr call -----------------------------
/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/chemicals/vapor_pressure.py:2063: NumbaWarning:
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "Ambrose_Walton" failed type inference due to: Untyped global name 'trunc_exp': Cannot determine Numba type of <class 'function'>
File "../../../../../opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/chemicals/vapor_pressure.py", line 2135:
def Ambrose_Walton(T, Tc, Pc, omega):
<source elided>
f2 = (-0.64771*tau + 2.41539*tau15 - 4.26979*tau25 + 3.25[259](https://github.com/PMEAL/OpenPNM/actions/runs/5651351031/job/15309317968#step:7:260)*tau5)
return Pc*trunc_exp((f0 + omega*(f1 + f2*omega))/Tr)
^
def Ambrose_Walton(T, Tc, Pc, omega):
/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/numba/core/object_mode_passes.py:151: NumbaWarning: Function "Ambrose_Walton" was compiled in object mode without forceobj=True.
File "../../../../../opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/chemicals/vapor_pressure.py", line 2063:
def Ambrose_Walton(T, Tc, Pc, omega):
^
warnings.warn(errors.NumbaWarning(warn_msg,
/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/numba/core/object_mode_passes.py:161: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected. This is deprecated behaviour that will be removed in Numba 0.59.0.
For more information visit https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit
File "../../../../../opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/chemicals/vapor_pressure.py", line 2063:
def Ambrose_Walton(T, Tc, Pc, omega):
^
warnings.warn(errors.NumbaDeprecationWarning(msg,
---------- coverage: platform linux, python 3.9.17-final-0 -----------
Coverage XML written to file coverage.xml
=========================== short test summary info ============================
FAILED tests/unit/models/phase/DensityTest.py::DensityTest::test_chemicals_wrapper_for_pure_liq_molar_volume - TypeError: return type must be specified for object mode
FAILED tests/unit/models/phase/VaporPressureTest.py::VaporPressureTest::test_generic_chemicals_for_pure_liquid - TypeError: return type must be specified for object mode
============= 2 failed, 775 passed, 10 skipped in 86.56s (0:01:26) =============
Error: Process completed with exit code 1.
@CalebBell I was wondering if there is any reason we'd like to keep the lists of methods (e.g. Tc_methods and Pc_methods) as well as the method strings (e.g. IUPAC = 'IUPAC'). I think the docstrings and the ability to get available methods already serve this purpose. Also, naive users might get the false impression that they can be altered to remove methods. Is there any dependencies on these? Please let me know if you're fine with dropping these. I'd be happy to remove them from the code and the documentation as well.
What is the chemical's pubchem ID?
157846
What is its formula?
"isosmiles": "CCCCCCCCCCCCCCCCCCCCCCN+(C)C.COS(=O)(=O)[O-]",
This is the Json from pubchem:
[
{
"cid": "157846",
"cmpdname": "Behentrimonium methosulfate",
"cmpdsynonym": ["Behentrimonium methosulfate","81646-13-1","Docosyltrimethylammonium methyl sulphate","docosyl(trimethyl)azanium;methyl sulfate","5SHP745C61","Behenyl trimethyl ammonium methosulfate","UNII-5SHP745C61","EINECS 279-791-1","EC 279-791-1","SCHEMBL126381","DTXSID00231231","BEHENTRIMONIUM METHOSULPHATE","behenyltrimethylammonium methyl sulfate","BEHENTRIMONIUM METHOSULFATE [INCI]","N,N,N-Trimethyl-1-docosanaminium methosulfate","BEHENYL TRIMETHYL AMMONIUM METHOSULPHATE","1-Docosanaminium, N,N,N-trimethyl-, methosulfate","Q27262812"],
"mw": "479.800",
"mf": "C26H57NO4S",
"polararea": "74.800",
"complexity": "344.000",
"heavycnt": "32",
"hbonddonor": "0",
"hbondacc": "4",
"rotbonds": "21",
"inchi": "InChI=1S/C25H54N.CH4O4S/c1-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26(2,3)4;1-5-6(2,3)4/h5-25H2,1-4H3;1H3,(H,2,3,4)/q+1;/p-1",
"isosmiles": "CCCCCCCCCCCCCCCCCCCCCCN+(C)C.COS(=O)(=O)[O-]",
"canonicalsmiles": "CCCCCCCCCCCCCCCCCCCCCCN+(C)C.COS(=O)(=O)[O-]",
"inchikey": "QIVLQXGSQSFTIF-UHFFFAOYSA-M",
"iupacname": "docosyl(trimethyl)azanium;methyl sulfate",
"exactmass": "479.401",
"monoisotopicmass": "479.401",
"charge": "0",
"covalentunitcnt": "2",
"isotopeatomcnt": "0",
"totalatomstereocnt": "0",
"definedatomstereocnt": "0",
"undefinedatomstereocnt": "0",
"totalbondstereocnt": "0",
"definedbondstereocnt": "0",
"undefinedbondstereocnt": "0",
"pclidcnt": "0",
"gpidcnt": "3818",
"gpfamilycnt": "1390",
"neighbortype": "2D",
"annothits": ["Classification","Patents","Safety and Hazards","Use and Manufacturing"],
"annothitcnt": "4",
"cidcdate": "2005-08-08",
"sidsrcname": ["A2B Chem","AA BLOCKS","ABI Chem","Alfa Chemistry","BenchChem","ChemIDplus","Chemieliva Pharmaceutical Co., Ltd","ChemSpider","ChemTik","Cooke Chemical Co., Ltd","CymitQuimica","DiscoveryGate","Egon Willighagen, Department of Bioinformatics - BiGCaT, Maastricht University","EPA DSSTox","FDA Global Substance Registration System (GSRS)","Google Patents","Hairui Chemical","J&H Chemical Co.,ltd","labseeker","NextBio","NORMAN Suspect List Exchange","PATENTSCOPE (WIPO)","RR Scientific","Smolecule","SureChEMBL","THE BioTek","Thomson Pharma","ToxPlanet","Wikidata","Yick-Vic Chemicals & Pharmaceuticals (HK) Ltd."],
"depcatg": ["Chemical Vendors","Curation Efforts","Governmental Organizations","Journal Publishers","Legacy Depositors","Research and Development","Subscription Services"]
}
]
Is there a procedure to do this manually and create a local list of extra chemicals not present in the library?
would be nice to have a method to import from pubchem json format but not sure how complex would be.
Happy to help if you can give me some guidance.
thanks
Marco
Is your feature request related to a problem? Please describe.
Most T and P dependent chemical property functions are on a molar basis (e.g. volume, Hvap, and most heat capacity functions). However, Lastovka_Shaw
, Dadgostar_Shaw
, and Lastovka_solid
heat capacity functions are on a per kg basis. I'd like to allow
these functions to return values in J/mol-K if given an optional parameter MW (the molecular weight). This would allow backwards compatibility as well as more consistency in units if measure.
Do you acknowledge the project is developed by volunteers adding features primarily for their own purposes?
Yes!
Additional context
@CalebBell, thumbs up if you agree and I'll get to adding this feature!
Is your feature request related to a problem? Please describe.
The GWP lookups do not have updated data for the IPCC report from 2012. It is important to update the data to include the new values.
Do you acknowledge the project is developed by volunteers adding features primarily for their own purposes?
Yes, and I have a purpose for it.
Additional context
Table can easily be copied from this official PDF:
https://www.sealevel.info/AR5_Table_8.A.1_Lifetimes_Radiative_Efficiencies_and_Metric_values_pp_731-738.pdf
Will likely simply update the existing file which is loaded to include the new values. Likely should update the default method to use the new values.
What is the search string
caustic soda liquid;aquafina;distilled water;hydrogen oxide (h2o);ultrexii ultrapure;
Which chemical in the database do you believe should be found?
its water,but the separators here are wrong
Describe the bug
The vectorized
functionality does not seem to be universally applied.
Minimal Reproducible Example
import chemicals as chem
import numpy as np
from chemicals import vectorized
props = chem.heat_capacity.TRC_gas_data.loc['7732-18-5'] # for water
coeffs = {}
for val in props.index:
if val.startswith('a'):
coeffs[val] = props[val]
Cp = chem.heat_capacity.TRCCp(T=np.array([333, 444]), **coeffs)
Additional context
The error message is:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This is occurring because of the line:
if T <= a7:
For this to work in a vectorized way, this needs to be if np.all(T < a7)
.
Some functions to retrieve constant properties such as critical points have some redundancies. For example:
The algorithm-improvements branch is taking care of this.
Seems like the fluids numba module jit compiles to_num, which makes us get this error:
>>> from chemicals.numba import heat_capacity
>>> heat_capacity.zabransky_dicts
Traceback (most recent call last):
File "<ipython-input-9-aba4759ac2e4>", line 1, in <module>
heat_capacity.zabransky_dicts
File "C:\Users\yrc2\OneDrive\Code\chemicals\chemicals\heat_capacity.py", line 547, in __getattr__
_load_Cp_data()
File "C:\Users\yrc2\OneDrive\Code\chemicals\chemicals\heat_capacity.py", line 478, in _load_Cp_data
values = to_num(line.strip('\n').split('\t'))
File "C:\Users\yrc2\AppData\Roaming\Python\Python37\site-packages\numba\core\dispatcher.py", line 415, in _compile_for_args
error_rewrite(e, 'typing')
File "C:\Users\yrc2\AppData\Roaming\Python\Python37\site-packages\numba\core\dispatcher.py", line 358, in error_rewrite
reraise(type(e), e, None)
File "C:\Users\yrc2\AppData\Roaming\Python\Python37\site-packages\numba\core\utils.py", line 80, in reraise
raise value.with_traceback(tb)
TypingError: No implementation of function Function(<class 'float'>) found for signature:
float(unicode_type)
There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload in function 'float': File: Unknown: Line <N/A>.
With argument(s): '(unicode_type)':
Rejected as the implementation raised a specific error:
TypeError: float() only support for numbers
raised from C:\Users\yrc2\AppData\Roaming\Python\Python37\site-packages\numba\core\typing\builtins.py:912
During: resolving callee type: Function(<class 'float'>)
During: typing of call at C:\Users\yrc2\OneDrive\Code\chemicals\chemicals\utils.py (96)
Could you take care of making sure to_num is not jit compiled? Thanks!
I have added a prelimnary port of my rachford_rice module.
The code includes quite a few additions to the previously public version of activity.py. All this module contains is stuff for solving the rachford_rice equation. Also included is the three and N phase variant :)
To summarize quickly the changes:
I haven't fixed the doctests yet. The unit testing for this is pretty good, with lots of issues I've found being tested. This is a really tough numeric thing to get right, so lots of things are try/excepted.
I gave a stab at getting numba up and running for this module but it's going to be a huge pain as the code is highly numerically optimized, and some things like sorting arrays I haven't figured out how to do in numba yet.
What is the chemical's pubchem ID? CAS: 7727-37-9'
What is its formula? N2
The lennard_jones.Stockmayer and molecular_diameter values are missing for nitrogen. I am currently adding them by hand, but it'd be helpful if you could include them. I am using these values: (s=3.788, e_k=71.4), though I forget where I found them.
About half of the data files are hidden with a leading underscore and the other half are public. And although there is a suggested convention for data within a module, there is no convention between the other modules. I propose the following convention:
{property}_data_{method}
for DataFrame with one set of property data:
{property}_values_{method}
for arrays with one set of property data:
{property}_dict_{method}
for dictionary with one set of property data:
{method}_data
for DataFrame with all sorts of property data:
{method}_values
for arrays with all sorts of property data:
{method}_dict
for dictionary with all sorts of data:
{property}_sources
for dictionary of data frames. Users could add their own data frame here too (I plan to in BioSTEAM).
I honestly don't mind whatever format we use (hidden or not). But as long as there is a convention we can follow, that would be dope.
Thanks for working on this and asking for my comments. Here are my comments at first glace:
search_chemical
function searches through the periodic table first... lines 361-379 could be deleted and the performance would be better.main_db
, user_dbs
, can_autoload
, and loaded_main_db
property/attributes can probably be simplified. I would suggest having two lists, a loaded_files
and an unloaded_files
. We could pop unloaded_files, load them, and append them to loaded_files and keep autoloading until we find the chemical or run out of unloaded files. This would also make it more clear which files have been loaded (in contrast to a can_autoload
property), and the possibility of staging files to autoload (which is not possible right now in chemicals
; but is possible in thermosteam).Here is one additional small change:
# Original
def search_pubchem(self, pubchem, autoload=True):
if type(pubchem) != int:
pubchem = int(pubchem)
return self._search_autoload(pubchem, self.pubchem_index, autoload=autoload)
# This is slightly faster and concise
def search_pubchem(self, pubchem, autoload=True):
return self._search_autoload(int(pubchem), self.pubchem_index, autoload=autoload)
I hope these comments are helpful and I didn't misinterpret any code,
Thanks!
Describe the bug
Most of the time, besides very basic things like water and hydrochloric acid, the program will find the chemical, but not give any data on it at all.
Minimal Reproducible Example
E.g. Search copper sulfate.
What was moved over, was moved over; and that is pretty good.
What is the search string
>>> chemicals.identifiers.search_chemical("CO")
<ChemicalMetadata, name=methanol, formula=CH4O, smiles=CO, MW=32.0419>
Which chemical in the database do you believe should be found?
>>> chemicals.identifiers.search_chemical("carbon monoxide")
<ChemicalMetadata, name=carbon monoxide, formula=CO, smiles=[C-]#[O+], MW=28.0101>
Perhaps a toggle to prefer searching by formulas over smiles first should be added?
The Hcombustion
function right now does 2 things: find the stoichiometry, and calculate the higher heating value (HHV). Here are the enhancements I believe we could make:
combustion_stoichiometry(formula, MW=None)
and HHV_stoichiometric(stoichiometry, Hf)
. The formula can be either a dictionary of elemental coefficients, or a string which would be parsed.HHV_modified_Dulong(formula, MW)
)LHV_from_HHV(HHV, N_H2O)
)In addition, we could make a lightweight CombustionData and CombustionStoichiometry object to store all the data generated to get HHV and leave LHV as a property (or not, the overhead almost unnoticeable ). For example:
from dataclasses import dataclass
def as_atoms(formula):
if isinstance(formula, str):
atoms = atoms_from_formula(formula)
elif isinstance(formula, dict):
atoms = formula
else:
raise ValueError("atoms must be either a string or dictionary, "
f"not a '{type(formula).__name__}' object")
return atoms
@dataclass(frozen=True)
class CombustionStoichiometry:
O2: float
CO2: float
Br2: float
I2: float
HCl: float
HF: float
SO2: float
N2: float
P4O10: float
H2O: float
Ash: float
@classmethod
def from_formula(cls, formula, MW=None):
atoms = as_atoms(formula)
... # We would leave out the `combustion_stoichiometry` function I mentioned earlier
# in favor of this class method
@dataclass(frozen=True)
class CombustionData:
stoichiometry: CombustionStoichiometry
HHV: float
Hf: float
MW: float
@property
def LHV(self):
return LHV_from_HHV(HHV, self.stoichiometry.H2O)
@classmethod
def from_formula(cls, formula, Hf=None, MW=None, method=None):
atoms = as_atoms(formula)
if not MW:
MW = molecular_weight_from_atoms(atoms)
stoichiometry = CombustionStoichiometry.from_formula(atoms)
if not method:
method = 'Dulong' if Hf is None else 'Stoichiometry'
if method == DULONG:
HHV = HHV_modified_Dulong(stoichiometry, MW)
if Hf: raise ValueError("cannot specify Hf if method is 'Dulong'")
Hf = HHV - HHV_stoichiometric(stoichiometry, 0)
elif method == STOICHIOMETRY:
if Hf is None: raise ValueError("must specify Hf if method is 'Stoichiometry'")
HHV = HHV_stoichiometric(stoichiometry, Hf)
else:
raise ValueError("method must be either 'Stoichiometric' or 'Dulong', "
f"not {repr(method)}")
return cls(stoichiometry, HHV, Hf, MW)
Let me know what you think. I personally like how the data is frozen and there is no confusion with working with these classes, which are simply meant to hold data. Not sure if you've worked with dataclass
, but there is always the asdict and astuple functions from the dataclasses
library that make these easy to work with (although asdict and astuple are a little slow).
Hi!
I'm guessing you probably already know this, but this package is missing from the conda-forge
channel ... (Is that what the #TODO: Conda install
note meant? I guess I didn't interpret it that way. )
If that's the case, maybe in the interim strikethrough the conda section? (In Github Markdown it's two tildes ~ on either side ๐ )
Thanks for the clarification!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.