calebbell / chemicals Goto Github PK

View Code? Open in Web Editor NEW

179.0 179.0 36.0 33.23 MB

chemicals: Chemical database of Chemical Engineering Design Library (ChEDL)

License: MIT License

Python 100.00%

chemicals's People

Contributors

Stargazers

Watchers

chemicals's Issues

conda-forge Package Missing

Hi!

I'm guessing you probably already know this, but this package is missing from the conda-forge channel ... (Is that what the #TODO: Conda install note meant? I guess I didn't interpret it that way. )

If that's the case, maybe in the interim ~~strikethrough~~ the conda section? (In Github Markdown it's two tildes ~ on either side 😁 )

Thanks for the clarification!

Vs in COSTALD_compressed should be Vsat?

In the COSTALD_compressed method, and perhaps others, there is an argument Vs for saturation volume. It was my impression that the suffix s on an argument name meant plural as in the properties of each component in the mixture. I am trying to write a wrapper for these functions, and have been relying on this implied API (any arg ending with s triggers a look-up of properties from multiple phases). I think this Vs should be Vsat. Also, I hope that am correct about you reserving the s suffix for mixture properties because this is quite handy. If this is just a loose convention, might I suggest or request that you consider making it a hard-coded fact?

GWP IPCC 5th edition (2018)

Hi Caleb,

I'm working on adding additional methods to chemicals.environment.GWP for the IPCC 5th edition (2018). I should be done within a week, then you can review and make sure it's up to the library's standards. I was wondering if there is a preferred method you like use to facilitate getting all the data in csv format (e.g. a software, manually)?

Thanks!

numba_vectorized reports TypeError

Describe the bug
I got the numba_vectorized func working for my single component calcs, but now the functions based on mixing rules are now giving me grief.

Minimal Reproducible Example

numba_vectorized.Lindsay_Bromley([333, 333], [[0.2, 0.2], [0.8, 0.8]], [[1, 1], [1, 1]], [[1, 1], [1, 1]], [351, 370], [18, 33])

Additional context
TypeError: return type must be specified for object mode

Nitrogen Lennard-Jones values missing

What is the chemical's pubchem ID? CAS: 7727-37-9'

What is its formula? N2

The lennard_jones.Stockmayer and molecular_diameter values are missing for nitrogen. I am currently adding them by hand, but it'd be helpful if you could include them. I am using these values: (s=3.788, e_k=71.4), though I forget where I found them.

Incorrect formula for dichlorosilane and trichlorosilane

Hi CalebBell
The data for dichlorosilane and trichlorosilane from pubchem are outdated. Hydrogen is missing in the formula.
Sincerely,
TStrhm

Files to be ported

What was moved over, was moved over; and that is pretty good.

reaction.py

What goes in this module?

Here is my local variant of these files. I understand it's pretty important module for biosteam, so I want to make sure this file does what you want. I am quite happy if you want to take the lead on this file! You are doing great work. I am sorry I cannot always get back to you quickly.

reaction.txt
test_reaction.txt

Adding chemical ID: 157846

What is the chemical's pubchem ID?
157846

What is its formula?
"isosmiles": "CCCCCCCCCCCCCCCCCCCCCCN+(C)C.COS(=O)(=O)[O-]",

This is the Json from pubchem:

[
{
"cid": "157846",
"cmpdname": "Behentrimonium methosulfate",
"cmpdsynonym": ["Behentrimonium methosulfate","81646-13-1","Docosyltrimethylammonium methyl sulphate","docosyl(trimethyl)azanium;methyl sulfate","5SHP745C61","Behenyl trimethyl ammonium methosulfate","UNII-5SHP745C61","EINECS 279-791-1","EC 279-791-1","SCHEMBL126381","DTXSID00231231","BEHENTRIMONIUM METHOSULPHATE","behenyltrimethylammonium methyl sulfate","BEHENTRIMONIUM METHOSULFATE [INCI]","N,N,N-Trimethyl-1-docosanaminium methosulfate","BEHENYL TRIMETHYL AMMONIUM METHOSULPHATE","1-Docosanaminium, N,N,N-trimethyl-, methosulfate","Q27262812"],
"mw": "479.800",
"mf": "C26H57NO4S",
"polararea": "74.800",
"complexity": "344.000",
"heavycnt": "32",
"hbonddonor": "0",
"hbondacc": "4",
"rotbonds": "21",
"inchi": "InChI=1S/C25H54N.CH4O4S/c1-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26(2,3)4;1-5-6(2,3)4/h5-25H2,1-4H3;1H3,(H,2,3,4)/q+1;/p-1",
"isosmiles": "CCCCCCCCCCCCCCCCCCCCCCN+(C)C.COS(=O)(=O)[O-]",
"canonicalsmiles": "CCCCCCCCCCCCCCCCCCCCCCN+(C)C.COS(=O)(=O)[O-]",
"inchikey": "QIVLQXGSQSFTIF-UHFFFAOYSA-M",
"iupacname": "docosyl(trimethyl)azanium;methyl sulfate",
"exactmass": "479.401",
"monoisotopicmass": "479.401",
"charge": "0",
"covalentunitcnt": "2",
"isotopeatomcnt": "0",
"totalatomstereocnt": "0",
"definedatomstereocnt": "0",
"undefinedatomstereocnt": "0",
"totalbondstereocnt": "0",
"definedbondstereocnt": "0",
"undefinedbondstereocnt": "0",
"pclidcnt": "0",
"gpidcnt": "3818",
"gpfamilycnt": "1390",
"neighbortype": "2D",
"annothits": ["Classification","Patents","Safety and Hazards","Use and Manufacturing"],
"annothitcnt": "4",
"cidcdate": "2005-08-08",
"sidsrcname": ["A2B Chem","AA BLOCKS","ABI Chem","Alfa Chemistry","BenchChem","ChemIDplus","Chemieliva Pharmaceutical Co., Ltd","ChemSpider","ChemTik","Cooke Chemical Co., Ltd","CymitQuimica","DiscoveryGate","Egon Willighagen, Department of Bioinformatics - BiGCaT, Maastricht University","EPA DSSTox","FDA Global Substance Registration System (GSRS)","Google Patents","Hairui Chemical","J&H Chemical Co.,ltd","labseeker","NextBio","NORMAN Suspect List Exchange","PATENTSCOPE (WIPO)","RR Scientific","Smolecule","SureChEMBL","THE BioTek","Thomson Pharma","ToxPlanet","Wikidata","Yick-Vic Chemicals & Pharmaceuticals (HK) Ltd."],
"depcatg": ["Chemical Vendors","Curation Efforts","Governmental Organizations","Journal Publishers","Legacy Depositors","Research and Development","Subscription Services"]
}
]

Is there a procedure to do this manually and create a local list of extra chemicals not present in the library?
would be nice to have a method to import from pubchem json format but not sure how complex would be.
Happy to help if you can give me some guidance.

thanks
Marco

identifiers.py

Thanks for working on this and asking for my comments. Here are my comments at first glace:

The optional autoload and cache arguments seem good to me. Just for good measure, we might want to clear the "chemical_search_cache" if the length is really long (although probably not a problem in long run). I would think it would be better to have the "chemical_search" function wrap a "_chemical_search" function that does not save the cache. This would decrease all the extra layers of if-statements for every return scenario.
Loading new files will replace items in "names_index". I've had the problem that Anion and Cation common names overlap with the ones from the main_db (which are the ones you would expect to get). As of right now, "autoload_next" will replace common names from the main_db with the ones in the user_db. In thermosteam.properties.chemicals, I autoload the main_db then the rest and make sure items are not replaced in "names_index". I understand that the main_db is pretty big, so maybe there is a better way to solve this problem such that main_db file is autoloaded last.
The next in the "autoload_next" method name gives the false impression that files are loaded one by one (when they are actually all loaded at the same time).
The elements are loaded into the ChemicalMetadataDB instance, but the search_chemical function searches through the periodic table first... lines 361-379 could be deleted and the performance would be better.
A main_db, user_dbs, can_autoload , and loaded_main_db property/attributes can probably be simplified. I would suggest having two lists, a loaded_files and an unloaded_files. We could pop unloaded_files, load them, and append them to loaded_files and keep autoloading until we find the chemical or run out of unloaded files. This would also make it more clear which files have been loaded (in contrast to a can_autoload property), and the possibility of staging files to autoload (which is not possible right now in chemicals; but is possible in thermosteam).

Here is one additional small change:

# Original
def search_pubchem(self, pubchem, autoload=True):
    if type(pubchem) != int:
        pubchem = int(pubchem)
    return self._search_autoload(pubchem, self.pubchem_index, autoload=autoload)

# This is slightly faster and concise
def search_pubchem(self, pubchem, autoload=True):
    return self._search_autoload(int(pubchem), self.pubchem_index, autoload=autoload)

I hope these comments are helpful and I didn't misinterpret any code,
Thanks!

lazy loading from chemicals.numba not working

Seems like the fluids numba module jit compiles to_num, which makes us get this error:

>>> from chemicals.numba import heat_capacity
>>> heat_capacity.zabransky_dicts
Traceback (most recent call last):

  File "<ipython-input-9-aba4759ac2e4>", line 1, in <module>
    heat_capacity.zabransky_dicts

  File "C:\Users\yrc2\OneDrive\Code\chemicals\chemicals\heat_capacity.py", line 547, in __getattr__
    _load_Cp_data()

  File "C:\Users\yrc2\OneDrive\Code\chemicals\chemicals\heat_capacity.py", line 478, in _load_Cp_data
    values = to_num(line.strip('\n').split('\t'))

  File "C:\Users\yrc2\AppData\Roaming\Python\Python37\site-packages\numba\core\dispatcher.py", line 415, in _compile_for_args
    error_rewrite(e, 'typing')

  File "C:\Users\yrc2\AppData\Roaming\Python\Python37\site-packages\numba\core\dispatcher.py", line 358, in error_rewrite
    reraise(type(e), e, None)

  File "C:\Users\yrc2\AppData\Roaming\Python\Python37\site-packages\numba\core\utils.py", line 80, in reraise
    raise value.with_traceback(tb)

TypingError: No implementation of function Function(<class 'float'>) found for signature:
 
float(unicode_type)
 
There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'float': File: Unknown: Line <N/A>.
    With argument(s): '(unicode_type)':
   Rejected as the implementation raised a specific error:
     TypeError: float() only support for numbers
  raised from C:\Users\yrc2\AppData\Roaming\Python\Python37\site-packages\numba\core\typing\builtins.py:912

During: resolving callee type: Function(<class 'float'>)
During: typing of call at C:\Users\yrc2\OneDrive\Code\chemicals\chemicals\utils.py (96)

Could you take care of making sure to_num is not jit compiled? Thanks!

Get chemical name from chemical notation

>>> from chemicals import *
>>> h2co3_from_name = CAS_from_any('Carbonic Acid')
>>> h2co3_from_notation = CAS_from_any('h2co3')
>>> h2co3_from_name == h2co3_from_notation # True
>>> get_name(h2co3_from_notation) # is there something I can use to do something like this and get "Carbonic Acid"?

Also, may I know the difference between CAS_from_any('h2co3') and CAS_from_any('H2CO3')? They seem to return different results: '463-79-6', and '107-32-4'.

Developer's Guide

Topics of discussion:

A note on the scope of the "Chemicals" project.
Lazy loading (keeping load speed fast).
Add assertion tests for new data sets, functions, and exceptions.
Add documentation using NumpyDoc style. Also add examples that run with doctests.
Use pytests and pytest --doctest-modules?
How to make github contributions.

Simplifying constant property search algorithms

Some functions to retrieve constant properties such as critical points have some redundancies. For example:

The Tc method has "SURF" in the IgnoreMethods. But the SURF method is not actually useful because every databank file has both Tc and Pc. Removing this method from Tc_methods and Pc_methods all together would be a safe improvement.
All constant property retrieval algorithms follow the same overall logic. It would be possible to create a high level function to perform this logic.
Searching through all available methods and only picking the first is not efficient. It would be possible to return the first value encountered.

The algorithm-improvements branch is taking care of this.

documentation: incorrect latex rendering

Describe the bug
chemicals.heat_capacity.PPDS2 does not render their latex equation correctly:

Minimal Reproducible Example
https://chemicals.readthedocs.io/chemicals.heat_capacity.html#chemicals.heat_capacity.PPDS2

Maintaining compatibility with magic pint parser

Hi Yoel,

A long time ago I put a lot of effort into the docstring format I was using, and made a reader and shim to make the functions I was writing in all my projects work with pint. I just added it to chemicals also, but you can see more details here: https://fluids.readthedocs.io/fluids.units.html

You can try it out on master as follows; the return type will be a pint value.

from chemicals.units import Tc
Tc('64-17-5')

I use this framework pretty often because I like doing calculations with pint.

I feel keeping pint as an optional dependency is a good one, and keeping it out of the library internals avoids slowing things down and allows it to be switched out for another library in the future.

However, the return signature of a function is expected to be what it says in the documentation. This is just one more complication to changing things.

Keep load speed fast

I am opening an issue to track the load speed of chemicals. I had already forgotten from last weekend how I was measuring load speed, so documenting it seems like a good idea.

I put the following code in a file called load_one_library.py

import cProfile
import os
import numpy as np
from scipy import special
from scipy import interpolate
from scipy import optimize
import pandas as pd
import sys
import json
import io
import datetime
from time import time
import fluids.constants
import fluids.numerics
import fluids
import ht
original_modules = set(sys.modules.keys())

pr = cProfile.Profile()
t0 = time()
pr.enable()
import chemicals
pr.disable()
print('Elapsed time: %f seconds' %(time() - t0))
pr.dump_stats('load_one_library.out')
after_modules = set(sys.modules.keys())
print('Loaded libraries')
print(after_modules.difference(original_modules))

Then I run that script with

python3 -OO load_one_library.py

You have to run it a second time after the first time to ensure all the python bytecode is up to date.

Then I look at where the time is spent with

python3 -m snakeviz load_one_library.out

Then I find the elements.py file, currently the longest to load.

Let's leave this issue open indefinitely for now and I'll update it with timings periodically - maybe get some development docs going and move this there at some point.

One side note - the -OO flag optimizes the compiled byte code so docstrings, asserts, and a few other things are not loaded. This is the meaningful number I am targeting. I refuse to be interested in increasing load speed by having less documentation.

This is typically used when building an actual application out of libraries, or on a server when processes are starting up and shutting down often. Because of this, it is important to remember that assert statements should not be used for control flow; they should be development-only checks.

The rest of the script above outputs something like this:

Elapsed time: 0.005867 seconds
Loaded libraries
{'chemicals.solubility', 'chemicals.acentric', 'chemicals.dippr', 'chemicals.elements', 'chemicals.miscdata', 'chemicals', 'chemicals.dipole', 'chemicals.temperature', 'chemicals.critical', 'chemicals.utils', 'chemicals.refractivity', 'chemicals.exceptions', 'chemicals.vapor_pressure', 'chemicals.data_reader', 'chemicals.environment', 'chemicals.virial', 'chemicals.triple', 'chemicals.lennard_jones', 'chemicals.phase_change'}

rachford_rice

I have added a prelimnary port of my rachford_rice module.

The code includes quite a few additions to the previously public version of activity.py. All this module contains is stuff for solving the rachford_rice equation. Also included is the three and N phase variant :)

To summarize quickly the changes:

Added Rachford_Rice_polynomial which turns the 2 phase rachford rice equation into a polynomial. It can then be solved analytically using something like np.roots. It is also extremely quick to solve as there are no repeated divisions needed in solving the polynomial. Unfortunately, this method is a lemon - it takes exponentially more time to convert the equation to a polynomial form as N increases. Up to N = 5 the solution is analytical (quartic) and looks plausible. I have read of deflation schemes that should N =6 and N = 7. I don't think there is likely to be a practical use for this method.
Added Rachford_Rice_solution2 for two phase RR solutions
Added Rachford_Rice_solutionN for N phase RR solutions. I have tested this up to 5 phases. I haven't found anything that doesn't converge for both solvers, so long as you start in the feasible range.
I think only Rachford_Rice_solution_LN2 is a new function for 1D solvers. It works the fastest and transforms the solution from a bounded search to an unbounded search. I am frustrated getting numba to work with it. I generally got at least 50% improved performance with it.
Do not use a cubic analytical solution for N = 3; it really sucks but to my knowledge any cubic analytical solution needs to be checked numerically for 1e-5 % of the time it's not accurate. I have a unit test showing it not working if you want to take a look at that case.

I haven't fixed the doctests yet. The unit testing for this is pretty good, with lots of issues I've found being tested. This is a really tough numeric thing to get right, so lots of things are try/excepted.

I gave a stab at getting numba up and running for this module but it's going to be a huge pain as the code is highly numerically optimized, and some things like sorting arrays I haven't figured out how to do in numba yet.

`TypeError` when operating in no-JIT mode

Describe the bug
When operating in no-JIT mode, we get the following error:

TypeError: return type must be specified for object mode

Minimal Reproducible Example
This happened during our automated tests, please see here. Here's the log:

----------------------------- Captured stderr call -----------------------------
/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/chemicals/vapor_pressure.py:2063: NumbaWarning: 
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "Ambrose_Walton" failed type inference due to: Untyped global name 'trunc_exp': Cannot determine Numba type of <class 'function'>

File "../../../../../opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/chemicals/vapor_pressure.py", line 2135:
def Ambrose_Walton(T, Tc, Pc, omega):
    <source elided>
    f2 = (-0.64771*tau + 2.41539*tau15 - 4.26979*tau25 + 3.25[259](https://github.com/PMEAL/OpenPNM/actions/runs/5651351031/job/15309317968#step:7:260)*tau5)
    return Pc*trunc_exp((f0 + omega*(f1 + f2*omega))/Tr)
    ^

  def Ambrose_Walton(T, Tc, Pc, omega):
/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/numba/core/object_mode_passes.py:151: NumbaWarning: Function "Ambrose_Walton" was compiled in object mode without forceobj=True.

File "../../../../../opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/chemicals/vapor_pressure.py", line 2063:

def Ambrose_Walton(T, Tc, Pc, omega):
^

  warnings.warn(errors.NumbaWarning(warn_msg,
/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/numba/core/object_mode_passes.py:161: NumbaDeprecationWarning: 
Fall-back from the nopython compilation path to the object mode compilation path has been detected. This is deprecated behaviour that will be removed in Numba 0.59.0.

For more information visit https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "../../../../../opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/chemicals/vapor_pressure.py", line 2063:

def Ambrose_Walton(T, Tc, Pc, omega):
^

  warnings.warn(errors.NumbaDeprecationWarning(msg,

---------- coverage: platform linux, python 3.9.17-final-0 -----------
Coverage XML written to file coverage.xml

=========================== short test summary info ============================
FAILED tests/unit/models/phase/DensityTest.py::DensityTest::test_chemicals_wrapper_for_pure_liq_molar_volume - TypeError: return type must be specified for object mode
FAILED tests/unit/models/phase/VaporPressureTest.py::VaporPressureTest::test_generic_chemicals_for_pure_liquid - TypeError: return type must be specified for object mode
============= 2 failed, 775 passed, 10 skipped in 86.56s (0:01:26) =============
Error: Process completed with exit code 1.

Incorrect calculation of density when using saturation temperature as input

The IAPWS functions return an erroneous density value when calculatiog with the saturation temperature.

import numpy as np
from chemicals import iapws95_Psat, iapws95_Tsat, iapws95_rho

pressure = 558497.3519367648
tsat = iapws95_Tsat(pressure)
iapws95_rho(tsat, pressure) # density wrong
iapws95_rho(np.floor(tsat), np.floor(pressure)) # rounding down leads to proper density

I guess this error is invoked due to some boundary issues with IAPWS. If it is too much of an effort to make it work at the saturation temperature exactly, an error message with a hint would be welcome.

Thanks for the great package anyway!

Heat capacity functions by mol

Is your feature request related to a problem? Please describe.

Most T and P dependent chemical property functions are on a molar basis (e.g. volume, Hvap, and most heat capacity functions). However, Lastovka_Shaw, Dadgostar_Shaw, and Lastovka_solid heat capacity functions are on a per kg basis. I'd like to allow
these functions to return values in J/mol-K if given an optional parameter MW (the molecular weight). This would allow backwards compatibility as well as more consistency in units if measure.

Do you acknowledge the project is developed by volunteers adding features primarily for their own purposes?
Yes!

Additional context
@CalebBell, thumbs up if you agree and I'll get to adding this feature!

Steam density calculations change significantly with new release

Steam density calculations have changed in some cases by more than 7% between versions 1.1.1 and 1.1.4. Interestingly, version 1.1.1 matches the steam densities reported in coolprop. Is this behavior to be expected?

Attached is an excel table showing density calculations at 5 different pressures and 3 different temperatures between the two versions of chemical and coolprop.

I am very impressed with this package (and thermo and fluids). Thank you, Caleb, for all of your hard work.

steamDensityCalculationsChemical.xlsx

Problem with Antoine's coefficients for Toluene

Describe the bug
The C value for Toluene is way off, perhaps it was not correctly adjusted to Pa/K units from mmHg/C?

Minimal Reproducible Example

>>> from chemicals import Antoine
>>> from chemicals import CAS_from_any
>>> from chemicals.vapor_pressure import Psat_data_AntoinePoling
>>> Psat_data_AntoinePoling.loc[CAS_from_any('Toluene')]

 'Toluene': Chemical    toluene 
 A            9.05043
 B            1327.62
 C           -255.525
 Tmin          286.44
 Tmax          409.61

Additional context
Benzene on the other hand gives:

'Benzene': Chemical    benzene 
 A            8.98523
 B            1184.24
 C            -55.578
 Tmin          279.64
 Tmax          377.06

The C value is significantly different. Also, xylene and cumene give C values in the 50's, so 255 for toluene is an outlier.

Numba interface

Hi Yoel,

Per our discussion, I have created a tentative wrapper around the functions in the fluids library and the chemicals library. Numba is never going to work perfectly for all cases, and optimizations to make numba perfect sometimes make CPython and especially PyPy slower. However I am very interested in it for what uses it can have. All I am looking to target with numba at this point is computation functions, not lookup functions. It seems anything with a dictionary or a string is probably going to get slower as well, often not possible to compile.

I believe you have more experience with it than I do, so I especially welcome your thoughts.

Here are some examples of the current wrapper. There are a few more tests.

import chemicals.numba
import chemicals.numba_vectorized

chemicals.numba.Antoine(100.0, 8.7687, 395.744, -6.469) # speed

chemicals.numba_vectorized.Psat_IAPWS(np.linspace(100.0,200.0, 3)) # vectorized computation speed

I find that a function has to be pretty slow to benefit from a single-point numba version of a function; but for functions with vector arguments, at 10 elements it is already showing significantly improved performance, compared to numpy also.

I think for some applications numba really wants a custom function as well, not to wrap the existing one. The optimizations of e.g.


def zs_to_ws(zs, MWs):
    cmps = range(len(zs))
    ws = [zs[i]*MWs[i] for i in cmps]
    Mavg = 0.0    # Cannot use sum and list comprehension with numba; otherwise Mavg = 1.0/sum(ws)
    for v in ws:
        Mavg += v
    Mavg = 1.0/Mavg
    for i in cmps:
        ws[i] *= Mavg
    return ws

Make the numba function return a list, but what we want there is a numpy array. It is not as elegant but I have no problem writing a few duplicate functions.

@numba.njit
def zs_to_ws(zs, MWs):
    ws = zs*MWs
    Mavg = 1.0/np.sum(ws)
    ws *= Mavg
    return ws

It seems some functions need to be re-written in a more verbose way to work with numba also, for example https://github.com/CalebBell/fluids/blob/master/fluids/friction.py#L2917

I consider this all pretty preliminary.

Pandas' new high-precision csv float parser is sometimes quite low precision

Describe the bug
I was wondering why the CI started failing, and it turns out Pandas 1.2.0 updated some defaults for their CSV parser. Well, one of those was to use a higher-precision floating point converter. Chemicals reveals at least one bug in the new parser.

Minimal Reproducible Example

chemicals.viscosity.mu_data_VDI_PPDS_8['D']

In Pandas 1.1.2 when reading "0.00000000000001953" we get:
1.953E-14

In Pandas 1.2.1 we get:
1.95E-14

Additional context
This also breaks results in people using data data source from this library.

Workaround
It is possible to set the old behavior with float_precision='legacy'. The two data files with this bug have had this default set to this in master now. Ideally, Pandas will fix their bug. I didn't find any issue reported with this in a cursory search.

Updates in reactions and heat_capacity modules

The reactions and heat_capacity modules have finally been introduced. It comes with several new data lookup functions, optimizations, and a few changes.

We now have lookup methods for energies at each phase: Hfs, Hfl, Hfg, S0s, S0l, S0g. The lookup methods for solids search only the CRC data. The column names for the data tables have been changed to fit this convention (e.g. Sf(g) -> S0g, Hf(g) -> Hfg, Sfc -> S0s). These changes are also reflected in the commented sections of elements.py that use CRC_standard_data.

A couple of tests for the heat_capacity module are not passing, but they should be! The results for the chemical property functions are very close. All equations and coefficients are the same, so the problem may possibly come from fluids.assert_close. The tests use values from literature to compare and the small difference leads to these assertion errors. If you'd like, I can increase the tolerance of these methods to make sure the assertions pass.

Method lists and names

@CalebBell I was wondering if there is any reason we'd like to keep the lists of methods (e.g. Tc_methods and Pc_methods) as well as the method strings (e.g. IUPAC = 'IUPAC'). I think the docstrings and the ability to get available methods already serve this purpose. Also, naive users might get the false impression that they can be altered to remove methods. Is there any dependencies on these? Please let me know if you're fine with dropping these. I'd be happy to remove them from the code and the documentation as well.

incorrect separators on water synonyms

What is the search string
caustic soda liquid;aquafina;distilled water;hydrogen oxide (h2o);ultrexii ultrapure;
Which chemical in the database do you believe should be found?
its water,but the separators here are wrong

Combustion

The Hcombustion function right now does 2 things: find the stoichiometry, and calculate the higher heating value (HHV). Here are the enhancements I believe we could make:

Refactor this function in two: combustion_stoichiometry(formula, MW=None) and HHV_stoichiometric(stoichiometry, Hf). The formula can be either a dictionary of elemental coefficients, or a string which would be parsed.
In thermosteam; the atoms which do not have combustion products are disregarded and "Ash" is added to the products to maintain the mass balance. I'd like to keep this and make a note of this in the documentation.
Add methods to calculate heat of combustion when the heat of formation is not given (e.g. HHV_modified_Dulong(formula, MW))
Add a function to calculate LHV from HHV (e.g. LHV_from_HHV(HHV, N_H2O))

In addition, we could make a lightweight CombustionData and CombustionStoichiometry object to store all the data generated to get HHV and leave LHV as a property (or not, the overhead almost unnoticeable ). For example:

from dataclasses import dataclass

def as_atoms(formula):
    if isinstance(formula, str):
        atoms = atoms_from_formula(formula)
    elif isinstance(formula, dict):
        atoms = formula
    else:
        raise ValueError("atoms must be either a string or dictionary, "
                        f"not a '{type(formula).__name__}' object")
    return atoms

@dataclass(frozen=True)
class CombustionStoichiometry:
    O2: float
    CO2: float
    Br2: float
    I2: float
    HCl: float
    HF: float
    SO2: float
    N2: float
    P4O10: float
    H2O: float
    Ash: float

    @classmethod
    def from_formula(cls, formula, MW=None):
        atoms = as_atoms(formula)
        ... # We would leave out the `combustion_stoichiometry` function I mentioned earlier
            # in favor of this class method

@dataclass(frozen=True)
class CombustionData:
    stoichiometry: CombustionStoichiometry
    HHV: float
    Hf: float
    MW: float

    @property
    def LHV(self):
        return LHV_from_HHV(HHV, self.stoichiometry.H2O)
    
    @classmethod
    def from_formula(cls, formula, Hf=None, MW=None, method=None):
        atoms = as_atoms(formula)
        if not MW:
            MW = molecular_weight_from_atoms(atoms)
        stoichiometry = CombustionStoichiometry.from_formula(atoms)
        if not method:
            method = 'Dulong' if Hf is None else 'Stoichiometry'
        if method == DULONG:
            HHV = HHV_modified_Dulong(stoichiometry, MW)
            if Hf: raise ValueError("cannot specify Hf if method is 'Dulong'")
            Hf = HHV - HHV_stoichiometric(stoichiometry, 0)
        elif method == STOICHIOMETRY:
            if Hf is None: raise ValueError("must specify Hf if method is 'Stoichiometry'")
            HHV = HHV_stoichiometric(stoichiometry, Hf)
        else:
            raise ValueError("method must be either 'Stoichiometric' or 'Dulong', "
                            f"not {repr(method)}")
        return cls(stoichiometry, HHV, Hf, MW)

Let me know what you think. I personally like how the data is frozen and there is no confusion with working with these classes, which are simply meant to hold data. Not sure if you've worked with dataclass, but there is always the asdict and astuple functions from the dataclasses library that make these easy to work with (although asdict and astuple are a little slow).

Searching for CO returns methanol.

What is the search string

>>> chemicals.identifiers.search_chemical("CO")
<ChemicalMetadata, name=methanol, formula=CH4O, smiles=CO, MW=32.0419>

Which chemical in the database do you believe should be found?

>>> chemicals.identifiers.search_chemical("carbon monoxide")
<ChemicalMetadata, name=carbon monoxide, formula=CO, smiles=[C-]#[O+], MW=28.0101>

Perhaps a toggle to prefer searching by formulas over smiles first should be added?

Issue with Alcock vapor pressure data

Describe the bug
The chemical Metadata in chemicals/Vapor Pressure/Alcock_Itkin_Horrigan_metalic_elements_sublimation.tsv only contains parameters A, B, C, D The documentation states those parameters can be used for the chemicals.dippr.EQ101 equation. However the EQ101 takes a fifth parameter E which default value is 0. With this default value the equation does not produce the original data published by Alcock.

When manually setting E=1 one can obtain correct values from EQ101. The easiest fix would be to change the default value of the python function code of EQ101 from 0 to 1. (not sure if this can break something else) Alternatively one could add a column with E=1 to the chemical metadata. With E=1 the above metadata seems to reproduce exactly the original publication data...

There is a second related issue in the liqud vapor pressure metadata chemicals/Vapor Pressure/Alcock_Itkin_Horrigan_metalic_elements.tsv In this dataset there is a column E that has the (allegedly wrong) value -3
This makes the equation non-isomorphous to that used originally Equation by Alcock. However even when setting E=1 manually the parameters fail to reproduce (exactly) the original published data by Alcock. So maybe on this dataset something else went wrong during parameter transoformation. (the error is luckily relatively small with the pressure off within a factor of 0.5-2 or so)

Additional context
For most elements the difference is small so that it can be overlooked. However for Tungsten (W) the error gets HUGE when using EQ101 without manual setting E=1 on the function. (8,36e3 Pa @ 2500K) where in fact (8,05e-4 Pa would be correct)
according to Wikipedia the 10000 Pascal pressure should not be reached until about >>5000K

bonus error
the chemical metadata table lists a Tmin of 220 K for Tungsten. However the original Publication from Alcock has the validity of this data specified from 2200K (there is a 0 missing)

For the range from 298K to 2200K and Tungsten the original Alcock publication lists an additional parameter set which is as follows: A=-0.5570 B=-43830.0000 C=2.5145 D=-0.2074 Tmin=298 Tmax=2200 (Parameters of the original Alcock publication which uses base 10 logaritm), They would need to be transformed before being useable in EQ101.

The vectorized submodule is not fully supported throughout the package

Describe the bug
The vectorized functionality does not seem to be universally applied.

Minimal Reproducible Example

import chemicals as chem
import numpy as np
from chemicals import vectorized
props = chem.heat_capacity.TRC_gas_data.loc['7732-18-5']  # for water
coeffs = {}
for val in props.index:
    if val.startswith('a'):
        coeffs[val] = props[val]
Cp = chem.heat_capacity.TRCCp(T=np.array([333, 444]), **coeffs)

Additional context
The error message is:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

This is occurring because of the line:

if T <= a7:

For this to work in a vectorized way, this needs to be if np.all(T < a7).

Many chemicals input are found but no data given

Describe the bug
Most of the time, besides very basic things like water and hydrochloric acid, the program will find the chemical, but not give any data on it at all.

Minimal Reproducible Example
E.g. Search copper sulfate.

Hidden vs public data files and naming convention

About half of the data files are hidden with a leading underscore and the other half are public. And although there is a suggested convention for data within a module, there is no convention between the other modules. I propose the following convention:

{property}_data_{method} for DataFrame with one set of property data:
- critical_data_IUPAC
- critical_data_Matthews
- mu_data_VDI_PPDS_7
- kappa_data_Perrys2_315
- Hf_data_API_TDB
- Cp_data_Poling
{property}_values_{method} for arrays with one set of property data:
- mu_values_VDI_PPDS_7
{property}_dict_{method} for dictionary with one set of property data:
- Cp_dict_Laliberte
- rho_dict_Laliberte
{method}_data for DataFrame with all sorts of property data:
- VDI_tabular_data
- CRC_inorganic_data
{method}_values for arrays with all sorts of property data:
- VDI_tabular_array
{method}_dict for dictionary with all sorts of data:
- VDI_saturation_dict
{property}_sources for dictionary of data frames. Users could add their own data frame here too (I plan to in BioSTEAM).
- Tc_sources
- Tb_sources

I honestly don't mind whatever format we use (hidden or not). But as long as there is a convention we can follow, that would be dope.

Update GWPs for IPCC Report version 5

Is your feature request related to a problem? Please describe.
The GWP lookups do not have updated data for the IPCC report from 2012. It is important to update the data to include the new values.

Do you acknowledge the project is developed by volunteers adding features primarily for their own purposes?
Yes, and I have a purpose for it.

Additional context
Table can easily be copied from this official PDF:
https://www.sealevel.info/AR5_Table_8.A.1_Lifetimes_Radiative_Efficiencies_and_Metric_values_pp_731-738.pdf
Will likely simply update the existing file which is loaded to include the new values. Likely should update the default method to use the new values.

Changed tsv files?

Hi Caleb,

I was wondering if there has been any corrections to any of the tsv files in chemicals (as compared to thermo). I'm getting significantly different numbers for the following:

>>> # Expecting 495689880.0
>>> sum([abs(Hfg(i, method='TRC')) for i in TRC_gas_data.index[pd.notnull(TRC_gas_data['Hfg'])]])
494559380.0 
>>> # Expecting 300592764.0
>>> sum([abs(Hfg(i, method='ATCT_G')) for i in Hfg_ATcT_data.index])
299452426.0