kaaiian / cbfv Goto Github PK
View Code? Open in Web Editor NEWTool to quickly create a composition-based feature vector
Tool to quickly create a composition-based feature vector
CompositionError Traceback (most recent call last)
in <cell line: 1>()
1 for f in['jarvis','magpie','mat2vec','oliynyk','onehot','random_200']:
----> 2 X_train_unscaled,y_train,formulae_train,skipped_train = generate_features(df, elem_prop=f, drop_duplicates=False, extend_features=False, sum_feat=True)
3 #it has to be tested again with bg data which I have created by deleting the duplicates
4
5 SEED=42
4 frames
/usr/local/lib/python3.10/dist-packages/CBFV/composition.py in generate_features(df, elem_prop, drop_duplicates, extend_features, sum_feat, mini)
281 if 'x' in formula:
282 continue
--> 283 l1, l2 = _element_composition_L(formula)
284 formula_mat.append(l1)
285 count_mat.append(l2)
/usr/local/lib/python3.10/dist-packages/CBFV/composition.py in _element_composition_L(formula)
97
98 def _element_composition_L(formula):
---> 99 comp_frac = _element_composition(formula)
100 atoms = list(comp_frac.keys())
101 counts = list(comp_frac.values())
/usr/local/lib/python3.10/dist-packages/CBFV/composition.py in _element_composition(formula)
86
87 def _element_composition(formula):
---> 88 elmap = parse_formula(formula)
89 elamt = {}
90 natoms = 0
/usr/local/lib/python3.10/dist-packages/CBFV/composition.py in parse_formula(formula)
62 expanded_formula = formula.replace(m.group(), expanded_sym)
63 return parse_formula(expanded_formula)
---> 64 sym_dict = get_sym_dict(formula, 1)
65 return sym_dict
66
/usr/local/lib/python3.10/dist-packages/CBFV/composition.py in get_sym_dict(f, factor)
26 f = f.replace(m.group(), "", 1)
27 if f.strip():
---> 28 raise CompositionError(f'{f} is an invalid formula!')
29 return sym_dict
30
CompositionError: ( is an invalid formula!
PyPi doesn't have a description, and it's not obvious from the README. Taylor and I are both having trouble with it. In its own conda environment:
(cbfv) C:\Users\sterg>pip install cbfv
Collecting cbfv
Downloading cbfv-1.0.0-py3-none-any.whl (5.0 kB)
Collecting numpy
Using cached numpy-1.21.2-cp38-cp38-win_amd64.whl (14.0 MB)
Collecting pytest
Downloading pytest-6.2.5-py3-none-any.whl (280 kB)
|████████████████████████████████| 280 kB 819 kB/s
Collecting pandas
Using cached pandas-1.3.3-cp38-cp38-win_amd64.whl (10.2 MB)
Collecting tqdm
Downloading tqdm-4.62.3-py2.py3-none-any.whl (76 kB)
|████████████████████████████████| 76 kB 5.5 MB/s
Collecting python-dateutil>=2.7.3
Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pytz>=2017.3
Downloading pytz-2021.3-py2.py3-none-any.whl (503 kB)
|████████████████████████████████| 503 kB 2.2 MB/s
Collecting iniconfig
Downloading iniconfig-1.1.1-py2.py3-none-any.whl (5.0 kB)
Collecting atomicwrites>=1.0
Downloading atomicwrites-1.4.0-py2.py3-none-any.whl (6.8 kB)
Collecting py>=1.8.2
Downloading py-1.10.0-py2.py3-none-any.whl (97 kB)
|████████████████████████████████| 97 kB 2.2 MB/s
Collecting pluggy<2.0,>=0.12
Downloading pluggy-1.0.0-py2.py3-none-any.whl (13 kB)
Collecting colorama
Using cached colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting packaging
Downloading packaging-21.0-py3-none-any.whl (40 kB)
|████████████████████████████████| 40 kB ...
Collecting attrs>=19.2.0
Downloading attrs-21.2.0-py2.py3-none-any.whl (53 kB)
|████████████████████████████████| 53 kB 1.0 MB/s
Collecting toml
Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB)
Collecting six>=1.5
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting pyparsing>=2.0.2
Using cached pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
Installing collected packages: six, pyparsing, toml, pytz, python-dateutil, py, pluggy, packaging, numpy, iniconfig, colorama, attrs, atomicwrites, tqdm, pytest, pandas, cbfv
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
skrebate 0.6 requires scikit-learn, which is not installed.
skrebate 0.6 requires scipy, which is not installed.
automatminer 1.0.3.20200727 requires matminer==0.6.2, which is not installed.
automatminer 1.0.3.20200727 requires pymatgen==2020.01.28, which is not installed.
automatminer 1.0.3.20200727 requires scikit-learn==0.22.2, which is not installed.
automatminer 1.0.3.20200727 requires tpot==0.11.0, which is not installed.
auto-xrd 0.0.1 requires pymatgen, which is not installed.
auto-xrd 0.0.1 requires scipy, which is not installed.
tensorflow 2.5.0rc1 requires absl-py~=0.10, which is not installed.
tensorflow 2.5.0rc1 requires astunparse~=1.6.3, which is not installed.
tensorflow 2.5.0rc1 requires flatbuffers~=1.12.0, which is not installed.
tensorflow 2.5.0rc1 requires gast==0.4.0, which is not installed.
tensorflow 2.5.0rc1 requires google-pasta~=0.2, which is not installed.
tensorflow 2.5.0rc1 requires grpcio~=1.34.0, which is not installed.
tensorflow 2.5.0rc1 requires h5py~=3.1.0, which is not installed.
tensorflow 2.5.0rc1 requires keras-nightly~=2.5.0.dev, which is not installed.
tensorflow 2.5.0rc1 requires keras-preprocessing~=1.1.2, which is not installed.
tensorflow 2.5.0rc1 requires typing-extensions~=3.7.4, which is not installed.
dtw-python 1.1.10 requires scipy>=1.1, which is not installed.
tensorboard 2.4.1 requires absl-py>=0.4, which is not installed.
tensorboard 2.4.1 requires google-auth<2,>=1.6.3, which is not installed.
tensorboard 2.4.1 requires google-auth-oauthlib<0.5,>=0.4.1, which is not installed.
tensorboard 2.4.1 requires grpcio>=1.24.3, which is not installed.
tensorboard 2.4.1 requires markdown>=2.6.8, which is not installed.
tensorboard 2.4.1 requires requests<3,>=2.21.0, which is not installed.
tensorboard 2.4.1 requires tensorboard-plugin-wit>=1.6.0, which is not installed.
tensorboard 2.4.1 requires werkzeug>=0.11.15, which is not installed.
tensorflow 2.5.0rc1 requires numpy~=1.19.2, but you have numpy 1.21.2 which is incompatible.
tensorflow 2.5.0rc1 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.
Successfully installed atomicwrites-1.4.0 attrs-21.2.0 cbfv-1.0.0 colorama-0.4.4 iniconfig-1.1.1 numpy-1.21.2 packaging-21.0 pandas-1.3.3 pluggy-1.0.0 py-1.10.0 pyparsing-2.4.7 pytest-6.2.5 python-dateutil-2.8.2 pytz-2021.3 six-1.16.0 toml-0.10.2 tqdm-4.62.3
(cbfv) C:\Users\sterg>python3
Python 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import cbfv
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'cbfv'
df.iloc[1]["formula"] = "Og" # make sure at least one gets skipped
X, y, formulae, skipped = generate_features(df)
assert X.shape[0] == len(formulae)
Some formulas in my datasets are occasionally not recognized and I get the error
raise CompositionError(f'{f} is an invalid formula!')
CompositionError: ,65 is an invalid formula!
This is happening into get_sym_dict() function. Is there a way to automatically drop non-recognized symbols?
Exception has occurred: ValueError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
'Me' is not in list
File "[C:\Users\sterg\miniconda3\envs\vickers\Lib\site-packages\composition_based_feature_vector\composition.py]()", line 133, in _assign_features
row = elem_index[elem_symbols.index(elem)]
File "[C:\Users\sterg\miniconda3\envs\vickers\Lib\site-packages\composition_based_feature_vector\composition.py]()", line 295, in generate_features
feats, targets, formulae, skipped = _assign_features(matrices,
File "[C:\Users\sterg\Documents\GitHub\sparks-baird\VickersHardnessPrediction\vickers_hardness\utils\mpds.py]()", line 12, in <module>
X, y, formulae, skipped = generate_features(df)
Processing Input Data: 100%|██████████| 1794/1794 [00:00<00:00, 7378.49it/s]
Featurizing Compositions...
Assigning Features...: 100%|██████████| 1778/1778 [00:00<00:00, 3426.03it/s]
NOTE: Your data contains formula with exotic elements. These were skipped.
Creating Pandas Objects...
---------------------------------------------------------------------------
InvalidIndexError Traceback (most recent call last)
[<ipython-input-45-22826a03d387>](https://localhost:8080/#) in <module>()
1 from CBFV import composition
----> 2 X, y, formulae, skipped = composition.generate_features(df, extend_features="R")
4 frames
[/usr/local/lib/python3.7/dist-packages/CBFV/composition.py](https://localhost:8080/#) in generate_features(df, elem_prop, drop_duplicates, extend_features, sum_feat, mini)
307 extended = pd.DataFrame(extra_features, columns=features)
308 extended = extended.set_index('formula', drop=True)
--> 309 X = pd.concat([X, extended], axis=1)
310
311 # reset dataframe indices
[/usr/local/lib/python3.7/dist-packages/pandas/util/_decorators.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
[/usr/local/lib/python3.7/dist-packages/pandas/core/reshape/concat.py](https://localhost:8080/#) in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
305 )
306
--> 307 return op.get_result()
308
309
[/usr/local/lib/python3.7/dist-packages/pandas/core/reshape/concat.py](https://localhost:8080/#) in get_result(self)
526 obj_labels = obj.axes[1 - ax]
527 if not new_labels.equals(obj_labels):
--> 528 indexers[ax] = obj_labels.get_indexer(new_labels)
529
530 mgrs_indexers.append((obj._mgr, indexers))
[/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_indexer(self, target, method, limit, tolerance)
3440
3441 if not self._index_as_unique:
-> 3442 raise InvalidIndexError(self._requires_unique_msg)
3443
3444 if not self._should_compare(target) and not is_interval_dtype(self.dtype):
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
What is the paper that accompanies this? Maybe include in the README?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.