kaaiian / cbfv Goto Github PK

Tool to quickly create a composition-based feature vector

Python 100.00%

cbfv's Issues

CompositionError: ( is an invalid formula!

Processing Input Data: 92%|█████████▏| 1263/1377 [00:00<00:00, 20629.89it/s]

CompositionError Traceback (most recent call last)
in <cell line: 1>()
1 for f in['jarvis','magpie','mat2vec','oliynyk','onehot','random_200']:
----> 2 X_train_unscaled,y_train,formulae_train,skipped_train = generate_features(df, elem_prop=f, drop_duplicates=False, extend_features=False, sum_feat=True)
3 #it has to be tested again with bg data which I have created by deleting the duplicates
4
5 SEED=42

4 frames
/usr/local/lib/python3.10/dist-packages/CBFV/composition.py in generate_features(df, elem_prop, drop_duplicates, extend_features, sum_feat, mini)
281 if 'x' in formula:
282 continue
--> 283 l1, l2 = _element_composition_L(formula)
284 formula_mat.append(l1)
285 count_mat.append(l2)

/usr/local/lib/python3.10/dist-packages/CBFV/composition.py in _element_composition_L(formula)
97
98 def _element_composition_L(formula):
---> 99 comp_frac = _element_composition(formula)
100 atoms = list(comp_frac.keys())
101 counts = list(comp_frac.values())

/usr/local/lib/python3.10/dist-packages/CBFV/composition.py in _element_composition(formula)
86
87 def _element_composition(formula):
---> 88 elmap = parse_formula(formula)
89 elamt = {}
90 natoms = 0

/usr/local/lib/python3.10/dist-packages/CBFV/composition.py in parse_formula(formula)
62 expanded_formula = formula.replace(m.group(), expanded_sym)
63 return parse_formula(expanded_formula)
---> 64 sym_dict = get_sym_dict(formula, 1)
65 return sym_dict
66

/usr/local/lib/python3.10/dist-packages/CBFV/composition.py in get_sym_dict(f, factor)
26 f = f.replace(m.group(), "", 1)
27 if f.strip():
---> 28 raise CompositionError(f'{f} is an invalid formula!')
29 return sym_dict
30

CompositionError: ( is an invalid formula!

Usage instructions for pip-installed cbfv (possible bug)

PyPi doesn't have a description, and it's not obvious from the README. Taylor and I are both having trouble with it. In its own conda environment:

(cbfv) C:\Users\sterg>pip install cbfv
Collecting cbfv
  Downloading cbfv-1.0.0-py3-none-any.whl (5.0 kB)
Collecting numpy
  Using cached numpy-1.21.2-cp38-cp38-win_amd64.whl (14.0 MB)
Collecting pytest
  Downloading pytest-6.2.5-py3-none-any.whl (280 kB)
     |████████████████████████████████| 280 kB 819 kB/s
Collecting pandas
  Using cached pandas-1.3.3-cp38-cp38-win_amd64.whl (10.2 MB)
Collecting tqdm
  Downloading tqdm-4.62.3-py2.py3-none-any.whl (76 kB)
     |████████████████████████████████| 76 kB 5.5 MB/s
Collecting python-dateutil>=2.7.3
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pytz>=2017.3
  Downloading pytz-2021.3-py2.py3-none-any.whl (503 kB)
     |████████████████████████████████| 503 kB 2.2 MB/s
Collecting iniconfig
  Downloading iniconfig-1.1.1-py2.py3-none-any.whl (5.0 kB)
Collecting atomicwrites>=1.0
  Downloading atomicwrites-1.4.0-py2.py3-none-any.whl (6.8 kB)
Collecting py>=1.8.2
  Downloading py-1.10.0-py2.py3-none-any.whl (97 kB)
     |████████████████████████████████| 97 kB 2.2 MB/s
Collecting pluggy<2.0,>=0.12
  Downloading pluggy-1.0.0-py2.py3-none-any.whl (13 kB)
Collecting colorama
  Using cached colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting packaging
  Downloading packaging-21.0-py3-none-any.whl (40 kB)
     |████████████████████████████████| 40 kB ...
Collecting attrs>=19.2.0
  Downloading attrs-21.2.0-py2.py3-none-any.whl (53 kB)
     |████████████████████████████████| 53 kB 1.0 MB/s
Collecting toml
  Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB)
Collecting six>=1.5
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting pyparsing>=2.0.2
  Using cached pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
Installing collected packages: six, pyparsing, toml, pytz, python-dateutil, py, pluggy, packaging, numpy, iniconfig, colorama, attrs, atomicwrites, tqdm, pytest, pandas, cbfv
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
skrebate 0.6 requires scikit-learn, which is not installed.
skrebate 0.6 requires scipy, which is not installed.
automatminer 1.0.3.20200727 requires matminer==0.6.2, which is not installed.
automatminer 1.0.3.20200727 requires pymatgen==2020.01.28, which is not installed.
automatminer 1.0.3.20200727 requires scikit-learn==0.22.2, which is not installed.
automatminer 1.0.3.20200727 requires tpot==0.11.0, which is not installed.
auto-xrd 0.0.1 requires pymatgen, which is not installed.
auto-xrd 0.0.1 requires scipy, which is not installed.
tensorflow 2.5.0rc1 requires absl-py~=0.10, which is not installed.
tensorflow 2.5.0rc1 requires astunparse~=1.6.3, which is not installed.
tensorflow 2.5.0rc1 requires flatbuffers~=1.12.0, which is not installed.
tensorflow 2.5.0rc1 requires gast==0.4.0, which is not installed.
tensorflow 2.5.0rc1 requires google-pasta~=0.2, which is not installed.
tensorflow 2.5.0rc1 requires grpcio~=1.34.0, which is not installed.
tensorflow 2.5.0rc1 requires h5py~=3.1.0, which is not installed.
tensorflow 2.5.0rc1 requires keras-nightly~=2.5.0.dev, which is not installed.
tensorflow 2.5.0rc1 requires keras-preprocessing~=1.1.2, which is not installed.
tensorflow 2.5.0rc1 requires typing-extensions~=3.7.4, which is not installed.
dtw-python 1.1.10 requires scipy>=1.1, which is not installed.
tensorboard 2.4.1 requires absl-py>=0.4, which is not installed.
tensorboard 2.4.1 requires google-auth<2,>=1.6.3, which is not installed.
tensorboard 2.4.1 requires google-auth-oauthlib<0.5,>=0.4.1, which is not installed.
tensorboard 2.4.1 requires grpcio>=1.24.3, which is not installed.
tensorboard 2.4.1 requires markdown>=2.6.8, which is not installed.
tensorboard 2.4.1 requires requests<3,>=2.21.0, which is not installed.
tensorboard 2.4.1 requires tensorboard-plugin-wit>=1.6.0, which is not installed.
tensorboard 2.4.1 requires werkzeug>=0.11.15, which is not installed.
tensorflow 2.5.0rc1 requires numpy~=1.19.2, but you have numpy 1.21.2 which is incompatible.
tensorflow 2.5.0rc1 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.
Successfully installed atomicwrites-1.4.0 attrs-21.2.0 cbfv-1.0.0 colorama-0.4.4 iniconfig-1.1.1 numpy-1.21.2 packaging-21.0 pandas-1.3.3 pluggy-1.0.0 py-1.10.0 pyparsing-2.4.7 pytest-6.2.5 python-dateutil-2.8.2 pytz-2021.3 six-1.16.0 toml-0.10.2 tqdm-4.62.3

(cbfv) C:\Users\sterg>python3
Python 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import cbfv
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'cbfv'

`len(formulae)` and `X.shape[0]` don't match when certain formulas are skipped

df.iloc[1]["formula"] = "Og" # make sure at least one gets skipped
X, y, formulae, skipped = generate_features(df)
assert X.shape[0] == len(formulae)

Can anyone explain what is sum_feat=True ?

Invalid formulas during generate_features()

Some formulas in my datasets are occasionally not recognized and I get the error

raise CompositionError(f'{f} is an invalid formula!')
CompositionError: ,65 is an invalid formula!

This is happening into get_sym_dict() function. Is there a way to automatically drop non-recognized symbols?

`Me` element missing, not accounted for in "exotic" elements checking

Exception has occurred: ValueError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
'Me' is not in list
  File "[C:\Users\sterg\miniconda3\envs\vickers\Lib\site-packages\composition_based_feature_vector\composition.py]()", line 133, in _assign_features
    row = elem_index[elem_symbols.index(elem)]
  File "[C:\Users\sterg\miniconda3\envs\vickers\Lib\site-packages\composition_based_feature_vector\composition.py]()", line 295, in generate_features
    feats, targets, formulae, skipped = _assign_features(matrices,
  File "[C:\Users\sterg\Documents\GitHub\sparks-baird\VickersHardnessPrediction\vickers_hardness\utils\mpds.py]()", line 12, in <module>
    X, y, formulae, skipped = generate_features(df)

Add skipatom featurizer

lantunes/skipatom#6

How to use if the formula already is normalised to 1

`generate_features(..., extend_features=...)` InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Processing Input Data: 100%|██████████| 1794/1794 [00:00<00:00, 7378.49it/s]
	Featurizing Compositions...
Assigning Features...: 100%|██████████| 1778/1778 [00:00<00:00, 3426.03it/s]
NOTE: Your data contains formula with exotic elements. These were skipped.
	Creating Pandas Objects...

---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
[<ipython-input-45-22826a03d387>](https://localhost:8080/#) in <module>()
      1 from CBFV import composition
----> 2 X, y, formulae, skipped = composition.generate_features(df, extend_features="R")

4 frames
[/usr/local/lib/python3.7/dist-packages/CBFV/composition.py](https://localhost:8080/#) in generate_features(df, elem_prop, drop_duplicates, extend_features, sum_feat, mini)
    307         extended = pd.DataFrame(extra_features, columns=features)
    308         extended = extended.set_index('formula', drop=True)
--> 309         X = pd.concat([X, extended], axis=1)
    310 
    311     # reset dataframe indices

[/usr/local/lib/python3.7/dist-packages/pandas/util/_decorators.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

[/usr/local/lib/python3.7/dist-packages/pandas/core/reshape/concat.py](https://localhost:8080/#) in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    305     )
    306 
--> 307     return op.get_result()
    308 
    309 

[/usr/local/lib/python3.7/dist-packages/pandas/core/reshape/concat.py](https://localhost:8080/#) in get_result(self)
    526                     obj_labels = obj.axes[1 - ax]
    527                     if not new_labels.equals(obj_labels):
--> 528                         indexers[ax] = obj_labels.get_indexer(new_labels)
    529 
    530                 mgrs_indexers.append((obj._mgr, indexers))

[/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_indexer(self, target, method, limit, tolerance)
   3440 
   3441         if not self._index_as_unique:
-> 3442             raise InvalidIndexError(self._requires_unique_msg)
   3443 
   3444         if not self._should_compare(target) and not is_interval_dtype(self.dtype):

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Accompanying paper

What is the paper that accompanies this? Maybe include in the README?

kaaiian / cbfv Goto Github PK

cbfv's Issues

CompositionError: ( is an invalid formula!

Processing Input Data: 92%|█████████▏| 1263/1377 [00:00<00:00, 20629.89it/s]

Usage instructions for pip-installed cbfv (possible bug)

`len(formulae)` and `X.shape[0]` don't match when certain formulas are skipped

Can anyone explain what is sum_feat=True ?

Invalid formulas during generate_features()

`Me` element missing, not accounted for in "exotic" elements checking

Add skipatom featurizer

How to use if the formula already is normalised to 1

`generate_features(..., extend_features=...)` InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Accompanying paper

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent