Code Monkey home page Code Monkey logo

Comments (6)

janosh avatar janosh commented on June 12, 2024

My bad for not handling pickle files separately in load_train_test(). Maybe we should rename the function into load() since it's clearly morphing into more than just a training and test set loader. @pbenner Curious to hear your opinion!

from matbench-discovery.

pbenner avatar pbenner commented on June 12, 2024

Yes sounds good! Also fetch_process_wbm_dataset.py could be fully integrated and called when first running load()

from matbench-discovery.

pbenner avatar pbenner commented on June 12, 2024

This is the error I get using the new branch:

Traceback (most recent call last):
  File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 322, in <module>
    assert sum(no_id_mask := df_summary.index.isna()) == 6, f"{sum(no_id_mask)=}"
AssertionError: sum(no_id_mask)=0

from matbench-discovery.

janosh avatar janosh commented on June 12, 2024

Are you using pandas v1.x.x? I just changed the code from v1 to v2 compat. I'll downwards pin pandas in pyproject.toml to avoid this in the future.

from matbench-discovery.

pbenner avatar pbenner commented on June 12, 2024

Indeed, I had pandas 1.5, trying to check with pandas 2.0. Meanwhile, I think 2023-02-07-mp-elemental-reference-entries.json.gz was modified:

python data/wbm/fetch_process_wbm_dataset.py
Loading 'wbm_summary' from cached file at '/home/pbenner/.cache/matbench-discovery/1.0.0/wbm/2022-10-19-wbm-summary.csv'
Warning: '/home/pbenner/.cache/matbench-discovery/1.0.0/mp/2023-02-07-mp-elemental-reference-entries.json.gz' associated with key='mp_elemental_ref_entries' does not exist. Would you like to download it now using matbench_discovery.data.load_train_test('mp_elemental_ref_entries'). This will cache the file for future use. [y/n] y
Downloading 'mp_elemental_ref_entries' from https://figshare.com/ndownloader/files/40344445

variable dump:
file='mp/2023-02-07-mp-elemental-reference-entries.json.gz',
url='https://figshare.com/ndownloader/files/40344445',
reader=<function read_json at 0x7f9898a875b0>,
kwargs={'compression': 'gzip'}
Traceback (most recent call last):
File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 24, in
from matbench_discovery.energy import get_e_form_per_atom
File "/home/pbenner/Source/tmp/matbench-discovery/matbench_discovery/energy.py", line 66, in
pd.read_json(DATA_FILES.mp_elemental_ref_entries, typ="series")
File "/home/pbenner/Source/tmp/matbench-discovery/matbench_discovery/data.py", line 217, in getattribute
self._on_not_found(key, msg)
File "/home/pbenner/Source/tmp/matbench-discovery/matbench_discovery/data.py", line 239, in _on_not_found
load_train_test(key) # download and cache data file
File "/home/pbenner/Source/tmp/matbench-discovery/matbench_discovery/data.py", line 111, in load_train_test
df = reader(url, **kwargs)
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/pandas/io/json/_json.py", line 760, in read_json
json_reader = JsonReader(
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/pandas/io/json/_json.py", line 862, in init
self.data = self._preprocess_data(data)
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/pandas/io/json/_json.py", line 874, in _preprocess_data
data = data.read()
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/gzip.py", line 301, in read
return self._buffer.read(size)
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/_compression.py", line 118, in readall
while data := self.read(sys.maxsize):
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/gzip.py", line 488, in read
if not self._read_gzip_header():
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/gzip.py", line 436, in _read_gzip_header
raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'{\n')

from matbench-discovery.

janosh avatar janosh commented on June 12, 2024

Yeah, I was in the process of updating the Figshare files but then got carried away. That error will be fixed before I merge #26.

from matbench-discovery.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.