mavensdc / cdflib Goto Github PK

A python module for reading NASA's Common Data Format (cdf) files

License: MIT License

Python 99.95% Dockerfile 0.05%

cdflib's Introduction

CDFlib

cdflib is a python module to read/write CDF (Common Data Format .cdf) files without needing to install the CDF NASA library.

Python >= 3.8 is required. The core of this package uses only numpy, with no complicated compiler requirements.

Install

To install, open up your terminal/command prompt, and type:

pip install cdflib

Documentation

The full documentation can be found here:

https://cdflib.readthedocs.io/en/latest/

cdflib's People

Contributors

Stargazers

Watchers

Forkers

wafels taz25 cadair sfieux liufelix dstansby abotiamnot aredshaw jessecob13 d-saturnino gdy1997 goobley argallmr msbentley scivision xhchen03 gaojiawei321 to77222 kohlrabi keimasunaga zhanfeng1986 ajefweiss fergusmcd hugovk meta1209 shanqiangchen nickssl jonathonmsmith aburrell jklenzing htyeim ericthewizard jeandet sandyfreelance cactus-mission brad-trantham krvidal mshumko spedas ayrisa warrickball bryan-harter maxinelasp

cdflib's Issues

Variable to specify output time format

Nick Hatzigeorgiu suggests:

In the library cdflib.cdfepoch you can provide a variable to specify the output time format.
For example, 2004-05-13T15:08, without seconds and milliseconds.

test_compute_cdftt2000 failure

The following hypothesis example is failing:

=================================== FAILURES ===================================

____________________________ test_compute_cdftt2000 ____________________________

>   ???

tests/test_epochs.py:166: 

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

dtime = datetime.datetime(2292, 5, 1, 0, 0)

    @given(random_dtime)

    @settings(max_examples=100)

    @example(datetime(1972, 1, 1, 0, 0))

    def test_compute_cdftt2000(dtime):

        random_time = [dtime.year, dtime.month, dtime.day,

                       dtime.hour, dtime.minute, dtime.second,

                       dtime.microsecond // 1000,  # Millisecond

                       randint(0, 999),     # Microsecond

                       randint(0, 999),     # Nanosecond

                       ]

        x = cdfepoch.breakdown(cdfepoch.compute(random_time))

        for i, t in enumerate(x):

>           assert t == random_time[i], f'Time {random_time} was not equal to {x}'

E           AssertionError: Time [2292, 5, 1, 0, 0, 0, 0, 864, 394] was not equal to [1707, 10, 12, 0, 26, 3, 291, 312, 778]

E           assert 1707 == 2292

E             +1707

E             -2292

tests/test_epochs.py:177: AssertionError

enforce gzip

For my project I need enforce compression on variable level also if the compressed size is not smaller than uncompressed. This fix is really urgent.

cdflib-0.3.12 fails to decompress/read some CDFs

Reading in certain data files with cdflib.CDF() fails with an error in the _read_cpr function. This has been confirmed on multiple installations and operating systems:

cdflib.CDF("mms1_dsp_fast_l2_swd_20150721_v0.4.0.cdf")
Traceback (most recent call last):
File "", line 1, in
File "/home/user/.local/lib/python3.7/site-packages/cdflib/init.py", line 16, in CDF
return cdfread.CDF(path, validate=validate)
File "/home/user/.local/lib/python3.7/site-packages/cdflib/cdfread.py", line 78, in init
new_path = self._uncompress_file(path)
File "/home/user/.local/lib/python3.7/site-packages/cdflib/cdfread.py", line 714, in _uncompress_file
data_start, data_size, cType, _ = self._read_ccr(8)
File "/home/user/.local/lib/python3.7/site-packages/cdflib/cdfread.py", line 740, in _read_ccr
cType, cParams = self._read_cpr(cproffset)
File "/home/user/.local/lib/python3.7/site-packages/cdflib/cdfread.py", line 761, in _read_cpr
cpr = f.read(block_size-8)
ValueError: read length must be non-negative or -1

Example data file (fails):
mms1_dsp_fast_l2_swd_20150721_v0.4.0.cdf.zip

Example data file (succeeds):
mms1_dsp_fast_l2_swd_20150720_v0.4.0.cdf.zip

Both files are read with no issues by the NASA CDF library (and IDL).

_default_pad() missing 1 required positional argument: 'num_elms'

Trying to get data from SyncStatus variable of the following CDF file

SW_OPER_EFIA_LP_1B_20131202T101113_20131202T140109_0501_MDR_EFI_LP.cdf

the following TypeError exception is raised:

In [1]: import cdflib                                                                                                      

In [2]: cdf = cdflib.CDF('SW_OPER_EFIA_LP_1B_20131202T101113_20131202T140109_0501_MDR_EFI_LP.cdf')                         

In [3]: cdf.cdf_info()['zVariables']                                                                                       
Out[3]: 
['Timestamp',
 'SyncStatus',
 'Latitude',
 'Longitude',
 'Radius',
 'U_orbit',
 'Ne',
 'Ne_error',
 'Te',
 'Te_error',
 'Vs',
 'Vs_error',
 'Flags_LP',
 'Flags_Ne',
 'Flags_Te',
 'Flags_Vs']

In [4]: cdf.varget('SyncStatus')                                                                                           
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-8887b529ae74> in <module>
----> 1 cdf.varget('SyncStatus')

~/.local/opt/anaconda3/envs/swarmdata/lib/python3.7/site-packages/cdflib/cdfread.py in varget(self, variable, epoch, starttime, endtime, startrec, endrec, record_range_only, inq, expand, to_np)
    623             return self._read_vardata(vdr_info, epoch=epoch, starttime=starttime, endtime=endtime,
    624                                       startrec=startrec, endrec=endrec, record_range_only=record_range_only,
--> 625                                       expand=expand, to_np=to_np)
    626 
    627     def epochrange(self, epoch = None, starttime = None, endtime = None):

~/.local/opt/anaconda3/envs/swarmdata/lib/python3.7/site-packages/cdflib/cdfread.py in _read_vardata(self, vdr_info, epoch, starttime, endtime, startrec, endrec, record_range_only, expand, to_np)
   2021 
   2022         data = self._read_vvrs(vdr_info, vvr_offsets, vvr_start, vvr_end,
-> 2023                                startrec, endrec, to_np=to_np)            
   2024         if record_range_only:
   2025             return [startrec, endrec]

~/.local/opt/anaconda3/envs/swarmdata/lib/python3.7/site-packages/cdflib/cdfread.py in _read_vvrs(self, vdr_dict, vvr_offs, vvr_start, vvr_end, startrec, endrec, to_np)
   1684             else:
   1685                 filled_data = CDF._convert_np_data(\
-> 1686                                     self._default_pad(vdr_dict['data_type']),\
   1687                                     vdr_dict['data_type'],\
   1688                                     vdr_dict['num_elements'])

TypeError: _default_pad() missing 1 required positional argument: 'num_elms'

Please find the above mentioned file in the ZIP archive:

ftp://swarm-diss.eo.esa.int/Level1b/Entire_mission_data/EFIx_LP/Sat_A/SW_OPER_EFIA_LP_1B_20131202T101113_20131202T140109_0501.CDF.ZIP

Consider using astropy.time for representation of time formats

Astropy Time has a lot of machinery for storing and transforming different time formats and epochs, including support for custom transformations. It looks like a lot of the epoch code could be simplified by using some of that functionality.

CDF Epoch should return datetime objects

Converting the EPOCH to readable date is good, but if instead of an array of separate components, if the modules could also return an array of datetime objects, that would be great, and make a lot of tasks simpler.

OSError: filename.cdf is not a CDF file or a non-supported CDF!

This is my introduction to CDF files, but have no idea what non-supported means, and how I could open it

Astropy epoch module returns lists of length 1

The astropy epoch module usually returns lists, even if the input might not be a list. We should make them consistent between the two modules.

Create unzipped CDF file in tmp directory instead of current directory

Currently, cdflib unzips a compressed cdf file to wherever the cdf file is located. If a user doesn't have write permissions in this directory, then this causes a problem.

Instead, check for a temporary directory, or use the current working directory, to save the unzipped file.

Cannot set WRITEABLE flag to True of this array

This issue was detected on release 0.3.9 while reading the same kind of data of closed issue #20.

The steps to reproduce it are the following:

Download file:

ftp://swarm-diss.eo.esa.int/Level1b/Entire_mission_data/MAGx_LR/Sat_A/SW_OPER_MAGA_LR_1B_20180816T000000_20180816T235959_0505.CDF.ZIP
Unzip downloaded file SW_OPER_MAGA_LR_1B_20180816T000000_20180816T235959_0505.CDF.ZIP
Open file SW_OPER_MAGA_LR_1B_20180816T000000_20180816T235959_0505_MDR_MAG_LR.cdf contained in the ZIP archive:
```
import cdflib
cdf = cdflib.CDF('SW_OPER_MAGA_LR_1B_20180816T000000_20180816T235959_0505_MDR_MAG_LR.cdf')
```
Get data from SyncStatus variable:
```
x = cdf.varget('SyncStatus')
```

At step 4, the following error is traced:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-8975049777ad> in <module>
----> 1 x = cdf.varget('SyncStatus')

~/.local/opt/anaconda3/envs/swarm/lib/python3.7/site-packages/cdflib/cdfread.py in varget(self, variable, epoch, starttime, endtime, startrec, endrec, record_range_only, inq, expand, to_np)
    543             return self._read_vardata(vdr_info, epoch=epoch, starttime=starttime, endtime=endtime,
    544                                       startrec=startrec, endrec=endrec, record_range_only=record_range_only,
--> 545                                       expand=expand, to_np=to_np)
    546 
    547     def epochrange(self, epoch=None, starttime=None, endtime=None):

~/.local/opt/anaconda3/envs/swarm/lib/python3.7/site-packages/cdflib/cdfread.py in _read_vardata(self, vdr_info, epoch, starttime, endtime, startrec, endrec, record_range_only, expand, to_np)
   2010 
   2011         data = self._read_vvrs(vdr_info, vvr_offsets, vvr_start, vvr_end,
-> 2012                                startrec, endrec, to_np=to_np)
   2013         if record_range_only:
   2014             return [startrec, endrec]

~/.local/opt/anaconda3/envs/swarm/lib/python3.7/site-packages/cdflib/cdfread.py in _read_vvrs(self, vdr_dict, vvr_offs, vvr_start, vvr_end, startrec, endrec, to_np)
   1671                 filled_data = CDF._convert_np_data(
   1672                     self._default_pad(vdr_dict['data_type'],
-> 1673                                       vdr_dict['num_elements']),
   1674                     vdr_dict['data_type'],
   1675                     vdr_dict['num_elements'])

~/.local/opt/anaconda3/envs/swarm/lib/python3.7/site-packages/cdflib/cdfread.py in _default_pad(self, data_type, num_elms)
   2149         dt = np.dtype(dt_string)
   2150         ret = np.frombuffer(pad_value, dtype=dt, count=1)
-> 2151         ret.setflags('WRITEABLE')
   2152         return ret
   2153 

ValueError: cannot set WRITEABLE flag to True of this array

write_var(): Data_Type: CDF_CHAR , error

cdf_file.write_var(var_info, var_attrs=var_attrs, var_data=var_data)

var_data: {'Rec_Ndim': 1, 'Rec_Shape': [3], 'Num_Records': 1, 'Records_Returned': 0, 'Item_Size': 20, 'Data_Type': 'CDF_CHAR', 'Data': array([[['O2+ FlowV X DQ (STA)'],
['O2+ FlowV Y DQ (STA)'],
['O2+ FlowV Z DQ (STA)']]], dtype='<U20')}

error: unpack requires a buffer of 60 bytes

Move getversion to the init.py file?

getversion is located in two different places in the file, should it just be one function?

use cdflib.wirte_ var causes file volume to increase

When I converted from several MB MATLAB data files to files, the file volume increased to GB level

f-string breaking Python 3.5 compatibility

There's (at least one) f-string at

cdflib/cdflib/cdfread.py

Line 510 in 1d58d72

raise ValueError(f"Variable name '{variable}' not found.")

- the feature was only introduced in Python 3.6 so it breaks under 3.5

Rather than fixing, I wonder if 3.5 support could be dropped? Since I'm sure we all love f-strings! and Numpy dev have already dropped 3.5.

I guess it could be a contentious issue for the less-actively maintained scientific libraries and systems, though my impression is that this is generally improving across communities. Should libraries like this one adopt some common deprecation policy from elsewhere? Maybe from PyHC, sunpy, astropy...?

Astropy epoch performance generally bad

While each epoch class uses the same functions, Astropy can be much slower. It is faster for getting the unixtime and encode functions, but it is MUCH slower for the compute and parse functions.

Generally this seems to be stemming from the time it takes to convert an astropy time object to the cdf format, so for example:

Time(1000, format='unix').cdf_tt2000

takes up a lot of time.

We should figure out where the slowdown is happening in astropy, and determine if there is something we can do to speed it up. If not, we should just default to utilizing the functions in the primary CDFepoch module.

Doc error

Key documented as var_spec['Dims_Sizes'] is actually var_spec['Dim_Sizes'] in code

Read from in-memory bytes

Working with a large number of large sized CDF files (ex MMS, ~100 Mb/file) it is more efficient to not incur additional IO cost from pulling data from a remote location, having to write it to disk, then marshalling it into memory as a python object. To illustrate, the current process working with an S3 bucket of CDF data:

` for obj in bucket.objects.all():
key = obj.key
body = obj.get()["Body"].read()

  # write the object to disk
  with open (key, "wb") as f:
    f.write(body)

  # load the CDF file in python object
  cdf = cdflib.CDF(key)

instead, the serialization to disk could be skipped (and associated overhead of writing to disk) if we could do:

cdf = cdflib.CDF(io.BytesIO(body))

Automatically use cdfepoch.compute() when given a string input to start/end in varget

Implement the following:

cdf_file.varget('Variable1', starttime='2016-01-01', endtime='2016-01-03')

A user can easily convert these times themselves using the cdfepoch module, but this will cut down on their needed code.

Convert print statements to warnings

Currently there are a lot of places (e.g. when a variable can't be found) that just contain plain print statements. These should all be upgraded to at least proper warnings, and in some cases I suspect exceptions that halt the program.

installation error, spacepy is required?

env: annaconda python=3.8 ubuntu

pip install cdflib
Requirement already satisfied: cdflib in ./.local/lib/python3.8/site-packages (0.3.20)
Collecting numpy
  Using cached numpy-1.21.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.8 MB)
Collecting attrs>=19.2.0
  Using cached attrs-21.2.0-py2.py3-none-any.whl (53 kB)
Installing collected packages: numpy, attrs
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacepy 0.2.2 requires h5py>=2.6, which is not installed.
spacepy 0.2.2 requires matplotlib>=1.5, which is not installed.
spacepy 0.2.2 requires networkx>=1.0, which is not installed.
spacepy 0.2.2 requires python-dateutil>=1.4, which is not installed.
spacepy 0.2.2 requires scipy>=0.11, which is not installed.
Successfully installed attrs-21.2.0 numpy-1.21.1

Error while retrieving data data from a variable

The following steps have been executed:

Download file:

ftp://swarm-diss.eo.esa.int/Level1b/Entire_mission_data/MAGx_LR/Sat_A/SW_OPER_MAGA_LR_1B_20180816T000000_20180816T235959_0408.CDF.ZIP

Unzip downloaded file SW_OPER_MAGA_LR_1B_20180816T000000_20180816T235959_0408.CDF.ZIP
Read the file SW_OPER_MAGA_LR_1B_20180816T000000_20180816T235959_0408_MDR_MAG_LR.cdf contained in the ZIP archive:

import cdflib
cdf = cdflib.CDF('SW_OPER_MAGA_LR_1B_20180816T000000_20180816T235959_0408_MDR_MAG_LR.cdf')

Get data from the SyncStatus variable:

x = cdf.varget('SyncStatus')

At step 4, the following error is traced:

In [6]: cdf.varget('SyncStatus')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-8887b529ae74> in <module>()
----> 1 cdf.varget('SyncStatus')

~/.local/opt/anaconda3/envs/swarmdata/lib/python3.6/site-packages/cdflib/cdfread.py in varget(self, variable, epoch, starttime, endtime, startrec, endrec, record_range_only, inq, expand, to_np)
    623             return self._read_vardata(vdr_info, epoch=epoch, starttime=starttime, endtime=endtime,
    624                                       startrec=startrec, endrec=endrec, record_range_only=record_range_only,
--> 625                                       expand=expand, to_np=to_np)
    626 
    627     def epochrange(self, epoch = None, starttime = None, endtime = None):

~/.local/opt/anaconda3/envs/swarmdata/lib/python3.6/site-packages/cdflib/cdfread.py in _read_vardata(self, vdr_info, epoch, starttime, endtime, startrec, endrec, record_range_only, expand, to_np)
   2021 
   2022         data = self._read_vvrs(vdr_info, vvr_offsets, vvr_start, vvr_end,
-> 2023                                startrec, endrec, to_np=to_np)            
   2024         if record_range_only:
   2025             return [startrec, endrec]

~/.local/opt/anaconda3/envs/swarmdata/lib/python3.6/site-packages/cdflib/cdfread.py in _read_vvrs(self, vdr_dict, vvr_offs, vvr_start, vvr_end, startrec, endrec, to_np)
   1684             else:
   1685                 filled_data = CDF._convert_np_data(\
-> 1686                                     self._default_pad(vdr_dict['data_type']),\
   1687                                     vdr_dict['data_type'],\
   1688                                     vdr_dict['num_elements'])

TypeError: _default_pad() missing 1 required positional argument: 'num_elms'

epochs.py in cdflib is using the local time zone instead of UTC

Copying a bug report from X. N. Chu (@xnchu), reported to pySPEDAS (with a minor correction to the expected result):

The epoch is converted to unix_time at line 195 in cdf_to_tplot.py
If cdflib is used, it is converted using function unixtime at line 192 in epochs.py.
The problem is line 222:
unixtime.append(datetime.datetime(*date).timestamp())
It assumes local time instead of UTC. Therefore, the time is offset by your local time.
The codes to reproduce the error is attached below.
The result should be 2010-01-01/00:00:00.

import pyspedas
import numpy as np
import pandas as pd
trange=['2010-01-01/00:00:00', '2010-01-02/00:00:00']
varname = 'BX_GSE'
data_omni = pyspedas.omni.data(trange=['2010-01-01/00:00:00', '2010-01-02/00:00:00'],notplot=True,varformat=varname,time_clip=True)
data = np.array(data_omni[varname]['y'])
unix_time = np.array(data_omni[varname]['x'])
date_time = pd.to_datetime(data_omni[varname]['x'],unit='s')
print(date_time[0])

I was able to reproduce the problem without astropy installed (if astropy is installed, pytplot doesn't use epochs.py)

gzip issue

Hi there,

I have recently begun using cdflib, and it works great. However I get this error when I try to read in a certain variable in my .cdf called "Flags_B". Please see attached code.
The .cdf can be found at: https://github.com/eyhl/geomagpy

import cdflib

path = "data/SW_OPER_MAGB_LR_1B_20140914T000000_20140914T235959_0405_MDR_MAG_LR.cdf"
cdf_file = cdflib.CDF(path)
cdf_file.varget("Flags_B")

Prodcuces the error:

Traceback (most recent call last):
  File "/Users/eyu/Google Drive/DTU/BSc_continued/0remake/geomagpy/dev_cdf_loader.py", line 18, in <module>
    cdf_file.varget("Flags_B")
  File "/anaconda3/envs/geomagpy/lib/python3.6/site-packages/cdflib/cdfread.py", line 545, in varget
    expand=expand, to_np=to_np)
  File "/anaconda3/envs/geomagpy/lib/python3.6/site-packages/cdflib/cdfread.py", line 2007, in _read_vardata
    startrec, endrec, to_np=to_np)
  File "/anaconda3/envs/geomagpy/lib/python3.6/site-packages/cdflib/cdfread.py", line 1653, in _read_vvrs
    var_block_data = self._read_vvr_block(vvr_offs[vvr_num])
  File "/anaconda3/envs/geomagpy/lib/python3.6/site-packages/cdflib/cdfread.py", line 2177, in _read_vvr_block
    return gzip.decompress(block[16:])
  File "/anaconda3/envs/geomagpy/lib/python3.6/gzip.py", line 532, in decompress
    return f.read()
  File "/anaconda3/envs/geomagpy/lib/python3.6/gzip.py", line 276, in read
    return self._buffer.read(size)
  File "/anaconda3/envs/geomagpy/lib/python3.6/gzip.py", line 463, in read
    if not self._read_gzip_header():
  File "/anaconda3/envs/geomagpy/lib/python3.6/gzip.py", line 411, in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'\x7fA')

It seems to me that Flags_B is already decompressed but cdflib tries to decompress and it fails?

I am running python 3.6 on macOS High Sierra 10.13.6

Hope you can help out!
Best regards,
Eigil

Add zenodo archiving to make cdflib citeable

It would be nice if future releases could be archived on zenodo to give them a citeable doi; instructions should be here: https://zenodo.org/account/settings/github/

Speed up

I'm currently using varattsget, and the profile implies that this method is currently very inefficient because various functions are opening and closing the actual CDF file in turn. We could probably increase performance by keeping the file open at a "global" level instead of opening and closing it in each function.

   │                 │  └─ 0.711 varattsget  cdflib/cdfread.py:654
   │                 │     ├─ 0.673 _read_varatts  cdflib/cdfread.py:1061
   │                 │     │  ├─ 0.334 _read_aedr_fast  cdflib/cdfread.py:1229
   │                 │     │  │  ├─ 0.323 _read_aedr_fast2  cdflib/cdfread.py:1246
   │                 │     │  │  │  ├─ 0.233 open  pathlib.py:1214
   │                 │     │  │  │  │  ├─ 0.168 _opener  pathlib.py:1076
   │                 │     │  │  │  │  │  └─ 0.157 open  <built-in>:0
   │                 │     │  │  │  │  ├─ 0.042 open  <built-in>:0
   │                 │     │  │  │  │  └─ 0.018 [self]  
   │                 │     │  │  │  ├─ 0.055 [self]  
   │                 │     │  │  │  └─ 0.026 BufferedReader.read  <built-in>:0
   │                 │     │  │  └─ 0.011 [self]  
   │                 │     │  ├─ 0.250 _read_adr  cdflib/cdfread.py:1118
   │                 │     │  │  └─ 0.245 _read_adr2  cdflib/cdfread.py:1160
   │                 │     │  │     ├─ 0.159 open  pathlib.py:1214
   │                 │     │  │     │  ├─ 0.110 _opener  pathlib.py:1076
   │                 │     │  │     │  │  └─ 0.106 open  <built-in>:0
   │                 │     │  │     │  ├─ 0.030 open  <built-in>:0
   │                 │     │  │     │  └─ 0.016 [self]  
   │                 │     │  │     ├─ 0.050 [self]  
   │                 │     │  │     └─ 0.012 BufferedReader.read  <built-in>:0
   │                 │     │  ├─ 0.075 _read_aedr  cdflib/cdfread.py:1257
   │                 │     │  │  └─ 0.075 _read_aedr2  cdflib/cdfread.py:1313
   │                 │     │  │     └─ 0.048 open  pathlib.py:1214
   │                 │     │  │        └─ 0.035 _opener  pathlib.py:1076
   │                 │     │  │           └─ 0.035 open  <built-in>:0
   │                 │     │  └─ 0.014 [self]  
   │                 │     └─ 0.031 _read_vdr_fast  cdflib/cdfread.py:1560
   │                 │        └─ 0.030 _read_vdr_fast2  cdflib/cdfread.py:1577
   │                 │           └─ 0.020 open  pathlib.py:1214
   │                 │              └─ 0.010 _opener  pathlib.py:1076
   │                 │                 └─ 0.010 open  <built-in>:0
   │                 ├─ 0.059 globalattsget  cdflib/cdfread.py:573
   │                 │  ├─ 0.029 _read_adr  cdflib/cdfread.py:1118
   │                 │  │  └─ 0.029 _read_adr2  cdflib/cdfread.py:1160
   │                 │  │     └─ 0.019 open  pathlib.py:1214
   │                 │  │        └─ 0.012 _opener  pathlib.py:1076
   │                 │  │           └─ 0.012 open  <built-in>:0
   │                 │  └─ 0.027 _read_aedr  cdflib/cdfread.py:1257
   │                 │     └─ 0.027 _read_aedr2  cdflib/cdfread.py:1313
   │                 │        └─ 0.014 open  pathlib.py:1214
   │                 ├─ 0.039 varget  cdflib/cdfread.py:433
   │                 │  ├─ 0.022 _read_vdr_fast  cdflib/cdfread.py:1560
   │                 │  │  └─ 0.022 _read_vdr_fast2  cdflib/cdfread.py:1577
   │                 │  │     └─ 0.018 open  pathlib.py:1214
   │                 │  │        └─ 0.012 _opener  pathlib.py:1076
   │                 │  │           └─ 0.012 open  <built-in>:0
   │                 │  └─ 0.011 _read_vardata  cdflib/cdfread.py:2013
   │                 ├─ 0.035 cdf_info  cdflib/cdfread.py:158
   │                 │  └─ 0.030 _get_attnames  cdflib/cdfread.py:886
   │                 │     └─ 0.030 _read_adr  cdflib/cdfread.py:1118
   │                 │        └─ 0.028 _read_adr2  cdflib/cdfread.py:1160
   │                 │           ├─ 0.015 open  pathlib.py:1214
   │                 │           │  └─ 0.012 _opener  pathlib.py:1076
   │                 │           │     └─ 0.011 open  <built-in>:0
   │                 │           └─ 0.010 [self]  
   │                 ├─ 0.035 __setitem__  pandas/core/frame.py:3147
   │                 │  └─ 0.028 _set_item  pandas/core/frame.py:3231
   │                 │     └─ 0.027 _set_item  pandas/core/generic.py:3824
   │                 │        └─ 0.026 insert  pandas/core/internals/managers.py:1176
   │                 │           └─ 0.015 insert  pandas/core/indexes/base.py:5544
   │                 ├─ 0.017 varinq  cdflib/cdfread.py:207
   │                 │  └─ 0.016 varget  cdflib/cdfread.py:433
   │                 │     └─ 0.015 _read_vdr_fast  cdflib/cdfread.py:1560
   │                 │        └─ 0.015 _read_vdr_fast2  cdflib/cdfread.py:1577
   │                 │           └─ 0.011 open  pathlib.py:1214
   │                 ├─ 0.012 to_datetime  cdflib/epochs.py:178
   │                 │  └─ 0.011 breakdown  cdflib/epochs.py:144
   │                 │     └─ 0.011 breakdown_epoch  cdflib/epochs.py:1447
   │                 └─ 0.010 __init__  sunpy/timeseries/timeseriesbase.py:94
   │                    └─ 0.010 time_range  sunpy/timeseries/timeseriesbase.py:171
   │                       └─ 0.010 __init__  sunpy/time/timerange.py:70
   │                          └─ 0.010 parse_time  sunpy/time/time.py:291
   │                             └─ 0.010 wrapper  functools.py:870
   │                                └─ 0.010 convert_time_pandasTimestamp  sunpy/time/time.py:152

Currently reads in all VVR data to get a subset

Even if the user selects 1 record to pull out in a varget(), the program will read in all records, unzip them, and then find the record. This might become an issue on big datasets.

Use dict.get()

Use dict.get() instead of all the try...except blocks in cdfwrite

Adding package to conda forge

Hi! Just wanted to let you know that I am trying to include a recipe in conda-forge for this package.

conda-forge/staged-recipes#10847

In case any of you wants to be listed as a mantainer, just let me know!

Column-major problems

I am in the process of converting my RadynPy package from spacepy.pycdf to cdflib. SpacePy's library relies on the compiled CDF library, which is a dependency I would rather not have, and SpacePy also seems to depend on an old version of NumPy, causing problems with installation.
I compared the results from reading the CDF files produced by a simulation (RADYN (FORTRAN77), sample files here) and found that whilst the dimensions were correct (albeit not for 2D arrays of strings), the data appeared to have been assumed to be in row-major format when converted to numpy arrays, compared to the expected form and what SpacePy produces.
As an example, for the g variable in any of those files. It should appear as

array([[ 2.,  2.,  1.],
       [ 8.,  4.,  3.],
       [18.,  6.,  1.],
       [32.,  2.,  9.],
       [50.,  4.,  3.],
       [ 1.,  1.,  2.],
       [ 0.,  0.,  2.],
       [ 0.,  0.,  6.],
       [ 0.,  0.,  2.]])

but is instead read as

array([[ 2.,  8., 18.],
       [32., 50.,  1.],
       [ 0.,  0.,  0.],
       [ 2.,  4.,  6.],
       [ 2.,  4.,  1.],
       [ 0.,  0.,  0.],
       [ 1.,  3.,  1.],
       [ 9.,  3.,  2.],
       [ 2.,  6.,  2.]])

I presume this is not the expected behaviour, since the CDF library handles these files correctly. I could not see any differences in the way the program handles arrays of different majority, so I made some basic fixes in 520d7b6. I am happy to submit this as a PR and find a way to make these changes work for mainline cdflib if you would be interested in working through this, as I would rather not maintain my own dependency, although we would have to discuss if it is acceptable to your use cases to convert arrays of strings from flat python lists to numpy arrays.
Many thanks!

_read_vvrs() assumes each VVR is in order

When the locations of VVRs are given to _read_vvrs(), it is assumed that they are in order, i.e.

VVR1 - Records 0-100
VVR2 - Records 101-200
VVR3 - Records 201-300

However, there is no requirement for VVRs to be in the "correct" order.

cdflib.varget() fails with numpy >= 1.16

The cdflib varget is failing with new numpy versions, starting from 1.16, as shown below. Numpy 1.15.4 works ok. The file in the first example can be generated at:

https://cdaweb.gsfc.nasa.gov/cgi-bin/eval2.cgi?dataset=WI_H0_MFI&index=sp_phys

(choose the option ("Create V3.7 CDFs for download..." ) (same error message is given for files obtained through heliopy library.)

A file downloaded directly, e.g.
https://cdaweb.gsfc.nasa.gov/pub/data/wind/mfi/mfi_h0/2016/wi_h0_mfi_20160101_v05.cdf

gives a different error, that below too. Both work with numpy 1.15.4.

---- sample script 1---

import cdflib
cdf_file = cdflib.CDF("wi_h0_mfi_20160101_v05.cdf")
cdf_file.varget("Epoch")

--- output ---

C:\Users\Timo\Anaconda3\envs\heliopy\lib\site-packages\cdflib\cdfread.py in varget(self, variable, epoch, starttime, endtime, startrec, endrec, record_range_only, inq, expand, to_np)
    571                     if name.strip().lower() == variable.strip().lower():
    572                         if (self.cdfversion == 3):
--> 573                             vdr_info = self._read_vdr(position)
    574                         else:
    575                             vdr_info = self._read_vdr2(position)

C:\Users\Timo\Anaconda3\envs\heliopy\lib\site-packages\cdflib\cdfread.py in _read_vdr(self, byte_loc)
   1402         if pad_bool:
   1403             byte_stream = vdr[coff:]
-> 1404             pad = self._read_data(byte_stream, data_type, 1, num_elements)
   1405 
   1406         return_dict = {}

C:\Users\Timo\Anaconda3\envs\heliopy\lib\site-packages\cdflib\cdfread.py in _read_data(self, byte_stream, data_type, num_recs, num_elems, dimensions)
   1920             dt = np.dtype(dt_string)
   1921             ret = np.frombuffer(byte_stream, dtype=dt, count=num_recs*num_elems)
-> 1922             ret.setflags('WRITEABLE')
   1923 
   1924         if squeeze_needed:

ValueError: cannot set WRITEABLE flag to True of this array

---- Sample script 2 -----

import cdflib
cdf_file = cdflib.CDF("wi_h0_mfi_20160101_v05.cdf")
cdf_file.varget("Epoch")

--- Sample output 2 ----

C:\Users\Timo\Anaconda3\envs\heliopy\lib\site-packages\cdflib\cdfread.py in varget(self, variable, epoch, starttime, endtime, startrec, endrec, record_range_only, inq, expand, to_np)
    573                             vdr_info = self._read_vdr(position)
    574                         else:
--> 575                             vdr_info = self._read_vdr2(position)
    576                         break
    577                     position = vdr_next

C:\Users\Timo\Anaconda3\envs\heliopy\lib\site-packages\cdflib\cdfread.py in _read_vdr2(self, byte_loc)
   1532             return_dict['dim_sizes'] = dim_sizes
   1533         if (pad_bool):
-> 1534             return_dict['pad'] = pad
   1535         return_dict['compression_bool'] = compression_bool
   1536         if (compression_bool):

UnboundLocalError: local variable 'pad' referenced before assignment

TT2000 conversions appear to be off

For example,

cdflib.cdfepoch.encode([500000000100])

returns: '01-Jan-2000 12:07:15.816.000.100'

However, this is 64.184 seconds less than what it should be I believe.

It looks like that is the number of leap seconds up to the year 2000 plus 32.184 as described in section 2.7 here: https://spdf.gsfc.nasa.gov/pub/software/cdf/doc/cdf371/cdf371ug.pdf

But I'm pretty sure the first calculation without the subtractions is correct

Add simple CDF reading test

It would be good to add a simple test to the test suite that reads in a CDF file.

unixtime doesn't was for a single epoch

When trying to convert a single epoch value and ended up with the following error (please look below my message). My "dt0" variable is a numpy.ndarray of shape (1,) (I extracted only 1 record with varget).

cdflib Version: 0.3.5 (problem still exists with curent state of the code)

I haven't found yet a pretty solution to propose as a PR...

In [6]: cdfepoch.unixtime(dt0)

TypeError Traceback (most recent call last)
in ()
----> 1 cdfepoch.unixtime(dt0)

~/anaconda3/lib/python3.6/site-packages/cdflib/epochs.py in unixtime(cdf_time, to_np)
356 for t in time_list:
357 date = ['year', 'month', 'day', 'hour', 'minute', 'second', 'microsecond']
--> 358 for i in range(0,len(t)):
359 if i > 7:
360 continue

TypeError: object of type 'numpy.int64' has no len()

Astropy module force convert to a certain type

When you force convert to cdf_epoch16 format with complex numbers, astropy doesn't know how to read it in. Need to first split off the imaginary part and read it in separately, like it does when it detects the complex number.

Documentation for cdfread.CDF.cdf_info incorrect

Documentation describes cdfread.CDF.cdf_info as returning a dictionary including keys 'rVariables' and 'zVariables', which have the description:
"the dictionary for zVariable numbers and their corresponding names"
These variables are lists, not dictionaries.

Tempfiles generated while decompressing are not properly deleted

I was running some operations against a massive amount of CDF files and suddenly ran out of disk space. When I checked /tmp, I noticed that it was full of .cdf files, even after cleaning out everything in my session and running the GC.

I'm guessing the issue happens here: https://github.com/MAVENSDC/cdflib/blob/master/cdflib/cdfread.py#L723

tempfile files are supposed to delete the underlying file once they go out of scope and get destroyed. Technically the NamedTemporaryFile object goes out of scope once this function exits. However, I'm guessing that since the Path object has some kind of handle on the file, it can't delete the underlying file, which makes sense since otherwise none of the code would work. So the file just sits there even after the CDF object gets deleted, at least until system restart, when /tmp gets cleared. (No idea how this acts on Windows.)

Environment: Python 3.6, library release 0.3.19, RHEL 8.2

Astropy epoch module precision issues

The cdf_epoch16 astropy time format does not keep the picosecond precision that it is intended to do.

For one, when an astropy time is converted to cdf_epoch16, only one value is returned instead of two. This might be able to be overwritten in the to_value() function that the TimeFromEpoch class has.

But for two, a loss of precision might be inevitable anyway, since astropy only guarantees nanosecond precision....

How should the MD5 checksum be handled when the whole file is compressed?

See title. I'm unsure if this needs additional code or not.

read error

I read the file and reported an error, similar to this

_convert_np_data is a NoSelf function that contains a reference to self

cdflib/cdflib/cdfread.py

Line 2203 in 30fe0fd

def _convert_np_data(data, data_type, num_elems): # @NoSelf

cdflib/cdflib/cdfread.py

Line 2211 in 30fe0fd

return data.ljust(num_elems, '\x00').encode(self.string_encoding)

cdflib.CDF is not reading the file

cdflib.CDF is reading the file as None. I am not able to extract any variable.

Keep Leap Seconds up to date

Leap seconds are currently hard coded into the cdfepoch module. I believe we can keep these leap seconds in a file that continually updates.

Plotting Example

Suggestion:

In the documentation of cdflib, you could provide an example, either in matplotlib.pyplot or in bokeh.plotting, where the x-axis of a plot is cdfepoch strings.

Release 0.3.19 not available on PyPI

It seems that the latest release 0.3.19 was not yet uploaded to PyPI. Was that intentional?

Getting started...

I have a basic python tuple list of [(Datetime, Float)]. I'm just looking to simply output this to a cdf. I've followed the directions and performed the pip install cdflib. I've also git pulled this repo and tried to follow the sample use. Anyone have any suggestions to help?

Sample use "Use a master CDF file as your template"
- I don't have any CDF file. I'm trying to write one. If a master file is recommended, is one not provided?
- What is happening with the indentation on the first 'if'... Currently neither of the lines following the 1st 'if' are indented and are therefore not inside of it... If this is on purpose what is the purpose of the first 'if'. If they are both meant to be indented then cdf_file.file in the 2nd 'if' is not initialized unless the 1st 'if' is True.
- What is the file /path/to/swea_file.cdf in cdf_file=cdfwrite.CDF('/path/to/swea_file.cdf',cdf_spec=info,delete=True)

cdfwrite.py - "Creates an empty CDF file."

I run cdfwrite.py and get "Process finished with exit code 0." No new file.

Readme doc

Could I contribute some documentation for this project? I am trying to install it and get it working on Windows 10 and am running into issues not mentioned in your Readme doc. I might also contribute other documentation if you are open to that.

My question now is, how do I do this?

cdf_file = cdflib.CDF('/path/to/cdf_file.cdf')

It is in the documentation that way, but it cannot be a command line prompt, can it? Is this something that needs to be changed in a file somewhere?

Converting data in to csv format

Hi
I have a cdf file with motion capture data which I would like to convert to a csv file. I can read the file with the library but not sure how to proceed after that.
Appreciate any help regarding the matter.

Thanks

The only compression handled is GZIP.

RLE, HUFF, and AHUFF compression types are not supported. Perhaps functions need to be written to handle these types.

mavensdc / cdflib Goto Github PK

cdflib's Introduction

CDFlib

Install

Documentation

cdflib's People

Contributors

Stargazers

Watchers

Forkers

cdflib's Issues

In [6]: cdfepoch.unixtime(dt0)

Recommend Projects

Recommend Topics

Recommend Org