Code Monkey home page Code Monkey logo

pyaerocom's Introduction

pyaerocom's People

Contributors

alpsjur avatar augustinmortier avatar avaldebe avatar charlienegri avatar dulte avatar eivindgw avatar ejgal avatar hannasv avatar hansbrenna avatar heikoklein avatar jgliss avatar jgriesfeller avatar lewisblake avatar michaelschulzmetno avatar ovewh avatar thorbjoernl avatar willemvancaspel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyaerocom's Issues

Review parameter naming and clarify in colocation routines

E.g.:

  • In low-level colocation routines: get rid of var_ref_keep_outliers and var_keep_outliers and handle directly through args var_outlier_ranges and var_ref_outlier_ranges.
  • Change input type of the latter (currently dict, passed through, should be list and handled in Colocator class just before calling the respective method.
  • Harmonise naming of attributes ColocationSetup and input args for low-level colocation routines (e.g. model_keep_outliers vs. var_keep_outliers). Check all other args.

Seasalt - DMS/SO4 evaluation at marine sites

The dataset on DMS/SO4 in Amsterdam Island prepared by Dirk in the CRESCENDO project should be worked up . Discussion is needed with Dirk on how to do this. References his emails of October 2018

introduce a variable numobs

For my satellite validation stuff it would be nice to be able to plot a field containing the number of level 2 points that went into a grid box of a level 3 product. This is essentially a number of observations per day.
The question is if we want to normalise that to a day / year in the map plots or sum that up to the time frame we make the plot for. Both should be in pyaerocom at some point, the latter being used e.g. for precipitation.

CI service

Integrate CI service for automatic testing, either CircleCI or Travis CI (@jgliss knows Travis, maybe easier)...

Missing tutorials

The following incomplete list specifies tutorial notebooks that are still missing or incomplete:

Error with computation of concso4 from OsloCTM3...HIST

Only zero values are returned when trying to read/compute concso4 from OsloCTM3...HIST model

mod = 'OsloCTM3v1.01-met2010_AP3-HIST'
r = pya.io.ReadGridded(mod)
data = r.read_var('concso4', start=2010)
data.quickplot_map()

ValueError: Minimum value in data equals maximum value: 0.0

Confusing error message, when not connected to lustre

Confusing errormessage. Its says it can access lustre but /lustre/storeA/ is not found.

Initating pyaerocom configuration
Checking database access...
Checking access to: /lustre/storeA
Process test:
Traceback (most recent call last):
  File "/home/hannas/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/hannas/anaconda3/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/lustre/storeA'
Access to lustre database: True
Init data paths for lustre
Expired time: 0.012 s
Supplementary data directory for etopo1 does not exist:
/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/PYAEROCOM/topodata/etopo1/

New helper method clear_cache

The new method that can be used to delete all cached data files that are in the cache directory.

Usage:

import pyaerocom as pya
pya.clear_cache()

Bug in ReadSulphurAasEtAl (read same data multiple times)

The read_file method of class ReadSulphurAasEtAl may be buggy in this part (L163 ++):

for key in station_group: # NOT YEAR OR MONTH, BUT WOULD LIKE TO KEEP THOSE.
    # Enters if the the key is a variable
    if key in variables_present:
        # Looping over all varibales to retrieve
        for var in vars_to_retrieve:
            if var == "wetso4":
                # input unit is kg S/ha
                y = station_group['year'].values
                m = station_group['month'].values
                
                monthly_to_sec = days_in_month(y, m)*24*60*60             
                mass_sulhpor = pd.to_numeric(station_group[key], 
                                             errors='coerce').values
                s[var] = unitconversion_wet_deposition(mass_sulhpor, 
                 "monthly")/monthly_to_sec#monthly_to_sec[:156]
                # output variable is ks so4 m-2s-1
            else:
                # Other variable have the correct unit.
                s[var] = pd.to_numeric(station_group[key], 
                                       errors='coerce').values
            # Adds the variable
            s['variables'].append(var)

E.g. if vars_to_retrieve='sconcso4pr' (file monthly_so4_precip.csv) it will loop over all available variables (variables_present = ['precip_amount_mm', 'deposition_kgS/ha', 'concentration_mgS/L']) and hence, the inner loop (for var in vars_to_retrieve) will be executed 3 times, even though only one variable (here sconcso4pr) was requested. Also, s['variables'].append(var), will thus be a list containing 3 times the same entry (i.e. s['variables']: ['sconcso4pr', 'sconcso4pr', 'sconcso4pr']).

I think there needs to be only one loop over vars_to_retrieve here.

Ultimately, this will duplicate these data in the UngriddedData object that is created in method read, as this method loops over the variables attribute of each StationData object that is created in read_file (cf. L266: temp_vars=station_data['variables'] and L275: for var_count, var in enumerate(temp_vars)`).

Not accepting '5minutely' sampling frequency in EBAS

For the station Alert, two data sets for equivalent black carbon mass concentration (var_name is 'conceqbc') have the ts_type '5minutely'. When using pyaerocom.to_station_data(), the error "NotImplementedError: Cannot resample to input frequency 5minutely. Choose from: dict_keys(['minutely', 'hourly', '3hourly', 'daily', 'weekly', 'monthly', 'season', 'yearly'])" is given.

Code for reproducibility:

import pyaerocom as pya
pya.const.BASEDIR = '/home/notebook/shared-ns1000k/inputs/pyaerocom-testdata/'
read_factory = pya.io.ReadUngridded('EBASMC')
reader = read_factory.get_reader()

data = reader.read(vars_to_retrieve='conceqbc', station_names={'Alert','Summit'})
alert = data.to_station_data('Alert')

Rewrite colocation methods to use xarray (Dask) functionality to avoid memory issues

Currently the ungridded / gridded coloaction routine loops over all stations and extracts timeseries from the model. This can lead to a crash in case of large modeldatasets (e.g. multiyear), since the modeldata needs to be realised in memory during runtime. This may be possible to fix via dask / xarray functionality.

Needs further investigation...

We need a test-suite

We need to define some sample datasets, that should be small and in the ideal case, openly accessible. The latter would make it easier to develop a test framework for pyaerocom using these data (which, if too big, could e.g. be accessed from the test environment online). Starting point could be currently existing minimal test suite (cf. here, or here, which, at the moment, however can only be run if access to METno servers is provided.

The test suite needs to grow and should run regularly in an automised manner, for instance, when pushing to remote or when drafting a release (e.g. using pytest and travis).

Status

Needs to be discussed and most relevant test routines are to be implemented.

Error in GriddedData.quickplot_map

When reading NetCDF file directly in GriddedData class, attribute ts_type is not assigned. Causes error in quickplot_map:

~/anaconda3/envs/negi_course/lib/python3.6/site-packages/pyaerocom/helpers.py in datetime2str(time, ts_type)
224
225 def datetime2str(time, ts_type=None):
--> 226 conv = TS_TYPE_DATETIME_CONV[ts_type]
227 if is_year(time):
228 return str(time)

KeyError: 'Unknown'

Assigning ts_type via

data.suppl_info['ts_type'] = 'daily'

beforehand fixes this problem, but then, other problem occurs, that may be due to the fact that datetime conversion (pandas, datetime64?) does not work properly for my paleo time data (year 905), raising the following issue:


OutOfBoundsDatetime Traceback (most recent call last)
~/anaconda3/envs/negi_course/lib/python3.6/site-packages/pyaerocom/griddeddata.py in quickplot_map(self, time_idx, xlim, ylim, **kwargs)
1208 t = cftime_to_datetime64(self.time[time_idx])[0]
-> 1209 tstr = datetime2str(t, self.ts_type)
1210 except:

~/anaconda3/envs/negi_course/lib/python3.6/site-packages/pyaerocom/helpers.py in datetime2str(time, ts_type)
228 return str(time)
--> 229 time = to_pandas_timestamp(time).strftime(conv)
230 return time

~/anaconda3/envs/negi_course/lib/python3.6/site-packages/pyaerocom/helpers.py in to_pandas_timestamp(value)
196 elif isinstance(value, (str, np.datetime64, datetime, date)):
--> 197 return pd.Timestamp(value)
198 else:

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.new()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.get_datetime64_nanos()

pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 901-01-01 12:00:00

During handling of the above exception, another exception occurred:

OutOfBoundsDatetime Traceback (most recent call last)
in
----> 1 fig = data.quickplot_map(time_idx=0)

~/anaconda3/envs/negi_course/lib/python3.6/site-packages/pyaerocom/griddeddata.py in quickplot_map(self, time_idx, xlim, ylim, **kwargs)
1210 except:
1211 tstr = datetime2str(self.time_stamps()[time_idx],
-> 1212 self.ts_type)
1213 fig.axes[0].set_title("{} ({}, {})".format(self.name,
1214 self.var_name, tstr))

~/anaconda3/envs/negi_course/lib/python3.6/site-packages/pyaerocom/helpers.py in datetime2str(time, ts_type)
227 if is_year(time):
228 return str(time)
--> 229 time = to_pandas_timestamp(time).strftime(conv)
230 return time
231

~/anaconda3/envs/negi_course/lib/python3.6/site-packages/pyaerocom/helpers.py in to_pandas_timestamp(value)
195 return value
196 elif isinstance(value, (str, np.datetime64, datetime, date)):
--> 197 return pd.Timestamp(value)
198 else:
199 try:

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.new()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.get_datetime64_nanos()

pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 901-01-01 12:00:00

Unit conversions

For O3 Ebas has ug/m3 although the variable is named vmro3. Models tend to use vmr, so we need to covert between the two. Often one uses a fixed ratio 1 ppb O3 = 2 ug/m3, but of course a proper calculation would need temperature and pressure and mol. weights. Could the method convert_units have options for a fixed numerical factor (e.g. 0.5) or MW input, or similar.

Also, some stations at high elevations need thinking about. Often they want to convert ug/m3 as measured to ug/m3 STP.

Import pyaerocom locally inits wrongly when lustre is not mounted

I work locally and have an empty mount /lustre in root directory. When trying to import pyaerocom in IPython console, I get the following error:

Initating pyaerocom configuration
Checking server configuration ...
Checking access to: /lustre/storeA
Process test:
Traceback (most recent call last):
  File "/home/hannas/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/hannas/anaconda3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/lustre/storeA'
Access to lustre database: True
Init data paths for lustre
Expired time: 0.004 s
Supplementary data directory for etopo1 does not exist:
/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/PYAEROCOM/topodata/etopo1/

Caching of UngriddedData

Right now, caching of UngriddedData objects is being done by pickling an existing instance of such an object (that contains, e.g. EBAS data). The naming convention for the pickled data is:

network_id_varinfo_start_stop.pkl

e.g.

EBASMC_MultipleVars_None_None.pkl

Each pickle file contains a header which currently contains the following information (corresponding to data situation when the cache file was created):

  1. newest file in read directory
  2. newest file date in read directory
  3. revision date
  4. version of pyaerocom Reader class (e.g. ReadEbas.version)
  5. version of UngriddedData class
  6. version of cacher class (CacheHandlerUngridded)

If all these information (in an existing cache file) matches the current configuration (when the data is requested) then the cache file is read. Else, the data is re-read from scratch.

Open questions

  • We have many different versions against which we check (e.g. version of reading class, data class, cacher class). Should we remove this and only check against pyaerocom.version (which should be changed whenever there is a change applied to the code anyways)?
  • Should we stick to pickle (it is fast and simple, but maybe not so secure). Investigate other formats (e.g. json). The latter could be a method of UndriddedData class itself (i.e. to_json).
  • Do we need to be more flexible with handling the cache files? E.g. if cache file exists for a certain network (say EBAS) with certain variables (say scatc550aer, absc550aer) and the request contains variables that are not yet in the cached file, it should append them to the existing file rather than overwriting it (like in the current procedure).
  • Do we need another naming convention for the files?

Installation via conda (init cache directory)

I have lustre mounted in root but only read access. After installation of v0.7.0 using

conda install -c nordicesmhub -c conda-forge pyaerocom

I get the following error when trying to import pyaerocom:

>>> import pyaerocom as pya
Init data paths for lustre
0.0041120052337646484 s
Failed to access CACHEDIR: PermissionError(13, 'Permission denied')Deactivating caching
Failed to access CACHEDIR: PermissionError(13, 'Permission denied')Deactivating caching
Failed to access CACHEDIR: PermissionError(13, 'Permission denied')Deactivating caching
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/augustinm/miniconda3/lib/python3.6/site-packages/pyaerocom/__init__.py", line 93, in <module>
    from . import analysis
  File "/home/augustinm/miniconda3/lib/python3.6/site-packages/pyaerocom/analysis.py", line 24, in <module>
    from pyaerocom.io import ReadUngridded, ReadGridded
  File "/home/augustinm/miniconda3/lib/python3.6/site-packages/pyaerocom/io/__init__.py", line 74, in <module>
    from .readungridded import ReadUngridded
  File "/home/augustinm/miniconda3/lib/python3.6/site-packages/pyaerocom/io/readungridded.py", line 54, in <module>
    class ReadUngridded(object):
  File "/home/augustinm/miniconda3/lib/python3.6/site-packages/pyaerocom/io/readungridded.py", line 68, in ReadUngridded
    _DONOTCACHEFILE = os.path.join(const.OBSDATACACHEDIR, 'DONOTCACHE')
  File "/home/augustinm/miniconda3/lib/python3.6/posixpath.py", line 80, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

EMEP unit issue with reading wetSO4.

The unit in the following file is mgS m-2. When reading this file the unit is automatically set to 'unknown' by Iris. Need to convert unit to kg (SO4) m-2 s-1 in reading routine for EMEP.

from pyaerocom.io.helpers_units import unitconv_wet_depo_from_emep
reader = pya.io.ReadGridded('EMEP_rv4.1.1.T2.1_ctl')
gridded_data = reader.read(vars_to_retrieve = ['wetso4'], 
                               start = 2008, 
                               ts_type = 'daily')

Init mounted user server at location other than root

Read data/paths_user_server.ini file misses some obs network specifications

pya.const.BASEDIR = '/home/bikascb/aerocom/aerocom-users-database/'
pya.const.read_config(config, keep_basedirs=True)

KeyError Traceback (most recent call last)
in
1 pya.const.BASEDIR = '/home/bikascb/aerocom/aerocom-users-database/'
----> 2 pya.const.read_config(config, keep_basedirs=True)

~/anaconda3/envs/NEGI-Andoya-workshop/lib/python3.6/site-packages/pyaerocom/config.py in read_config(self, config_file, keep_basedirs)
448 self.AERONET_SUN_V3L2_SDA_ALL_POINTS_NAME = cr['obsnames']['AERONET_SUN_V3L2_SDA_ALL_POINTS']
449 # inversions
--> 450 self.AERONET_INV_V3L15_DAILY_NAME = cr['obsnames']['AERONET_INV_V3L15_DAILY']
451 self.AERONET_INV_V3L2_DAILY_NAME = cr['obsnames']['AERONET_INV_V3L2_DAILY']
452

~/anaconda3/envs/NEGI-Andoya-workshop/lib/python3.6/configparser.py in getitem(self, key)
1231 def getitem(self, key):
1232 if not self._parser.has_option(self._name, key):
-> 1233 raise KeyError(key)
1234 return self._parser.get(self._name, key)
1235

KeyError: 'AERONET_INV_V3L15_DAILY'

Problem with bounds variable in function plot_griddeddata_on_map

If deviating from default input bounds is not properly assigned, i.e.

fig = pya.plot.mapping.plot_griddeddata_on_map(data[0].grid.data,   
data.longitude.points,  data.latitude.points,vmin=0, vmax=300, add_zero=False,  
c_under=None, c_over=None, log_scale=False, discrete_norm=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-85-afe364bfae0f> in <module>
      1 fig = pya.plot.mapping.plot_griddeddata_on_map(data[0].grid.data, data.longitude.points, data.latitude.points,
      2                                               vmin=0, vmax=300, add_zero=False, c_under=None,
----> 3                                               c_over=None, log_scale=False, discrete_norm=False)

~/anaconda3/envs/negi_course/lib/python3.6/site-packages/pyaerocom/plot/mapping.py in plot_griddeddata_on_map(data, lons, lats, var_name, xlim, ylim, vmin, vmax, add_zero, c_under, c_over, log_scale, discrete_norm, cbar_levels, cbar_ticks, color_theme, **kwargs)
    276     disp = ax.pcolormesh(X, Y, data, cmap=cmap, norm=norm)
    277 
--> 278     min_mag = -exponent(bounds[1])
    279     min_mag = 0 if min_mag < 0 else min_mag
    280 

TypeError: 'NoneType' object is not subscriptable

Missing argument var_name in function colocate_gridded_ungridded

The function pyaerocom.colocation.colocate_gridded_ungridded does not have an explicit input argument var_name, as is indicated in the doc-string. It is still possible to specify var_name as a key-value argument, but it does not appear to do anything. The function actually appears to use the var_name stored in the GriddedData object, and not the name provided as function input (in line 196, the first line after the doc-string).

Regridding of GriddedData to custom resolution

Sometimes, it may be desirable to change the lat / lon resolution of a loaded GriddedData object, e.g. to 5x5 degrees as done by default in method pyaerocom.colocation.colocate_gridded_gridded using the following code:

if regrid_res_deg is not None:
        
        lons = gridded_data_ref.longitude.points
        lats = gridded_data_ref.latitude.points
        
        lons_new = np.arange(lons.min(), lons.max(), regrid_res_deg)
        lats_new = np.arange(lats.min(), lats.max(), regrid_res_deg) 
        
        gridded_data_ref = gridded_data_ref.interpolate(latitude=lats_new, 
                                                        longitude=lons_new)

This strategy should be revised since the arange method may miss result in ignoring some data at the edges of the lon / lat coordinates, e.g. for 5 degree output res.:

np.arange(-179.5, 179.5, 5)
array([-179.5, -174.5, -169.5, -164.5, -159.5, -154.5, -149.5, -144.5,
       -139.5, -134.5, -129.5, -124.5, -119.5, -114.5, -109.5, -104.5,
        -99.5,  -94.5,  -89.5,  -84.5,  -79.5,  -74.5,  -69.5,  -64.5,
        -59.5,  -54.5,  -49.5,  -44.5,  -39.5,  -34.5,  -29.5,  -24.5,
        -19.5,  -14.5,   -9.5,   -4.5,    0.5,    5.5,   10.5,   15.5,
         20.5,   25.5,   30.5,   35.5,   40.5,   45.5,   50.5,   55.5,
         60.5,   65.5,   70.5,   75.5,   80.5,   85.5,   90.5,   95.5,
        100.5,  105.5,  110.5,  115.5,  120.5,  125.5,  130.5,  135.5,
        140.5,  145.5,  150.5,  155.5,  160.5,  165.5,  170.5,  175.5])

the right end is missing... this might not be a problem due to the fact that iris knows that the coordinates are continuous, but I feel like the safer way would be to use area weighted regridding by creating an empty cube with appropriate coordinates in the output resolution and than use iris regridding functionality rather than (nearest neighbour) interpolation.

Also, the regridding code shown above, that is currently in the colocation method, should be included in GriddedData.regrid function, e.g. by accepting input args that specify the desired resolution (e.g. lat_res_deg and lon_res_deg) and that than create an empty Cube that is used to regrid.

Reading UngriddedData AeronetSunV3Lev2.daily

Latitude is stored in both _LATINDEX = 2 and _LONINDEX = 3 of the ungridded dataobject.

Its correct in the metadata dictionary.

{'var_info': OrderedDict([('od550aer', OrderedDict([('units', '1')]))]),
 'latitude': 45.3139,
 'longitude': 12.508299999999998,
 'altitude': 10.0,
 'station_name': 'AAOT',
 'PI': 'Brent_Holben',
 'ts_type': 'daily',
 'data_id': 'AeronetSunV3Lev2.daily',
 'variables': ['od550aer'],
 'instrument_name': 'sun_photometer',
 'data_revision': '20190920'}

Example:

obs_data = pya.io.ReadUngridded('AeronetSunV3Lev2.daily', ['od550aer', 'ang4487aer']).read()

_LATINDEX = 2
_LONINDEX = 3
data = obs_data._data
print(data[:10,_LATINDEX])
print(data[:10,_LONINDEX])

Duplicate values

There are duplicate values in GAWTADSulphurSubset/data/monthly_so4_aero.csv
eg: K-puszta, 2012-06
How should pyaerocom deal with it?

Where to put masks for region filtering

I still think it is better to keep additional resources for pyaerocom (e.g. HTAP masks) in a separate repository, especially if these are binary files.

For that purpose I created pyaerocom-suppl which should contain supplementary data such as masks.

In pyaerocom itself, the access to these masks will be provided automatically whenever they are needed, that is, on first access they will be downloaded automatically from pyaerocom-suppl to local MyPyaerocom directory (is created automatically and is also the place where other things are stored, such as cached UngriddedData objects). On second ++ access, the masks locally stored masks can be used.

I think this is better, and keeps the distributed pyaerocom source code smaller, without loosing any functionality or introducing user inconvenience (only if first access to the masks would be in a case where the user does not have a internet connection).
That way, it will also not impact the size of the pyaerocom repository if in the future, e.g. the masks are changed / updated, etc. (only the download URL needs to be modified then in pyaerocom).

Check source ts_type and naming convention of ColocatedData

Currently, the source ts_type of the model is included in the file naming convention of the colocated data object, as well as the actual ts_type of the colocated data. E.g.:

scatc550dryaer_REF-EBAS-Lev3_MOD-GFDL-AM4-met2010-monthly_20100101_20101231_monthly_WORLD-noMOUNTAINS.nc

I think this is not needed and makes it confusing to understand the file naming. Only the actual ts_type should be in the file name and source ts_type for both model and obs remains accessible in the metadata of the colocated NetCDF file.

Keep in mind commit 8fc9c3e

EBAS reading

Check data level.

  • EbasSQLRequest class (retrieve only files that are at level)
  • ReadEbas (think about level strategy)

packages

Shouldnt the installation guide also explain or autmatically install all needed packages?

Colocation with climatological obsdata

Colocation with climatological obs data should be possible.

This would be specified in ColocationSetup class, e.g. as attr. obs_use_climatology and would (by default) compute climatological timeseries within a timeframe of +/- 5 years around the year(s) to be analysed.

Things to keep in mind / think about:

  • what if the modeldata is not single year?
  • computation of climatological timeseries: should be handled in data objects (e.g. see here for iris Cube, i.e. GriddedData and for UngriddedData this should be done in to_station_data)
  • What constraints do we need for climatological data retrieval (e.g. minimum coverage of available years in data, etc.)
  • and, and, and

NEGI-ANDOYA COURSE - MISSING FEATURES (NICE TO HAVE)

Please insert features (by commenting this issue) that you are missing in pyaerocom or that you think would be nice to have.

Note

This issue is not for reporting bugs. If you find bugs in the current version, please open a new issue.

Error in CacheHandlerUngridded if cache directory is not defined

I retrieve the following error in ReadUngridded class (have no write access to

/lustre/storeA/project/aerocom/user_data/pyaerocom_cache/

I get the following error when trying to read Aeronet:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-26-cc14f066d63b> in <module>
----> 1 aeronet_data = obs_reader.read(vars_to_retrieve='od550aer')
      2 type(aeronet_data) #displays data type

~/Desktop/Dev/pyaerocom/pyaerocom/pyaerocom/io/readungridded.py in read(self, datasets_to_read, vars_to_retrieve, **kwargs)
    295         for ds in self.datasets_to_read:
    296             self.logger.info('Reading {} data'.format(ds))
--> 297             data.append(self.read_dataset(ds, vars_to_retrieve, **kwargs))
    298             self.logger.info('Successfully imported {} data'.format(ds))
    299         self.data = data

~/Desktop/Dev/pyaerocom/pyaerocom/pyaerocom/io/readungridded.py in read_dataset(self, dataset_to_read, vars_to_retrieve, **kwargs)
    240             # initate cache handler
    241             cache = CacheHandlerUngridded(reader, vars_available, **kwargs)
--> 242             if cache.check_and_load():
    243                 all_avail = True
    244                 for var in vars_available:

~/Desktop/Dev/pyaerocom/pyaerocom/pyaerocom/io/cachehandler_ungridded.py in check_and_load(self)
    149 
    150     def check_and_load(self):
--> 151         if not os.path.isfile(self.file_path):
    152             logger.info('No cache file available for query of dataset '
    153                         '{}'.format(self.dataset_to_read))

~/Desktop/Dev/pyaerocom/pyaerocom/pyaerocom/io/cachehandler_ungridded.py in file_path(self)
    128         """Full file path of cache file for query"""
    129         if self.CACHE_DIR is None:
--> 130             raise IOError('pyaerocom cache directory is not defined')
    131         return os.path.join(self.CACHE_DIR, self.file_name)
    132 

OSError: pyaerocom cache directory is not defined

Try to load 1D data without longitude latitude dimension into GriddedData fails

I have a 1D NetCDF file with time series data and tried to load it using

data = pya.GriddedData(obsfile)

This did not work and raised the following error:

---------------------------------------------------------------------------
NetcdfError                               Traceback (most recent call last)
<ipython-input-34-09fc41919dde> in <module>
----> 1 data = pya.GriddedData(obsfile)

~/anaconda3/envs/NEGI-Andoya-workshop/lib/python3.6/site-packages/pyaerocom/griddeddata.py in __init__(self, input, var_name, convert_unit_on_init, **suppl_info)
    115         self._area_weights = None
    116         if input:
--> 117             self.load_input(input, var_name)
    118         for k, v in suppl_info.items():
    119             if k in self.suppl_info:

~/anaconda3/envs/NEGI-Andoya-workshop/lib/python3.6/site-packages/pyaerocom/griddeddata.py in load_input(self, input, var_name)
    404             from pyaerocom.io.iris_io import load_cube_custom
    405             from pyaerocom.io import FileConventionRead
--> 406             self.grid = load_cube_custom(input, var_name)
    407 
    408             try:

~/anaconda3/envs/NEGI-Andoya-workshop/lib/python3.6/site-packages/pyaerocom/io/iris_io.py in load_cube_custom(file, var_name, grid_io, file_convention)
     81                               'specify var_name. Input file contains the '
     82                               'following variables: {}'.format(file, 
---> 83                                                                vars_avail))
     84         cube = cube_list[0]
     85         var_name = cube.var_name

NetcdfError: Could not load single cube from /home/matildah/Negi-AndoyaKurs/mount/MET_observations_svalbard/SN99710/2000/01.nc. Please specify var_name. Input file contains the following variables: ['air_pressure_at_sea_level', 'air_temperature_2m', 'latitude', 'longitude', 'relative_humidity', 'surface_air_pressure_2m', 'wind_from_direction_10m', 'wind_speed_10m']

GriddedData should not perform checks when reading files


FileConventionError Traceback (most recent call last)
in
----> 1 mask = pya.GriddedData(path)

~/Desktop/pyaerocom/pyaerocom/griddeddata.py in init(self, input, var_name, convert_unit_on_init, **meta)
145
146 if input:
--> 147 self.load_input(input, var_name)
148
149 self.update_meta(**meta)

~/Desktop/pyaerocom/pyaerocom/griddeddata.py in load_input(self, input, var_name)
626 elif isinstance(input, str) and os.path.exists(input):
627 from pyaerocom.io.iris_io import load_cube_custom
--> 628 self.grid = load_cube_custom(input, var_name)
629 self.metadata["from_files"].append(input)
630

~/Desktop/pyaerocom/pyaerocom/io/iris_io.py in load_cube_custom(file, var_name, file_convention, perform_checks)
215 grid_io = const.GRID_IO
216 if grid_io.CHECK_TIME_FILENAME:
--> 217 cube = _check_correct_time_dim(cube, file, file_convention)
218 else:
219 logger.warning("WARNING: Automatic check of time "

~/Desktop/pyaerocom/pyaerocom/io/iris_io.py in _check_correct_time_dim(cube, file, file_convention)
437
438 raise FileConventionError('Unknown file convention: {}'
--> 439 .format(file_convention))
440
441 finfo = file_convention.get_info_from_file(file)

FileConventionError: Unknown file convention: None

Bug in isinstance?

The following case is happening:

type(mod_data)
Out[68]: pyaerocom.griddeddata.GriddedData
isinstance(mod_data, pya.GriddedData)
Out[69]: False

Definition of NMB wrong

@AugustinMortier Hi Augustin, the definition of NMB in the evaluation interface is wrong. Currently it is

NMB=1/N x sum(Mi-Oi))

Instead, it should be:

NMB = sum(Mi - Oi) / sum(Oi)

Can you fix that?

Note also, that we also filter out negative values in Oi before computing the bias (as NMB only has a meaning for expectation values exceeding 0).

Cheers,
Jonas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.