Comments (13)
@pfuhe1 Could you please attach or send me a sample script that reproduces this?
from cmor.
@pfuhe1 can you compile with debug so that the trace tell us where the core dump happens? Or run it via gdb.
from cmor.
@doutriaux1 I have been having trouble reproducing this crash reliably, so will do a bit more testing myself before sending you a script.
I am also unsure if I have compiled in debug model correctly. I set the environment variable CFLAGS = '-g' when I compiled, but this doesn't seem to change the trace that is output when it crashes. Do I have to specify some other debug options or set them another way?
from cmor.
I have come back to this issue again, and have produced a simple script that uses cmor to write random data to a file. It then loops, writing over the file many times.
It seems to have a memory leak and crashes after a while from running out of memory. I'm wondering if this is due to the same issue as above.
I don't think I can attach the files here, so I'm sending you the script and some example output by email.
from cmor.
thank you so much for doing this, can you please post the script it will help debugging.
from cmor.
# This is a dummy version of the ACCESS Post Processor.
# Peter Uhe
# 24 July 2014
#
import numpy as np
import datetime
import cmor
#
#main function to post-process files
#
def app(opts):
#
# Set up the CMOR stuff.
#
print 'cmor setup'
cmor.setup(inpath=opts['table_path'],
netcdf_file_action=cmor.CMOR_REPLACE_3,
set_verbosity=cmor.CMOR_NORMAL,
exit_control=cmor.CMOR_NORMAL,
logfile=opts['logfile'], create_subdirectories=1)
#
# Define the dataset.
#
cmor.dataset(outpath=opts['outpath'],
experiment_id=opts['experiment_id'],
institution=opts['institution'],
source=opts['source'],
calendar=opts['calendar'],
realization=opts['realization'],
contact=opts['contact'],
history=opts['history'],
comment=opts['comment'],
references=opts['references'],
model_id=opts['model_id'],
forcing=opts['forcing'],
initialization_method=opts['initialization_method'],
physics_version=opts['physics_version'],
institute_id=opts['institution_id'],
parent_experiment_id=opts['parent_experiment_id'],
branch_time=opts['branch_time'],
parent_experiment_rip=opts['parent_experiment_rip'])
#
# Load the CMIP tables into memory.
#
tables=[]
tables.append(cmor.load_table('CMIP5_grids'))
tables.append(cmor.load_table(opts['cmip_table']))
#manually create time axis for monthly data
min_tvals=[]
max_tvals=[]
cmor_tName='time'
tvals=[]
axis_ids=[]
for year in range(opts['tstart'],opts['tend']+1):
for mon in range(1,13):
tvals.append(datetime.date(year,mon,15).toordinal()-1)
# set up time values and bounds
for i,ordinaldate in enumerate(tvals):
model_date = datetime.date.fromordinal(int(ordinaldate)+1)
#min bound is first day of month
model_date=model_date.replace(day=1)
min_tvals.append(model_date.toordinal()-1)
#max_bound is first day of next month
tyr=model_date.year+model_date.month/12
tmon=model_date.month%12+1
model_date=model_date.replace(year=tyr,month=tmon)
max_tvals.append(model_date.toordinal()-1)
#correct date to middle of month
mid=(max_tvals[i]-min_tvals[i])/2.
tvals[i]=min_tvals[i]+mid
tval_bounds = np.column_stack((min_tvals, max_tvals))
#set up cmor time axis:
cmor.set_table(tables[1])
time_axis_id = cmor.axis(table_entry=cmor_tName,
units='days since 0001-01-01', length=len(tvals),
coord_vals=tvals[:], cell_bounds=tval_bounds[:],
interval=None)
axis_ids.append(time_axis_id)
#
# Define the CMOR variable.
#
cmor.set_table(tables[1])
in_missing = float(1.e20)
print 'defining cmor variable'
variable_id = cmor.variable(table_entry=opts['vcmip'], units=opts['in_units'], \
axis_ids=axis_ids, type='f', missing_value=in_missing)
#
# Write the data
#
data_vals=np.array(np.random.rand(len(tvals)),dtype=np.float32)
try:
print 'writing...'
cmor.write(variable_id, data_vals[:], ntimes_passed=np.shape(data_vals)[0]) #assuming time is the first dimension
except Exception, e:
raise Exception("ERROR writing data!")
#
# Close the CMOR file.
#
try:
path = cmor.close(variable_id, file_name=True)
except:
raise Exception("ERROR closing cmor file!")
return path
if __name__ == "__main__":
# from pympler import tracker
import resource
# from guppy import hpy
# Example dictionary containing metadata used by the post-processor
opts={'initialization_method': 1, 'calculation': '', 'vin': ['temp_global_ave'], 'branch_time': 109207.0, 'vcmip': 'thetaoga', 'positive': '', 'tend': 1852, 'tstart': 1850, 'realization': 1, 'forcing': 'GHG, Oz, SA, Sl, Vl, BC, OC, (GHG = CO2, N2O, CH4, CFC11, CFC12, CFC113, HCFC22, HFC125, HFC134a)', 'infile': '/g/data1/p66/ACCESSDIR/har599/ACCESS/output/hg2-r11Mhd/history///ocn/ocean_scalar.nc-*', 'model_id': 'ACCESS-test', 'parent_experiment_id': 'piControl', 'cmip_table': 'CMIP5_Omon', 'in_units': 'K', 'version_number': 'v20130710', 'notes': 'branch date is 300-01-01', 'physics_version': 1, 'axes_modifier': 'dropX', 'experiment_id': 'historical', 'parent_experiment_rip': 'r1i1p1'}
opts['source']='ACCESS-test 2011. \
Atmosphere: AGCM v1.0 (N96 grid-point, 1.875 degrees EW x approx 1.25 degree NS, 38 levels); '+\
'ocean: NOAA/GFDL MOM4p1 (nominal 1.0 degree EW x 1.0 degrees NS, tripolar north of 65N, '+\
'equatorial refinement to 1/3 degree from 10S to 10 N, cosine dependent NS south of 25S, 50 levels); '+\
'sea ice: CICE4.1 (nominal 1.0 degree EW x 1.0 degrees NS, tripolar north of 65N, '+\
'equatorial refinement to 1/3 degree from 10S to 10 N, cosine dependent NS south of 25S); '+\
'land: MOSES2 (1.875 degree EW x approx. 1.25 degree NS, 4 levels'
opts['logfile']=None
opts['institution']='CSIRO-BOM'
opts['institution_id']='CSIRO-BOM'
opts['calendar']='proleptic_gregorian'
opts['contact']='dummy'
opts['history']='dummy'
opts['references']='dummy'
opts['comment']='dummy'
opts['outpath']='/short/p66/pfu599'
opts['table_path']='/g/data1/p66/pfu599/post_processor/branches/APP1-0/cmip5-cmor-tables/Tables'
# Memory profiler setup
# tr = tracker.SummaryTracker()
# tr.print_diff()
# hp=hpy()
# new=hp.heap()
# Loop over many times rewriting the same file.
for i in range(10):
print i
print app(opts)
# Memory profiling
# tr.print_diff()
# old=new
# new=hp.heap()
# diff=new-old
# print diff
# print diff.byrcs[0].byid
print 'Memory usage: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
from cmor.
That's it. Sorry about the length of the script.
from cmor.
perfect! Thx!
from cmor.
You need to change the lines setting opts['outpath'] and opts['tablepath'] for your machine.
Note I am running cmor 2.9.1 with python 2.7.6 and numpy 1.8.0.
I have also just ran the script on an old machine I still have access to, which had cmor 2.8.3 installed along with python 2.6 and numpy 1.3.0, and the problem with the increasing memory usage doesn't occur.
from cmor.
I use the same system for which Peter reported the memory leak.
More or less by accident I found that it's due to building with a particular copy of the uuid library that was on the machine. Using a new version built from source fixes the leak.
Now testing whether this fixes the intermittent crashes from the full processing.
from cmor.
@MartinDix this is great news! please let usknow, I will tweak to configure to make sure we use the correct uuid version.
from cmor.
This wasn't the real problem.
A lucky observation showed that the crashes occurred when writing a 4D file after a 3D file.This allowed creating an example small enough to run in totalview which then pointed to the line
free(cmor_axes[cmor_naxes].wrapping);
at the end of cmor_axis in cmor_axes.c, https://github.com/PCMDI/cmor/blob/master/Src/cmor_axes.c#L1343
The wrapping pointer is only allocated for longitude axes. cmor_axes
is an external variable so keeps the value of the freed pointer between calls. If the first file has dimensions (T,Y,X), cmor_axes[2].wrapping
gets allocated and freed. If the next file then has dimensions (T,Z,Y,X) the cmor_axis call for Y still has a non-null value for cmor_axes[2].wrapping
which it tries to free again.
Sometimes this gives the double free error that Peter originally reported. Other times it gives other more or less obscure crashes.
I've created an example script https://gist.github.com/MartinDix/6b2624d620da79c4e9f9
Adding a print
printf("Wrapping %d %p\n", cmor_naxes, cmor_axes[cmor_naxes].wrapping);
before the free then gives
% python cmor_testscript.py
Wrapping 0 (nil)
Wrapping 1 (nil)
Wrapping 2 0x3270b50
writing...
/short/p66/mrd599/CMIP5/output/CMOR-test/CMOR-test/historical/mon/atmos/ts/r1i1p1//ts_Amon_CMOR-test_historical_r1i1p1_185001-185012.nc
Wrapping 0 (nil)
Wrapping 1 (nil)
Wrapping 2 0x3270b50
Wrapping 3 0x32b1a70
writing...
*** glibc detected *** python: corrupted double-linked list: 0x0000000003270b40 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x75e76)[0x2b2719f81e76]
/lib64/libc.so.6(+0x79caa)[0x2b2719f85caa]
/lib64/libc.so.6(__libc_malloc+0x71)[0x2b2719f866b1]
/apps/netcdf/4.3.2/lib/libnetcdf.so.7(+0x6a7ce)[0x2b274af017ce]
In this case it's crashed at some point after the actual free, but just where it crashes seems to depend on array sizes, netcdf library versions etc.
I think the fix is to add
cmor_axes[cmor_naxes].wrapping = NULL;
after the free. This seems to have fixed things here.
from cmor.
wow! Nice catch! Will fix and add your script to the test suite! Thanks!
from cmor.
Related Issues (20)
- Exposing latest netcdf 4.9.x library functionality: quantize, zstandard HOT 13
- Remove unused attributes when processing CMIP6Plus datasets HOT 14
- Exposing latest netcdf 4.9.x library functionality: quantize, zstandard HOT 48
- unclear warning... HOT 5
- bounds required on singleton lon and lat? HOT 5
- avoid attributes of bounds of auxilliary coordinates (`vertices_latitudes` / `vertices_longitude`) HOT 5
- Calibrating CMOR3 & 4 forward development plans HOT 7
- CMOR 3.8.0 Release HOT 4
- Update README.md to remove v3.7 reference
- default `realm = "REALM"` is always written although not required by CV HOT 2
- order in `required_global_attributes` matters HOT 1
- input time type as INT HOT 3
- CircleCI current image deprecated HOT 1
- Numpy 2.0 compatibility issue HOT 2
- File output size and chunking HOT 10
- Message Logging generates duplicated output lines
- DESTDIR support HOT 2
- Failure invoking set_deflate HOT 1
- user defined grid mapping
- Add CITATION.cff file HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cmor.