cosima / cice5 Goto Github PK

Clone of The Los Alamos sea ice model (CICE) with ACCESS drivers. See https://github.com/CICE-Consortium/CICE-svn-trunk/tree/cice-5.1.2

Roff 0.23% Makefile 0.12% C 0.40% Fortran 84.68% Shell 1.84% TeX 11.77% PostScript 0.79% Rich Text Format 0.16%

cice5's Introduction

Overview

This repository contains the trunk from the subversion (svn) repository of the Los Alamos Sea Ice Model, CICE, including release tags through version 5.1.2.

More recent versions are found in the CICE and Icepack repositories, which are maintained by the CICE Consortium.

If you expect to make any changes to the code, we recommend that you work in the CICE and Icepack repositories. Changes made to code in this repository will not be accepted, other than critical bug fixes.

Useful links

Wiki: https://github.com/CICE-Consortium/CICE-svn-trunk/wiki

Information about the CICE model prior to version 6 including how to obtain the code
Version Index: https://github.com/CICE-Consortium/CICE-svn-trunk/wiki/CICE-Versions-Index-(older)

Numbered CICE releases prior to version 6.
Resource Index: https://github.com/CICE-Consortium/About-Us/wiki/Resource-Index

List of resources for information about the Consortium and its repositories as well as model documentation, testing, and development.

cice5's People

Contributors

Stargazers

Watchers

Forkers

aekiss aidanheerdegen marshallward russfiedler hakaseh penguian rmholmes scrallen pwongpan dhruvbhagtani access-nri martindix vanuatun

cice5's Issues

Not complying with licensing

The current CICE licence requires that the copyright notice and disclaimers in the license be retained.

I think it should be sufficient to add the licence file to the repo, and the distribution policy too I suppose, as it is referenced therein.

diagnostic bug in CICE when outputting 4d fields

Background:
For each grid point and ice category, cice computes its thermodynamics over nkice layers (this is 4 in ACCESS-OM2).

Under the BL99 thermodynamics option, the conductive salinity profile (aka 'ice internal salinity') is fixed and prescribed from a function - therefore we know what values to expect. For a given point and thickness category the expected values for these four layers are [0.64920187, 2.354581 , 3.0310922 , 3.1892977 ].

The Issue:
However, if Sinz is output (f_sinz is the namelist field flag), the values do not appear as I expect.

Here's an example taken from the 1° model, given an Xarray DataArray:

In [4]: type(sinz)
Out[4]: xarray.core.dataarray.DataArray

In [5]: sinz.shape
Out[5]: (1, 5, 4, 300, 360)

selecting a point for a given time (which has length 1 in this case), lat and lon (that has ice, aka aice>0)

In [6]: sinz.isel(time=0,ni=30,nj=40,nc=0).values
Out[6]: array([0.64920187, 0.64920187, 0.64920187, 0.64920187], dtype=float32)

We do not see the 4 values we expect to see.

Consider the same time,lat,lon for all layers and thickness categories:

n [8]: sinz.isel(time=0,ni=30,nj=40).shape
Out[8]: (5, 4)
In [9]: sinz.isel(time=0,ni=30,nj=40).values
Out[9]: 
array([[0.64920187, 0.64920187, 0.64920187, 0.64920187],
       [0.64920187, 2.354581  , 2.354581  , 2.354581  ],
       [2.354581  , 2.354581  , 3.0310922 , 3.0310922 ],
       [3.0310922 , 3.0310922 , 3.0310922 , 3.1892977 ],
       [3.1892977 , 3.1892977 , 3.1892977 , 3.1892977 ]], dtype=float32)

The values appear to be ordered along the wrong dimensions. What I think should be the correct answer can be retrieved (for this time,lat,lon) by:

In [10]: temp = sinz.isel(time=0,ni=30,nj=40).values

In [11]: temp.reshape((4,5)).transpose()
Out[11]: 
array([[0.64920187, 2.354581  , 3.0310922 , 3.1892977 ],
       [0.64920187, 2.354581  , 3.0310922 , 3.1892977 ],
       [0.64920187, 2.354581  , 3.0310922 , 3.1892977 ],
       [0.64920187, 2.354581  , 3.0310922 , 3.1892977 ],
       [0.64920187, 2.354581  , 3.0310922 , 3.1892977 ]], dtype=float32)

These code snippets have been produced in ipython using pyXarray, but the same answer is obtained using a variety of tools, including inspecting the netCDF file with ncview.

If there is an issue here, this may affect other 4d fields like Tinz, but these may be harder to diagnose from their values, as they aren't prescribed and fixed like Sinz.

An example output file with the Sinz variable can be found on gadi at:

/home/548/sxa548/access-om2-sample_output/iceh.2018-08-15.nc

cice output has the wrong date

Since libaccessom2 restarted cice jobs have incorrect date. Also clearly the libaccessom2 model time synchronisation checks are not working correctly.

halo update bugs in ACCESS driver?

Hi @russfiedler, following on from #68, these also look wrong to me (should be field_type_scalar, otherwise their signs will be flipped on the tripole in halo updates)

cice5/drivers/access/cpl_interface.F90

Lines 865 to 866 in edcfa6f

    
           call ice_HaloUpdate(um_tmlt, halo_info, field_loc_center,field_type_vector) 
        
           call ice_HaloUpdate(um_bmlt, halo_info, field_loc_center,field_type_vector)

and

cice5/drivers/access/cpl_interface.F90

Lines 869 to 874 in edcfa6f

    
           call ice_HaloUpdate(um_swflx, halo_info,field_loc_center,field_type_vector) 
        
           call ice_HaloUpdate(um_lwflx, halo_info,field_loc_center,field_type_vector) 
        
           call ice_HaloUpdate(um_shflx, halo_info,field_loc_center,field_type_vector) 
        
           call ice_HaloUpdate(um_press, halo_info,field_loc_center,field_type_vector) 
        
           call ice_HaloUpdate(um_co2, halo_info, field_loc_center, field_type_vector) 
        
           call ice_HaloUpdate(um_wnd, halo_info, field_loc_center, field_type_vector)

Initialise offset in pack_coupling_array

At line 462 of https://github.com/COSIMA/cice5/blob/master/drivers/auscom/cpl_interface.F90
in subroutine pack_coupling_array, the variable offset should be initialised to 0.
The current lack of initialisation causes errors such as

Segmentation fault: address not mapped to object

History: In commit c132689 which resolves issue #21, at line 465 in subroutine pack_coupling_array, the variable offset should have been initialised to 0. Compare with line 436 in subroutine unpack_coupling_array in the same commit.

Incorrect documentation for u_surf variable in surface_flux_mod.F90

u_surf is the ice/ocean velocity at the surface. However, in the documentation, it is incorrectly written as the zonal wind velocity at the surface. The mathematics is all correct, it's just the documentation that needs to be changed.

Add compression level namelist parameter

Netcdf compression level affects cice runtime, e.g. in 3mo runs with daily output (/g/data3/hh5/tmp/cosima/access-om2-01/01deg_jra55v13_ryf8485_spinup7_newexe):

level                          Timer  12: ReadWrite   file size
0 (no compression*)               1510s                  1600Mb
1                                1833s                   252Mb
5                                2501s                   236Mb

It would be nice to be able to control this speed/space tradeoff via a namelist parameter setting the compression level.
There are only a few code changes required - see 9e69c99

*this test still used nf90_def_var_deflate, but at level 0 - perhaps it would be faster to skip nf90_def_var_deflate if level is 0. Before switching to netcdf4 (d2ef6b1) IO took about 1300s. It's unclear whether the extra 200s in the table is the cost of netcdf4 or the level-0 deflate step.

Slack discussion: https://arccss.slack.com/archives/C9Q7Y1400/p1557809112231100

Why CICE in ACCESS-OM2 driven by Auscom while ACCESS driver exists?

Dear all,

Makefile for CICE says e.g.

01deg:
bld/build.sh $(platform) auscom 3600x2700

While in the folder "drivers" we have an access driver.
I'm wondering why would be need an access driver then?

Thank you in advance for any clarifications!
Natalia

lcdf64 no longer recognised; existing cice_in.nml now cause crash on init

Hi @nichannah

lcdf64 was removed from the setup_nml namelist in source/ice_init.F90 in d2ef6b1#diff-ad0de16fa7f0dbc37accca5aa3b94191L136

The problem is, we have lots of existing cice_in.nml files that specify lcdf64 and these now crash CICE on initialisation.

Even if lcdf64 is now ignored, can it be reinstated in setup_nml to retain backward compatibility?

Thanks!

Support non-BGC configs

I should have thought a bit harder before merging in @hakaseh's support for coupled BGC 10b3527

CICE now expects to find surface nitrate and algae in the coupling fields, so it won't work with our usual physics-only configurations.

I guess this could be fixed with lots of #ifdefs, but that will give us 2 exes for each resolution. Is there a more elegant way to do it?

diag bugs with AusCOM driver (affecting ACCESS-OM2)

Thanks to Stewart Allen for flagging this issue (see Slack discussion https://arccss.slack.com/archives/C6PP0GU9Y/p1627269245007400).

These diagnostics are identical in access-om2, and shouldn't be:

fresh and fresh_ai
fsalt and fsalt_ai
fhocn and fhocn_ai
fswthru and fswthru_ai

This issue probably also affects these diagnostics and their _ai counterparts: alvdr, alidr, alvdf, alidf, fNO, fNH, fN, fSil, but I haven't checked.

I did a test run in /home/156/aek156/payu/1deg_jra55_iaf_cice_diag_test, which gives these test results for equality of some diagnostics and their _ai counterparts:

import xarray as xr

ds = xr.open_dataset('/scratch/v45/aek156/access-om2/archive/1deg_jra55_iaf_cice_diag_test/output132/ice/OUTPUT/iceh.1968-02.nc')
allvars = list(ds.variables.keys())
vs = [v for v in allvars if v[:-2]+'_ai_m' in allvars]

for v in vs:
    print(v, ds[v].equals(ds[v[:-2]+'_ai_m']))

prints

snow_m False
rain_m False
fswabs_m False
flat_m False
fsens_m False
flwup_m False
evap_m False
fresh_m True
fsalt_m True
fhocn_m True
fswthru_m True

Backport weight-per-block set by file from CICE6

Presently the amount of work done in each block is estimated as a linear function of latitude. This is obviously not a very close approximation to the amount of ice work at a given geographic location.

CICE6 has an option to set the work weight for each grid point using a file containing a 2d field. We think this will allow a lot more accurate specification of the amount of work and hence better load balancing.

This issue will back-port the CICE6 functionality to our version of CICE.

abort_ice does not pass an error code to MPI_abort

This is the same issue Martin Dix discovered:

https://accessdev.nci.org.au/trac/ticket/318

The call signature for MPI_Abort is
https://www.open-mpi.org/doc/v1.10/man3/MPI_Abort.3.php

MPI_ABORT(COMM, ERRORCODE, IERROR)            
    INTEGER        COMM, ERRORCODE, IERROR
comm
Communicator of tasks to abort.
errorcode
Error code to return to invoking environment.

It is called here:
https://github.com/OceansAus/cice5/blob/master/mpi/ice_exit.F90#L61
but the value of ierr is not set in the routine, so it is whatever it was initialised to by the compiler.

They had a similar issue with oasis_abort:

https://portal.enes.org/oasis/faq-forum/oasis3-forum/real-coupled-models/548853210

Apparently they decided oasis_abort should default to a non-zero value, it is an abort call after all.

Put atm coupling field halo updates after ocean send

Presently the halo updates in the CICE ACCESS coupling code slow the ocean down. The ocean is waiting to receive on the ice while the ice does halo updates.

This issue moves the halo updates to after the ocean communication.

Make uses cpp rather than compiler

In the Makefile the .F.o and .F90.o rules consist of 2 steps. The files are preprocessed with cpp and then the .f90 files are compiled.

CICE-Consortium/CICE@17f346b#diff-edcac6da09c9e9a6ee6bdac54c80057dc1ad4d43acb77a6f22b31de5e1dcf129

This means that compiler predefined macros aren't defined. This means that traceback info in mpi/ice_exit.F90 is unavailable.

I think using the compiler with -EP -P is one option if you want to keep the preprocessed files.

Use reproducibility flags for Gadi

Document the reproducibility flags that were used in testing https://github.com/penguian/1deg_jra55_ryf migration from NCI Raijin to Gadi.

Using these flags with the Intel 2019.3.199 compilers produced identical output on Raijin and Gadi. See "repro group 4" in the report summary for results.

restart and input directories are the same

The information model for MOM (and I think other models) is for inputs to be read from one directory, outputs to be saved to different directory, and restarts to be saved into another unique directory. In this way the contents of the restart directory can be copied/linked to the input directory for the next run.

Currently sicemass, u_star and the coupling fields are read from restart_dir

https://github.com/OceansAus/cice5/blob/fe7300227107bde802a217ff0d6ef7f92a6eb6c2/drivers/auscom/CICE_RunMod.F90#L106
https://github.com/OceansAus/cice5/blob/05597824ac633a1c6ce444ac78b651f3844092e1/drivers/auscom/CICE_InitMod.F90#L170

and written to restart_dir

https://github.com/OceansAus/cice5/blob/fe7300227107bde802a217ff0d6ef7f92a6eb6c2/drivers/auscom/CICE_RunMod.F90#L228

This can cause issues if these files are symbolic links then writing to them will overwrite the previous version of the restart file.

I would like to have separate INPUT and RESTART directories.

Thoughts @nicjhan @aekiss @marshallward

Can't output Tinz

As reported here, ACCESS-OM2 1deg_jra55_ryf aborts when f_tinz is anything other than ‘x’. It aborts at the first time the data would be written.

Abort with message Unknown Error: Unrecognized error code in file /g/data/v45/aek156/CHUCKABLE/access-om2/src/cice5/ParallelIO/src/clib/pio_darray_int.c at line 687

This is the offending line: https://github.com/NCAR/ParallelIO/blob/7e242f78bd1b4766518aff44fda17ff50eed6188/src/clib/pio_darray_int.c#L687

Possibly related: #62 (comment)

It has been possible to output Tinz in other runs, e.g. 0.1° IAF.

Valgrind error in CICE

I have been running valgrind on access-om2 to track down a segfault. Currently I am getting the following error message before a crash. I don't know if this is triggering the crash so may be a general problem with CICE.

==31611== Invalid write of size 8
==31611== at 0x50C326: ice_gather_scatter_mp_scatter_global_dbl_ (ice_gather_scatter.f90:959)
==31611== by 0x5AFA4F: ice_read_write_mp_ice_read_nc_xy_ (ice_read_write.f90:1163)
==31611== by 0x41E93A: cpl_forcing_handler_mp_get_u_star_ (cpl_forcing_handler.f90:251)
==31611== by 0x40F5EC: cice_init (CICE_InitMod.f90:199)
==31611== by 0x40F5EC: cice_initmod_mp_cice_initialize_ (CICE_InitMod.f90:63)
==31611== by 0x40C841: MAIN__ (CICE.f90:56)
==31611== by 0x40C7DD: main (in /short/x77/nah599/access-om2/bin/cice_auscom_1440x1080_480p_maxblocks_4.exe)
==31611== Address 0x2c6fdc80 is 8 bytes after a block of size 22,312 alloc'd
==31611== at 0x4C2A8FA: malloc (vg_replace_malloc.c:298)
==31611== by 0x9279AB: _mm_malloc (in /short/x77/nah599/access-om2/bin/cice_auscom_1440x1080_480p_maxblocks_4.exe)
==31611== by 0x8A4E07: for_alloc_allocatable (in /short/x77/nah599/access-om2/bin/cice_auscom_1440x1080_480p_maxblocks_4.exe)

The command line I used to run this was:

mpirun --mca orte_base_help_aggregate 0 -wdir /short/x77/nah599/access-om2/work/025deg_jra55_ryf/atmosphere -np 1 /short/public/access-om2/bin/yatm_c2868e5b.exe : -wdir /short/x77/nah599/access-om2/work/025deg_jra55_ryf/ocean -np 1455 /short/x77/nah599/access-om2/bin/fms_ACCESS-OM_quantify_load_imbalance.x : -wdir /short/x77/nah599/access-om2/work/025deg_jra55_ryf/ice -np 393 -x LD_PRELOAD=/home/599/nah599/more_home/usr/local/lib/valgrind/libmpiwrap-amd64-linux.so /home/599/nah599/more_home/usr/local/bin/valgrind --main-stacksize=200000000 --max-stackframe=200000000 --error-limit=no --freelist-vol=10000000 --suppressions=/short/v45/nah599/more_home/mom-run-scheduler/valgrind_suppressions.txt /short/x77/nah599/access-om2/bin/cice_auscom_1440x1080_480p_maxblocks_4.exe

Add support for using relative humidity rather than specific humidity.

As mentioned in this thread COSIMA/libaccessom2#30
Initial development is here https://github.com/russfiedler/cice5/tree/rel_humid

Excessive log output when saving albice diagnostic

This line should probably be removed. It dumps a massive quantity of repetitive output to ice.log.task_* when saving the albice diagnostic.

cice5/source/ice_history.F90

Line 1747 in 39ecafd

write(nu_diag,*) 'albcnt',albcnt(i,j,iblk,ns)

Diagnostic output and restarts not compressed

Slack conversation:

Hi All. Just a quick query on what you all think we should do about CICE output. Right now I have a few problems with the CICE output — I dislike the single file per month, I don’t know why we need to have all CICE output stored in ice/OUTPUT/ rather than just ice/ and, finally, it is uncompressed! A quick test with 025deg output indicates that, for monthly output, CICE output is costing us 4 times the MOM output. By individually compressing each file, we could reduce the ice storage by a factor >5 and the total storage by a factor of 2.5. Obviously this is a no-brainer, and we should do it.
At the same time we could also consider trimming down the number of files. I would like this from a user point of view, but maybe I am just old-fashioned. It would require us to have a postprocessing script to collate monthly files in (say) annual files. My quick tests today indicated it would only save a few %, and we would have to re-build the cookbook database once we change the file structure. Any thoughts on whether we should do this?
Finally, should we automate compression and/or collation of CICE output within payu for future runs?

aidan [3:39 PM]
Should look first to see how simple it might be to accomplish the compression part in CICE itself

andy [4:40 PM]
OK, yes, let’s see what CICE can do. In the meantime, Aidan, do you have time to attempt a postprocessing script for us to trawl through existing cice files and do a straight compression? I can then test on some of our less important datasets before we set it going for real.

aidan [4:40 PM]
I’ll add it to the list (and prioritise!)
Can you say EXACTLY what you want done, preferably with an example directory and description of before and after

andy [4:55 PM]
No worries. In my testing, I copied a CICE OUTPUT directory to /home/157/amh157/v45/amh157/temp. There I made a parallel directory OUTPUT_PROCESSED, and tested a few of the files with the following command:
nccopy -d 5 -7 OUTPUT/iceh.2256-01.nc OUTPUT_PROCESSED/iceh.2256-01.nc
Basically, I guess the best strategy is to nccopy every file like that and overwrite the old one??

Investigate using parallel IO

It may be worth trying to compile with parallel IO using PIO (setenv IO_TYPE pio).

We currently compile CICE with serial IO (setenv IO_TYPE netcdf in bld/build.sh), so one CPU does all the IO and we end up with an Amdahl's law situation that limits the scalability with large core counts.

At 0.1 deg CICE is IO-bound when doing daily outputs (see Timer 12 in ice_diag.d), and the time spent in CICE IO accounts for almost all the time MOM waits for CICE (oasis_recv in access-om2.out) so the whole coupled model is waiting on one cpu. With daily CICE output at 0.1deg this is ~19% of the model runtime (it's only ~2% without daily CICE output). Lowering the compression level to 1 (#33) has helped (MOM wait was 23% with level 5), and omitting static field output (#32) would also help.

Also I understand that PIO doesn't support compression - is that correct?

@russfiedler had these comments on Slack:

I have a feeling that the CICE parallel IO hadn't really been tested or there was some problem with it.
We would have to update the netcdf versions being used in CICE for a start.
the distributors of PIO note that they need to use netCDF 4.6.1 and HDF5 1.10.4 or later for their latest version. There's a bug in parallel collective IO in earlier hdf5 versions. The NCI version of netCDF 4.6.1 is built with hdf5 1.10.2! marshall noted above that Rui found a performance drop off when moving from 1.10.2 to 1.10.4.
the gather is done on all the small tiles. So you have each PE sending a single horizontal slab several times to the root PE for each level.
the number of MPI calls is probably the main issue. It looks like there's an individual send/recv for each tile rather than either a bulk send of the tiles or something more funky using MPI_Gather(v) and MPI_Type_create_subarray.

Slack discussion: https://arccss.slack.com/archives/C9Q7Y1400/p1557272377089800

Bad error message when o2i.nc / i2o.nc can't be written

This error message needs to be fixed:

error - from NetCDF library
Permission denied

CICE does not set date correctly when use_restart_time = .false.

As mentioned in COSIMA/libaccessom2#22 it looks like the driver part of CICE is not setting the year correctly when use_restart_time = .false..

The current year is being set using the forcing date rather than the experiment date.

Code cleanup

The CICE code and particularly the AUSCOM driver has accumulated some mess - the usual bad formatting, unused variables, pointless code changes, etc.

It would be nice to clean this up mainly for readability but also to make it easier to track and apply changes from other CICE repositories such as:

https://github.com/CICE-Consortium

and

https://github.com/NCAR/CICE

The code cleanup should not change results.

Add option to not output static data in history files

The grid-related fields below are static but are included in every output .nc file, wasting a lot of runtime and storage. It would be good to provide a namelist flag which would write this grid data to a separate .nc file once per run, and omit it from all other .nc files. It looks like this would require code changes, e.g. at

cice5/io_netcdf/ice_history_write.F90

Line 249 in 1d2a963

! define information for required time-invariant variables

float TLON(nj, ni) ;
float TLAT(nj, ni) ;
float ULON(nj, ni) ;
float ULAT(nj, ni) ;
float NCAT(nc) ;
float tmask(nj, ni) ;
float blkmask(nj, ni) ;
float tarea(nj, ni) ;
float uarea(nj, ni) ;
float dxt(nj, ni) ;
float dyt(nj, ni) ;
float dxu(nj, ni) ;
float dyu(nj, ni) ;
float HTN(nj, ni) ;
float HTE(nj, ni) ;
float ANGLE(nj, ni) ;
float ANGLET(nj, ni) ;

Asynchronous output using ParallelIO library

The ParallelIO library (https://github.com/NCAR/ParallelIO) now supports asynchronous IO using an IO server.

Implement this and test with daily CICE outputs in the 0.1 deg config.

Remove dependence on maxblocks ==1 and implement Oasis orange partitioning

cpl_interface.F90 has this code:

#if (MXBLCKS != 1)
#error The code assumes that max_blocks == 1
#endif

This limitation needs to be removed in order to improve the CICE load balance.

WOMBAT requires 10m winds to be passed in OM mode

This is also required if the new Langmuir mixing parameterisation is required in the OM version. In fully coupled mode the winds are passed but not when running in OM mode.

I'd rather not hard code this as it would break existing existing configs. Using CPP preprocessing (it's in the MOM code as ACCESS_WND) is nasty but I'll probably have to do it for the moment. I'd rather this just get done on the fly via a flag that gets read in somewhere and use the OASIS error codes to test whether it should be passed or not.

NetCDF large file option is ignored for history files

CICE has a namelist option to turn on netcdf large file support. This flag is only applied to restarts not to diagnostic output.

We have been getting this error when writing output:

ice: Error in nf90_enddef: NetCDF: One or more variable sizes violate format constraints

This error usually happens when the file size is too big for the NetCDF format file type. We hope that by enabling large file support on the diagnostic output this will go away.

All non-master error output going to a single file and being overwritten

We seem to be losing error messages in CICE. At the moment we're guessing that it's because we have all (non-master) error output going to a single file. The error messages coming from one PE are being overwritten by info/debug output coming from others.

This issue should fix the code so that error messages are sent to stderr.

An alternative is to have individual output files for each PE however this has down-sides like too many files and it being difficult to find the one that has the relevant error message.

Crash in thickness_changes (ice_therm_vertical.f90)

Turns out the crash I thought was MATM (COSIMA/matm#4) is in cice.

This is not a CICE issue, but I thought it important to document in case someone else has the same problem.

To recap, this is the ACCESS-OM-1deg JRA55 RYF config, but with a new (KDS50) vertical level scheme. I have interpolated the initial conditions but as far as I know nothing else depends on the ocean vertical grid.

The crash is a divide by zero, the initial traceback has no information:

Image              PC                Routine            Line        Source             
cice_auscom_360x3  000000000092C391  Unknown               Unknown  Unknown
cice_auscom_360x3  000000000092A4CB  Unknown               Unknown  Unknown
cice_auscom_360x3  00000000008D9274  Unknown               Unknown  Unknown
cice_auscom_360x3  00000000008D9086  Unknown               Unknown  Unknown
cice_auscom_360x3  0000000000857A49  Unknown               Unknown  Unknown
cice_auscom_360x3  00000000008623F9  Unknown               Unknown  Unknown
libpthread-2.12.s  00002B9E5C3CE7E0  Unknown               Unknown  Unknown
cice_auscom_360x3  0000000000624132  Unknown               Unknown  Unknown
cice_auscom_360x3  0000000000621D5C  Unknown               Unknown  Unknown
cice_auscom_360x3  00000000005F92C6  Unknown               Unknown  Unknown
cice_auscom_360x3  000000000040E75C  Unknown               Unknown  Unknown
cice_auscom_360x3  000000000040C47D  Unknown               Unknown  Unknown
cice_auscom_360x3  000000000040C41E  Unknown               Unknown  Unknown
libc-2.12.so       00002B9E5C5FAD1D  __libc_start_main     Unknown  Unknown
cice_auscom_360x3  000000000040C329  Unknown               Unknown  Unknown

even though I recompiled cice with -g. If I load the core dump with gdb, I get this info:

#4  ice_therm_vertical::thickness_changes (nx_block=Cannot access memory at address 0x1
) at ice_therm_vertical.f90:1556
#5  0x0000000000621d5c in ice_therm_vertical::thermo_vertical (nx_block=Cannot access memory at address 0x1
) at ice_therm_vertical.f90:421
#6  0x00000000005f92c6 in ice_step_mod::step_therm1 (dt=Cannot access memory at address 0x1
) at ice_step_mod.f90:481
#7  0x000000000040e75c in ice_step () at CICE_RunMod.f90:323
#8  cice_runmod::cice_run () at CICE_RunMod.f90:180
#9  0x000000000040c47d in icemodel () at CICE.f90:57
#10 0x000000000040c41e in main ()
#11 0x00002b9e5c5fad1d in __libc_start_main () from /lib64/libc.so.6
#12 0x000000000040c329 in _start ()
(gdb) where
#0  0x00002b9e5c60e495 in raise () from /lib64/libc.so.6
#1  0x00002b9e5c60fc75 in abort () from /lib64/libc.so.6
#2  0x0000000000861d4c in for__signal_handler ()
#3  <signal handler called>
#4  ice_therm_vertical::thickness_changes (nx_block=Cannot access memory at address 0x1
) at ice_therm_vertical.f90:1556
#5  0x0000000000621d5c in ice_therm_vertical::thermo_vertical (nx_block=Cannot access memory at address 0x1
) at ice_therm_vertical.f90:421
#6  0x00000000005f92c6 in ice_step_mod::step_therm1 (dt=Cannot access memory at address 0x1
) at ice_step_mod.f90:481
#7  0x000000000040e75c in ice_step () at CICE_RunMod.f90:323
#8  cice_runmod::cice_run () at CICE_RunMod.f90:180
#9  0x000000000040c47d in icemodel () at CICE.f90:57
#10 0x000000000040c41e in main ()
#11 0x00002b9e5c5fad1d in __libc_start_main () from /lib64/libc.so.6
#12 0x000000000040c329 in _start ()
(gdb) bt full                                                                                                                                  
#0  0x00002b9e5c60e495 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00002b9e5c60fc75 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x0000000000861d4c in for__signal_handler ()
No symbol table info available.
#3  <signal handler called>
No symbol table info available.
#4  ice_therm_vertical::thickness_changes (nx_block=Cannot access memory at address 0x1
) at ice_therm_vertical.f90:1556
        phi_i_mushy = 0.84999999999999998
        qbot0 = 0
        qbotp = 0
        qbotm = 0
        hstot = 0
        wk1 = 0
        qbot = 0
        ts = 0
        ti = 0
        tmlts = 0
        ij = 30936576
        j = 21206080
        i = 33728
        dzi = (( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...) ...)
#5  0x0000000000621d5c in ice_therm_vertical::thermo_vertical (nx_block=Cannot access memory at address 0x1
) at ice_therm_vertical.f90:421
        my_task = 7
        dhi = 0
        ij = 30936576
        fadvocn = (( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...) ...)
        iage = (( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...) ...)

The line number is probably not reliable (still -02 flag on), so I think the crash is here:

https://github.com/OceansAus/cice5/blob/49b36d4bfb97328e818d428d0b8438144dbd69a1/source/ice_therm_vertical.F90#L1554

as qbotp = 0.

I'm guessing there is some ice initial conditions issues, but I don't know why change the ocean vertical grid would impact the ice. Any ideas?

sectrobin block distribution scheme for CICE

We have been using roundrobin to distribute CICE blocks amongst ranks however the 'sectrobin' looks more suitable because it may decrease comms overhead allowing us to put more blocks per rank and hence get better load balancing.

It is also just nice to have in terms of flexibility.

Merge latest code from the CICE Consortium?

It looks like we last synched with the CICE consortium when 5.1.2 was merged in on Jun 24, 2015.

There have been a few commits to CICE5.1 since then (including bug fixes) - is it time to merge these in?
https://github.com/CICE-Consortium/CICE-svn-trunk/commits/master

Incorrect halo update for calving flux

There's a bug here:

cice5/drivers/auscom/cpl_interface.F90

Line 710 in edcfa6f

call ice_HaloUpdate(calv0, halo_info, field_loc_center, field_type_vector)

Calving flux is a scalar, so this should be

  call ice_HaloUpdate(calv0, halo_info, field_loc_center, field_type_scalar)

Update CICE halos after all coupling calls and before time step

Presently ice field halos are updated immediately after receiving from atm/ocean. This is bad because it means halo updates occur between the coupling calls to send and receive from the ocean. So the ocean ends up waiting on ice halo updates which can be slow.

This issue moves all halo updates to a single place directly after the coupling but before the time step.

chio namelist value ignored

The value of chio is only used once, to calculate cpchr here:

cice5/source/ice_therm_vertical.F90

Line 691 in 16fcc6d

cpchr = -cp_ocn*rhow*chio

but then cpchr is redefined here before it is used for anything

cice5/source/ice_therm_vertical.F90

Lines 737 to 744 in 16fcc6d

    
           if (trim(fbot_xfer_type) == 'Cdn_ocn') then 
        
              ! Note: Cdn_ocn has already been used for calculating ustar  
        
              ! (formdrag only) --- David Schroeder (CPOM) 
        
              cpchr = -cp_ocn*rhow*Cdn_ocn(i,j) 
        
           else ! fbot_xfer_type == 'constant' 
        
              ! 0.006 = unitless param for basal heat flx ala McPhee and Maykut 
        
              cpchr = -cp_ocn*rhow*0.006_dbl_kind 
        
           endif

fbot_xfer_type = 'constant' is the default, which effectively hard-codes chio=0.006, ignoring whatever was set in the namelist file.
This bug seems to have been introduced in the upgrade to cice 5.1.2 in 2015.
Thanks to Paul Sandery for pointing out the insensitivity to this parameter.

Apply boundary exchange performance optimisations

@russfiedler has suggested that we look at CICE performance improvements made for CESM.

see:

http://www.cesm.ucar.edu/events/workshops/ws.2018/presentations/sewg/kim.pdf

The code changes are here:

ESCOMP/CESM_CICE5@57c1ce3

Given that CICE in the 0.1 configuration spends a very significant proportion of time doing boundary updates this is probably worth a try.

Broken timer

@russfiedler said:

Looks like one of the timers in CICE got broken in the layout update. It's the one that measures the time CICE is waiting for the ocean.
imer 18: waiting_o 10817.66 seconds
Timer stats (node): min = 10817.63 seconds
max = 10817.66 seconds
mean= 10817.64 seconds

Halo misalignment for ocean surface velocity and slope

Just preserving @russfiedler's Slack comment before it disappears
https://arccss.slack.com/archives/C9Q7Y1400/p1613975875009400

	call ice_HaloUpdate(um_tmlt, halo_info, field_loc_center,field_type_vector)
	call ice_HaloUpdate(um_bmlt, halo_info, field_loc_center,field_type_vector)

	call ice_HaloUpdate(um_swflx, halo_info,field_loc_center,field_type_vector)
	call ice_HaloUpdate(um_lwflx, halo_info,field_loc_center,field_type_vector)
	call ice_HaloUpdate(um_shflx, halo_info,field_loc_center,field_type_vector)
	call ice_HaloUpdate(um_press, halo_info,field_loc_center,field_type_vector)
	call ice_HaloUpdate(um_co2, halo_info, field_loc_center, field_type_vector)
	call ice_HaloUpdate(um_wnd, halo_info, field_loc_center, field_type_vector)

	if (trim(fbot_xfer_type) == 'Cdn_ocn') then
	! Note: Cdn_ocn has already been used for calculating ustar
	! (formdrag only) --- David Schroeder (CPOM)
	cpchr = -cp_ocnrhowCdn_ocn(i,j)
	else ! fbot_xfer_type == 'constant'
	! 0.006 = unitless param for basal heat flx ala McPhee and Maykut
	cpchr = -cp_ocnrhow0.006_dbl_kind
	endif