Code Monkey home page Code Monkey logo

gsimcli's Introduction

gsimcli:

Geostatistical SIMulation for the homogenisation and interpolation of CLImate data

Documentation Status Code Issues Project Stats

What is it

gsimcli is a method to homogenise climate data using geostatistical stochastic simulation methods.

It is presented here as an open source Python project. Some of its modules are intended to serve as useful libraries for other projects.

Development

gsimcli is implemented using Direct Sequential Simulation (DSS) [1]. The method description and its application have already been published [2].

This research project is hosted at NOVA IMS (Lisbon, Portugal) and it is funded by the "Fundação para a Ciência e Tecnologia" (FCT), Portugal, through the research project PTDC/GEO-MET/4026/2012. See the approval and funding notice.

The outcomes of this project also include three peer-reviewed papers published in scientific journals. See the complete list of the Project Publications below.

Note by the programmer

This software is no longer being developed. Of course, development may continue in any fork.

The latest and last version is available in the master branch.

The Issues page lists the tasks and ideas that were not implemented and/or completed, as well as known limitations. Those may be a source of ideas for any eventual future development.

NOVA IMS FCT

Documentation

The documentation (user manual) is hosted at readthedocs.org: http://gsimcli.readthedocs.org

Browse and post issues and contributions [here] (https://github.com/iled/gsimcli/issues).

Dependencies

License

GPLv3

References

[1]: Soares, Amílcar. Direct Sequential Simulation and Cosimulation. Mathematical Geology 33, no. 8 (2001): 911-926. http://link.springer.com/article/10.1023/A:1012246006212.

[2]: Costa, AC, and A Soares. Homogenization of Climate Data: Review and New Perspectives Using Geostatistics. Mathematical Geosciences 41, no. 3 (November 28, 2009): 291-305. doi:10.1007/s11004-008-9203-3.

Project Publications

Scientific Journals

Ribeiro, S., Caineta, J., Costa, A. C., Henriques, R. (2016) gsimcli: a geostatistical procedure for the homogenisation of climatic time series. International Journal of Climatology. doi: 10.1002/joc.4929

Ribeiro, S., Caineta, J., Costa, A. C., Henriques, R., Soares, A. (2016) Detection of inhomogeneities in precipitation time series in Portugal using direct sequential simulation. Atmospheric Research 171, 147–158. doi: 10.1016/j.atmosres.2015.11.014

Ribeiro, S., Caineta, J., Costa, A. C., (2015) Review and discussion of homogenisation methods for climate data.. Physics and Chemistry of the Earth 94, 167 - 179. doi: 10.1016/j.pce.2015.08.007

Proceedings

Ribeiro, S., Caineta, J., Costa, A. C., Soares, A. (2015). Establishment of detection and correction parameters for a geostatistical homogenisation approach. Procedia Environmental Sciences, 27, 83-88. doi: 10.1016/j.proenv.2015.07.115

Caineta, J., Ribeiro, S., Soares, A., Costa, A. C. (2015). Workflow for the homogenisation of climate data using geostatistical simulation. In: Conference Proceedings of the 15th SGEM GeoConference on Informatics, Geoinformatics and Remote Sensing. Albena, Bulgaria, 16-25 June 2015, Vol. 1, pp. 921-929.

Ribeiro, S., Caineta, J., Costa, A. C., Henriques, R. (2015). Analysing the detection and correction parameters in the homogenisation of climate data series using gsimcli. In: F. Bacao, M. Y. Santos, M. Painho (Eds.), The 18th AGILE International Conference on Geographic Information Science, Lisbon, Portugal, 9-12 June 2015.

Caineta, J., Ribeiro, S., Henriques, R., Costa, A. C. (2015). A Package for the homogenisation of climate data using geostatistical simulation. In: GEOProcessing 2015: The Seventh International Conference on Advanced Geographic Information Systems, Applications, and Services, Lisbon, Portugal, 22-27 February 2015.

Other Publications

Caineta, J., Ribeiro, S., Henriques, R., Soares, A., Costa, A. C. (2014). Benchmarking a geostatistical procedure for the homogenisation of annual precipitation series. In: Geophysical Research Abstracts, Vol. 16, EGU2014-7605, European Geosciences Union General Assembly 2014. (Vienna, Austria, 27 April –2 May 2014)

Caineta, J., Ribeiro, S., Costa, A. C., Henriques, R., Soares, A. (2014). Inhomogeneities detection in annual precipitation time series in Portugal using direct sequential simulation. In: Geophysical Research Abstracts, Vol. 16, EGU2014-7849, European Geosciences Union General Assembly 2014. (Vienna, Austria, 27 April –2 May 2014)

Ribeiro, S., Caineta, J., Henriques, R., Soares, A., Costa, A. C. (2014). Advantages and applicability of commonly used homogenisation methods for climate data. In: Geophysical Research Abstracts, Vol. 16, EGU2014-7725, European Geosciences Union General Assembly 2014. (Vienna, Austria, 27 April –2 May 2014)

gsimcli's People

Contributors

iled avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gsimcli's Issues

To Do

This is a list of some tasks that were not completed or not even started.

They differ from the list of ideas in the sense that these should have a higher importance and/or impact in the user experience, according to the existing features.

Of course, even more important are the known bugs and limitations.

This list follows no specific order or categories.

TODO

  1. Add an option in cost.py for absolute or relative counting of days/months.
  2. Check/fix if convert_gslib is creating the file even when it is not going to write it.
  3. Try to automate the numbers formatting when writing files.
  4. [Refactor] Remove references to stations merging.
  5. Confirm if the DSS binary is floor'ing variograms' range. Act accordingly.
  6. Option to append year (maybe this?.
  7. Integrate keys in the overall process.
  8. Deal with stations outside of the grid. Check grid limits and put them aside?
  9. [Refactor] Review the need for flag (optional or mandatory?)
  10. Confirm the desired behaviour of this call to dsspar.update.
  11. Check the exclusions when using the option to skip outliers in the scores calculation:
    • station: exclude the year
    • network: exclude the station
  12. Validate rows (scores GUI), maybe with QValidator.
  13. Test/fix homogenisation with one single station.
  14. Maybe add an option in cost.py to append a column with the year when resolution != 'y' or when sum_year == True.
  15. Check/fix station_order when header == False.
  16. Fix the hack in flush_varnames related with the 'flag'.
  17. Make summary of the merge_output compatible to when not using batch.
  18. Check/fix fill_station (here and here) for non annual data. Also, why do we have two functions? There was some refactoring lost in the way...
  19. [Refactor] Merge loadcheck with PointSet.load, or to the parent class.
  20. Omit results per decade in the results.xls of the batch decade.
  21. Add option to do not fill no_data, or to fill it with other value (e.g., average or user defined). Make it independent, by first inserting no_data values where needed.
  22. Handle variography definition when using batch networks without batch decade.
  23. save_settings after loading does not update the original file. Maybe ask when closing.
  24. Check filling stations in monthly data.
  25. Add buttons to change tab.
  26. Create a gsimcli_results.xls file with a single hard data.
  27. There are other TODO's spread over the code
  28. Most likely there is a lot of code cleaning to do (including removing commented out blocks that are not useful anymore).

New features and ideas

This is a list of some new features and ideas that I had, which were never implemented or tried out.

They differ from the to do list in the sense that those should have a higher importance and/or impact in the user experience, according to the existing features.

Of course, even more important are the known bugs and limitations.

They are grouped into two categories:

  • Method: related with the methodology itself.
  • Application: strictly related with the software.

Method

  1. Test using octants in DSS (Does simulation work? Is it only for two part search?).
  2. Add an test the option to crop the first n years. It is already implemented here, but not in the front-end. Venema (2012) excludes the first and last 5 years (see page 95).
  3. Implement the score power of detection (Venema, 2012).
  4. Test smaller search radius.
  5. Test candidate as soft data.
  6. Add and test mask/buffer. It should highly reduce the simulation time.
  7. Implement the score CMRSE Anomalies.
  8. Try DSS version with local means.
  9. Test PCA/clusters.
  10. Calculate and flag abrupt changes between corrected and observed values, e.g., column with (obs-corr)/corr.

Application

  1. More outputs:
    • histograms
    • maps
    • other graphics
    • visualize variograms
  2. Class to encapsulate/manage all the files of the gsimcli procedure (input, output, intermediary, parameters).
  3. Convert between file types:
    • GSLIB to COST-HOME
    • COST-HOME to CSV/gsimcli
    • GSLIB to CSV
    • CSV to GSLIB
    • GSLIB add/remove header
  4. Ask the user the number of divisions over ZZ.
  5. Add functions to split hard data.
  6. Handle different versions of the DSS (running and reading/loading parameters).
  7. Add a cute QSplashScreen.
  8. Save results by station.
  9. Show simstats progress.
  10. Handle simstats parameters.
  11. Try to use multiprocessing to detect irregularities.
  12. Try optimizations (Cython, Numba, PyPy, Pyston).
  13. Maybe replace parameters handling with ConfigParser
  14. Load/save data/parameters in binary format.
  15. Improve clearing the GUI settings, default/reset.
  16. Estimate remaining time.
  17. Provide quick launch/settings.
  18. Handle user errors/missing parameters.
  19. Load/save grid and variog files.
  20. Show variog file in batch decades.
  21. mp_exec with symlink instead of copy (not sure if it would work).
  22. Convert to COST-HOME when finalizing batch decade.
  23. To/from gsimcliparam.
  24. Allow gsimcli_results without batch_decade.
  25. Develop a CLI version, including docopt.
  26. Bundle an .exe (cx_freeze, PyInstaller, Cython, Nuitka).

Known bugs and limitations

Here is a list of known bugs and limitations. I grouped them into two categories:

  • Major relevance: the user has to be careful about these.
  • Minor relevance: small issues that should not be noticeable to the user, and may be in parts of the code that are not being used in the last version.

There is no particular order within the categories.

Major relevance

  1. Does not support file paths with :/ (which can happen in POSIX systems).

  2. In some unknown occasions, using the menu option to "restore/load settings" may add a duplicate network (double check the list and remove as needed).

  3. Make sure all the candidate stations are within the grid limits, otherwise it will throw an error while trying to extract the corresponding vertical line.

  4. Related with the previous point, be careful to define the grid dimensions in order to accomodate the radius (tolerance) around the stations. Example:

    • the grid is 10 x 10 nodes
    • one station in located at node (2, 2)
    • if using tolerance 3, it will surpass the grid dimensions in the west and north sides
    • you need to increase the grid to at least 11 x 11 (in the aforementioned directions)
  5. When calculating the scores for monthly data in the cost-home format, it will only use the first selected checkbox.

Minor relevance

  1. cost2gslib only adds the column with the 'year' for annual and monthly data.
  2. drill has some offset in the fetched location (not in use).
  3. GsimcliParam.load is being used like it was a static method (it should not be a problem because most likely the first condition will short circuit).
  4. Using the option to save intermediary files, coupled with using a tolerance, may throw an error while saving.
  5. benchmark is not updating the header when using 'monthly'.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.