Code Monkey home page Code Monkey logo

quetzal-crumbs's Introduction

Quetzal-CRUMBS

Website becheler.github.io License: GPL v3 PyPI version Supported Python Versions Lines of code

This python library is meant to be used along other software from the Quetzal framework to perform iDDC modeling and inference.

iDDC modeling (Integrated Distributional, Demographic, Coalescent modeling) is a methodology for statistical phylogeography. It relies heavily on spatial models and methods to explain how past processes (sea level change, glaciers dynamics, climate modifications) shaped the present spatial distribution of genetic lineages.

๐Ÿš€ The project website lives here.

In short:

  1. Access the high resolution paleoclimatic database CHELSA to download world files
  2. Fit a Species Distribution model and save (persist) the fitted model
  3. Distribute the fitted model on cluster nodes and reconstruct landscape habitability dynamics across millennia.
  4. Assemble the layers and prepare the dynamic landscape for simulation-based inference
  5. If using Quetzal-EGGS spatial simulators, retrieve parameters and simulated data
  6. Use Decrypt to perform robustness analysis of species delimitation methods.

What problem does this library solve?

iDDC modeling is quite a complex workflow and Quetzal-CRUMBS allows to simplify things quite a lot.

For example, to estimate the current habitat of a species using CHELSA-Trace21k model and reconstruct its high-resolution dynamics along the last 21.000 years (averaged across 4 different ML classifiers), with nice visualizations and GIS operations at the end, you can just do the following:

# Pull the docker image with all the dependencies
docker pull arnaudbecheler/quetzal-nest:latest

# Run the docker image synchronizing you working directory
docker run --user $(id -u):$(id -g) --rm=true -it \
  -v $(pwd):/srv -w /srv \
  becheler/quetzal-nest:latest /bin/bash

This will start your docker container. Once inside, copy/paste the following code in a script.sh file, and then run it with chmod u+x script.sh && ./script.sh.

#!/bin/bash

# your DNA sampling points (shapefile)
sample='sampling-points/sampling-points.shp'

# for present to LGM analysis (but much longer computations) use instead:
# chelsaTimes=$(seq -s ',' -200 1 20)
chelsaTimes=20,0,-50

# spatial buffer around sampling points (in degree)
buffer=2.0

# for proper SDM (but much longer computations) use instead:
# biovars=dem,bio
biovars=dem,bio01

crumbs-get-gbif \
      --species "Heteronotia binoei" \
      --points $sample \
      --limit 30 \
      --year "1950,2022" \
      --buffer $buffer \
      --output occurrences

mkdir -p occurrences
mv occurrences.* occurrences/

crumbs-fit-sdm \
      --presences occurrences/occurrences.shp \
      --variables $biovars \
      --nb-background 100 \
      --buffer $margin \
      --cleanup \
      --sdm-file 'my-fitted-sdm.bin'

#ย This part can be massively parallelized on dHTC grids
crumbs-extrapolate-sdm \
     --sdm-file 'my-fitted-sdm.bin' \
     --timeID 20

crumbs-extrapolate-sdm \
    --sdm-file 'my-fitted-sdm.bin' \
    --timeID 19

What is nice is that you can leverage these long computations for publication analyses using dHTC (distributed Hight Throughput Computing) and distribute this load on a cluster grid: check out quetzal_on_OSG!

Contact and troubleshooting

๐Ÿ’ฅ A problem? A bug? Outrageous! ๐Ÿ™€ Please let me know by opening an issue or sending me an email so I can fix this! โ›‘๏ธ

๐Ÿ›Ž๏ธ In need of assistance about this project? Just want to chat? Let me know and let's have a zoom meeting!


๐Ÿš€ Updating the package (tip note for the dev)

  • Create a feature branch, make updates to it.
  • Test the feature
  • Bump the version in setup.cfg
  • Bump the version of the whl file in .circleci/config.yml
  • Update the ChangeLog
  • Push to GitHub

๐ŸŒˆ When you have a successful build on https://app.circleci.com/pipelines/github/Becheler/quetzal-CRUMBS:

  • create a Pull Request (PR) to the develop branch
  • Merge the PR if it looks good.
  • When that build succeeds, create a PR to the main branch, review it, and merge.
  • Go get a beer and bless this new version with some luuuv ๐Ÿบ ๐Ÿ’ž

Workflow from https://circleci.com/blog/publishing-a-python-package/.


๐Ÿ“š References

  • Karger, Dirk Nikolaus; Nobis, M., Normand, Signe; Graham, Catherine H., Zimmermann, N.E. (2021). CHELSA-TraCE21k: Downscaled transient temperature and precipitation data since the last glacial maximum. Geoscientific Model Development - Discussions

  • Version 1.0 Karger, Dirk Nikolaus; Nobis, M., Normand, Signe; Graham, Catherine H., Zimmermann, N.E. (2021). CHELSA-TraCE21k: Downscaled transient temperature and precipitation data since the last glacial maximum. EnviDat.

  • Bocedi, G., Peโ€™er, G., Heikkinen, R. K., Matsinos, Y., & Travis, J. M. (2012). Projecting speciesโ€™ range expansion dynamics: sources of systematic biases when scaling up patterns and processes. Methods in Ecology and Evolution, 3(6), 1008-1018.

  • Baird, S. J., & Santos, F. (2010). Monte Carlo integration over stepping stone models for spatial genetic inference using approximate Bayesian computation. Molecular ecology resources, 10(5), 873-885.

  • Estoup, A., Baird, S. J., Ray, N., Currat, M., CORNUET, J. M., Santos, F., ... & Excoffier, L. (2010). Combining genetic, historical and geographical data to reconstruct the dynamics of bioinvasions: application to the cane toad Bufo marinus. Molecular ecology resources, 10(5), 886-901.

quetzal-crumbs's People

Contributors

becheler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

quetzal-crumbs's Issues

in SDM step, an extra layer is added for present times (CHELSA time 20)

VRT file content:

<VRTDataset rasterXSize="1267" rasterYSize="1024">
  <SRS dataAxisToSRSAxisMapping="2,1">GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]</SRS>
  <GeoTransform>  1.2834152654445001e+02,  8.3333333000000006e-03,  0.0000000000000000e+00, -9.7501392098500190e+00,  0.0000000000000000e+00, -8.3333333000000006e-03</GeoTransform>
  <VRTRasterBand dataType="Float32" band="1">
    <NoDataValue>-32768</NoDataValue>
    <ComplexSource resampling="average">
      <SourceFilename relativeToVRT="1">averaged/suitability_-50.tif</SourceFilename>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="1267" RasterYSize="1024" DataType="Float32" BlockXSize="1267" BlockYSize="1" />
      <SrcRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <DstRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <NODATA>-32768</NODATA>
    </ComplexSource>
  </VRTRasterBand>
  <VRTRasterBand dataType="Float32" band="2">
    <NoDataValue>-32768</NoDataValue>
    <ComplexSource resampling="average">
      <SourceFilename relativeToVRT="1">averaged/suitability_0.tif</SourceFilename>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="1267" RasterYSize="1024" DataType="Float32" BlockXSize="1267" BlockYSize="1" />
      <SrcRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <DstRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <NODATA>-32768</NODATA>
    </ComplexSource>
  </VRTRasterBand>
  <VRTRasterBand dataType="Float32" band="3">
    <NoDataValue>-32768</NoDataValue>
    <ComplexSource resampling="average">
      <SourceFilename relativeToVRT="1">averaged/suitability_20.tif</SourceFilename>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="1267" RasterYSize="1024" DataType="Float32" BlockXSize="1267" BlockYSize="1" />
      <SrcRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <DstRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <NODATA>-32768</NODATA>
    </ComplexSource>
  </VRTRasterBand>
  <VRTRasterBand dataType="Float32" band="4">
    <NoDataValue>-32768</NoDataValue>
    <ComplexSource resampling="average">
      <SourceFilename relativeToVRT="1">averaged/suitability_20.tif</SourceFilename>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="1267" RasterYSize="1024" DataType="Float32" BlockXSize="1267" BlockYSize="1" />
      <SrcRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <DstRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <NODATA>-32768</NODATA>
    </ComplexSource>
  </VRTRasterBand>
</VRTDataset>

Unable to allocate array with shape and data type

  • Quetzal-CRUMBS - Temporal interpolation for missing layers
    ... reading multiband raster sdm_outputs/suitability.tif
    - number of existing bands is 3
    - bands will be macthed against (in years BP): [0, 2000, 7000]
    - number of missing bands is 6998
    - request memory allocation 33.84 GB
    ... creating new array
    Traceback (most recent call last):
    File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
    File "/usr/local/lib/python3.8/dist-packages/crumbs/interpolate.py", line 145, in
    main(sys.argv[1:])
    File "/usr/local/lib/python3.8/dist-packages/crumbs/interpolate.py", line 138, in main
    return temporal_interpolation(
    File "/usr/local/lib/python3.8/dist-packages/crumbs/interpolate.py", line 100, in temporal_interpolation
    new_array = fill_known_bands(ma.zeros(new_shape, dtype=src_array.dtype), src_array, band_to_yearBP)
    File "/usr/local/lib/python3.8/dist-packages/numpy/ma/core.py", line 8154, in call
    result = self._func.call(*args, **params).view(MaskedArray)
    numpy.core._exceptions.MemoryError: Unable to allocate 33.8 GiB for an array with shape (7001, 1024, 1267) and data type float32

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.