quetzal-framework / quetzal-crumbs Goto Github PK

Python library for iDDC modeling using Quetzal

License: GNU General Public License v3.0

Python 99.09% Shell 0.91%

iddc coalescence modeling python spatial

quetzal-crumbs's Introduction

Quetzal-CRUMBS

This python library is meant to be used along other software from the Quetzal framework to perform iDDC modeling and inference.

iDDC modeling (Integrated Distributional, Demographic, Coalescent modeling) is a methodology for statistical phylogeography. It relies heavily on spatial models and methods to explain how past processes (sea level change, glaciers dynamics, climate modifications) shaped the present spatial distribution of genetic lineages.

🚀 The project website lives here.

In short:

Access the high resolution paleoclimatic database CHELSA to download world files
Fit a Species Distribution model and save (persist) the fitted model
Distribute the fitted model on cluster nodes and reconstruct landscape habitability dynamics across millennia.
Assemble the layers and prepare the dynamic landscape for simulation-based inference
If using Quetzal-EGGS spatial simulators, retrieve parameters and simulated data
Use Decrypt to perform robustness analysis of species delimitation methods.

What problem does this library solve?

iDDC modeling is quite a complex workflow and Quetzal-CRUMBS allows to simplify things quite a lot.

For example, to estimate the current habitat of a species using CHELSA-Trace21k model and reconstruct its high-resolution dynamics along the last 21.000 years (averaged across 4 different ML classifiers), with nice visualizations and GIS operations at the end, you can just do the following:

# Pull the docker image with all the dependencies
docker pull arnaudbecheler/quetzal-nest:latest

# Run the docker image synchronizing you working directory
docker run --user $(id -u):$(id -g) --rm=true -it \
  -v $(pwd):/srv -w /srv \
  becheler/quetzal-nest:latest /bin/bash

This will start your docker container. Once inside, copy/paste the following code in a script.sh file, and then run it with chmod u+x script.sh && ./script.sh.

#!/bin/bash

# your DNA sampling points (shapefile)
sample='sampling-points/sampling-points.shp'

# for present to LGM analysis (but much longer computations) use instead:
# chelsaTimes=$(seq -s ',' -200 1 20)
chelsaTimes=20,0,-50

# spatial buffer around sampling points (in degree)
buffer=2.0

# for proper SDM (but much longer computations) use instead:
# biovars=dem,bio
biovars=dem,bio01

crumbs-get-gbif \
      --species "Heteronotia binoei" \
      --points $sample \
      --limit 30 \
      --year "1950,2022" \
      --buffer $buffer \
      --output occurrences

mkdir -p occurrences
mv occurrences.* occurrences/

crumbs-fit-sdm \
      --presences occurrences/occurrences.shp \
      --variables $biovars \
      --nb-background 100 \
      --buffer $margin \
      --cleanup \
      --sdm-file 'my-fitted-sdm.bin'

# This part can be massively parallelized on dHTC grids
crumbs-extrapolate-sdm \
     --sdm-file 'my-fitted-sdm.bin' \
     --timeID 20

crumbs-extrapolate-sdm \
    --sdm-file 'my-fitted-sdm.bin' \
    --timeID 19

What is nice is that you can leverage these long computations for publication analyses using dHTC (distributed Hight Throughput Computing) and distribute this load on a cluster grid: check out quetzal_on_OSG!

Contact and troubleshooting

💥 A problem? A bug? Outrageous! 🙀 Please let me know by opening an issue or sending me an email so I can fix this! ⛑️

🛎️ In need of assistance about this project? Just want to chat? Let me know and let's have a zoom meeting!

🚀 Updating the package (tip note for the dev)

Create a feature branch, make updates to it.
Test the feature
Bump the version in setup.cfg
Bump the version of the whl file in .circleci/config.yml
Update the ChangeLog
Push to GitHub

🌈 When you have a successful build on https://app.circleci.com/pipelines/github/Becheler/quetzal-CRUMBS:

create a Pull Request (PR) to the develop branch
Merge the PR if it looks good.
When that build succeeds, create a PR to the main branch, review it, and merge.
Go get a beer and bless this new version with some luuuv 🍺 💞

Workflow from https://circleci.com/blog/publishing-a-python-package/.

📚 References

Karger, Dirk Nikolaus; Nobis, M., Normand, Signe; Graham, Catherine H., Zimmermann, N.E. (2021). CHELSA-TraCE21k: Downscaled transient temperature and precipitation data since the last glacial maximum. Geoscientific Model Development - Discussions
Version 1.0 Karger, Dirk Nikolaus; Nobis, M., Normand, Signe; Graham, Catherine H., Zimmermann, N.E. (2021). CHELSA-TraCE21k: Downscaled transient temperature and precipitation data since the last glacial maximum. EnviDat.
Bocedi, G., Pe’er, G., Heikkinen, R. K., Matsinos, Y., & Travis, J. M. (2012). Projecting species’ range expansion dynamics: sources of systematic biases when scaling up patterns and processes. Methods in Ecology and Evolution, 3(6), 1008-1018.
Baird, S. J., & Santos, F. (2010). Monte Carlo integration over stepping stone models for spatial genetic inference using approximate Bayesian computation. Molecular ecology resources, 10(5), 873-885.
Estoup, A., Baird, S. J., Ray, N., Currat, M., CORNUET, J. M., Santos, F., ... & Excoffier, L. (2010). Combining genetic, historical and geographical data to reconstruct the dynamics of bioinvasions: application to the cane toad Bufo marinus. Molecular ecology resources, 10(5), 886-901.

quetzal-crumbs's People

Contributors

Stargazers

Watchers

Quetzal-CRUMBS - Temporal interpolation for missing layers
... reading multiband raster sdm_outputs/suitability.tif
- number of existing bands is 3
- bands will be macthed against (in years BP): [0, 2000, 7000]
- number of missing bands is 6998
- request memory allocation 33.84 GB
... creating new array
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/crumbs/interpolate.py", line 145, in
main(sys.argv[1:])
File "/usr/local/lib/python3.8/dist-packages/crumbs/interpolate.py", line 138, in main
return temporal_interpolation(
File "/usr/local/lib/python3.8/dist-packages/crumbs/interpolate.py", line 100, in temporal_interpolation
new_array = fill_known_bands(ma.zeros(new_shape, dtype=src_array.dtype), src_array, band_to_yearBP)
File "/usr/local/lib/python3.8/dist-packages/numpy/ma/core.py", line 8154, in call
result = self._func.call(*args, **params).view(MaskedArray)
numpy.core._exceptions.MemoryError: Unable to allocate 33.8 GiB for an array with shape (7001, 1024, 1267) and data type float32

fix: avoid 'rasterio.errors.RasterioIOError not recognized as a supported file format' if wrong timesID given to sdm

crumbs.sdm with wrong CHELSA timesID (eg 50 instead of -50 ) results in a cryptic error

avoid "Directory CHELSA_world is not empty and will not be deleted"

In running sdm.py with --cleanup:

/usr/local/lib/python3.8/dist-packages/crumbs/get_chelsa.py:227: UserWarning: Directory CHELSA_world is not empty and will not be deleted.
  warnings.warn("Directory " + output_dir + " is not empty and will not be deleted.")

Dependencies bug on Ubuntu 22.04 LTS

Weird error only on 22.04 - other OS run fine.

Problem description here: https://stackoverflow.com/questions/72874772/pytest-core-dump-on-unbutu-22-04-lts?noredirect=1#comment128720025_72874772

Bug seems caused by adding Mayavi dependencies in the requirements:

# Mayavi dependencies
pyqt5==5.15.7
pillow==9.1.1
nose
traitsui==7.2.1

OSG does not support multicore autodetection

In sdm.py and interpolate.py: add an option to pass the number of available cores (around 8 in OSG)

in SDM step, an extra layer is added for present times (CHELSA time 20)

VRT file content:

<VRTDataset rasterXSize="1267" rasterYSize="1024">
  <SRS dataAxisToSRSAxisMapping="2,1">GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]</SRS>
  <GeoTransform>  1.2834152654445001e+02,  8.3333333000000006e-03,  0.0000000000000000e+00, -9.7501392098500190e+00,  0.0000000000000000e+00, -8.3333333000000006e-03</GeoTransform>
  <VRTRasterBand dataType="Float32" band="1">
    <NoDataValue>-32768</NoDataValue>
    <ComplexSource resampling="average">
      <SourceFilename relativeToVRT="1">averaged/suitability_-50.tif</SourceFilename>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="1267" RasterYSize="1024" DataType="Float32" BlockXSize="1267" BlockYSize="1" />
      <SrcRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <DstRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <NODATA>-32768</NODATA>
    </ComplexSource>
  </VRTRasterBand>
  <VRTRasterBand dataType="Float32" band="2">
    <NoDataValue>-32768</NoDataValue>
    <ComplexSource resampling="average">
      <SourceFilename relativeToVRT="1">averaged/suitability_0.tif</SourceFilename>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="1267" RasterYSize="1024" DataType="Float32" BlockXSize="1267" BlockYSize="1" />
      <SrcRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <DstRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <NODATA>-32768</NODATA>
    </ComplexSource>
  </VRTRasterBand>
  <VRTRasterBand dataType="Float32" band="3">
    <NoDataValue>-32768</NoDataValue>
    <ComplexSource resampling="average">
      <SourceFilename relativeToVRT="1">averaged/suitability_20.tif</SourceFilename>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="1267" RasterYSize="1024" DataType="Float32" BlockXSize="1267" BlockYSize="1" />
      <SrcRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <DstRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <NODATA>-32768</NODATA>
    </ComplexSource>
  </VRTRasterBand>
  <VRTRasterBand dataType="Float32" band="4">
    <NoDataValue>-32768</NoDataValue>
    <ComplexSource resampling="average">
      <SourceFilename relativeToVRT="1">averaged/suitability_20.tif</SourceFilename>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="1267" RasterYSize="1024" DataType="Float32" BlockXSize="1267" BlockYSize="1" />
      <SrcRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <DstRect xOff="0" yOff="0" xSize="1267" ySize="1024" />
      <NODATA>-32768</NODATA>
    </ComplexSource>
  </VRTRasterBand>
</VRTDataset>