moja-global / taswira Goto Github PK

Taswira: An interactive tool for visualising GCBM output.

License: Mozilla Public License 2.0

Python 98.37% Dockerfile 1.63%

python gcbm flint

taswira's Issues

Make passing config file optional by adding a set of default configs

Is your feature request related to a problem? Please describe.
Users have to pass a JSON formatted configuration file to start Taswira. This i

Describe the solution you'd like
I believe that we can make this configuration file optional by hard-coding a set of configs for all the common ecosystem indicators. We can build on the example config file that is available in the README:

[
  {
    "database_indicator": "NPP",
    "file_pattern": "NPP*.tiff",
    "graph_units": "Ktc",
    "palette": "Greens"
  },
  {
    "database_indicator": "NBP",
    "file_pattern": "NBP*.tiff",
    "palette": "Greens"
  },
  {
    "database_indicator": "NEP",
    "file_pattern": "NEP*.tiff",
    "palette": "Reds"
  },
  {
    "title": "AG Biomass",
    "database_indicator": "Aboveground Biomass",
    "file_pattern": "AG_Biomass_C_*.tiff",
    "palette": "YlGnBu",
    "graph_units": "Mtc"
  }
]

Keyerror in ingestion.py when trying to run Taswira

Describe the bug
It is showing keyerror in ingestion.py file in visualization tool whenever i try to run Taswira.

To Reproduce
Steps to reproduce the behavior:

go to directory GCBM.Visualisation_Tool
go to the conda environment 'taswira' which you have created in your machine.
When i typed following command it is showing me error
$ taswira indicators.json sample_data/sample_1/output_files/spatial
sample_data/sample_1/output_files/compiled_gcbm_output.db
in place of sample_data/sample_1/output_files/spatial i have passed their path .
See error
KeyError: '2016'
Expected behavior
Normally my browser should automatically open up with Taswira's interface.

Screenshots

Operating Environment:

Ubuntu 20.04.1 LTS

Additional context
Add any other context about the problem here.

Remove `Science` folder

The Science folder in the root of this repository was imported from a template when this repository was created. We don't have any use of this folder and so it can be safely deleted.

Restructure and Publish CLI on PyPI using Flit

You can install Taswira by creating a Conda environment using the included environment.yml file. This arrangement allows a cross-platform way for setting up the development environment.

This will be a little too complicated for people who want to use the tool and do not need to setup a development environment.

In this document, I propose that we should upload the tool on PyPI so that users can install by running:

$ pip install taswira

PyPI

The Python Package Index (PyPI) is a repository of software written in Python. It functions as the primary source for pip which is a package installer that by default ships with all modern distributions of Python.

The above descriptions makes PyPI that perfect place to publish our project at.

Flit

setuptools is the most popular method for packaging Python packages. It's great but requires a lot of complicated configuration. Flit is much more easier to configure as is supports the modern pyproject.toml package specification format.

Why Restructure Repository?

PyPI uses the repository's README on a project's page in its website. This is why I would need to change the README to describe the tool and would need to remove the GSoC information.

At the moment the tool is present in a sub-directory labelled taswira, I would need to move the contents of that sub-directory to the root of the repo. I would also like to change the repository's name to taswira and set its description to "An interactive visualisation tool for GCBM output". This will help increase the visibility of the tool and allow people to find it easily.

Publish a moja global Taswira package

Hey @abhineet97 - our DevOps group have been on a tear publishing images for popular moja global tools. Could we please add some CI to this repo and add your Docker image as a GitHub package?

If so, I wondered if you'd like to label your service taswira in light of your ambition to be model agnostic (#35)?

Add Contributors

Using this issue to create the contributors table.

@all-contributors please add @gmajan for projectManagement
@all-contributors please add @kaskou for review

DEFLATE-compressed raster processing is slow

Describe the bug

When using DEFLATE-compressed raster files, it takes a significant amount of time for the tool to start. This is because Terracotta is optimized to work with ZSTD-compressed raster files (i.e., cloud-optimized graphics).

Possible Resolutions

Optimize raster files on-the-fly using Terracotta's optimize-raster feature.
Ask the user to reconfigure GCBM and provide COGs instead.

Taswira as a general purpose spatial+non-spatial visualization tool

Is your feature request related to a problem? Please describe.
At the moment Taswira is able to visualize only GCBM output. It has the table names, raster patterns and other things hard-coded into it for the sole purpose of visualizing GCBM output. It would exponentially increase the utility of Taswira if it were able to visualize any arbitrary combinations of spatial and non-spatial data.

Describe the solution you'd like
I think that if we are somehow able to offload the task of specifying table names, column name, raster patterns, SQL queries, etc to the user then we can solve this problem.

Application Architecture

This is a diagram of the planned architecture of the project.

The diagram was made using Structurizr and follows the C4 Model

Add Welcome Bot to create an inclusive environment for new contributors

Is your feature request related to a problem? Please describe.
Welcome is a simple way to welcome new users based off maintainer defined comments.

The 3 plugins it combines with are new-issue-welcome(Comment to be posted to on first time issues) , new-pr-welcome(Comment to be posted to on PRs from first time contributors in your repository) and first-pr-merge(Comment to be posted to on pull requests merged by a first time user).

Describe the solution you'd like
We can setup Welcome bot by adding the GitHub App to our organization repositories and configuring .github/config.yml according to the content of the messages we want.

This will make the new contributors feel welcomed and at ease in interacting with the community.

Additional context
Screenshots that depict the Welcome bot in action:

Pitch: Use Dash for the front-end

The front-end would be responsible for showing the raster files served by the back-end. In this document, I describe my rationale for using the Dash framework for this purpose.

Dash

A Python framework for creating web applications. Under the hood, it uses Flask, React.js and the Plotly.js (a graphing library). It's made specially for building data visualisation apps, which is why I came to know about it.

Why Dash?

The CLI tool taswira and the back-end are all written in Python. Wouldn't it be great if we could build even the front-end in Python?

This was my primary motivation behind selecting Dash. The idea of creating a web app written entirely in Python is incredibly appealing to me. It's something that I've never done. I believe that it would be a great learning opportunity for me.

The other reason is that Dash and Terracotta both use Flask. This means that it should be a straightforward process to combine the two. Dash has even provided us with the necessary documentation for this.

Apart from all that, Leaflet, the map library that I've planned to use, has Dash-specific bindings already available. This, I believe, would greatly simplify the process of building the front-end.

Remove Terracotta

Terracotta utilizes the Rasterio Python Library for working with the raster files. Rasterio is a convenience wrapper that makes it easy to work with GDAL.

This means that, theoretically, we can do away with Terracotta and directly work with Rasterio.

Unable to create test raster files

Initially, I had planned on generating random GCBM raster files for testing Terracotta. But recently I found out that GCBM uses ZSTD compression. This doesn't pose a problem when ingesting the files to Terracotta. However, the problem seems to appear when I try to create a raster using rasterio (more info here).

It appears that to fix this, I'll need to build rasterio from source with ZSTD support. And apart from that, I'll also need to figure out how I'll share this build so that others can also run the tests easily.

This is something that I want to figure out. However, it doesn't seem important at the moment. So, I'm putting it here for later.

Convert DEFLATE-compressed rasters to ZSTD before intializing

Is your feature request related to a problem? Please describe.
Terracotta is not designed to work with DEFLATE-compressed raster files. It is made for Cloud Optimized GeoTIFFs (COGs) that use ZSTD compression. It does work with the DEFLATEs but the processing is considerably slow.

Describe the solution you'd like
We can convert these DELFATEs into ZSTDs before passing starting Terracotta. Thankfully Terracotta provides us with convenient methods for doing this. See here.

Describe alternatives you've considered
Asking the user to reconfigure GCBM on their end to produce COGs. This is what the program is currently doing.

Fix typo in docstring

In the docstring of the find_units function of this module, "from" has been accidentally written. It should be removed.

Project Plan

This is my project plan. Each section below represents a Milestone/Deliverable. Under each section is a list of tasks that must be completed for the milestone/deliverable to be considered achieved/delivered.

This is a living document and hence will be updated continuously as the project goes along. The list of changes can be viewed using the "edited" drop-down menu that's available above.

Community Bonding (May 4 - June 1)

Study the existing internal tool and understand its working.
Identify elements from the internal tool that can be reused.
Understand how to use spatial data in the browser.
Evaluate different frameworks and select the best one.
Learn more about the selected frameworks.
Setup coding environment (code editor, linting and styling tools, Python, etc)

Terracotta Back-End (June 1 - July 3)

Setup the code repository, ensuring that it conforms with moja global's standards.
Add Terracotta and add associated documentation for setting up development environment.
Create an ingestion script that loads GCBM data into Terracotta.
Implement a command line interface for the tool.
Add tests for checking the implementation of the above features.
Test and verify the integration (using Terracotta's inbuilt exploration interface).

Dash Integration (July 3 - July 31)

Pitch a plan for describing how Dash integration would work.
Implement overlaying raster files onto Leaflet.js Maps.
Identify the different controls that can be used for interacting with data.
Implement the interaction controls.
Setup continuous integration.

Final Touches (August 1 - August 24)

Test tool using different data sets and then fix any encountered bugs.
Identify potential features and enhancements for post-GSoC development.
Dockerize the environment to aid easy installation and development.
Update documentation and add installation instructions.
~~Publish tool on PyPI and then setup continuous deployment.~~ (see #31 for an explanation of why this was not possible)

A buffer of 2 weeks has been taken to account for any unforeseeable absence on my end (about which I'll inform as early as possible).

Pitch: Use Terracotta to Serve Raster Data

The GCBM spatial output is a list of tiled, DEFLATE compressed GeoTIFF files. In this form, the raster data cannot be used in my project's web application.

Also, the associated non-spatial data is not embedded into this raster data. It is present in a seperate SQLite database.

To be able to create an interactive front-end, I need an easy way to be able to access this raster data along with its associated non-spatial data.

Terracotta

Terracotta is a tile server written in Python. It takes some raster data and then serves it through an easy to use HTTP API. This API can solve the problems that I mentioned above.

It can serve the various tiles of a raster in the form of PNGs, which can then be easily rendered in a browser. The nature of the API also allows us easily overlay these images onto a map, as shown in this preview.

The API has the ability to return any arbitrary metadata associated with a particular raster. This can solve my problem of accessing non-spatial data on the front-end.

Terracotta & GCBM Output

Terracotta is not exactly a library. It is a command line app. It is used to serve and explore a directory of spatial data. You pass it a filename pattern as an argument. It parses the directory based on it, categorizes the raster data and then starts a server.

This make it a great tool to explore GCBM's spatial data. For example, following are the instructions for using it with the sample data available here.

First, rename the files so that they follow a pattern that Terracotta can understand. In bash:

$ for f in *.*; do echo "$(echo $f | sed 's/_//g' | sed -r 's/(.*)([0-9][0-9][0-9][0-9])(.*)/\1_\2\3/')"; done

Next, start a Terracotta server by executing:

$ terracotta serve -r ./{name}_{year}.tif'

Finally, connect this server to Terracotta's Preview interface:

$ terracotta connect localhost:5000

This will open an interface in your web browser, which you can then use to view raster data overlayed onto a map.

Terracotta & My Project

As mentioned above, Terracotta is not a library. So, how will I use it in my project?

For that, I plan on doing two thins:

First, I'll use Terracotta's Python API and generate a database of raster data. This will allow me to add the non-spatial data as metadata. And also instead of having Terracotta parse a directory, this would allow me to pass it just a single database.

Second, even though Terracotta is not advertised as a library, its code still follows a well organized and modular structure. This means that all of its features can be invoked programmatically by calling the right function (which would be this one in my case) with the right arguments.

Conclusion

I also looked at other solutions like MapServer, before settling on Terracotta. None of them offered the combined benifts of Python, HTTP API, Raster Optimization Tools, etc.

So, in conclusion, Terracotta seemed to be the most pragmatic choice for my project.