moja-global / taswira Goto Github PK
View Code? Open in Web Editor NEWTaswira: An interactive tool for visualising GCBM output.
License: Mozilla Public License 2.0
Taswira: An interactive tool for visualising GCBM output.
License: Mozilla Public License 2.0
Is your feature request related to a problem? Please describe.
Users have to pass a JSON formatted configuration file to start Taswira. This i
Describe the solution you'd like
I believe that we can make this configuration file optional by hard-coding a set of configs for all the common ecosystem indicators. We can build on the example config file that is available in the README
:
[
{
"database_indicator": "NPP",
"file_pattern": "NPP*.tiff",
"graph_units": "Ktc",
"palette": "Greens"
},
{
"database_indicator": "NBP",
"file_pattern": "NBP*.tiff",
"palette": "Greens"
},
{
"database_indicator": "NEP",
"file_pattern": "NEP*.tiff",
"palette": "Reds"
},
{
"title": "AG Biomass",
"database_indicator": "Aboveground Biomass",
"file_pattern": "AG_Biomass_C_*.tiff",
"palette": "YlGnBu",
"graph_units": "Mtc"
}
]
Describe the bug
It is showing keyerror in ingestion.py file in visualization tool whenever i try to run Taswira.
To Reproduce
Steps to reproduce the behavior:
Operating Environment:
Additional context
Add any other context about the problem here.
The Science
folder in the root of this repository was imported from a template when this repository was created. We don't have any use of this folder and so it can be safely deleted.
You can install Taswira by creating a Conda environment using the included environment.yml
file. This arrangement allows a cross-platform way for setting up the development environment.
This will be a little too complicated for people who want to use the tool and do not need to setup a development environment.
In this document, I propose that we should upload the tool on PyPI so that users can install by running:
$ pip install taswira
The Python Package Index (PyPI) is a repository of software written in Python. It functions as the primary source for pip
which is a package installer that by default ships with all modern distributions of Python.
The above descriptions makes PyPI that perfect place to publish our project at.
setuptools is the most popular method for packaging Python packages. It's great but requires a lot of complicated configuration. Flit is much more easier to configure as is supports the modern pyproject.toml package specification format.
PyPI uses the repository's README on a project's page in its website. This is why I would need to change the README to describe the tool and would need to remove the GSoC information.
At the moment the tool is present in a sub-directory labelled taswira
, I would need to move the contents of that sub-directory to the root of the repo. I would also like to change the repository's name to taswira
and set its description to "An interactive visualisation tool for GCBM output". This will help increase the visibility of the tool and allow people to find it easily.
Hey @abhineet97 - our DevOps group have been on a tear publishing images for popular moja global tools. Could we please add some CI to this repo and add your Docker image as a GitHub package?
If so, I wondered if you'd like to label your service taswira
in light of your ambition to be model agnostic (#35)?
Using this issue to create the contributors table.
@all-contributors please add @gmajan for projectManagement
@all-contributors please add @kaskou for review
Describe the bug
When using DEFLATE-compressed raster files, it takes a significant amount of time for the tool to start. This is because Terracotta is optimized to work with ZSTD-compressed raster files (i.e., cloud-optimized graphics).
Possible Resolutions
Is your feature request related to a problem? Please describe.
At the moment Taswira is able to visualize only GCBM output. It has the table names, raster patterns and other things hard-coded into it for the sole purpose of visualizing GCBM output. It would exponentially increase the utility of Taswira if it were able to visualize any arbitrary combinations of spatial and non-spatial data.
Describe the solution you'd like
I think that if we are somehow able to offload the task of specifying table names, column name, raster patterns, SQL queries, etc to the user then we can solve this problem.
This is a diagram of the planned architecture of the project.
The diagram was made using Structurizr and follows the C4 Model
Is your feature request related to a problem? Please describe.
Welcome is a simple way to welcome new users based off maintainer defined comments.
The 3 plugins it combines with are new-issue-welcome(Comment to be posted to on first time issues) , new-pr-welcome(Comment to be posted to on PRs from first time contributors in your repository) and first-pr-merge(Comment to be posted to on pull requests merged by a first time user).
Describe the solution you'd like
We can setup Welcome bot by adding the GitHub App to our organization repositories and configuring .github/config.yml
according to the content of the messages we want.
This will make the new contributors feel welcomed and at ease in interacting with the community.
Additional context
Screenshots that depict the Welcome bot in action:
The front-end would be responsible for showing the raster files served by the back-end. In this document, I describe my rationale for using the Dash framework for this purpose.
A Python framework for creating web applications. Under the hood, it uses Flask, React.js and the Plotly.js (a graphing library). It's made specially for building data visualisation apps, which is why I came to know about it.
The CLI tool taswira
and the back-end are all written in Python. Wouldn't it be great if we could build even the front-end in Python?
This was my primary motivation behind selecting Dash. The idea of creating a web app written entirely in Python is incredibly appealing to me. It's something that I've never done. I believe that it would be a great learning opportunity for me.
The other reason is that Dash and Terracotta both use Flask. This means that it should be a straightforward process to combine the two. Dash has even provided us with the necessary documentation for this.
Apart from all that, Leaflet, the map library that I've planned to use, has Dash-specific bindings already available. This, I believe, would greatly simplify the process of building the front-end.
Terracotta utilizes the Rasterio Python Library for working with the raster files. Rasterio is a convenience wrapper that makes it easy to work with GDAL.
This means that, theoretically, we can do away with Terracotta and directly work with Rasterio.
Initially, I had planned on generating random GCBM raster files for testing Terracotta. But recently I found out that GCBM uses ZSTD compression. This doesn't pose a problem when ingesting the files to Terracotta. However, the problem seems to appear when I try to create a raster using rasterio
(more info here).
It appears that to fix this, I'll need to build rasterio
from source with ZSTD support. And apart from that, I'll also need to figure out how I'll share this build so that others can also run the tests easily.
This is something that I want to figure out. However, it doesn't seem important at the moment. So, I'm putting it here for later.
Is your feature request related to a problem? Please describe.
Terracotta is not designed to work with DEFLATE-compressed raster files. It is made for Cloud Optimized GeoTIFFs (COGs) that use ZSTD compression. It does work with the DEFLATEs but the processing is considerably slow.
Describe the solution you'd like
We can convert these DELFATEs into ZSTDs before passing starting Terracotta. Thankfully Terracotta provides us with convenient methods for doing this. See here.
Describe alternatives you've considered
Asking the user to reconfigure GCBM on their end to produce COGs. This is what the program is currently doing.
In the docstring of the find_units
function of this module, "from" has been accidentally written. It should be removed.
This is my project plan. Each section below represents a Milestone/Deliverable. Under each section is a list of tasks that must be completed for the milestone/deliverable to be considered achieved/delivered.
This is a living document and hence will be updated continuously as the project goes along. The list of changes can be viewed using the "edited" drop-down menu that's available above.
Community Bonding (May 4 - June 1)
Terracotta Back-End (June 1 - July 3)
Dash Integration (July 3 - July 31)
Final Touches (August 1 - August 24)
A buffer of 2 weeks has been taken to account for any unforeseeable absence on my end (about which I'll inform as early as possible).
The GCBM spatial output is a list of tiled, DEFLATE compressed GeoTIFF files. In this form, the raster data cannot be used in my project's web application.
Also, the associated non-spatial data is not embedded into this raster data. It is present in a seperate SQLite database.
To be able to create an interactive front-end, I need an easy way to be able to access this raster data along with its associated non-spatial data.
Terracotta is a tile server written in Python. It takes some raster data and then serves it through an easy to use HTTP API. This API can solve the problems that I mentioned above.
It can serve the various tiles of a raster in the form of PNGs, which can then be easily rendered in a browser. The nature of the API also allows us easily overlay these images onto a map, as shown in this preview.
The API has the ability to return any arbitrary metadata associated with a particular raster. This can solve my problem of accessing non-spatial data on the front-end.
Terracotta is not exactly a library. It is a command line app. It is used to serve and explore a directory of spatial data. You pass it a filename pattern as an argument. It parses the directory based on it, categorizes the raster data and then starts a server.
This make it a great tool to explore GCBM's spatial data. For example, following are the instructions for using it with the sample data available here.
First, rename the files so that they follow a pattern that Terracotta can understand. In bash:
$ for f in *.*; do echo "$(echo $f | sed 's/_//g' | sed -r 's/(.*)([0-9][0-9][0-9][0-9])(.*)/\1_\2\3/')"; done
Next, start a Terracotta server by executing:
$ terracotta serve -r ./{name}_{year}.tif'
Finally, connect this server to Terracotta's Preview interface:
$ terracotta connect localhost:5000
This will open an interface in your web browser, which you can then use to view raster data overlayed onto a map.
As mentioned above, Terracotta is not a library. So, how will I use it in my project?
For that, I plan on doing two thins:
First, I'll use Terracotta's Python API and generate a database of raster data. This will allow me to add the non-spatial data as metadata. And also instead of having Terracotta parse a directory, this would allow me to pass it just a single database.
Second, even though Terracotta is not advertised as a library, its code still follows a well organized and modular structure. This means that all of its features can be invoked programmatically by calling the right function (which would be this one in my case) with the right arguments.
I also looked at other solutions like MapServer, before settling on Terracotta. None of them offered the combined benifts of Python, HTTP API, Raster Optimization Tools, etc.
So, in conclusion, Terracotta seemed to be the most pragmatic choice for my project.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.