thraxil / apomixis Goto Github PK
View Code? Open in Web Editor NEWdistributed image server prototype
distributed image server prototype
should be able to set a maximum disk usage cap for each node. once that max is reached, node changes writeable to False.
web interface for adding a node, removing a node, pinging nodes, etc.
figure out if it improves performance and/or keeps processes from swelling with large images.
image retrieval should do smart things wrt etags
image retrieval should do smart things with if-modified-since headers
need a good README, and some install/config documentation.
reverse(), pop(), reverse() seems stupid. need a better python queue implementation.
given a hash, a node should be able to report which nodes in the cluster have copies, what the ring looks like, maybe what scaled versions are cached, etc. mainly for debug purposes.
create a django-apomixis application to make it easier for a django project to use apomixis as its image backend. templatetags and ORM managers similar to what sorl has.
when writing, try to write to the desired number of locations. when reading, always try to read from the current location before falling back to others.
stash should only accept requests from nodes in the cluster
if a client requests an image thumbnail that's redundant (ie, asking for a thumbnail bigger than the original full size image), we should issue a 301 permanent redirect to the "canonical" size.
bootstrap installs importlib because celery/amqplib requires it for 2.6. Unfortunately, python 2.7, instead of just ignoring it, balks and the bootstrap fails. So bootstrapping needs to be conditional depending on python version (yuck!)
apply the circuit breaker pattern to handling of failed nodes to prevent thundering herd problems. Basically, if pinging (or writing to a node) fails, we set its last_failed and wait announce_frequency seconds to try pinging again. If it fails a second ping, we wait 2 * announce_frequency seconds before making the next attempt, then 3 * announce_frequency times, and so on.
every once in a while, walk the storage directory, calculate the sha1 of each file, verify that it matches the sha1 that it was stored as and repair/rebalance if necessary.
check the content type of the post request
the current gossip protocol tries to minimize intra-cluster communication somewhat by not pinging a node if that node has been seen with announce_frequency seconds. This is good most of the time but, occasionally, maybe once every 10 * announce_frequency, we should force the pings to happen.
retrieve should only accept requests from nodes in the cluster
when an image is retrieved, pull it down from the first node that has a copy, but also start a background task that checks that they other N nodes that should have it do have it and attempts to repair if they don't.
status page should be locked down
For the use case of running a "cluster" that is just a single node, it should work without running a celeryd process in addition to the web process.
Currently, if there is no celeryd, the nodes table in the database will never populate with even the single self-entry. Without any entries there, stash() won't write anywhere.
The fix would probably be to populate the nodes table at some point without having to go through celery. Alternatively, stash() could detect the case that there's only a single node (itself) and just write to disk without needing to query the nodes table.
when an image is uploaded, writes to other nodes should be done as asynchronous background tasks.
This could be an option. IE, the client could say "return as fast as possible" and everything gets done in the background or "i'll wait" and gets the status report on how many nodes it was successfully written to, etc.
when an image is retrieved, add some HTTP headers with the actual height/width so a smart client could make use of them
to avoid having to re-calculate the rings each time.
cache is invalidated any time a node enters or leaves (but we can leave the cache alone otherwise)
when uploading an image, accept a list of sizes that to create eagerly (probably in a celery task). The idea is that often, the client knows in advance what size thumbnails will be needed later so future requests can be faster if these thumbs are pre-made. This will probably need to propagate to /stash.
for forward compatibility, all API calls should include the apomixis version number somehow.
announce should only accept requests from nodes in the cluster
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.