samuelsmithhk / seasaw Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 29.22 MB

Python 77.80% HTML 19.71% CSS 1.57% JavaScript 0.92%

seasaw's People

Contributors

Watchers

seasaw's Issues

Lazy load implementation to keep track of new changes in datastore

As a developer
I need to keep track of datastore for addition of new images
So that indexer and frontend servers can make informed decision about query results

Acceptance Criteria

Addition of new data to indexer files when datastore has inserts

Create vagrantfile

As a developer
*I need a vagrantfile
So that other developers can quickly spin up a working development environment.

Acceptance Criteria

After running the command 'vagrant up' in my project environment, I should have a working developer environment with my basic MVP running.

create indexer

As a developer
I need to have an indexer
So that indexing can work smoothly without intervention from actual data

Acceptance criteria:

Able to create .pickle files on developer's working environment

Download images to form zip file

As a developer
I need a program to download Images for the datastore and zip them
So that I can access the visual recognition service of Bluemix to obtain classifiers.

Acceptance Criteria

Before I start visual recognition service, I should have zip file with atleast 20 images per zip

Connect results database to the project

As a developer
I need to connect the results database to the project
So that I can persist the results from my data collection process.

Acceptance Criteria
When I query the datastore API, it should return results based on the data currently stored in the database

Get project setup

As a developer
I need a working project
So that I can begin my development of features

Assumptions

The work will be primarily done in Python
We will be using git for scm and issuetracking
We will use travis-ci for continuous integration

Acceptance criteria

When I pull from git, I will have a project structure
When I push to git, a travis-ci build will run. If it passes, I will be allowed to make a pull request

datasource api only running on one port

The API for the datasource is only running on one port, but it is supposed to run on two

Create Swagger page for data collection api

As a developer
I need documentation for usage of APIs
So that I can use the APIs to work out how to interact with the data collection component.

Acceptance criteria
When I hit the url http://192.168.33.10:25280/doc (after a vagrant up) I should see the documentation for the component's API

Build a scraper for Youtube

*As a user
I need data from Youtube
So that I can use that to build my indexes

Assumptions / Acceptance Criteria

The scraper will be initiated with the application, provided the datasource is enabled
In the inventory, there will be a list of seed searches, one will be chosen at random on startup. A result will be picked at random from this search, and then the scraper will go through the related videos of that result.
*The scraper needs to determine that it's not scraping the same video, and it will not stop while the datasource is running

Bring tag data into the API

As a developer**
I need the tags for the video as a part of the api
So that I can show the user more easily the reason for their results

Acceptance Criteria

When I hit the API with a request for a result, is that result has been processed by the indexer, the response should contain a property "tags" which will be a simple list of strings of tags

Optimize visual Recognition to avoid connection issues

As a developer
I need to create zip file and delete images saved locally
So that we can overcome redundancy

Acceptance Criteria:

Once the zip file is passed to visual recognition service of cloud, that zip file and images in the zip file are of no use and are deleted after Response 200 from cloud service

Optimize Indexer for real time data

As a developer
I need to continuously update Inverted index and IDF
So that frontend can index based on latest data available

Assumptions / Acceptance Criteria

Everytime Indexer is invoked, pickles should rewritten with new data and old data.

create a skeleton project for data collector

As a developer
I need a skeleton project
So that I can begin work on the data collection component of the project

Acceptance Criteria

I should have a python file that when I start, am provided with a basic running http service

Create database for storing of data collection results

As a developer
I need a place to store my results
So that we can persist them for later use by the indexer and front end

Assumptions
A simple, single-table relational database will be enough for our problem. It will need to be stored in the cloud so that other developers do not need to download the entire db with the repo.

Create API For Data Collector

As a developer
I need an API for the datastore
So that I can access the datastore for indexing purposes.

Design

/results will allow me to get a bunch of result IDs
/results/ (GET) will get me the result - containing title, url, and urls to the frames
/doc will bring up the documentation so I know how to use the above tools

Acceptance Criteria

When I hit the urls above, I should see what is described.

Notes

Datastore is not yet implemented, so the results will be fake.

Cloud storage of index files

As a developer
I need the index files to be persisted in the cloud
So that I can run the indexer from any machine and get a globally-consistent index (well, close-to).

Acceptance Criteria

When we start up the project, the indexer should first reach out to the cloud to get the latest version of the index.
As the indexer runs, it should periodically write to the cloud
Before writing, it should first check to see if there's any changes in that cloud version, and merge the two.

Proxy not working in Python3

Upgrading to Python3, the proxy is throwing an error.

"TypeError: 'str' does not support the buffer interface"

Optimize imagedownloader to download images in parallel than sequentially

As a developer
I need the imagedownloader to download images in parallel
So that when I run image downloader the images are ready to be sent to bluemix

Acceptance Criteria

Multiprocessing download phase of image downloader for parallel execution

Front end design

As a developer
I need a working front end design
So that UI and UX can previewed and testings can be done when other components are finished

Acceptance Criteria:

a working webpage/webapp
video playing API figured out

Handling already processed videos from reloading (Database)

As a developer
I need that the videos already processed be inserted to database
So that I do not reload the indexer with same data

Acceptance Criteria

While loading the indexer for first time, it should load the videos already processed from database
Videos already processed shouldn't be fetched
After 10 videos the indexer should update the database with new videos added to indexer

Set up vagrant

As a developer
Get vagrant running

Acceptance Criteria

After running the command 'vagrant up' in my project environment, I should have a working developer environment.

a function that returns properly regenerated video from frame images

Processing PNG files

As a developer
I need the indexer to handle png files
So that all frames are indexed without loss of data for any result_id

Acceptance Criteria

All png files should be processed by bluemix visual service
png files that are greater than 2 MB are not processed by bluemix

samuelsmithhk / seasaw Goto Github PK

seasaw's People

Contributors

Watchers

seasaw's Issues

Recommend Projects

Recommend Topics

Recommend Org