seasaw's People
seasaw's Issues
Lazy load implementation to keep track of new changes in datastore
As a developer
I need to keep track of datastore for addition of new images
So that indexer and frontend servers can make informed decision about query results
Acceptance Criteria
- Addition of new data to indexer files when datastore has inserts
Create vagrantfile
As a developer
*I need a vagrantfile
So that other developers can quickly spin up a working development environment.
Acceptance Criteria
- After running the command 'vagrant up' in my project environment, I should have a working developer environment with my basic MVP running.
create indexer
As a developer
I need to have an indexer
So that indexing can work smoothly without intervention from actual data
Acceptance criteria:
Able to create .pickle files on developer's working environment
Download images to form zip file
As a developer
I need a program to download Images for the datastore and zip them
So that I can access the visual recognition service of Bluemix to obtain classifiers.
Acceptance Criteria
- Before I start visual recognition service, I should have zip file with atleast 20 images per zip
Connect results database to the project
As a developer
I need to connect the results database to the project
So that I can persist the results from my data collection process.
Acceptance Criteria
When I query the datastore API, it should return results based on the data currently stored in the database
Get project setup
As a developer
I need a working project
So that I can begin my development of features
Assumptions
- The work will be primarily done in Python
- We will be using git for scm and issuetracking
- We will use travis-ci for continuous integration
Acceptance criteria
- When I pull from git, I will have a project structure
- When I push to git, a travis-ci build will run. If it passes, I will be allowed to make a pull request
datasource api only running on one port
The API for the datasource is only running on one port, but it is supposed to run on two
Create Swagger page for data collection api
As a developer
I need documentation for usage of APIs
So that I can use the APIs to work out how to interact with the data collection component.
Acceptance criteria
When I hit the url http://192.168.33.10:25280/doc (after a vagrant up) I should see the documentation for the component's API
Build a scraper for Youtube
*As a user
I need data from Youtube
So that I can use that to build my indexes
Assumptions / Acceptance Criteria
- The scraper will be initiated with the application, provided the datasource is enabled
- In the inventory, there will be a list of seed searches, one will be chosen at random on startup. A result will be picked at random from this search, and then the scraper will go through the related videos of that result.
*The scraper needs to determine that it's not scraping the same video, and it will not stop while the datasource is running
Bring tag data into the API
As a developer**
I need the tags for the video as a part of the api
So that I can show the user more easily the reason for their results
Acceptance Criteria
- When I hit the API with a request for a result, is that result has been processed by the indexer, the response should contain a property "tags" which will be a simple list of strings of tags
Optimize visual Recognition to avoid connection issues
As a developer
I need to create zip file and delete images saved locally
So that we can overcome redundancy
Acceptance Criteria:
- Once the zip file is passed to visual recognition service of cloud, that zip file and images in the zip file are of no use and are deleted after Response 200 from cloud service
Optimize Indexer for real time data
As a developer
I need to continuously update Inverted index and IDF
So that frontend can index based on latest data available
Assumptions / Acceptance Criteria
- Everytime Indexer is invoked, pickles should rewritten with new data and old data.
create a skeleton project for data collector
As a developer
I need a skeleton project
So that I can begin work on the data collection component of the project
Acceptance Criteria
- I should have a python file that when I start, am provided with a basic running http service
Create database for storing of data collection results
As a developer
I need a place to store my results
So that we can persist them for later use by the indexer and front end
Assumptions
A simple, single-table relational database will be enough for our problem. It will need to be stored in the cloud so that other developers do not need to download the entire db with the repo.
Create API For Data Collector
As a developer
I need an API for the datastore
So that I can access the datastore for indexing purposes.
Design
- /results will allow me to get a bunch of result IDs
- /results/ (GET) will get me the result - containing title, url, and urls to the frames
- /doc will bring up the documentation so I know how to use the above tools
Acceptance Criteria
- When I hit the urls above, I should see what is described.
Notes
- Datastore is not yet implemented, so the results will be fake.
Cloud storage of index files
As a developer
I need the index files to be persisted in the cloud
So that I can run the indexer from any machine and get a globally-consistent index (well, close-to).
Acceptance Criteria
- When we start up the project, the indexer should first reach out to the cloud to get the latest version of the index.
- As the indexer runs, it should periodically write to the cloud
- Before writing, it should first check to see if there's any changes in that cloud version, and merge the two.
Proxy not working in Python3
Upgrading to Python3, the proxy is throwing an error.
"TypeError: 'str' does not support the buffer interface"
Optimize imagedownloader to download images in parallel than sequentially
As a developer
I need the imagedownloader to download images in parallel
So that when I run image downloader the images are ready to be sent to bluemix
Acceptance Criteria
- Multiprocessing download phase of image downloader for parallel execution
Front end design
As a developer
I need a working front end design
So that UI and UX can previewed and testings can be done when other components are finished
Acceptance Criteria:
- a working webpage/webapp
- video playing API figured out
Handling already processed videos from reloading (Database)
As a developer
I need that the videos already processed be inserted to database
So that I do not reload the indexer with same data
Acceptance Criteria
- While loading the indexer for first time, it should load the videos already processed from database
- Videos already processed shouldn't be fetched
- After 10 videos the indexer should update the database with new videos added to indexer
Set up vagrant
As a developer
Get vagrant running
Acceptance Criteria
After running the command 'vagrant up' in my project environment, I should have a working developer environment.
Bug fix for handling frames that are empty in results fetched
Situation: ImageDownloader would give an error while trying to download a frame that is in fact not present.
Bug Fix ImageDownloader should only download frames that are fetched by results/result_id
Set up Visual recognition using bluemix
As a developer
Set up bluemix visual recognition
So as to get visual concepts tags from the image which can be used by indexer
Acceptance Criteria
Correct identification of visual concepts and accumulation of data
Video regeneration
As a developer
**I need ** a video regenerator func
So that videos can be generated from the frames that were processed and provide preview of the entire video for the user
Acceptance Criteria
- a function that returns properly regenerated video from frame images
Processing PNG files
As a developer
I need the indexer to handle png files
So that all frames are indexed without loss of data for any result_id
Acceptance Criteria
- All png files should be processed by bluemix visual service
- png files that are greater than 2 MB are not processed by bluemix
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.