esizikova / rips-shoah-2012 Goto Github PK
View Code? Open in Web Editor NEWProject done as part of summer internship in RIPS UCLA with the Shoah Foundation
Project done as part of summer internship in RIPS UCLA with the Shoah Foundation
Pass the csv file into a dictionary
In order to properly evaluate the results it is important to know how many videos are assigned to each keyword and if there is an overlap. This is probably an SQL issue
Imanol strongly believes that it would be easiest for us to convert some data files from csv to SQLlite in order to use in python or other analysis.
We will probably need several metrics that measure different degrees of "difference"
We have to have the Lucene output for the top 226 queries that @imanol907 has kindly hand translated.
Translating locations will be a big problem, I think, as many of them Google translate does not recognize as words from a particular language. We should check out how the group last year dealt with this.
Report Coordinator responsibility
As most of the programming will be in python it will be important to familiarize with it
As Roja suggested, we might be able to put context into out project to help us out with the translation. There might be an API for wikipedia or some other application that we could use.
We need to have Java code that will input the TermID, Search Label, Label etc. and provide it in a suitable format for Indexing/Lucene manipulations.
Revise the work statement and submit to Mike for approval
Create an easily navigated text file for the Shoah Thesaurus
self-explanatory
A project will help us keep track of what codes what, and also out all the code in an easy to access format.
Research the thesaurus standard used in the Shoah database as suggested by the sponsors.
Self Explanatory
@christiequaranta assisting
Since the 7/1 meeting with sponsors, we need to redesign the way we represent search vectors and what we will evaluate them on
Make it clear what kind of goals we will be pursuing as of the meeting.
How to access Lucene indexing and searching possibilities. The goal of this is for everyone to everyone to get a good grip on Lucene.
self explanatory
Python gives a lot of encoding errors when pasting translated text to a text file. Eric fixed the initial Spanish translation, but for other languages, especially with more difficult alphabets, the problem will be worse. We have to figure out a way that would avoid this.
Leo to set:
Dependant on #2
Transfer stuff, including report templates, etc, from Windows to Linux.
Currently, proposed lnaguages are: Spanish, German, Russian, Persian, Swahili, Arabic, Mandarin Chinese. (Possibly add more)
After mike does his review as in #1, send to Shoah
self explanatory
Work through Lucene code, initialization and be able to make simple queries that would return the related terms in an easy format.
Make a visual transcript for everyone from our internal meeting.
Thursday, July 5
Git has a powerful way of sharing code with version control, and it would be a great idea to implement this in our project.
We agreed with @eschwartz1991 that I'll finish Russian and German by Friday (July 12) morning, then Eric does Persian and Swahili during Friday (July 12), and Elena runs Arabic and Mandarin Chinese over the weekend, if no other bugs arise.
Filter the queries from the beginning of time in order to obtain a list of 100 queries
s
self-explanatory
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.