cegme / grisham Goto Github PK
View Code? Open in Web Editor NEWWe are looking at ways of modeling paper and user preferences.
We are looking at ways of modeling paper and user preferences.
Cane we use Lucene or solr for our text search?
Can we do this while still respecting our algorithms (e.g. weighting results using the user model).
The current graph visualization code is broken.
Additionally, It uses arbor.js.
Lets explore the d3.js at http://d3js.org/ because this is the more popular package.
We need to make sure d3.js is dynamic and can respond to events such as clicks.
Here is a problem for you @virup @clintpgeorge @supriyan,
Given two papers how would you write a db query to find the "shortest" path between them.
Vertices are papers and edges and citations/references. We can think of the edges as being undirected. It is certainly possible that no path exists between two papers.
Can you implement a solution to this?
password protect the website while it is undergoing construction. a simple .htaccess password should be sufficient.
In order to understand the data we need to look at the arrangement of papers.
We need a graph (preferable interactive) to see be able to explore our current dataset.
We could you a python package such as http://networkx.lanl.gov/.
You can get the connection info from the references
table in the db.
Would anybody be able to do this ASAP?
Check out this. Showing papers by topics. They have an interesting visualization model. http://cs.stanford.edu/~karpathy/nipspreview/
Here is another problem for you @virup @clintpgeorge @supriyan
We want to calculate a global importance factor for all the papers in the data set.
This is similar to page rank. The value of a paper CR(p)
should produce a value that is the probability that if I am randomly looking for an important paper I land on p
.
A paper with citations should have a higher value than a paper with no citations.
A paper with P
citations should have a smaller value compared to a paper with G
citations of citations where |P| - |G| < sigma
.
The references of a paper do no affect the paper's score. (Although we should have a self-citation penalty)
Also, can we compute these values using SQL?
Can we add user feedback to improve/further define topics?
One way is to drag and drop the order of the words in the topic page.
This is opening up the ML black box.
Set up the database to work on the 32 core machine. Possibly, switch to greenplum instead of postgres.
The size of a node is dependent on the number of papers that cite the paper
The length of a link is the time between the two papers.
The thickness/color of the line may be the similarity of the papers and the user model.
The query.php operation rank_realtime has some bugs in its calculation of KL divergence.
https://github.com/cegme/grisham/blob/master/web/query.php#L174-178
Pi
and pi
.sum
should be around the whole query instead of just the first. #L175 should be
Can we allow the user save their topic settings across sessions?
Remove all site scpefic links and calls from the web site,
Can we let the user commit/undo/redo the changes to topic definitions?
Can we use Microsoft Academic API to augment the data we have on the DB?
Can we have links to actual publications and author pages?
Can we allow users to add more than one topic during their search?
We want to allow users to explore more than one topic. We need to combine the single topic algorithms.
@supriyan Hey would you like to make this paper a Technical report? I don't see use doing any further work on it. /cc @virup @clintpgeorge
@virup @clintpgeorge @supriyan
You guys think we could push out a paper to SIAM DM http://www.siam.org/meetings/sdm13/ ?
The deadline is 10/12.
It would be a < 9 page paper., we would have to develop some of our ideas more and provide nice evaluation/performance numbers but I think it is possible if you guys are up for a challenge.
Check out the themes at the bottom of that link.
Can we allow users to develop virtual topics. These are topics that are weighted combinations of existing topics.
Move the current website from neo to dsr.cise.ufl.edu/demo/Gresham
Keyword search does not include user topics.
The keyword search should include the user preferences specified by the User Model.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.