cegme / grisham Goto Github PK

We are looking at ways of modeling paper and user preferences.

Makefile 0.01% Python 0.34% C 0.08% TeX 17.98% Shell 0.13% C++ 0.13% PHP 1.76% HTML 29.67% CSS 1.18% JavaScript 48.65% PLpgSQL 0.06%

grisham's People

Stargazers

Watchers

Forkers

clintpgeorge jlederluis phanijella

grisham's Issues

Use Lucene/solr for text search

Cane we use Lucene or solr for our text search?
Can we do this while still respecting our algorithms (e.g. weighting results using the user model).

Rewrite the Graph visualization code

The current graph visualization code is broken.
Additionally, It uses arbor.js.
Lets explore the d3.js at http://d3js.org/ because this is the more popular package.
We need to make sure d3.js is dynamic and can respond to events such as clicks.

6 degrees of separation

Here is a problem for you @virup @clintpgeorge @supriyan,

Given two papers how would you write a db query to find the "shortest" path between them.

Vertices are papers and edges and citations/references. We can think of the edges as being undirected. It is certainly possible that no path exists between two papers.

Can you implement a solution to this?

PAssword protect the website

password protect the website while it is undergoing construction. a simple .htaccess password should be sufficient.

Visualize the paper data

In order to understand the data we need to look at the arrangement of papers.

We need a graph (preferable interactive) to see be able to explore our current dataset.

We could you a python package such as http://networkx.lanl.gov/.

You can get the connection info from the references table in the db.

Would anybody be able to do this ASAP?

NIPS paper topic visualization

Check out this. Showing papers by topics. They have an interesting visualization model. http://cs.stanford.edu/~karpathy/nipspreview/

/cc @virup @clintpgeorge @supriyan

CitationRank (like PageRank accept for papers/citations)

Here is another problem for you @virup @clintpgeorge @supriyan

We want to calculate a global importance factor for all the papers in the data set.
This is similar to page rank. The value of a paper CR(p) should produce a value that is the probability that if I am randomly looking for an important paper I land on p.

A paper with citations should have a higher value than a paper with no citations.

A paper with P citations should have a smaller value compared to a paper with G citations of citations where |P| - |G| < sigma.

The references of a paper do no affect the paper's score. (Although we should have a self-citation penalty)

Also, can we compute these values using SQL?

Add user feedback to define topics

Can we add user feedback to improve/further define topics?
One way is to drag and drop the order of the words in the topic page.

This is opening up the ML black box.

Database access using 32 cores

Set up the database to work on the 32 core machine. Possibly, switch to greenplum instead of postgres.

Graph visualization meanings

The size of a node is dependent on the number of papers that cite the paper

The length of a link is the time between the two papers.

The thickness/color of the line may be the similarity of the papers and the user model.

Bug in DB KL computation

The query.php operation rank_realtime has some bugs in its calculation of KL divergence.

https://github.com/cegme/grisham/blob/master/web/query.php#L174-178

The first is the inconsistency of the use of Pi and pi.
Second, it looks like the sum should be around the whole query instead of just the first. #L175 should be

User save settings

Can we allow the user save their topic settings across sessions?

Make website independent of location

Remove all site scpefic links and calls from the web site,

User version control

Can we let the user commit/undo/redo the changes to topic definitions?

Microsoft Academic API integration

Can we use Microsoft Academic API to augment the data we have on the DB?
Can we have links to actual publications and author pages?

Multiple topic exploration

Can we allow users to add more than one topic during their search?
We want to allow users to explore more than one topic. We need to combine the single topic algorithms.

Technical Report

@supriyan Hey would you like to make this paper a Technical report? I don't see use doing any further work on it. /cc @virup @clintpgeorge

SIAM DM 13 paper

@virup @clintpgeorge @supriyan
You guys think we could push out a paper to SIAM DM http://www.siam.org/meetings/sdm13/ ?

The deadline is 10/12.

It would be a < 9 page paper., we would have to develop some of our ideas more and provide nice evaluation/performance numbers but I think it is possible if you guys are up for a challenge.

Check out the themes at the bottom of that link.

cegme / grisham Goto Github PK

grisham's People

Stargazers

Watchers

Forkers

grisham's Issues

Recommend Projects

Recommend Topics

Recommend Org