Code Monkey home page Code Monkey logo

cbrec's Introduction

############################################################################

              Content-Based Recommendation Generator (CBRec v1.0)      

############################################################################


README:
=======
A Python library which generates content-based recommendations for a set of 
items described by textual metadata using four possible vector space methods,
namely TF-IDF, LSI, RP and LDA. The library can be used in command line or 
directly in a Python program. It takes as input a JSON file which contains
an array of hashes that describe the metadata of items and generates an out-
put JSON file which contains the same item hashes augmented with two more att-
ributes, namely (i) rec attribute which contains the top-N recommendations for 
each item, represented by an array of item IDs and (ii) rec_scores attribute
which contains the top-N similarity scores, represented by an array of float
numbers.

FILES:
======
The library contains the following files:
   
    data.py           Data class for items (text extraction, preprocessing)
    vector_space.py   Vector space class supporting TF-IDF, LSI, RP and LDA
    generate.py       Main class responsible for genereting recommendations
    utils.py          Unbuffered stdout class
    example.json      Example JSON file with 1000 TED talks

USAGE:
======
Usage:
    generate.py --input=<path> --output=<path> [options]

Options:
    -v, --version                      show program's version number and exit
    -h, --help                         show this help message and exit
    -d, --debug                        print status and debug messages [default: False]
    -r, --display                      display recommendations per item [default: False]
    -i, --input=<path>                 path to JSON file to be used as input
    -o, --output=<path>                path to JSON file to be used as output
    --extract=<attributes>             comma separated JSON attributes to be used [default: All]
    --preprocess                       whether to preprocess text or not  [default: False]
    --method=<TFIDF|LSI|RP|LDA>        vector space method to represent the items [default: LSI]
    --k=<integer>                      number of topics for LSI, RP and LDA [default: 100]
    --N=<integer>                      number of recommendations [default: 5]

EXAMPLE:
========
$ python generate.py --input=example.json --output=out.json --debug
{'--N': '5',
 '--debug': True,
 '--display': False,
 '--extract': 'All',
 '--help': False,
 '--input': 'example.json',
 '--k': '100',
 '--method': 'LSI',
 '--output': 'out.json',
 '--preprocess': False,
 '--version': False}
[+] Loading items:
    -> Extracting text................................[OK]
[+] Creating the vector space:
    -> Computing the dictionary.......................[OK]
    -> Creating the bag-of-words space................[OK]
    -> Creating the LSI space.........................[OK]
[+] Generating recommendations........................[OK]
[+] Saving to output file.............................[OK]
[x] Finished.

$ python generate.py --input=example.json --output=out.json --debug --preprocess --N=10 --extract=title,description
{'--N': '10',                                                                                                                           
 '--debug': True,                                                                                                                       
 '--display': False,                                                                                                                    
 '--extract': 'title,description',                                                                                                      
 '--help': False,                                                                                                                       
 '--input': 'example.json',                                                                                                             
 '--k': '100',                                                                                                                          
 '--method': 'LSI',                                                                                                                     
 '--output': 'out.json',                                                                                                                
 '--preprocess': True,                                                                                                                  
 '--version': False}                                                                                                                    
[+] Loading items:                                                                                                                      
    -> Extracting text................................[OK]                                                                              
    -> Preprocessing text.............................[OK]
[+] Creating the vector space:
    -> Computing the dictionary.......................[OK]
    -> Creating the bag-of-words space................[OK]
    -> Creating the LSI space.........................[OK]
[+] Generating recommendations........................[OK]
[+] Saving to output file.............................[OK]
[x] Finished.


DEPENDENCIES:
============
1) Install python: http://www.python.org/getit/
2) Install pip: http://www.pip-installer.org/en/latest/installing.html
3) Then:
$ pip install docopt
$ pip install json
$ pip install pyyaml
$ pip install numpy
$ pip install scipy
$ pip install gensim
$ pip install nltk
$ python
>>> import nltk
>>> nltk.download()

TROUBLESHOOTING:
================ 
Q: How can I use the library with items stored in other formats than JSON?
A: You have to convert your file to JSON.
Q: How can I use the library directly with an item hash?
A: Simply import the library in Python and initialize a generator object with 
   the item hash of your preference.
Q: Is there any attribute that is required to be present in the item metadata?
A: Yes the 'id' attribute is mandatory.

CONTACT:
========
Nikolaos Pappas 
Idiap Research Institute
Centre du Parc, 
CH 1920 Martigny, 
Switzerland
E-mail:  [email protected] 
Website: http://people.idiap.ch/npappas/ 


---
Last update:
16 Dec, 2013

cbrec's People

Contributors

nik0spapp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.