pathologicalhandwaving / cbrec Goto Github PK
View Code? Open in Web Editor NEWThis project forked from idiap/cbrec
Content-based Recommendation Generator
License: GNU General Public License v3.0
This project forked from idiap/cbrec
Content-based Recommendation Generator
License: GNU General Public License v3.0
############################################################################ Content-Based Recommendation Generator (CBRec v1.0) ############################################################################ README: ======= A Python library which generates content-based recommendations for a set of items described by textual metadata using four possible vector space methods, namely TF-IDF, LSI, RP and LDA. The library can be used in command line or directly in a Python program. It takes as input a JSON file which contains an array of hashes that describe the metadata of items and generates an out- put JSON file which contains the same item hashes augmented with two more att- ributes, namely (i) rec attribute which contains the top-N recommendations for each item, represented by an array of item IDs and (ii) rec_scores attribute which contains the top-N similarity scores, represented by an array of float numbers. FILES: ====== The library contains the following files: data.py Data class for items (text extraction, preprocessing) vector_space.py Vector space class supporting TF-IDF, LSI, RP and LDA generate.py Main class responsible for genereting recommendations utils.py Unbuffered stdout class example.json Example JSON file with 1000 TED talks USAGE: ====== Usage: generate.py --input=<path> --output=<path> [options] Options: -v, --version show program's version number and exit -h, --help show this help message and exit -d, --debug print status and debug messages [default: False] -r, --display display recommendations per item [default: False] -i, --input=<path> path to JSON file to be used as input -o, --output=<path> path to JSON file to be used as output --extract=<attributes> comma separated JSON attributes to be used [default: All] --preprocess whether to preprocess text or not [default: False] --method=<TFIDF|LSI|RP|LDA> vector space method to represent the items [default: LSI] --k=<integer> number of topics for LSI, RP and LDA [default: 100] --N=<integer> number of recommendations [default: 5] EXAMPLE: ======== $ python generate.py --input=example.json --output=out.json --debug {'--N': '5', '--debug': True, '--display': False, '--extract': 'All', '--help': False, '--input': 'example.json', '--k': '100', '--method': 'LSI', '--output': 'out.json', '--preprocess': False, '--version': False} [+] Loading items: -> Extracting text................................[OK] [+] Creating the vector space: -> Computing the dictionary.......................[OK] -> Creating the bag-of-words space................[OK] -> Creating the LSI space.........................[OK] [+] Generating recommendations........................[OK] [+] Saving to output file.............................[OK] [x] Finished. $ python generate.py --input=example.json --output=out.json --debug --preprocess --N=10 --extract=title,description {'--N': '10', '--debug': True, '--display': False, '--extract': 'title,description', '--help': False, '--input': 'example.json', '--k': '100', '--method': 'LSI', '--output': 'out.json', '--preprocess': True, '--version': False} [+] Loading items: -> Extracting text................................[OK] -> Preprocessing text.............................[OK] [+] Creating the vector space: -> Computing the dictionary.......................[OK] -> Creating the bag-of-words space................[OK] -> Creating the LSI space.........................[OK] [+] Generating recommendations........................[OK] [+] Saving to output file.............................[OK] [x] Finished. DEPENDENCIES: ============ 1) Install python: http://www.python.org/getit/ 2) Install pip: http://www.pip-installer.org/en/latest/installing.html 3) Then: $ pip install docopt $ pip install json $ pip install pyyaml $ pip install numpy $ pip install scipy $ pip install gensim $ pip install nltk $ python >>> import nltk >>> nltk.download() TROUBLESHOOTING: ================ Q: How can I use the library with items stored in other formats than JSON? A: You have to convert your file to JSON. Q: How can I use the library directly with an item hash? A: Simply import the library in Python and initialize a generator object with the item hash of your preference. Q: Is there any attribute that is required to be present in the item metadata? A: Yes the 'id' attribute is mandatory. CONTACT: ======== Nikolaos Pappas Idiap Research Institute Centre du Parc, CH 1920 Martigny, Switzerland E-mail: [email protected] Website: http://people.idiap.ch/npappas/ --- Last update: 16 Dec, 2013
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.