Code Monkey home page Code Monkey logo

dbtm's Introduction

Citation

Source code for the paper: Tracking Brand-Associated Polarity-Bearing Topics in User Reviews. Runcong Zhao, Lin Gui, Hanqi Yan and Yulan He (TACL).

Environment

Dependency packages are in requirement.txt, you can use pip to install:

(envfordbtm)$ pip install -r requirements.txt

Data

Preprocessed data included in [data/beauty_makeupalley/] can be used directly for dBTM, O-dBTM. It can also be used for baseline BTM, dJST and TBIP, with some tiny change to fit the input formats of those models. The original data is from MakeupAlley, a review website on beauty products. Another dataset used in the paper is HotelRec, a hotel recommendation dataset based on TripAdvisor.

To include a customized data set, first create a repo data/{dataset_name}/time. The following files must be inside this folder:

  • counts.npz: a [num_documents, num_words] sparse CSR matrix containing the word counts for each document.
  • brand_indices.npy: a [num_documents] vector where each entry is an integer in the set {0, 1, ..., num_brands - 1}, indicating the brand of the corresponding document in counts.npz.
  • score_indices.npy: a [num_documents] vector where each entry is an integer in the set {-1, 0, 1}, indicating the review polarity of the corresponding document in counts.npz.

Also in data/{dataset_name}/clean. The following files must be inside this folder:

  • brand_map.txt: a [num_brands]-length file where each line denotes the name of the brand in the corpus.
  • vocabulary.txt: a [num_words]-length file where each line denotes the corresponding word in the vocabulary.

Learning

Run dBTM.py with the command:

(envfordbtm)$ python setup/poisson_factorization_pretrain_t.py  --data=beauty_makeupalley
(envfordbtm)$ python dBTM.py

perform analysis for the outputs.

(envfordbtm)$ python analyze_dBTM.py

for OdBTM, just change the command to:

(envfordbtm)$ python setup/poisson_factorization_pretrain_t.py  --data=beauty_makeupalley
(envfordbtm)$ python OdBTM.py
(envfordbtm)$ python analyze_OdBTM.py

for BTM:

(envfordbtm)$ python setup/poisson_factorization_individual_t.py  --data=beauty_makeupalley
(envfordbtm)$ python btm.py
(envfordbtm)$ python analyze_BTM.py

for TBIP:

(envfordbtm)$ python setup/poisson_factorization_individual_t.py  --data=beauty_makeupalley
(envfordbtm)$ python tbip.py
(envfordbtm)$ python analyze_BTM.py

for dJST:

./djst -est -config ../mozilla.train.config
./djst -est -config ../mozilla.test.config
python analyze_dJST.py

##Coherence & Uniqueness Set the local Palmetto with https://github.com/dice-group/Palmetto/wiki/How-Palmetto-can-be-used, then run:

python run6.py
python ave_uniqueness.py
python ave_coherence.py

References

Part of our code is based on: Text-Based Ideal Points by Keyon Vafa, Suresh Naidu, and David Blei (ACL 2020). https://github.com/keyonvafa/tbip

dbtm's People

Contributors

blpxspg avatar nlpdataset avatar

Stargazers

Ji-Muze avatar

Watchers

 avatar

Forkers

victorusachev

dbtm's Issues

No requirements.txt

Could you please provide the requirements.txt file?
Btw, do you use tensorflow < 2 to build these codes?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.