Code Monkey home page Code Monkey logo

hdp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hdp's Issues

Training set

Please give a proper training set and a good readme. Thanks in advance

Problem Installing hdp

I am trying to install hdp, and I get this error:

In file included from state.h:4:0,
from state.cpp:1:
corpus.h: In constructor ‘document::document()’:
corpus.h:18:17: error: ‘NULL’ was not declared in this scope
words = NULL;
^
corpus.h: In destructor ‘document::~document()’:
corpus.h:34:22: error: ‘NULL’ was not declared in this scope
if (words != NULL)
^
In file included from state.cpp:1:0:
state.h: At global scope:
state.h:146:63: error: ‘NULL’ was not declared in this scope
hdp_state * target_state=NULL);
^
In file included from state.h:4:0,
from hdp.h:4,
from hdp.cpp:1:
corpus.h: In constructor ‘document::document()’:
corpus.h:18:17: error: ‘NULL’ was not declared in this scope
words = NULL;
^
corpus.h: In destructor ‘document::~document()’:
corpus.h:34:22: error: ‘NULL’ was not declared in this scope
if (words != NULL)
^
In file included from hdp.h:4:0,
from hdp.cpp:1:
state.h: At global scope:
state.h:146:63: error: ‘NULL’ was not declared in this scope
hdp_state * target_state=NULL);
^
hdp.cpp: In member function ‘void hdp::run(const char*)’:
hdp.cpp:147:97: error: call to ‘double hdp_state::split_sampling(int, int, int, int, int, hdp_state*)’ uses the default argument for parameter 6, which is not yet defined
double prob_split = proposed_state->split_sampling(num_scans, d0, d1, t0, t1);
^
Makefile:13: recipe for target 'hdp' failed
make[1]: *** [hdp] Error 1
make[1]: Leaving directory '/srv/scratch/shared/surya/imk1/TFBindingPredictionProject/src/hdp/hdp'
Makefile:4: recipe for target 'make_all' failed
make: *** [make_all] Error 2

I installed the latest version of gsl, and I think that I updated the Makefiles in hdp and hdp-faster correctly. Do you know what might be causing this problem?

Thanks so much!

could someone shed me some light on how to calculate of the likelihood?

In the code of state.cpp, the way to calculate the likelihood in table assignment is different from that in the word assignment, and I didn't find the details on the paper. confused by this for days, and still find little information, really appreciate it if someone could advice some details or related materials. Thank you!
screen shot 2017-06-09 at 12 09 40 am

I got a error when executing hdp

gsl: ../gsl/gsl_rng.h:200: ERROR: invalid n, either 0 or exceeds maximum value of generator
Default GSL error handler invoked.
how can I fix it
thanks!

The executive function takes up too much resources and the process is killed.

Hello, I have been testing the function of hdp recently, but in the process of testing, I will encounter the problem that the process is killed due to the excessive occupation of environmental resources. Is there something wrong with my usage or is the test failure caused by my own environmental restrictions?

ls data.txt

3 2 6
0:1 1:2 2:3
0:2 1:1 2:3
0:3 1:1 2:2
0:1 1:3 2:2
0:2 1:2 2:2
0:1 1:2 2:3

hdp --algorithm train --data data.txt  --directory train_dir

Program starts with following parameters:
algorithm:          = train
data_path:          = data.txt
directory:          = train_dir
max_iter            = 1000
save_lag            = 100
init_topics         = 0
random_seed         = 1716260426
gamma_a             = 1.00
gamma_b             = 1.00
alpha_a             = 1.00
alpha_b             = 1.00
eta                 = 0.50
#restricted_scans   = 5
split-merge         = no
sampling hyperparam = no

reading data from data.txt
Killed

cat /etc/os-release
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

uname -a
Linux simon28li 4.18.0-383.el8.aarch64 #1 SMP Wed Apr 20 15:39:57 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

Memory-related errors during training

I've made attempts to run hdp and hdp-faster on Ubuntu 16.04 and Windows 10. On Windows, hdp raises a segmentation fault and hdp-faster triggers std::bad_alloc. On Ubuntu, training sessions are killed after a few minutes.

Final results aren't saved in `--directory` when `--sample_hyper` is "yes" with some `--random_seed` values and datasets

I encountered that with some datasets, the final results of the training phase aren't stored under --directory if I use a random_seed of 13712 while hyper-sampling the concentration parameters as well. Only the file state.log would be produced, but not any other output files.

To reproduce the problem:

  1. Download this training corpus from PAN @ CLEF 2017 competition
  2. Run the regular hdp (not the fast variant) on the LDA-C corpus of the fifth training problem set, like:
hdp.exe --data ..\pan17_train\problem005\ldac_corpus.dat --algorithm train --directory ..\output --sample_hyper yes --save_lag -1 --random_seed 13712

(I used gensim to generate the LDA-C corpora)

The program will run smoothly and no error would be raised. However, the output directory would contain only the state.log file and the interim outputs, where we expect also mode.bin, mode-topics.dat and mode-word-assignments.dat. As far as I can tell, the combination of --sample_hyper yes and --random_seed 13712 is causing this fault to occur on selected datasets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.