Code Monkey home page Code Monkey logo

hred-qs's Introduction

Hierarchical Recurrent Encoder-Decoder code (HRED) for Query Suggestion.

This code accompanies the paper:

"A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion", by Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob G. Simonsen, Jian-Yun Nie, to appear in CIKM'15.

The pre-print of the paper is available at: http://arxiv.org/abs/1507.02221.

-- Data processing

The dataset must consist in two files:

data.ses: each line is a sequence of tab-separated strings (queries). Each line represents a query session. data.rnk: each line is a sequence of tab-separated integers (not currently used in the model, can be set to a tab-separated list of 0).

Basically, the .rnk file is not used by the model but it contains the rank of the clicked documents for each of the queries.

./convert-text2dict.py data

This will create the preprocessed dataset for training.

-- Training

Create a prototype by modifying state.py and launch:

python train.py --prototype your_prototype

hred-qs's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

hred-qs's Issues

Missing evaluation include

Looks like the evaluation file is missing

Traceback (most recent call last):
File "train.py", line 7, in
from evaluation import *
ImportError: No module named evaluation

'context_to_indices' is not defined in suggest.py

Hi! sordonia,Thanks for your comment in the last issue.
But a NameError problem occured when running suggest.py

error message:
Traceback (most recent call last):
File "suggest.py", line 110, in
main()
File "suggest.py", line 102, in main
seqs = context_to_indices(lines, model)
NameError: global name 'context_to_indices' is not defined

documentation

First, I would like to thank you for releasing this great work as open-source.

Would you please mind adding some documentation to the project (docstring for functions and modules, comments in the code, wiki of the project, Project wiki, explanation of the order of execution, etc..) to improve reproducibility.

It's really time-consuming to read all the project line by line and try to infer the meaning of every variable.

Thank you in advance!

About test results

I trained and tested on AOL data

But the results after running samply.py is not very good.

2017-04-26 2 57 18

Can you help me analyze it?

Or can you teach me how to use "Learning to Rank" method in this experiment.

Dataset

Hi,
Can you please provide the processed train and test data that you used in the paper

background data

Hello dear sordonia
you wrote;

we sort the query log by query timestamp and we use the queries submitted before 1 May,
2006 as our background data to estimate the proposed model and the baselines. The next two weeks of data are used as a training set for tuning the ranking models. The remaining two weeks are split into the validation and the test set.

what does it mean background data? why didnt you use in training phase background data.

an error

hi:
When I run your project,( python train.py --prototype prototype_test)I encountered the following error,how should I change, thank you!

Traceback (most recent call last):
File "train.py", line 204, in
main(args)
File "train.py", line 93, in main
model = SessionEncoderDecoder(state)
File "/home/dixin/work/hred-qs/session_encdec.py", line 586, in init
self.encoder.build_encoder(training_x, xmask=training_hs_mask)
File "/home/dixin/work/hred-qs/session_encdec.py", line 167, in build_encoder
f_enc, sequences=[xe, xmask], outputs_info=o_enc_info)
File "/home/work/anaconda/lib/python2.7/site-packages/theano/scan_module/scan.py", line 1041, in scan
scan_outs = local_op(*scan_inputs)
File "/home/work/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 611, in call
node = self.make_node(*inputs, **kwargs)
File "/home/work/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 538, in make_node
inner_sitsot_out.type.dtype))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 2) has dtype float32, while the result of the inner function (fn) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.

more detail

I want to reproduce your work, but when running the code encountered some problems. I hope you can give a more detailed readme.

So that I can run these programs step by step:)

What does rank file mean?

Hi, Sordonj, Thanks for your great work.
I have some problem running the convert-text2dict.py,
I know [session file] means query serises, how about [rank_file]?

Preprocessing

Hello,
Good idea. Bu I wonder about preprocess. You wrote on the paper "apply a spelling corrector". which spell corrector did you use?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.