Code Monkey home page Code Monkey logo

clickmodel_wc's Introduction

ClickModelsWC is a small set of Python scripts for the user click models based on Yandex version (https://github.com/varepsilon/clickmodels).

A Click Model is a probabilistic graphical model used to predict search engine click data from past observations.

This project is aimed to implement recently proposed click models and intended to be easy-to-read and easy-to-modify. If it's not, please let me know how to improve it :)

Models Implemented

  • Temporal Hidden Click Model ( THCM ) model: Danqing Xu, Yiqun Liu, Min Zhang, Shaoping Ma. Incorporating revisiting behaviors into click models. WSDM (2012).
  • Temporal Click Model ( TCM ) model: Wanhong Xu, Eren Manavoglu, Erick Cantú-Paz. Temporal Click Model for Sponsored Search. SIFIR (2010).
  • Partially Observable Markov Model ( POM ) model: Kuansan Wang, Nikolas Gloy, Xiaolong Li. Inferring search behaviors using partially observable markov (pom) model. WSDM (2010).
  • Dynamic Bayesian Network ( DBN ) model: Chapelle, O. and Zhang, Y. 2009. A dynamic bayesian network click model for web search ranking. WWW (2009). (This model is exactly the same implementation as Yandex version)
  • User Browsing Model ( UBM ): Dupret, G. and Piwowarski, B. 2008. A user browsing model to predict search engine click data from past observations. SIGIR (2008). (This model is exactly the same implementation as the inference method from the original paper, which is slightly different from Yandex version)

Files

README.md

This file.

bin/

Directory with the scripts.

data/

Directory with the sample dataset.

Format of the Input Data

A small example can be found under sample/ (tab-separated). 5 files are included in this directory:

  • query_id: encode each query into a unique id.
    • e.g.: "test 1 10 5" means query ("test") with a unique id (1), 10 sessions are found in search logs and 5 sessions contain click action.
  • query_class: The probability of been each searh intent for each query.
    • e.g.: "test 0.25 0.25 0.25 0.25" means query ("test") has four search intents. Set "query_id 1" for each query if this information is needless.
  • url_id: encode each URL into a unique id.
  • train_data, test_data: search logs, in which each line represents one query-session. 10 tab-separated part are included. The inner separator for each [] is space (" "):
    • query_id [url_id * 10] [click * 10] [click_time * 10] [mouse_feature_1 * 10] [mouse_feature_2 * 10] [mouse_feature_3 * 10] [mouse_feature_4 * 10] [mouse_feature_5 * 10] [mouse_feature_6 * 10]
    • click: 1 represents click, 0 represents no click
    • click_time: >0 represents click time in seconds, -1 represents no click
    • mouse_feature_1: The most left position mouse ever reach to in the result’s display area
    • mouse_feature_2: User’s total right towards movement length in the result’s display area
    • mouse_feature_3: The total dwell time that cursor spends in the result's display area neglecting its horizontal coordinate
    • mouse_feature_4: The total dwell time that cursor spends in the result's display area
    • mouse_feature_5: The total time that cursor hovers over the result's display area
    • mouse_feature_6: The amount of cursor actions (scroll, test select, move times) that appear in the result's display area
    • Ps: Just set mouse feature as 0 if your search logs do not contain mouse movement information.

Usage

in bin/config_sample.py: select click models (e.g.: TEST_MODELS = ['THCM', 'UBM','DBN','POM'])

in bin/ : python wc_click_model_inference_by_id.py ../sample

Output

in target data directory, "/output" directory will be automatically generated, in which model results will be logged:

  • model_name.model: Parameters of this model generated from train_data
  • model_name.model.perplexity: Perplexity metrics of this model tested on test_data
  • model_name.model.relevance: Query-URL-Relevance generated from this model

clickmodel_wc's People

Contributors

kurakimai avatar

Watchers

谢晓晖 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.