Code Monkey home page Code Monkey logo

matchzoo's Introduction

logo

MatchZoo Tweet

Facilitating the design, comparison and sharing of deep text matching models.
MatchZoo 是一个通用的文本匹配工具包,它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。

Python 3.6 Pypi Downloads Documentation Status Build Status codecov License Requirements Status

🔥News: MatchZoo-py (PyTorch version of MatchZoo) is ready now.

The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase identification. With the unified data processing pipeline, simplified model configuration and automatic hyper-parameters tunning features equipped, MatchZoo is flexible and easy to use.

Tasks Text 1 Text 2 Objective
Paraphrase Identification string 1 string 2 classification
Textual Entailment text hypothesis classification
Question Answer question answer classification/ranking
Conversation dialog response classification/ranking
Information Retrieval query document ranking

Get Started in 60 Seconds

To train a Deep Semantic Structured Model, import matchzoo and prepare input data.

import matchzoo as mz

train_pack = mz.datasets.wiki_qa.load_data('train', task='ranking')
valid_pack = mz.datasets.wiki_qa.load_data('dev', task='ranking')

Preprocess your input data in three lines of code, keep track parameters to be passed into the model.

preprocessor = mz.preprocessors.DSSMPreprocessor()
train_processed = preprocessor.fit_transform(train_pack)
valid_processed = preprocessor.transform(valid_pack)

Make use of MatchZoo customized loss functions and evaluation metrics:

ranking_task = mz.tasks.Ranking(loss=mz.losses.RankCrossEntropyLoss(num_neg=4))
ranking_task.metrics = [
    mz.metrics.NormalizedDiscountedCumulativeGain(k=3),
    mz.metrics.MeanAveragePrecision()
]

Initialize the model, fine-tune the hyper-parameters.

model = mz.models.DSSM()
model.params['input_shapes'] = preprocessor.context['input_shapes']
model.params['task'] = ranking_task
model.guess_and_fill_missing_params()
model.build()
model.compile()

Generate pair-wise training data on-the-fly, evaluate model performance using customized callbacks on validation data.

train_generator = mz.PairDataGenerator(train_processed, num_dup=1, num_neg=4, batch_size=64, shuffle=True)
valid_x, valid_y = valid_processed.unpack()
evaluate = mz.callbacks.EvaluateAllMetrics(model, x=valid_x, y=valid_y, batch_size=len(valid_x))
history = model.fit_generator(train_generator, epochs=20, callbacks=[evaluate], workers=5, use_multiprocessing=False)

References

Tutorials

English Documentation

中文文档

If you're interested in the cutting-edge research progress, please take a look at awaresome neural models for semantic match.

Install

MatchZoo is dependent on Keras and Tensorflow. Two ways to install MatchZoo:

Install MatchZoo from Pypi:

pip install matchzoo

Install MatchZoo from the Github source:

git clone https://github.com/NTMC-Community/MatchZoo.git
cd MatchZoo
python setup.py install

Models

  1. DRMM: this model is an implementation of A Deep Relevance Matching Model for Ad-hoc Retrieval.

  2. MatchPyramid: this model is an implementation of Text Matching as Image Recognition

  3. ARC-I: this model is an implementation of Convolutional Neural Network Architectures for Matching Natural Language Sentences

  4. DSSM: this model is an implementation of Learning Deep Structured Semantic Models for Web Search using Clickthrough Data

  5. CDSSM: this model is an implementation of Learning Semantic Representations Using Convolutional Neural Networks for Web Search

  6. ARC-II: this model is an implementation of Convolutional Neural Network Architectures for Matching Natural Language Sentences

  7. MV-LSTM:this model is an implementation of A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations

  8. aNMM: this model is an implementation of aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model

  9. DUET: this model is an implementation of Learning to Match Using Local and Distributed Representations of Text for Web Search

  10. K-NRM: this model is an implementation of End-to-End Neural Ad-hoc Ranking with Kernel Pooling

  11. CONV-KNRM: this model is an implementation of Convolutional neural networks for soft-matching n-grams in ad-hoc search

  12. models under development: Match-SRNN, DeepRank, BiMPM ....

Citation

If you use MatchZoo in your research, please use the following BibTex entry.

@inproceedings{Guo:2019:MLP:3331184.3331403,
 author = {Guo, Jiafeng and Fan, Yixing and Ji, Xiang and Cheng, Xueqi},
 title = {MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching},
 booktitle = {Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
 series = {SIGIR'19},
 year = {2019},
 isbn = {978-1-4503-6172-9},
 location = {Paris, France},
 pages = {1297--1300},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/3331184.3331403},
 doi = {10.1145/3331184.3331403},
 acmid = {3331403},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {matchzoo, neural network, text matching},
} 

Development Team

​ ​ ​ ​

faneshion
Fan Yixing

Core Dev
ASST PROF, ICT

bwanglzu
Wang Bo

Core Dev
M.S. TU Delft

uduse
Wang Zeyi

Core Dev
B.S. UC Davis

pl8787
Pang Liang

Core Dev
ASST PROF, ICT

yangliuy
Yang Liu

Core Dev
PhD. UMASS

wqh17101
Wang Qinghua

Documentation
B.S. Shandong Univ.

ZizhenWang
Wang Zizhen

Dev
M.S. UCAS

lixinsu
Su Lixin

Dev
PhD. UCAS

zhouzhouyang520
Yang Zhou

Dev
M.S. CQUT

rgtjf
Tian Junfeng

Dev
M.S. ECNU

Contribution

Please make sure to read the Contributing Guide before creating a pull request. If you have a MatchZoo-related paper/project/compnent/tool, send a pull request to this awesome list!

Thank you to all the people who already contributed to MatchZoo!

Jianpeng Hou, Lijuan Chen, Yukun Zheng, Niuguo Cheng, Dai Zhuyun, Aneesh Joshi, Zeno Gantner, Kai Huang, stanpcf, ChangQF, Mike Kellogg

Project Organizers

  • Jiafeng Guo
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage
  • Yanyan Lan
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage
  • Xueqi Cheng
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage

License

Apache-2.0

Copyright (c) 2015-present, Yixing Fan (faneshion)

matchzoo's People

Contributors

adedzy avatar aneesh-joshi avatar bwanglzu avatar caiyinqiong avatar changqf avatar chriskuei avatar crystina-z avatar faneshion avatar githubclj avatar hkvision avatar houjp avatar jellying avatar jibrilfrej avatar lixinsu avatar matthew-z avatar niuox avatar pl8787 avatar rgtjf avatar sleepybag avatar stanpcf avatar uduse avatar wqh17101 avatar wsdm2019-dapa avatar yangliuy avatar zenogantner avatar zhouzhouyang520 avatar zizhenwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

matchzoo's Issues

run MatchZoo/examples/wikiqa$ bash run_mvlstm.sh failed

mldl@mldlUB1604:/ub16_prj/MatchZoo/examples/wikiqa$ bash run_mvlstm.sh
Using TensorFlow backend.
2017-12-14 03:37:07.053967: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:37:07.053990: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:37:07.054015: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:37:07.054020: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:37:07.054025: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:37:07.142388: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-12-14 03:37:07.142703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 950M
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:01:00.0
Total memory: 3.95GiB
Free memory: 3.65GiB
2017-12-14 03:37:07.142718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-12-14 03:37:07.142723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-12-14 03:37:07.142734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 950M, pci bus id: 0000:01:00.0)
{
"inputs": {
"test": {
"phase": "EVAL",
"input_type": "ListGenerator",
"relation_file": "./data/WikiQA/relation_test.txt",
"batch_list": 10
},
"predict": {
"phase": "PREDICT",
"input_type": "ListGenerator",
"relation_file": "./data/WikiQA/relation_test.txt",
"batch_list": 10
},
"train": {
"relation_file": "./data/WikiQA/relation_train.txt",
"input_type": "PairGenerator",
"batch_size": 100,
"batch_per_iter": 5,
"phase": "TRAIN",
"query_per_iter": 50,
"use_iter": false
},
"share": {
"vocab_size": 18670,
"use_dpool": false,
"embed_size": 50,
"target_mode": "ranking",
"text1_corpus": "./data/WikiQA/corpus_preprocessed.txt",
"text2_corpus": "./data/WikiQA/corpus_preprocessed.txt",
"embed_path": "./data/WikiQA/embed_glove_d50",
"text1_maxlen": 10,
"train_embed": false,
"text2_maxlen": 40
},
"valid": {
"phase": "EVAL",
"input_type": "ListGenerator",
"relation_file": "./data/WikiQA/relation_valid.txt",
"batch_list": 10
}
},
"global": {
"optimizer": "adadelta",
"num_iters": 400,
"save_weights_iters": 10,
"learning_rate": 0.0001,
"test_weights_iters": 400,
"weights_file": "examples/wikiqa/weights/mvlstm.wikiqa.weights",
"model_type": "PY",
"display_interval": 10
},
"outputs": {
"predict": {
"save_format": "TREC",
"save_path": "predict.test.wikiqa.txt"
}
},
"losses": [
{
"object_name": "rank_hinge_loss",
"object_params": {
"margin": 1.0
}
}
],
"metrics": [
"ndcg@3",
"ndcg@5",
"map"
],
"net_name": "MVLSTM",
"model": {
"model_py": "mvlstm.MVLSTM",
"setting": {
"dropout_rate": 0.5,
"hidden_size": 50,
"topk": 100
},
"model_path": "./matchzoo/models/"
}
}
[./data/WikiQA/embed_glove_d50]
Embedding size: 18677
Traceback (most recent call last):
File "matchzoo/main.py", line 328, in
main(sys.argv)
File "matchzoo/main.py", line 320, in main
train(config)
File "matchzoo/main.py", line 67, in train
share_input_conf['embed'] = convert_embed_2_numpy(embed_dict, embed = embed)
File "/home/mldl/ub16_prj/MatchZoo/matchzoo/utils/rank_io.py", line 93, in convert_embed_2_numpy
embed[k] = np.array(embed_dict[k])
IndexError: index 18670 is out of bounds for axis 0 with size 18670
mldl@mldlUB1604:
/ub16_prj/MatchZoo/examples/wikiqa$

run MatchZoo/examples/wikiqa$ bash run_dssm.sh failed

mldl@mldlUB1604:/ub16_prj/MatchZoo/examples/wikiqa$ bash run_dssm.sh
Using TensorFlow backend.
2017-12-14 03:34:23.080444: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:34:23.080467: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:34:23.080490: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:34:23.080496: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:34:23.080514: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:34:23.169856: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-12-14 03:34:23.170205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 950M
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:01:00.0
Total memory: 3.95GiB
Free memory: 3.65GiB
2017-12-14 03:34:23.170236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-12-14 03:34:23.170242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-12-14 03:34:23.170271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 950M, pci bus id: 0000:01:00.0)
{
"inputs": {
"test": {
"phase": "EVAL",
"input_type": "Triletter_ListGenerator",
"batch_list": 10,
"relation_file": "./data/WikiQA/relation_test.txt",
"dtype": "dssm"
},
"predict": {
"phase": "PREDICT",
"input_type": "Triletter_ListGenerator",
"batch_list": 10,
"relation_file": "./data/WikiQA/relation_test.txt",
"dtype": "dssm"
},
"train": {
"relation_file": "./data/WikiQA/relation_train.txt",
"input_type": "Triletter_PairGenerator",
"batch_size": 100,
"batch_per_iter": 5,
"dtype": "dssm",
"phase": "TRAIN",
"query_per_iter": 50,
"use_iter": false
},
"share": {
"vocab_size": 3314,
"embed_size": 1,
"target_mode": "ranking",
"text1_corpus": "./data/WikiQA/corpus_preprocessed.txt",
"text2_corpus": "./data/WikiQA/corpus_preprocessed.txt",
"word_triletter_map_file": "./data/WikiQA/word_triletter_map.txt"
},
"valid": {
"phase": "EVAL",
"input_type": "Triletter_ListGenerator",
"batch_list": 10,
"relation_file": "./data/WikiQA/relation_valid.txt",
"dtype": "dssm"
}
},
"global": {
"optimizer": "adam",
"num_iters": 400,
"save_weights_iters": 10,
"learning_rate": 0.0001,
"test_weights_iters": 400,
"weights_file": "examples/wikiqa/weights/dssm.wikiqa.weights",
"model_type": "PY",
"display_interval": 10
},
"outputs": {
"predict": {
"save_format": "TREC",
"save_path": "predict.test.wikiqa.txt"
}
},
"losses": [
{
"object_name": "rank_hinge_loss",
"object_params": {
"margin": 1.0
}
}
],
"metrics": [
"ndcg@3",
"ndcg@5",
"map"
],
"net_name": "DSSM",
"model": {
"model_py": "dssm.DSSM",
"setting": {
"dropout_rate": 0.9,
"hidden_sizes": [
300
]
},
"model_path": "./matchzoo/models/"
}
}
[Embedding] Embedding Load Done.
[Input] Process Input Tags. [u'train'] in TRAIN, [u'test', u'valid'] in EVAL.
[./data/WikiQA/corpus_preprocessed.txt]
Data size: 24106
[Dataset] 1 Dataset Load Done.
{u'relation_file': u'./data/WikiQA/relation_train.txt', u'vocab_size': 3314, u'embed_size': 1, u'target_mode': u'ranking', u'input_type': u'Triletter_PairGenerator', u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'batch_size': 100, u'batch_per_iter': 5, u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/WikiQA/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'TRAIN', 'embed': array([[-0.18291523],
[-0.00574826],
[-0.13887608],
...,
[-0.17844775],
[-0.1465386 ],
[-0.13503003]], dtype=float32), u'query_per_iter': 50, u'use_iter': False}
[./data/WikiQA/relation_train.txt]
Instance size: 20360
Pair Instance Count: 8995
[Triletter_PairGenerator] init done
{u'relation_file': u'./data/WikiQA/relation_test.txt', u'vocab_size': 3314, u'embed_size': 1, u'target_mode': u'ranking', u'input_type': u'Triletter_ListGenerator', u'batch_list': 10, u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/WikiQA/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'EVAL', 'embed': array([[-0.18291523],
[-0.00574826],
[-0.13887608],
...,
[-0.17844775],
[-0.1465386 ],
[-0.13503003]], dtype=float32)}
[./data/WikiQA/relation_test.txt]
Instance size: 2341
List Instance Count: 237
[Triletter_ListGenerator] init done
{u'relation_file': u'./data/WikiQA/relation_valid.txt', u'vocab_size': 3314, u'embed_size': 1, u'target_mode': u'ranking', u'input_type': u'Triletter_ListGenerator', u'batch_list': 10, u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/WikiQA/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'EVAL', 'embed': array([[-0.18291523],
[-0.00574826],
[-0.13887608],
...,
[-0.17844775],
[-0.1465386 ],
[-0.13503003]], dtype=float32)}
[./data/WikiQA/relation_valid.txt]
Instance size: 1126
List Instance Count: 122
[Triletter_ListGenerator] init done
[DSSM] init done
[layer]: Input [shape]: [None, 3314]
[Memory] Total Memory Use: 294.5273 MB Resident: 301596 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Input [shape]: [None, 3314]
[Memory] Total Memory Use: 294.5273 MB Resident: 301596 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: MLP [shape]: [None, 300]
[Memory] Total Memory Use: 295.1914 MB Resident: 302276 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: MLP [shape]: [None, 300]
[Memory] Total Memory Use: 295.1914 MB Resident: 302276 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dot [shape]: [None, 1]
[Memory] Total Memory Use: 295.1914 MB Resident: 302276 Shared: 0 UnshareData: 0 UnshareStack: 0
[Model] Model Compile Done.
[12-14-2017 03:34:23] [Train:train] Traceback (most recent call last):
File "matchzoo/main.py", line 328, in
main(sys.argv)
File "matchzoo/main.py", line 320, in main
train(config)
File "matchzoo/main.py", line 151, in train
verbose = 0
File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 87, in wrapper
return func(*args, **kwargs)
TypeError: fit_generator() got an unexpected keyword argument 'shuffle'
Using TensorFlow backend.
2017-12-14 03:34:25.341013: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:34:25.341035: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:34:25.341060: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:34:25.341064: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:34:25.341069: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 03:34:25.406950: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-12-14 03:34:25.407200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 950M
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:01:00.0
Total memory: 3.95GiB
Free memory: 3.65GiB
2017-12-14 03:34:25.407216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-12-14 03:34:25.407220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-12-14 03:34:25.407230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 950M, pci bus id: 0000:01:00.0)
{
"inputs": {
"test": {
"phase": "EVAL",
"input_type": "Triletter_ListGenerator",
"batch_list": 10,
"relation_file": "./data/WikiQA/relation_test.txt",
"dtype": "dssm"
},
"predict": {
"phase": "PREDICT",
"input_type": "Triletter_ListGenerator",
"batch_list": 10,
"relation_file": "./data/WikiQA/relation_test.txt",
"dtype": "dssm"
},
"train": {
"relation_file": "./data/WikiQA/relation_train.txt",
"input_type": "Triletter_PairGenerator",
"batch_size": 100,
"batch_per_iter": 5,
"dtype": "dssm",
"phase": "TRAIN",
"query_per_iter": 50,
"use_iter": false
},
"share": {
"vocab_size": 3314,
"embed_size": 1,
"target_mode": "ranking",
"text1_corpus": "./data/WikiQA/corpus_preprocessed.txt",
"text2_corpus": "./data/WikiQA/corpus_preprocessed.txt",
"word_triletter_map_file": "./data/WikiQA/word_triletter_map.txt"
},
"valid": {
"phase": "EVAL",
"input_type": "Triletter_ListGenerator",
"batch_list": 10,
"relation_file": "./data/WikiQA/relation_valid.txt",
"dtype": "dssm"
}
},
"global": {
"optimizer": "adam",
"num_iters": 400,
"save_weights_iters": 10,
"learning_rate": 0.0001,
"test_weights_iters": 400,
"weights_file": "examples/wikiqa/weights/dssm.wikiqa.weights",
"model_type": "PY",
"display_interval": 10
},
"outputs": {
"predict": {
"save_format": "TREC",
"save_path": "predict.test.wikiqa.txt"
}
},
"losses": [
{
"object_name": "rank_hinge_loss",
"object_params": {
"margin": 1.0
}
}
],
"metrics": [
"ndcg@3",
"ndcg@5",
"map"
],
"net_name": "DSSM",
"model": {
"model_py": "dssm.DSSM",
"setting": {
"dropout_rate": 0.9,
"hidden_sizes": [
300
]
},
"model_path": "./matchzoo/models/"
}
}
[Embedding] Embedding Load Done.
[Input] Process Input Tags. [u'predict'] in PREDICT.
[./data/WikiQA/corpus_preprocessed.txt]
Data size: 24106
[Dataset] 1 Dataset Load Done.
{u'relation_file': u'./data/WikiQA/relation_test.txt', u'vocab_size': 3314, u'embed_size': 1, u'target_mode': u'ranking', u'input_type': u'Triletter_ListGenerator', u'batch_list': 10, u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/WikiQA/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'PREDICT', 'embed': array([[-0.18291523],
[-0.00574826],
[-0.13887608],
...,
[-0.17844775],
[-0.1465386 ],
[-0.13503003]], dtype=float32)}
[./data/WikiQA/relation_test.txt]
Instance size: 2341
List Instance Count: 237
[Triletter_ListGenerator] init done
[DSSM] init done
[layer]: Input [shape]: [None, 3314]
[Memory] Total Memory Use: 289.7930 MB Resident: 296748 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Input [shape]: [None, 3314]
[Memory] Total Memory Use: 289.7930 MB Resident: 296748 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: MLP [shape]: [None, 300]
[Memory] Total Memory Use: 290.1719 MB Resident: 297136 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: MLP [shape]: [None, 300]
[Memory] Total Memory Use: 290.1719 MB Resident: 297136 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dot [shape]: [None, 1]
[Memory] Total Memory Use: 290.4727 MB Resident: 297444 Shared: 0 UnshareData: 0 UnshareStack: 0
Traceback (most recent call last):
File "matchzoo/main.py", line 328, in
main(sys.argv)
File "matchzoo/main.py", line 322, in main
predict(config)
File "matchzoo/main.py", line 245, in predict
model.load_weights(weights_file)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2566, in load_weights
f = h5py.File(filepath, mode='r')
File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 269, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 99, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 78, in h5py.h5f.open
IOError: Unable to open file (unable to open file: name = 'examples/wikiqa/weights/dssm.wikiqa.weights.400', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
mldl@mldlUB1604:
/ub16_prj/MatchZoo/examples/wikiqa$

DeepRank model

Do you plan to release code for your DeepRank model (CIKM'17) as part of MatchZoo?

Segmentation fault running DSSM on another dataset

python matchzoo/main.py --phase train --model_file examples/config/dssm_ranking.config 
Using TensorFlow backend.
2018-01-08 11:47:26.702599: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX
{
  "inputs": {
    "test": {
      "phase": "EVAL", 
      "input_type": "Triletter_ListGenerator", 
      "batch_list": 10, 
      "relation_file": "./data/relation_test.txt", 
      "dtype": "dssm"
    }, 
    "predict": {
      "phase": "PREDICT", 
      "input_type": "Triletter_ListGenerator", 
      "batch_list": 10, 
      "relation_file": "./data/relation_test.txt", 
      "dtype": "dssm"
    }, 
    "train": {
      "relation_file": "./data/relation_train.txt", 
      "input_type": "Triletter_PairGenerator", 
      "batch_size": 100, 
      "batch_per_iter": 5, 
      "dtype": "dssm", 
      "phase": "TRAIN", 
      "query_per_iter": 3, 
      "use_iter": true
    }, 
    "share": {
      "vocab_size": 3484, 
      "embed_size": 10, 
      "target_mode": "ranking", 
      "text1_corpus": "./data/corpus_preprocessed.txt", 
      "text2_corpus": "./data/corpus_preprocessed.txt", 
      "word_triletter_map_file": "./data/word_triletter_map.txt"
    }, 
    "valid": {
      "phase": "EVAL", 
      "input_type": "Triletter_ListGenerator", 
      "batch_list": 10, 
      "relation_file": "./data/relation_valid.txt", 
      "dtype": "dssm"
    }
  }, 
  "global": {
    "optimizer": "adam", 
    "num_iters": 10, 
    "save_weights_iters": 10, 
    "learning_rate": 0.0001, 
    "test_weights_iters": 10, 
    "weights_file": "examples/weights/dssm_ranking.weights", 
    "model_type": "PY", 
    "display_interval": 10
  }, 
  "outputs": {
    "predict": {
      "save_format": "TREC", 
      "save_path": "predict.test.dssm_ranking.txt"
    }
  }, 
  "losses": [
    {
      "object_name": "rank_hinge_loss", 
      "object_params": {
        "margin": 1.0
      }
    }
  ], 
  "metrics": [
    "ndcg@3", 
    "ndcg@5", 
    "map"
  ], 
  "net_name": "dssm", 
  "model": {
    "model_py": "dssm.DSSM", 
    "setting": {
      "dropout_rate": 0.5, 
      "hidden_sizes": [
        100, 
        30
      ]
    }, 
    "model_path": "matchzoo/models/"
  }
}
[Embedding] Embedding Load Done.
[Input] Process Input Tags. [u'train'] in TRAIN, [u'test', u'valid'] in EVAL.
[./data/corpus_preprocessed.txt]
        Data size: 71849
[Dataset] 1 Dataset Load Done.
{u'relation_file': u'./data/relation_train.txt', u'vocab_size': 3484, u'embed_size': 10, u'target_mode': u'ranking', u'input_type': u'Triletter_PairGenerator', u'text1_corpus': u'./data/corpus_preprocessed.txt', u'batch_size': 100, u'batch_per_iter': 5, u'text2_corpus': u'./data/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'TRAIN', 'embed': array([[-0.18291523, -0.00574826, -0.13887608, ..., -0.13666791,
         0.00907838,  0.13784599],
       [ 0.03368587,  0.13503729,  0.00107509, ...,  0.18584302,
         0.03414046, -0.14042418],
       [ 0.03610065,  0.19066425,  0.11800677, ...,  0.14983599,
        -0.09182639, -0.0633784 ],
       ..., 
       [ 0.1179866 , -0.19746014,  0.08622313, ..., -0.02868197,
        -0.07183626,  0.06968395],
       [-0.02044802,  0.17994043, -0.0810562 , ...,  0.03050527,
         0.03873055, -0.14228183],
       [ 0.04971068,  0.16548306,  0.08958763, ...,  0.0537957 ,
         0.04853643,  0.09921838]], dtype=float32), u'query_per_iter': 3, u'use_iter': True}
[./data/relation_train.txt]
        Instance size: 32953
[Triletter_PairGenerator] init done
{u'relation_file': u'./data/relation_test.txt', u'vocab_size': 3484, u'embed_size': 10, u'target_mode': u'ranking', u'input_type': u'Triletter_ListGenerator', u'batch_list': 10, u'text1_corpus': u'./data/corpus_preprocessed.txt', u'text2_corpus': u'./data/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'EVAL', 'embed': array([[-0.18291523, -0.00574826, -0.13887608, ..., -0.13666791,
         0.00907838,  0.13784599],
       [ 0.03368587,  0.13503729,  0.00107509, ...,  0.18584302,
         0.03414046, -0.14042418],
       [ 0.03610065,  0.19066425,  0.11800677, ...,  0.14983599,
        -0.09182639, -0.0633784 ],
       ..., 
       [ 0.1179866 , -0.19746014,  0.08622313, ..., -0.02868197,
        -0.07183626,  0.06968395],
       [-0.02044802,  0.17994043, -0.0810562 , ...,  0.03050527,
         0.03873055, -0.14228183],
       [ 0.04971068,  0.16548306,  0.08958763, ...,  0.0537957 ,
         0.04853643,  0.09921838]], dtype=float32)}
[./data/relation_test.txt]
        Instance size: 25535
List Instance Count: 1445
[Triletter_ListGenerator] init done
{u'relation_file': u'./data/relation_valid.txt', u'vocab_size': 3484, u'embed_size': 10, u'target_mode': u'ranking', u'input_type': u'Triletter_ListGenerator', u'batch_list': 10, u'text1_corpus': u'./data/corpus_preprocessed.txt', u'text2_corpus': u'./data/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'EVAL', 'embed': array([[-0.18291523, -0.00574826, -0.13887608, ..., -0.13666791,
         0.00907838,  0.13784599],
       [ 0.03368587,  0.13503729,  0.00107509, ...,  0.18584302,
         0.03414046, -0.14042418],
       [ 0.03610065,  0.19066425,  0.11800677, ...,  0.14983599,
        -0.09182639, -0.0633784 ],
       ..., 
       [ 0.1179866 , -0.19746014,  0.08622313, ..., -0.02868197,
        -0.07183626,  0.06968395],
       [-0.02044802,  0.17994043, -0.0810562 , ...,  0.03050527,
         0.03873055, -0.14228183],
       [ 0.04971068,  0.16548306,  0.08958763, ...,  0.0537957 ,
         0.04853643,  0.09921838]], dtype=float32)}
[./data/relation_valid.txt]
        Instance size: 24919
List Instance Count: 1443
[Triletter_ListGenerator] init done
[DSSM] init done
[layer]: Input  [shape]: [None, 3484] 
 [Memory] Total Memory Use: 249.0977 MB          Resident: 261197824 Shared: 0 UnshareData: 0 UnshareStack: 0 
[layer]: Input  [shape]: [None, 3484] 
 [Memory] Total Memory Use: 249.1133 MB          Resident: 261214208 Shared: 0 UnshareData: 0 UnshareStack: 0 
[layer]: MLP    [shape]: [None, 30] 
 [Memory] Total Memory Use: 250.2773 MB          Resident: 262434816 Shared: 0 UnshareData: 0 UnshareStack: 0 
[layer]: MLP    [shape]: [None, 30] 
 [Memory] Total Memory Use: 250.5195 MB          Resident: 262688768 Shared: 0 UnshareData: 0 UnshareStack: 0 
[layer]: Dot    [shape]: [None, 1] 
 [Memory] Total Memory Use: 250.6992 MB          Resident: 262877184 Shared: 0 UnshareData: 0 UnshareStack: 0 
[Model] Model Compile Done.
Segmentation fault: 11

Missing data

Hi,

I have tried to setup the project as described however the data is missing.
Looking at the models config file I could state that are referencing unavailable folders. If you try to setup the project in a vanilla enviroment will find something like this.

IOError: [Errno 2] No such file or directory: u'../data/mq2007/embed.idf'

`weights` directory not found

I followed the README.md, cloned the repo and ran python matchzoo/main.py --phase train --model_file examples/toy_example/config/arci_ranking.config and this error is shown:

Traceback (most recent call last):
  File "matchzoo/main.py", line 328, in <module>
    main(sys.argv)
  File "matchzoo/main.py", line 320, in main
    train(config)
  File "matchzoo/main.py", line 178, in train
    model.save_weights(weights_file % (i_e+1))
  File "/home/zeyi/.virtualenvs/match-zoo/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2586, in save_weights
    f = h5py.File(filepath, 'w')
  File "/home/zeyi/.virtualenvs/match-zoo/local/lib/python2.7/site-packages/h5py/_hl/files.py", line 269, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/home/zeyi/.virtualenvs/match-zoo/local/lib/python2.7/site-packages/h5py/_hl/files.py", line 105, in make_fid
    fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 98, in h5py.h5f.create
IOError: Unable to create file (unable to open file: name = 'examples/toy_example/weights/arci_ranking.weights.10', errno = 2, error message = 'No such file or directory', flags = 13, o_fla
gs = 242)

I resolved this by mkdir examples/toy_example/weights. However, this issue needs to be addressed.

There are two possible solutions that comes to my mind:

  1. make the directory, and touch a .keep file, which is the convention of keeping an empty directory in git
  2. check os.path.exists(path) before saving weights to that directory, if no such path exists, os.mkdir(path)

GPU support

Hi there!

Any info on when (and if) we will have GPU support for the models?

Thanks!

Drmm_tks

Is it possible to provide further details on DRMM_TKS (I mean, paper).

ImportError: No module named 'resource'

When i use "python matchzoo/main.py --phase train --model_file examples/toy_example/config/arci_ranking.config" to test, however,The following problems arise :

D:\tool\MatchZoo-master\matchzoo>python main.py --phase train --model_file ./moels/arci_ranking.config
Using TensorFlow backend.
Traceback (most recent call last):
File "main.py", line 20, in
File "D:\tool\MatchZoo-master\matchzoo\utils_init_.py", line 9, in <module from .utility import import_class
File "D:\tool\MatchZoo-master\matchzoo\utils\utility.py", line 5, in
import resource
ImportError: No module named 'resource'

what can i do?

working examples?

Do you have any working examples that can help me know more about it ?

Missing example data

Hi, is it possible to provide the file of /data/example/ranking/word_triletter_map.txt?

How to achieve matchzoo distributed

Hi ,i want to achieve matchzoo distributed by tensorflow. Is this feasible?If feasible, can you say something about how to achieve it,thank you

I met errors when I run "python example/toy_example/test_preparation_for_classification.py"

[kkk@MatchZoo]$ python examples/toy_example/test_preparation_for_classify.py
Traceback (most recent call last):
File "examples/toy_example/test_preparation_for_classify.py", line 7, in
from preparation import *
ImportError: No module named preparation

Obviously, you said I should run "python examples/testpreparationforclassification.py". This is a mistake, right? I wonder why the usage is different from the descriptions in your readme...
And are you really sure that you can run the code on your server?

训练toy_example的时候出错

训练examples的时候出现错误

执行命令:
python matchzoo/main.py --phase train --model_file examples/toy_example/config/arci_ranking.config

错误信息:
Traceback (most recent call last):
File "matchzoo/main.py", line 328, in
main(sys.argv)
File "matchzoo/main.py", line 320, in main
train(config)
File "matchzoo/main.py", line 178, in train
model.save_weights(weights_file % (i_e+1))
File "/home/hadoop/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 2586, in save_weights
f = h5py.File(filepath, 'w')
File "/home/hadoop/anaconda2/lib/python2.7/site-packages/h5py/_hl/files.py", line 271, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/home/hadoop/anaconda2/lib/python2.7/site-packages/h5py/_hl/files.py", line 107, in make_fid
fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 98, in h5py.h5f.create
IOError: Unable to create file (Unable to open file: name = 'examples/toy_example/weights/arci_ranking.weights.10', errno = 2, error message = 'no such file or directory', flags = 13, o_flags = 242)

是这个文件的问题?
examples/toy_example/weights/arci_ranking.weights.10

ndcg@及map精度问题

你好,请原谅用中文,有几个问题不是很理解请教一下:

  1. github中各模型所得的ndcg及map的数据与论文(A Deep Relevance Matching Model for Ad-hoc Retrieval)中的数据相差甚远, 与这些提出这些模型的论文也同样如此。是因为不同数据导致的么,不清楚是怎么回事呢?
  2. 很多模型的实现并不是完全参照论文中的模型及超参数进行模型bulid的,这样用于论文数据的展示是否严谨?如ARCi模型卷积之后使用的是多层感知机,而代码中的arci模型直接接了一个softmax层。
  3. classify的使用场景是什么呢? 可否看成是ranking的一个特例。
    感谢大师们的解答。

wechat

Your Wechat MatchZoo Group is full, and i can't join you, could you please take me in. My wechat number is : hshrimp. Thank you.

what should the resutls be like if I set the validation set and the prediction set the same?

A problem occured to me that when I set the training, validation and prediction dataset the same and run the pipline train and predict.

I used matchpyramid as the matching model, 0.0001 learning rate with 400 epochs for training. It displayed that during the last iterations, the accuracy on the eval set had reached 0.99+. However, when running prediction, loading the model of the last iteration, and predict on the training set, the accuracy was about 0.8.

Anyone knows what's wrong?

Fix error in windows

Lib resource is a Unix Specific ,35.11. resource — Resource usage information

This will cause "No module named resouce" in windows paltform,
modify matchzoo\utils\utility.py to pass this error:

`
import os
import sys
import traceback
WIN=False
try:
import resource
except:
WIN=True

def show_layer_info(layer_name, layer_out):
print('[layer]: %s\t[shape]: %s \n%s' % (layer_name,str(layer_out.get_shape().as_list()), show_memory_use()))

def show_memory_use():
if WIN:
return ""
rusage_denom = 1024.
if sys.platform == 'darwin':
rusage_denom = rusage_denom * rusage_denom
ru = resource.getrusage(resource.RUSAGE_SELF)
total_memory = 1. * (ru.ru_maxrss + ru.ru_ixrss + ru.ru_idrss + ru.ru_isrss) / rusage_denom
strinfo = "\x1b[33m [Memory] Total Memory Use: %.4f MB \t Resident: %ld Shared: %ld UnshareData: %ld UnshareStack: %ld \x1b[0m" %
(total_memory, ru.ru_maxrss, ru.ru_ixrss, ru.ru_idrss, ru.ru_isrss)
return strinfo
`

What's the logic of the implementation of rank hinge loss?

Given a positive-labeled sample s1 and negative-labeled sample s2, the neural model would output two prediction values, which is further considered in the rank hinge loss. I got a little confused about the logic of computing this loss in keras when diving into the code in the MatchZoo, how do you handle this problem? Thank you in advance.

Use of python3

Good morning,
I would like to use this toolkit to run experiments and develop/edit deep models, my question is: can I run it with python 3 (with the other requirements) or I should use Python 2.7?
Thanks

Small bug: toy_example/weights/ folder not exists

When I tried to run the toy example, I got the error:
Unable to open file: name = 'examples/toy_example/weights/arci_ranking.weights.10', errno = 2, error message = 'no such file or directory'

The error was due to the weights folder has not been made. After manually adding the folder, the problem was solved.

failure of "python setup.py install" ???

[root@training2 MatchZoo]# rm /usr/lib/python2.7/site-packages/MatchZoo-0.2.0-py2.7.egg
rm: remove regular file ‘/usr/lib/python2.7/site-packages/MatchZoo-0.2.0-py2.7.egg’? y
[root@training2 MatchZoo]#
[root@training2 MatchZoo]#
[root@training2 MatchZoo]# python setup.py install
running install
running bdist_egg
running egg_info
writing requirements to MatchZoo.egg-info/requires.txt
writing MatchZoo.egg-info/PKG-INFO
writing top-level names to MatchZoo.egg-info/top_level.txt
writing dependency_links to MatchZoo.egg-info/dependency_links.txt
reading manifest file 'MatchZoo.egg-info/SOURCES.txt'
writing manifest file 'MatchZoo.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/matchzoo
creating build/bdist.linux-x86_64/egg/matchzoo/inputs
copying build/lib/matchzoo/inputs/init.py -> build/bdist.linux-x86_64/egg/matchzoo/inputs
copying build/lib/matchzoo/inputs/list_generator.py -> build/bdist.linux-x86_64/egg/matchzoo/inputs
copying build/lib/matchzoo/inputs/pair_generator.py -> build/bdist.linux-x86_64/egg/matchzoo/inputs
copying build/lib/matchzoo/inputs/point_generator.py -> build/bdist.linux-x86_64/egg/matchzoo/inputs
copying build/lib/matchzoo/inputs/preparation.py -> build/bdist.linux-x86_64/egg/matchzoo/inputs
copying build/lib/matchzoo/inputs/preprocess.py -> build/bdist.linux-x86_64/egg/matchzoo/inputs
creating build/bdist.linux-x86_64/egg/matchzoo/layers
copying build/lib/matchzoo/layers/DynamicMaxPooling.py -> build/bdist.linux-x86_64/egg/matchzoo/layers
copying build/lib/matchzoo/layers/Match.py -> build/bdist.linux-x86_64/egg/matchzoo/layers
copying build/lib/matchzoo/layers/MatchTensor.py -> build/bdist.linux-x86_64/egg/matchzoo/layers
copying build/lib/matchzoo/layers/NonMasking.py -> build/bdist.linux-x86_64/egg/matchzoo/layers
copying build/lib/matchzoo/layers/SparseFullyConnectedLayer.py -> build/bdist.linux-x86_64/egg/matchzoo/layers
copying build/lib/matchzoo/layers/init.py -> build/bdist.linux-x86_64/egg/matchzoo/layers
creating build/bdist.linux-x86_64/egg/matchzoo/losses
copying build/lib/matchzoo/losses/init.py -> build/bdist.linux-x86_64/egg/matchzoo/losses
copying build/lib/matchzoo/losses/rank_losses.py -> build/bdist.linux-x86_64/egg/matchzoo/losses
creating build/bdist.linux-x86_64/egg/matchzoo/metrics
copying build/lib/matchzoo/metrics/init.py -> build/bdist.linux-x86_64/egg/matchzoo/metrics
copying build/lib/matchzoo/metrics/evaluations.py -> build/bdist.linux-x86_64/egg/matchzoo/metrics
copying build/lib/matchzoo/metrics/rank_evaluations.py -> build/bdist.linux-x86_64/egg/matchzoo/metrics
creating build/bdist.linux-x86_64/egg/matchzoo/utils
copying build/lib/matchzoo/utils/init.py -> build/bdist.linux-x86_64/egg/matchzoo/utils
copying build/lib/matchzoo/utils/rank_io.py -> build/bdist.linux-x86_64/egg/matchzoo/utils
copying build/lib/matchzoo/utils/roc_auc.py -> build/bdist.linux-x86_64/egg/matchzoo/utils
copying build/lib/matchzoo/utils/utility.py -> build/bdist.linux-x86_64/egg/matchzoo/utils
copying build/lib/matchzoo/init.py -> build/bdist.linux-x86_64/egg/matchzoo
copying build/lib/matchzoo/main.py -> build/bdist.linux-x86_64/egg/matchzoo
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/inputs/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/inputs/list_generator.py to list_generator.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/inputs/pair_generator.py to pair_generator.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/inputs/point_generator.py to point_generator.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/inputs/preparation.py to preparation.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/inputs/preprocess.py to preprocess.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/layers/DynamicMaxPooling.py to DynamicMaxPooling.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/layers/Match.py to Match.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/layers/MatchTensor.py to MatchTensor.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/layers/NonMasking.py to NonMasking.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/layers/SparseFullyConnectedLayer.py to SparseFullyConnectedLayer.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/layers/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/losses/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/losses/rank_losses.py to rank_losses.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/metrics/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/metrics/evaluations.py to evaluations.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/metrics/rank_evaluations.py to rank_evaluations.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/utils/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/utils/rank_io.py to rank_io.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/utils/roc_auc.py to roc_auc.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/utils/utility.py to utility.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/matchzoo/main.py to main.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying MatchZoo.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying MatchZoo.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying MatchZoo.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying MatchZoo.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying MatchZoo.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/MatchZoo-0.2.0-py2.7.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing MatchZoo-0.2.0-py2.7.egg
Copying MatchZoo-0.2.0-py2.7.egg to /usr/lib/python2.7/site-packages
Adding MatchZoo 0.2.0 to easy-install.pth file

Installed /usr/lib/python2.7/site-packages/MatchZoo-0.2.0-py2.7.egg
Traceback (most recent call last):
File "setup.py", line 38, in
'tqdm >= 4.19.4'
File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup
dist.run_commands()
File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/usr/lib/python2.7/site-packages/setuptools/command/install.py", line 73, in run
self.do_egg_install()
File "/usr/lib/python2.7/site-packages/setuptools/command/install.py", line 101, in do_egg_install
cmd.run()
File "/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 380, in run
self.easy_install(spec, not self.no_deps)
File "/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 604, in easy_install
return self.install_item(None, spec, tmpdir, deps, True)
File "/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 655, in install_item
self.process_distribution(spec, dist, deps)
File "/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 701, in process_distribution
distreq.project_name, distreq.specs, requirement.extras
TypeError: init() takes exactly 2 arguments (4 given)

about the word embed

Is the word embedding in WiKiQA based on glove?
and how do you train the word embedding , or get the word embedding?
The embedding is also based on the dateset of WiKiQA?
Thank you

Is it not right for "embed_path" in examples/wikiqa/config/drmm_wikiqa.config???

When I see the configuration file in examples/wikiqa/config/drmm_wikiqa.config and find this:

"embed_size": 300,
"embed_path": "./data/WikiQA/embed.idf",

this file "embed.idf" is generated from "cat word_stats.txt | cut -d ' ' -f 1,4 > embed.idf". And its content like this:
0 4.013749
1 4.035216
2 5.650094
3 8.964280

So is it a bug for configuration for "embed_path"?

How to set the batch size for prediction?

Hi all, I think it is possible to set the training batch size as 100 and predicting size as 10, right?
So I tried different sizes of predicting batch sizes, from 1, 10, to 100, after predicting, there are different results:
It is for binary classification using match_pyramid and predict totally 42,155 samples.
size = 1 numpy.core._internal.AxisError: axis 1 is out of bounds for array of dimension 1
size=10 predict and output predicting results for 42,142 samples
size=50 predict and output predicting results for 42,142 samples
size=100 predict and output predicting results for 42,092 samples
Anyone knows what was wrong?

人已经满了

二维码 人数已经超过100 ,请拉一下 w16402151618

an error for "steps_per_epoch = display_interval"

in main() function, there is a bug for display_interval parameter. It seems set wrongly in the main() funciton for train().
because in keras, steps_per_epoch doesn't mean it.

for i_e in range(num_iters):
for tag, generator in train_gen.items():
genfun = generator.get_batch_generator()
print('[%s]\t[Train:%s] ' % (time.strftime('%m-%d-%Y %H:%M:%S', time.localtime(time.time())), tag), end='')
history = model.fit_generator(
genfun,
steps_per_epoch = display_interval,
epochs = 1,
shuffle=False,
verbose = 0
) #callbacks=[eval_map])

predict question of classify task

i run the classify task , there is a question that the accuracy of test in training is as high as 80%, but the accuracy of the test in predict is only 60%. How to deal with it?

thanks!

classification task error

I tried to run matchzoo to do classification task using command
sudo python main.py --phase train --model_file ./models/matchpyramid_classify.config
sudo python main.py --phase predict --model_file ./models/matchpyramid_classify.config
but i got an error :"
Traceback (most recent call last):
File "main.py", line 304, in
main(sys.argv)
File "main.py", line 298, in main
predict(config)
File "main.py", line 248, in predict
list_counts = input_data['list_counts']
KeyError: 'list_counts'
"
would please tell me why this happend?

running "python setup.py install" failed

[ 8/11] Cythonizing /tmp/easy_install-gyR2M6/h5py-2.7.1/temp/easy_install-sqsKKi/Cython-0.27.3/Cython/Plex/Actions.py
[ 9/11] Cythonizing /tmp/easy_install-gyR2M6/h5py-2.7.1/temp/easy_install-sqsKKi/Cython-0.27.3/Cython/Plex/Scanners.py
[10/11] Cythonizing /tmp/easy_install-gyR2M6/h5py-2.7.1/temp/easy_install-sqsKKi/Cython-0.27.3/Cython/Runtime/refnanny.pyx
[11/11] Cythonizing /tmp/easy_install-gyR2M6/h5py-2.7.1/temp/easy_install-sqsKKi/Cython-0.27.3/Cython/Tempita/_tempita.py
warning: no files found matching '2to3-fixers.txt'
warning: no files found matching 'Doc/'
warning: no files found matching '
.pyx' under directory 'Cython/Debugger/Tests'
warning: no files found matching '.pxd' under directory 'Cython/Debugger/Tests'
warning: no files found matching '
.pxd' under directory 'Cython/Utility'
/tmp/easy_install-gyR2M6/h5py-2.7.1/temp/easy_install-sqsKKi/Cython-0.27.3/Cython/Plex/Scanners.c:19:20: fatal e
rror: Python.h: No such file or directory
#include "Python.h"
^
compilation terminated.
error: Setup script exited with error: command 'gcc' failed with exit status 1

[root@hadoop208 MatchZoo]# uname -a
Linux hadoop208 3.10.0-514.16.1.el7.x86_64 #1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

[root@hadoop208 MatchZoo]# gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)

[root@hadoop208 MatchZoo]# cat /proc/version
Linux version 3.10.0-514.16.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Apr 12 15:04:24 UTC 2017

有几个参数不太懂?能解答一下么?感谢!

relation_train.txt 中我有200万行的数据对。
在XXX.config中,有几个参数如下:
"num_iters": 22300,
"query_per_iter": 70,
"batch_per_iter": 5,
"batch_size": 100,

这几个分别是什么意思呢。不知道我理解是否对
batch_size是指一次batch跑100行数据
batch_per_iter是指一次跑5个batch
num_iters是指,一共跑22300次

那一共就会跑 5乘100乘22300 =11150000行数据对?

DeepRank model

When does DeepRank model (proposed in cikm 17) could be released in MatchZoo ?

ModuleNotFoundError : No module named 'jieba'

Hi, first, thank you for your dedication to this library.

I tried to run the script run_data.sh and I encountered the following error that wanted me to install 'jieba'. It seems like a Chinese segmentation library, but I don't have any plan to use it for Chinese texts. I wonder if I still have to install jieba, or is there any way to circumvent this issue. Thank you.

d3b122:WikiQA$ ./run_data.sh
--2018-01-10 16:17:59--  https://download.microsoft.com/download/E/5/F/E5FCFCEE-7005-4814-853D-DAA7C66507E0/WikiQACorpus.zip
Resolving download.microsoft.com... 23.35.220.223, 2600:140b:4:285::e59, 2600:140b:4:284::e59, ...
Connecting to download.microsoft.com|23.35.220.223|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7094233 (6.8M) [application/octet-stream]
Saving to: `WikiQACorpus.zip'

100%[=====================================================================================================================================>] 7,094,233   14.7M/s   in 0.5s    

2018-01-10 16:18:00 (14.7 MB/s) - `WikiQACorpus.zip' saved [7094233/7094233]

Archive:  WikiQACorpus.zip
   creating: WikiQACorpus/emnlp-table/
  inflating: WikiQACorpus/emnlp-table/WikiQA.CNN.dev.rank  
  inflating: WikiQACorpus/emnlp-table/WikiQA.CNN.test.rank  
  inflating: WikiQACorpus/emnlp-table/WikiQA.CNN-Cnt.dev.rank  
  inflating: WikiQACorpus/emnlp-table/WikiQA.CNN-Cnt.test.rank  
  inflating: WikiQACorpus/eval.py    
  inflating: WikiQACorpus/Guidelines_Phase1.pdf  
  inflating: WikiQACorpus/Guidelines_Phase2.pdf  
  inflating: WikiQACorpus/WikiQA.tsv  
  inflating: WikiQACorpus/WikiQA-dev.ref  
  inflating: WikiQACorpus/WikiQA-dev.tsv  
  inflating: WikiQACorpus/WikiQA-dev.txt  
  inflating: WikiQACorpus/WikiQA-dev-filtered.ref  
  inflating: WikiQACorpus/WikiQASent.pos.ans.tsv  
  inflating: WikiQACorpus/WikiQA-test.ref  
  inflating: WikiQACorpus/WikiQA-test.tsv  
  inflating: WikiQACorpus/WikiQA-test.txt  
  inflating: WikiQACorpus/WikiQA-test-filtered.ref  
  inflating: WikiQACorpus/WikiQA-train.ref  
  inflating: WikiQACorpus/WikiQA-train.tsv  
  inflating: WikiQACorpus/WikiQA-train.txt  
  inflating: WikiQACorpus/LICENSE.pdf  
  inflating: WikiQACorpus/README.txt  
--2018-01-10 16:18:00--  http://nlp.stanford.edu/data/glove.840B.300d.zip
Resolving nlp.stanford.edu... 171.64.67.140
Connecting to nlp.stanford.edu|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.840B.300d.zip [following]
--2018-01-10 16:18:00--  https://nlp.stanford.edu/data/glove.840B.300d.zip
Connecting to nlp.stanford.edu|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2176768927 (2.0G) [application/zip]
Saving to: `glove.840B.300d.zip'

100%[===================================================================================================================================>] 2,176,768,927  442K/s   in 80m 59s 

2018-01-10 17:39:00 (438 KB/s) - `glove.840B.300d.zip' saved [2176768927/2176768927]

Archive:  glove.840B.300d.zip
  inflating: glove.840B.300d.txt     
--2018-01-10 17:40:09--  http://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu... 171.64.67.140
Connecting to nlp.stanford.edu|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.6B.zip [following]
--2018-01-10 17:40:09--  https://nlp.stanford.edu/data/glove.6B.zip
Connecting to nlp.stanford.edu|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: `glove.6B.zip'

100%[=====================================================================================================================================>] 862,182,613  389K/s   in 37m 25s 

2018-01-10 18:17:35 (375 KB/s) - `glove.6B.zip' saved [862182613/862182613]

Archive:  glove.6B.zip
  inflating: glove.6B.50d.txt        
  inflating: glove.6B.100d.txt       
  inflating: glove.6B.200d.txt       
  inflating: glove.6B.300d.txt       
Traceback (most recent call last):
  File "prepare_mz_data.py", line 10, in <module>
    from preparation import Preparation
  File "../../matchzoo/inputs/preparation.py", line 11, in <module>
    import preprocess
  File "../../matchzoo/inputs/preprocess.py", line 4, in <module>
    import jieba
ModuleNotFoundError: No module named 'jieba'
load word dict ...
Traceback (most recent call last):
  File "gen_w2v.py", line 126, in <module>
    word_dict = load_word_dict(word_dict_file)
  File "gen_w2v.py", line 107, in load_word_dict
    for line in tqdm(io.open(word_map_file, encoding='utf8')):
FileNotFoundError: [Errno 2] No such file or directory: 'word_dict.txt'
Traceback (most recent call last):
  File "norm_embed.py", line 14, in <module>
    with codecs.open(infile, 'r', encoding='utf8') as f:
  File "/home1/irteam/anaconda3/lib/python3.6/codecs.py", line 895, in open
    file = builtins.open(filename, mode, buffering)
FileNotFoundError: [Errno 2] No such file or directory: 'embed_glove_d300'
load word dict ...
Traceback (most recent call last):
  File "gen_w2v.py", line 126, in <module>
    word_dict = load_word_dict(word_dict_file)
  File "gen_w2v.py", line 107, in load_word_dict
    for line in tqdm(io.open(word_map_file, encoding='utf8')):
FileNotFoundError: [Errno 2] No such file or directory: 'word_dict.txt'
Traceback (most recent call last):
  File "norm_embed.py", line 14, in <module>
    with codecs.open(infile, 'r', encoding='utf8') as f:
  File "/home1/irteam/anaconda3/lib/python3.6/codecs.py", line 895, in open
    file = builtins.open(filename, mode, buffering)
FileNotFoundError: [Errno 2] No such file or directory: 'embed_glove_d50'
cat: word_stats.txt: No such file or directory
Traceback (most recent call last):
  File "gen_hist4drmm.py", line 8, in <module>
    from preprocess import cal_hist
  File "../../matchzoo/inputs/preprocess.py", line 4, in <module>
    import jieba
ModuleNotFoundError: No module named 'jieba'
Traceback (most recent call last):
  File "gen_binsum4anmm.py", line 12, in <module>
    from preprocess import cal_binsum
  File "../../matchzoo/inputs/preprocess.py", line 4, in <module>
    import jieba
ModuleNotFoundError: No module named 'jieba'
Done ...

安装不成功,报错

非常感谢开发组做出的努力,目前遇到2个问题
1.出了问题没法交流,能不能建个QQ群什么的,加微信加不上?
2.安装时报错:
Scanners.c
正在创建库 build\temp.win32-2.7\Release\users\sony\appdata\local\temp\easy_in
stall-ctqurr\h5py-2.7.1\temp\easy_install-mfrl5r\Cython-0.27.3\Cython\Plex\Scann
ers.lib 和对象 build\temp.win32-2.7\Release\users\sony\appdata\local\temp\easy_i
nstall-ctqurr\h5py-2.7.1\temp\easy_install-mfrl5r\Cython-0.27.3\Cython\Plex\Scan
ners.exp
LINK : fatal error LNK1104: 无法打开文件“build\temp.win32-2.7\Release\users\son
y\appdata\local\temp\easy_install-ctqurr\h5py-2.7.1\temp\easy_install-mfrl5r\Cyt
hon-0.27.3\Cython\Plex\Scanners.pyd.manifest”
error: Setup script exited with error: command 'F:\Program Files\Microsoft Vis
ual Studio 9.0\VC\BIN\link.exe' failed with exit status 1104

SparseFullyConnectedLayer Missing

Hi, I'm trying to set up DSSM, but in dssm.py SparseFullyConnectedLayer is used and the definition is missing. Can you upload the relevant files? Thanks!

How to run the MatchZoo in windows

I try to run setup.py with python
but it display error
this is its error message

SystemExit: usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help

error: no commands supplied

C:\Users\MCLAB\AppData\Local\conda\conda\envs\Tensorflow\lib\site-packages\IPython\core\interactiveshell.py:2870: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)

My python edition is 3.5.4
and I have already installed the tensorflow library

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.