qa4mre's Introduction
NAIST-QACLEF ---------------------------------------------------------------------------- Author : Philip Arthur Release Year : 2013 This software is the technical implementation of the following paper: http://www.phontron.com/paper/arthur13clef.pdf In the collaboration with Graham Neubig for the training algorithm (T-MERT) that maximizes the C@1 objective function. http://phontron.com/tmert Note: the file 'tmert.py' is the slight modification of original tmert. The modification is to encapsulate run_mert method so the script can be imported from other python script. License: ---------------------------------------------------------------------------- All rights reserved. This program and the accompanying materials are made available under the terms of the GNU Lesser General Public License (LGPL) version 2.1 which accompanies this distribution, and is available at http://www.gnu.org/licenses/lgpl-2.1.html This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. Release LOG --------------------------------------------------------------------------- v1.1.0 - Add POS TAG - Add Word Splitter - Add Matrix view to Report - Change from Tupple based to List based - Change Porter Stemmer to Word Net Lemmatizer - Save Preprocessed Data separately - Readjust l from 1.0 to 0.5 v1.0.0 - Initial Release Installation: ---------------------------------------------------------------------------- [ FUNDAMENTAL ] - Python 2.7 The software is mainly written in python. So you need to install python first. Currently it supports only python 2.7.x. http://www.python.org/download/releases/2.7/ - Java Java is used in some part where program needs to connect with stanford's jar. Also include java in your environment variable "PATH". The java script in lib path was compiled with javac 1.6.0_27 http://www.oracle.com/technetwork/java/javase/overview/index.html [ SPECIALIZED ] - NLTK NLTK is and stands for natural language toolkit. This software is used for preprocessing. http://nltk.org/install.html - Stanford NER Stanford Named Entity Recognition is used as a part of preprocessing. Please download the jar http://nlp.stanford.edu/software/CRF-NER.shtml, EXTRACT and PUT the name of its path in configuration.py. The default configuration is '/usr/share/stanford-ner/stanford-ner.jar' Please change it according to your machine. The used tagset is english.conll.4class.distsim.crf.ser.gz [it bounds with correference resolution method in preprocessing.py] - Stanford Parser Stanford Parser is used as part of preprocessing to split the input sentence. http://nlp.stanford.edu/software/lex-parser.shtml same with StanfordNER please EXTRACT and PUT the name of its path in configuration.py. Technical Usage Questions ------------------------------------------------------------------------------ 1. What is the main script? - The main script is 'qaclef.py' 2. I want to work with QA4MRE GS 2011 data only [train and test with the data] how can I? - By executing 'python qaclef.py --selftest --data 2011' 3. I want to work with multiple data and test on them, how can I? - By executing 'python qaclef.py --selftest --data 2011 2012 2013' *example when working with 2011, 2012, 2013* 4. I want to train the system with 2011 and test it on 2012, how can I? - By executing 'python qaclef.py --data 2011 --test 2012 --train' 5. I want to change the parameter (e.g. the ngram or the threshold), how can I - adding parameter --ngram=[some_positive_integer] and --threshold=[some_float] would do. Some Technical Details ------------------------------------------------------------------------------- 1. This software runs good in at least 4GB memory. 2. The software basically stores its output on file system. You can find the detail of the output in 'cache' folder. 3. The software is equipped with 'by-passing' feature that will by pass some process, if it founds the saved output in 'cache' folder (according to its process). - Preprocess -> will look on '[edition]-preprocessed.txt' .) It will take the particular edition data on corresponding file. .) To force preprocessing add '--preprocess' flag .) Run software with '--preprocessonly' to do preprocessing only. - TMERT Training -> will look on 'weight.txt' .) To force training add '--train' flag Cache ------------------------------------------------------------------------------- As it is mentioned above, the software will write its process in file system. However not all of them are used to by pass processes. Some of them are intended for debugging and analyzing. They are: - 1 - 5 [preprocessing-details] -> for preprocessing - final.txt -> final test model (written in json) - qa-mert_input.txt -> generated input for tmert.py. If you wish to run tmert seperately, this is the generated train data input for latest run Report ------------------------------------------------------------------------------- The system can generate human readable document by adding '--report' for the execution. The report is written to 'report/report.txt' FAQ ------------------------------------------------------------------------------- 1. How can I add feature to the system? - Basically we add details of the algorithm in metric.py and just map it on scoring.py 2. How can I refine the preprocessing of the system? - By modifying preprocessing.py 3. How can I modify the model of the system. - Please take a look at model_builder.py Contact ----------------------------- Philip Arthur: [email protected] Graham Neubig: [email protected]
qa4mre's People
qa4mre's Issues
The data set is unavailable.
The data set described in file configuration.py is
input = {
'CLEF_2011_GS':'http://celct.fbk.eu/QA4MRE/scripts/downloadFile.php?file=/websites/ResPubliQA/resources/past_campaigns/2011/Training_Data/Goldstandard/QA4MRE-2011-EN_GS.xml',
'CLEF_2012_GS': 'http://celct.fbk.eu/QA4MRE/scripts/downloadFile.php?file=/websites/ResPubliQA/resources/past_campaigns/2012/Main_Task/Training_Data/Goldstandard/Parallel_Aligned/QA4MRE-2012-EN_GS_SYNC.xml',
'CLEF_2013_GS': 'http://celct.fbk.eu/QA4MRE/scripts/downloadFile.php?file=/websites/ResPubliQA/resources/past_campaigns/2013/Main_Task/Training_Data/Goldstandard/QA4MRE-2013-EN_GS.xml'
}
But it can not be download now.
Can anybody provide other resource link to get it?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.