Comments (4)
@LuchenTan Here's an example of a test case you can adapt for tokenization:
https://github.com/lintool/twitter-tools/blob/master/twitter-tools-core/src/test/java/cc/twittertools/index/TokenizationTest.java
from anserini.
My tokenizer would result different as this tokenizer. For example, no
stemmer performed, url removal. Anyway, I will update the tokenizer
tonight, and modify this test class.
On Mon, Nov 2, 2015 at 3:49 PM, Jimmy Lin [email protected] wrote:
@LuchenTan https://github.com/LuchenTan Here's an example of a test
case you can adapt for tokenization:—
Reply to this email directly or view it on GitHub
#38 (comment).
Luchen Tan
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
from anserini.
@LuchenTan Yes, the result of the tokenization will be different, but you can still use the general format of the test case.
from anserini.
Not much interest in working on tweet search (anymore)...
from anserini.
Related Issues (20)
- Add test cases for MIRACL dev set HOT 1
- There is currently no way to get the underlying IndexReader from SimpleSearcher
- Missing docvector in cw12b13
- Refactoring HNSW Lucene classes
- Unit tests for HNSW vector retrieval HOT 1
- Missing msmarco-doc-segmented-wp.yaml condition HOT 1
- Unique terms not available in IndexReaderUtils HOT 2
- Index Size for Impact indexes HOT 1
- Update SimpleIndexer Args
- Naming for index and search classes HOT 1
- ClassCastException when indexing ACL Anthology HOT 2
- [feature request] Specify the json field to index via a cli parameter
- Figure out how ONNX works cross-platform HOT 1
- Error: Could not find or load main class io.anserini.search.SearchMsmarco HOT 1
- Dropbox links for pre-built indices not accessible HOT 1
- Problem with indexing ACLAnthology HOT 7
- Add ability to parse raw text into docvectors on-the-fly for impact indexes HOT 7
- Regression pages, links to topics/qrels broken
- Verify ONNX repo, add ONNX model conversion documentation HOT 1
- Verify SPLADE++ models on MS MARCO V2 passage HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anserini.