Comments (6)
Gov2 indexer: Total 25172934 documents indexed in 00:59:28
emptydocids.txt 32245
from anserini.
indexed documents plus document that are empty sum up to the number (25,205,179
documents) reported at official site : http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm
25172934 + 32245 = 25205179
from anserini.
What are the specs of your machine? I.e., exact processor model? For example our main machine "streeling" is:
- 2 x Intel Xeon E5-2680 v3 2.5GHz (12 cores)
- 768 GB RAM
Detailed specs would help compare performance.
from anserini.
less /proc/cpuinfo
says Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
processor number goes upto 39. It has 128 GB RAM.
However, I use java -server -Xms20g -Xmx20g
and 15 threads.
from anserini.
The intel spec here is here: E5-2650 v3 @ 2.30GHz. Main difference is that we have 12 cores to your 10 (and hence more cache); and also higher clock freq. But the hardware is quite similar.
from anserini.
Merged to master cfb10c9
from anserini.
Related Issues (20)
- Add test cases for MIRACL dev set HOT 1
- There is currently no way to get the underlying IndexReader from SimpleSearcher
- Missing docvector in cw12b13
- Refactoring HNSW Lucene classes
- Unit tests for HNSW vector retrieval HOT 1
- Missing msmarco-doc-segmented-wp.yaml condition HOT 1
- Unique terms not available in IndexReaderUtils HOT 2
- Index Size for Impact indexes HOT 1
- Update SimpleIndexer Args
- Naming for index and search classes HOT 1
- ClassCastException when indexing ACL Anthology HOT 2
- [feature request] Specify the json field to index via a cli parameter
- Figure out how ONNX works cross-platform HOT 1
- Error: Could not find or load main class io.anserini.search.SearchMsmarco HOT 1
- Dropbox links for pre-built indices not accessible HOT 1
- Problem with indexing ACLAnthology HOT 7
- Add ability to parse raw text into docvectors on-the-fly for impact indexes HOT 7
- Regression pages, links to topics/qrels broken
- Verify ONNX repo, add ONNX model conversion documentation HOT 1
- Verify SPLADE++ models on MS MARCO V2 passage HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anserini.