Code Monkey home page Code Monkey logo

tantivy-cli's Introduction

Docs Join the chat at https://discord.gg/MT27AG5EVE License: MIT Crates.io

tantivy-cli

tantivy-cli is the the command line interface for the tantivy search engine. It provides indexing and search capabilities, and is suitable for smaller projects.

For a more complete solution around tantivy, you may use

Tutorial: Indexing Wikipedia with Tantivy CLI

Introduction

In this tutorial, we will create a brand new index with the articles of English wikipedia in it.

Installing the tantivy CLI.

There are a couple ways to install tantivy-cli.

If you are a Rust programmer, you probably have cargo installed and you can just run cargo install tantivy-cli

Creating the index: new

Let's create a directory in which your index will be stored.

    # create the directory
    mkdir wikipedia-index

We will now initialize the index and create its schema. The schema defines the list of your fields, and for each field:

  • its name
  • its type, currently u64, i64 or str
  • how it should be indexed.

You can find more information about the latter on tantivy's schema documentation page

In our case, our documents will contain

  • a title
  • a body
  • a url

We want the title and the body to be tokenized and indexed. We also want to add the term frequency and term positions to our index.

Running tantivy new will start a wizard that will help you define the schema of the new index.

Like all the other commands of tantivy, you will have to pass it your index directory via the -i or --index parameter as follows:

    tantivy new -i wikipedia-index

Answer the questions as follows:


    Creating new index 
    Let's define its schema! 



    New field name  ? title
    Choose Field Type (Text/u64/i64/f64/Date/Facet/Bytes) ? Text
    Should the field be stored (Y/N) ? Y
    Should the field be indexed (Y/N) ? Y
    Should the term be tokenized? (Y/N) ? Y
    Should the term frequencies (per doc) be in the index (Y/N) ? Y
    Should the term positions (per doc) be in the index (Y/N) ? Y
    Add another field (Y/N) ? Y
    
    
    
    New field name  ? body
    Choose Field Type (Text/u64/i64/f64/Date/Facet/Bytes) ? Text
    Should the field be stored (Y/N) ? Y
    Should the field be indexed (Y/N) ? Y
    Should the term be tokenized? (Y/N) ? Y
    Should the term frequencies (per doc) be in the index (Y/N) ? Y
    Should the term positions (per doc) be in the index (Y/N) ? Y
    Add another field (Y/N) ? Y
    
    
    
    New field name  ? url
    Choose Field Type (Text/u64/i64/f64/Date/Facet/Bytes) ? Text
    Should the field be stored (Y/N) ? Y
    Should the field be indexed (Y/N) ? N
    Add another field (Y/N) ? N


    [
    {
        "name": "title",
        "type": "text",
        "options": {
            "indexing": "position",
            "stored": true
        }
    },
    {
        "name": "body",
        "type": "text",
        "options": {
            "indexing": "position",
            "stored": true
        }
    },
    {
        "name": "url",
        "type": "text",
        "options": {
            "indexing": "unindexed",
            "stored": true
        }
    }
    ]


After the wizard has finished, a meta.json should exist in wikipedia-index/meta.json. It is a fairly human readable JSON, so you can check its content.

It contains two sections:

  • segments (currently empty, but we will change that soon)
  • schema

Indexing the document: index

Tantivy's index command offers a way to index a json file. The file must contain one JSON object per line. The structure of this JSON object must match that of our schema definition.

    {"body": "some text", "title": "some title", "url": "http://somedomain.com"}

For this tutorial, you can download a corpus with the 5 million+ English Wikipedia articles in the right format here: wiki-articles.json (2.34 GB). Make sure to decompress the file. Also, you can avoid this if you have bzcat installed so that you can read it compressed.

    bunzip2 wiki-articles.json.bz2

If you are in a rush you can download 100 articles in the right format here (11 MB).

The index command will index your document. By default it will use as 3 thread, each with a buffer size of 1GB split a across these threads.

    cat wiki-articles.json | tantivy index -i ./wikipedia-index

You can change the number of threads by passing it the -t parameter, and the total buffer size used by the threads heap by using the -m. Note that tantivy's memory usage is greater than just this buffer size parameter.

On my computer (8 core Xeon(R) CPU X3450 @ 2.67GHz), on 8 threads, indexing wikipedia takes around 9 minutes.

While tantivy is indexing, you can peek at the index directory to check what is happening.

    ls ./wikipedia-index

The main file is meta.json.

You should also see a lot of files with a UUID as filename, and different extensions. Our index is in fact divided in segments. Each segment acts as an individual smaller index. Its name is simply a uuid.

If you decided to index the complete wikipedia, you may also see some of these files disappear. Having too many segments can hurt search performance, so tantivy actually automatically starts merging segments.

Serve the search index: serve

Tantivy's cli also embeds a search server. You can run it with the following command.

    tantivy serve -i wikipedia-index

By default, it will serve on port 3000.

You can search for the top 20 most relevant documents for the query Barack Obama by accessing the following url in your browser

http://localhost:3000/api/?q=barack+obama&nhits=20

By default this query is treated as barack OR obama. You can also search for documents that contains both term, by adding a + sign before the terms in your query.

http://localhost:3000/api/?q=%2Bbarack%20%2Bobama&nhits=20

Also, - makes it possible to remove documents the documents containing a specific term.

http://localhost:3000/api/?q=-barack%20%2Bobama&nhits=20

Finally tantivy handle phrase queries.

http://localhost:3000/api/?q=%22barack%20obama%22&nhits=20

Search the index via the command line

You may also use the search command to stream all documents matching a specific query. The documents are returned in an unspecified order.

    tantivy search -i wikipedia-index -q "barack obama"
    tantivy search -i hdfs --query "*" --agg '{"severities":{"terms":{"field":"severity_text"}}}'

Benchmark the index: bench

Tantivy's cli provides a simple benchmark tool. You can run it with the following command.

    tantivy bench -i wikipedia-index -n 10 -q queries.txt

tantivy-cli's People

Contributors

asdfuser avatar currymj avatar ehiggs avatar erichutchins avatar evanxg852000 avatar fmassot avatar fulmicoton avatar hntd187 avatar keens avatar leiysky avatar lvheyang avatar maiha avatar misawa avatar mosuka avatar nocduro avatar osyoyu avatar pablocastellano avatar petr-tik avatar pseitz avatar remram44 avatar scampi avatar vishalsodani avatar wuranbo avatar yanns avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tantivy-cli's Issues

panic when meta.json does not exist

$ tantivy index --index ~/
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "Indexing failed : Error(PathDoesNotExist(\"meta.json\"), State { next_error: None, backtrace: None })"', libcore/result.rs:945:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

tantivy-cli should report that meta.json does not exist and exit. Not panic.

What should be the schema for a non json log text file

Hi @fulmicoton

I want to index and benchmark a text file containing millions of logs with each line having this structure

[Thu Jun 09 06:07:04 2005] [notice] LDAP: Built with OpenLDAP LDAP SDK

How can I define the schema for this file given schema is must with tantivy

Panic at "tantivy new"

Creating a new index, immediately give panic after question (with our without existing directory)

Tantivy display :

Creating new index
Let's define it's schema!

New field name ? thread 'main' panicked at 'Failed to read line', src/libcore/option.rs:1034:5

I'm using tanticy-clin 0.10.0

Field name must match the pattern

Does not work:

D:\Projects\tantivy-cli\target\debug>tantivy new -i wikipedia-index

Creating new index
Let's define it's schema!

New field name ? title
Error: Field name must match the pattern [_a-zA-Z0-9]+

Ignore fields that aren't indexed

Currently, passing a json file with fields that aren't indexable leads to an error like this

Failed to add document doc NoSuchFieldInSchema("asin")

I think it will be more user-friendly, if we ignore fields that we don't need to index and log it to stdout, instead of stopping indexing. This will reduce friction for new users - they will not have to change/preprocess their json files and can gradually add new fields and reindex.

Remove Cargo.lock from repository

I think Cargo.lock should be added to .gitignore. Now after each compilation on different machine it changes.
Also while changing .gitignore, can we also add the following 2 lines to it:

.idea/
*.iml

To ignore IDEA IDE files?

Reach parity with core tantivy

Going through the quickstart guide I noticed, the cli offers 2 field options - int or text, while core tantivy offers text, u64, i64, DateTime, Facet and Bytes.

Most likely this will be the first point of call for new users, who want to check tantivy out.

Going forward, if tantivy-cli wants to stay a real and relevant part of tantivy, we will need to invest in continuous feature parity.

It might be easier if we subsume tantivy-cli/src under bin/tantivy-cli in core tantivy. This will help us check, if new fields are included in cli application as well as extend the API endpoint.

We can also use CI to cross-compile and deploy ready tantivy-cli binaries for people to download and play around with.

Rephrase Question: Should the field be fast

When creating a new index with tantivy, one of the questions is

Should the field be fast (Y/N)?

I'd guess that noone would like to deny that question. Maybe add some more context to this one :-)

Error while compiling source

I have fresh version cloned from this repo and when I try to run cargo build I get the following error:

error[E0034]: multiple applicable items in scope
   --> /Users/klangner/.cargo/registry/src/github.com-1ecc6299db9ec823/downcast-0.9.2/src/lib.rs:120:38
    |
120 |     fn is_type(&self) -> bool { self.type_id() == TypeId::of::<T>() }
    |                                      ^^^^^^^ multiple `type_id` found
    |
note: candidate #1 is defined in the trait `Any`
   --> /Users/klangner/.cargo/registry/src/github.com-1ecc6299db9ec823/downcast-0.9.2/src/lib.rs:29:5
    |
29  |     fn type_id(&self) -> TypeId { TypeId::of::<Self>() }
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = help: to disambiguate the method call, write `Any::type_id(&self)` instead
note: candidate #2 is defined in the trait `std::any::Any`
    = help: to disambiguate the method call, write `std::any::Any::type_id(&self)` instead

error: aborting due to previous error

For more information about this error, try `rustc --explain E0034`.

I have MacOs and rust version 1.34

I can also compile without problems tantivy.

It looks that the problem could be fixed with newer version of dependencies. I can fix it and create PR. This will require upgrading tantivy and do some changes since the API changed.

Escaping "=" equal symbols in search using served index

I've experienced that searches which contain = return too many results. My q parameter equals: name=anything which gets encoded to name%3Danything when searching via served index.

Searching via CLI does not have this issue.

Steps to reproduce:

Download tantivy static binary for macOS

https://github.com/tantivy-search/tantivy-cli/releases/download/0.4.2/tantivy-cli-0.4.2-x86_64-apple-darwin.tar.gz

(can reproduce in linux x86 64 too)

Create new index

$ ./repro/tantivy new --index repro/index

Creating new index
Let's define it's schema!



New field name  ? text
Text or unsigned 32-bit integer (T/I) ? t
Should the field be stored (Y/N) ? n
Should the field be indexed (Y/N) ? y
Should the field be tokenized (Y/N) ? y
Should the term frequencies (per doc) be in the index (Y/N) ? n
Add another field (Y/N) ? y



New field name  ? name
Text or unsigned 32-bit integer (T/I) ? y
Error: Invalid input. Options are (T/I)
Text or unsigned 32-bit integer (T/I) ? t
Should the field be stored (Y/N) ? y
Should the field be indexed (Y/N) ? n
Add another field (Y/N) ? n

Index our two documents from data.json

# data.json
{"name":"bob","text":"this is text for bob, mainly it contains: name=\"bob\" which we want to find"}
{"name":"mary","text":"this is text for mary, mainly it contains: zame=\"mary\" which we do not want to find"}
$ cat repro/data.json | repro/tantivy index -i repro/index
Commit succeed, docstamp at 2
Waiting for merging threads
Terminated successfully!

Serve index on port 3000

$ ./repro/tantivy serve -i ./repro/index
listening on http://localhost:3000

Search for anything containing an equals symbol

$ curl "http://localhost:3000/api/?q=name%3Danything"

Expected Output

No matches

Actual Output

{
  "q": "name=askljdkfj",
  "num_hits": 1,
  "hits": [
    {
      "doc": {
        "name": [
          "bob"
        ]
      }
    }
  ],
  "timings": {
    "timings": [
      {
        "name": "search",
        "duration": 54,
        "depth": 0
      },
      {
        "name": "fetching docs",
        "duration": 70,
        "depth": 0
      }
    ]
  }
}

CLI panics when `agg` option is omitted

I am trying to run a search and the command parsing code is panicking:

% tantivy search --index tmp/index-1 --query 'NumAliphaticRings: 1'       
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/commands/search.rs:20:52
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

line 20 is the agg code here:

pub fn run_search_cli(matches: &ArgMatches) -> Result<(), String> {
    let index_directory = PathBuf::from(matches.value_of("index").unwrap());
    let query = matches.value_of("query").unwrap();
    let agg = Some(matches.value_of("aggregation").unwrap());

It should probably be a more defensive unwrap with a default value.

Add grok or regex support.

Like elastic 5.0 have done. https://www.elastic.co/guide/en/elasticsearch/reference/master/grok-processor.html.
With regex support, the tantivy-cli should be more practical, eg. use the Nginx or Apache log directly as input file.

@fulmicoton what about you thought? This is what I want to do with the https://github.com/BurntSushi/fst in my owner project.

So I will take it.

But as a very Rust newbie I may take some time. If you think it is a bad idea, actually I will still do it in my fork to familiar the code base of tantivy. ^_^

Update to use tantivy 0.13

FWIW, I have tried updating tantivy in Cargo.toml but this is not enough.

   Compiling tantivy-cli v0.13.0 (/root/tantivy/tantivy-cli)                          
error[E0423]: expected function, tuple struct or tuple variant, found struct `Field`                                   
  --> src/commands/bench.rs:28:30
   |                                                                                                                   
28 |         .map(|(field_id, _)| Field(field_id as u32))
   |                              ^^^^^ did you mean `Field { /* fields */ }`?
                                                           
error[E0412]: cannot find type `Error` in crate `tantivy`
  --> src/commands/merge.rs:10:28        
   |                                                                                                                   
10 | fn error_msg(err: tantivy::Error) -> String {
   |                            ^^^^^ not found in `tantivy`                          
   |                                               
help: possible candidates are found in other modules, you can import them into scope                                   
   |                                                                                                                   
3  | use clap::Error;                  
   |
3  | use core::fmt::Error;                          
   |                                                                                                                   
3  | use iron::Error;
   |
3  | use iron::error::Error;
   |
     and 10 other candidates

error[E0423]: expected function, tuple struct or tuple variant, found struct `Field`
  --> src/commands/search.rs:31:23
   |
31 |         .map(|(i, _)| Field(i as u32))
   |                       ^^^^^ did you mean `Field { /* fields */ }`?

error[E0423]: expected function, tuple struct or tuple variant, found struct `Field`
  --> src/commands/serve.rs:95:27
   |
95 |             .map(|(i, _)| Field(i as u32))
   |                           ^^^^^ did you mean `Field { /* fields */ }`?
                                                                                                                                                                                                                                              
error[E0599]: no method named `iter` found for opaque type `impl std::iter::Iterator` in the current scope                                                                                                                                    
  --> src/commands/bench.rs:25:10                                                                                                                                                                                                             
   |
25 |         .iter()
   |          ^^^^ method not found in `impl std::iter::Iterator`

error[E0277]: the `?` operator can only be applied to values that implement `std::ops::Try`
  --> src/commands/merge.rs:24:37
   |
24 |       let segment_meta: SegmentMeta = index
   |  _____________________________________^
25 | |         .writer(HEAP_SIZE)?
26 | |         .merge(&segments)?
   | |__________________________^ the `?` operator cannot be applied to type `impl std::future::Future`
   |
   = help: the trait `std::ops::Try` is not implemented for `impl std::future::Future`
   = note: required by `std::ops::Try::into_result`

error[E0277]: the `?` operator can only be applied to values that implement `std::ops::Try`
  --> src/commands/merge.rs:32:5
   |
32 | /     Index::open_in_dir(&path)?
33 | |         .writer_with_num_threads(1, 40_000_000)?
34 | |         .garbage_collect_files()?;
   | |_________________________________^ the `?` operator cannot be applied to type `impl std::future::Future`
   |
   = help: the trait `std::ops::Try` is not implemented for `impl std::future::Future`
   = note: required by `std::ops::Try::into_result`

error[E0599]: no method named `iter` found for opaque type `impl std::iter::Iterator` in the current scope
  --> src/commands/search.rs:23:10
   |
23 |         .iter()
   |          ^^^^ method not found in `impl std::iter::Iterator`

error[E0061]: this function takes 2 arguments but 1 argument was supplied
  --> src/commands/search.rs:39:33
   |
39 |         let mut scorer = weight.scorer(segment_reader)?;
   |                                 ^^^^^^ -------------- supplied 1 argument
   |                                 |
   |                                 expected 2 arguments

error[E0308]: mismatched types
  --> src/commands/search.rs:41:15
   |
41 |         while scorer.advance() {
   |               ^^^^^^^^^^^^^^^^ expected `bool`, found `u32`

error[E0599]: no method named `filter` found for struct `tantivy::tokenizer::SimpleTokenizer` in the current scope
  --> src/commands/serve.rs:79:18
   |
79 |                 .filter(RemoveLongFilter::limit(40))
   |                  ^^^^^^ method not found in `tantivy::tokenizer::SimpleTokenizer`
   | 
  ::: /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.13.0/src/tokenizer/simple_tokenizer.rs:7:1
   |
7  | pub struct SimpleTokenizer;
   | --------------------------- doesn't satisfy `_: std::iter::Iterator`
   |
   = note: the method `filter` exists but the following trait bounds were not satisfied:
           `tantivy::tokenizer::SimpleTokenizer: std::iter::Iterator`
           which is required by `&mut tantivy::tokenizer::SimpleTokenizer: std::iter::Iterator`

error[E0599]: no method named `iter` found for opaque type `impl std::iter::Iterator` in the current scope
  --> src/commands/serve.rs:87:14
   |
87 |             .iter()
   |              ^^^^ method not found in `impl std::iter::Iterator`

warning: unused import: `futures::Future`
 --> src/commands/merge.rs:4:5
  |
4 | use futures::Future;
  |     ^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

error: aborting due to 12 previous errors

Some errors have detailed explanations: E0061, E0277, E0308, E0412, E0423, E0599.
For more information about an error, try `rustc --explain E0061`.
error: could not compile `tantivy-cli`.

To learn more, run the command again with --verbose.

How to define schema for source code?

I have no prior experience with tantivy, and I want to use it to search a huge amount of source code (like the Chromium's repository). What's the proper way to define a schema for it? Thanks.

Search command println panics with broken pipes

I commonly use head to preview search results but this causes a panic. I resorted to some silly bash tricks to filter out this error.

$ tantivy search -i wikipedia-index -q "barack obama" | head >/dev/null
thread 'main' panicked at 'failed printing to stdout: Broken pipe (os error 32)', src/libstd/io/stdio.rs:955:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Would you be amenable to swapping println for writeln and ignoring broken pipes? Something like:

@@ -39,7 +41,16 @@ fn run_search(directory: &Path, query: &str) -> tantivy::Result<()> {
             let doc_id = scorer.doc();
             let doc = store_reader.get(doc_id)?;
             let named_doc = schema.to_named_doc(&doc);
-            println!("{}", serde_json::to_string(&named_doc).unwrap());
+            if let Err(e) = writeln!(
+                io::stdout(),
+                "{}",
+                serde_json::to_string(&named_doc).unwrap()
+            ) {
+                if e.kind() != ErrorKind::BrokenPipe {
+                    eprintln!("{}", e.to_string());
+                    process::exit(1)
+                }
+            }
             scorer.advance();
         }
     }

Failed to open field term dictionary in composite file. Is the field indexed

I am toying with the CLI. I got my docs indexed but when I try to search with
RUST_BACKTRACE=1 tantivy search -i test -q 'foo'

I get this panic:

thread 'main' panicked at 'Failed to open field term dictionary in composite file. Is the field indexed', libcore/option.rs:917:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::panicking::default_hook::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic_fmt
   4: core::panicking::panic_fmt
   5: core::option::expect_failed
   6: tantivy::core::segment_reader::SegmentReader::inverted_index
   7: <core::iter::Map<I, F> as core::iter::iterator::Iterator>::fold
   8: <tantivy::query::term_query::term_query::TermQuery as tantivy::query::query::Query>::weight
   9: <&'a mut I as core::iter::iterator::Iterator>::next
  10: <tantivy::query::boolean_query::boolean_query::BooleanQuery as tantivy::query::query::Query>::weight
  11: tantivy::commands::search::run_search_cli
  12: tantivy::main
  13: std::rt::lang_start::{{closure}}
  14: main

My schema looks like this (it has many fields with same options but some are u64 and some are text + one _doc field):

    {
      "name": "backend_access.response.headers.Set-Cookie.0",
      "type": "text",
      "options": {
        "indexing": {
          "record": "position",
          "tokenizer": "en_stem"
        },
        "stored": true
      }
    },
    {
      "name": "response.headers.Server.0",
      "type": "text",
      "options": {
        "indexing": {
          "record": "position",
          "tokenizer": "en_stem"
        },
        "stored": true
      }
    },
    {
      "name": "_doc",
      "type": "text",
      "options": {
        "indexing": {
          "record": "position",
          "tokenizer": "en_stem"
        },
        "stored": true
      }
    }

Overall better output from tantivy-cli

... Especially #23 made it clear that the fact that tantivy-cli is waiting for documents from stdin if no in put file is given can be confusing.

We could also want ot use indicatif to show the progress at least when indexing from a file.

Incorrect meta.json format

I think the JSON format is not quite right in this version of tantivy-cli. I can't get searching to work with the code as-is. There might be some issues with the version of tantivy that it depends on (0.4.3). Since 0.4.3, there have been some changes to the way tantivy serializes the Schema.

Steps to reproduce:

cargo install tantivy-cli
mkdir index
tantivy new -i index
Creating new index 
Let's define it's schema! 



New field name  ? text
Text or unsigned 32-bit integer (T/I) ? T
Should the field be stored (Y/N) ? Y
Should the field be indexed (Y/N) ? Y
Should the field be tokenized (Y/N) ? Y
Should the term frequencies (per doc) be in the index (Y/N) ? Y
Should the term positions (per doc) be in the index (Y/N) ? Y
Add another field (Y/N) ? N

[
  {
    "name": "text",
    "type": "text",
    "options": {
      "indexing": "position",
      "stored": true
    }
  }
]

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "Error(CorruptedFile(\"meta.json\"), State { next_error: Some(ErrorImpl { code: Message(\"invalid type: sequence, expected struct Schema\"), line: 3, column: 12 }), backtrace: None })"', src/libcore/result.rs:916:4
note: Run with `RUST_BACKTRACE=1` for a backtrace

I can also reproduce this issue manually creating the meta.json file and running tantivy search.

Here's the JSON that is created with tantivy new:

{
  "segments": [],
  "schema": [
    {
      "name": "text",
      "type": "text",
      "options": {
        "indexing": "position",
        "stored": true
      }
    }
  ],
  "opstamp": 0
}

I was able to get it working here: nicot@99fe211.

Compilation issue when running `cargo install tantivy-cli`

I had an error when trying to install tantivy-cli on osx.

Ran cargo install tantivy-cli.

cargo --version
cargo 1.46.0

rustc --version
rustc 1.46.0

This is the output

   Compiling tantivy-cli v0.13.1
error[E0599]: no function or associated item named `with_name` found for struct `clap::Arg<'_>` in the current scope
  --> /Users/alr/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.13.1/src/main.rs:11:26
   |
11 |     let index_arg = Arg::with_name("index")
   |                          ^^^^^^^^^
   |                          |
   |                          function or associated item not found in `clap::Arg<'_>`
   |                          help: there is an associated function with a similar name: `get_name`

error[E0599]: no function or associated item named `with_name` found for struct `clap::Arg<'_>` in the current scope
  --> /Users/alr/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.13.1/src/main.rs:32:27
   |
32 |                 .arg(Arg::with_name("host")
   |                           ^^^^^^^^^
   |                           |
   |                           function or associated item not found in `clap::Arg<'_>`
   |                           help: there is an associated function with a similar name: `get_name`

error[E0599]: no function or associated item named `with_name` found for struct `clap::Arg<'_>` in the current scope
  --> /Users/alr/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.13.1/src/main.rs:37:27
   |
37 |                 .arg(Arg::with_name("port")
   |                           ^^^^^^^^^
   |                           |
   |                           function or associated item not found in `clap::Arg<'_>`
   |                           help: there is an associated function with a similar name: `get_name`

error[E0599]: no function or associated item named `with_name` found for struct `clap::Arg<'_>` in the current scope
  --> /Users/alr/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.13.1/src/main.rs:49:27
   |
49 |                 .arg(Arg::with_name("file")
   |                           ^^^^^^^^^
   |                           |
   |                           function or associated item not found in `clap::Arg<'_>`
   |                           help: there is an associated function with a similar name: `get_name`

error[E0599]: no function or associated item named `with_name` found for struct `clap::Arg<'_>` in the current scope
  --> /Users/alr/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.13.1/src/main.rs:54:27
   |
54 |                 .arg(Arg::with_name("num_threads")
   |                           ^^^^^^^^^
   |                           |
   |                           function or associated item not found in `clap::Arg<'_>`
   |                           help: there is an associated function with a similar name: `get_name`

error[E0599]: no function or associated item named `with_name` found for struct `clap::Arg<'_>` in the current scope
  --> /Users/alr/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.13.1/src/main.rs:60:27
   |
60 |                 .arg(Arg::with_name("memory_size")
   |                           ^^^^^^^^^
   |                           |
   |                           function or associated item not found in `clap::Arg<'_>`
   |                           help: there is an associated function with a similar name: `get_name`

error[E0599]: no function or associated item named `with_name` found for struct `clap::Arg<'_>` in the current scope
  --> /Users/alr/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.13.1/src/main.rs:66:27
   |
66 |                 .arg(Arg::with_name("nomerge")
   |                           ^^^^^^^^^
   |                           |
   |                           function or associated item not found in `clap::Arg<'_>`
   |                           help: there is an associated function with a similar name: `get_name`

error[E0599]: no function or associated item named `with_name` found for struct `clap::Arg<'_>` in the current scope
  --> /Users/alr/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.13.1/src/main.rs:74:27
   |
74 |                 .arg(Arg::with_name("query")
   |                           ^^^^^^^^^
   |                           |
   |                           function or associated item not found in `clap::Arg<'_>`
   |                           help: there is an associated function with a similar name: `get_name`

error[E0599]: no function or associated item named `with_name` found for struct `clap::Arg<'_>` in the current scope
  --> /Users/alr/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.13.1/src/main.rs:85:27
   |
85 |                 .arg(Arg::with_name("queries")
   |                           ^^^^^^^^^
   |                           |
   |                           function or associated item not found in `clap::Arg<'_>`
   |                           help: there is an associated function with a similar name: `get_name`

error[E0599]: no function or associated item named `with_name` found for struct `clap::Arg<'_>` in the current scope
  --> /Users/alr/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.13.1/src/main.rs:91:27
   |
91 |                 .arg(Arg::with_name("num_repeat")
   |                           ^^^^^^^^^
   |                           |
   |                           function or associated item not found in `clap::Arg<'_>`
   |                           help: there is an associated function with a similar name: `get_name`

error[E0308]: mismatched types
   --> /Users/alr/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.13.1/src/main.rs:105:9
    |
105 |     let (subcommand, some_options) = cli_options.subcommand();
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^   ------------------------ this expression has type `std::option::Option<(&str, &clap::ArgMatches)>`
    |         |
    |         expected enum `std::option::Option`, found tuple
    |
    = note: expected enum `std::option::Option<(&str, &clap::ArgMatches)>`
              found tuple `(_, _)`

error: aborting due to 11 previous errors

Some errors have detailed explanations: E0308, E0599.
For more information about an error, try `rustc --explain E0308`.
error: failed to compile `tantivy-cli v0.13.1`, intermediate artifacts can be found at `/var/folders/xx/111r6kc974z_tmqc70rymxkm0000gn/T/cargo-installKbo9Oi`

Caused by:
  could not compile `tantivy-cli`.

To learn more, run the command again with --verbose.

Is there anything on my system not up-to-date enough? Any more information that I can provide?

Thanks!

How to delete a document?

Is there a way using the CLI to delete a document, or to have some way to replace a duplicate document given a certain field name?

thread '<unnamed>' panicked at 'Encounterred a field that is not supposed to be indexed. Have you modified the index?'

Following instructions for tantivy-cli and encountered the following message:

100000 Docs
126767571 docs / hour
200000 Docs
61979511 docs / hour
300000 Docs
45240748 docs / hour
400000 Docs
80860935 docs / hour
500000 Docs
37683017 docs / hour
600000 Docs
43229461 docs / hour
700000 Docs
40245879 docs / hour
800000 Docs
53577161 docs / hour
900000 Docs
63745697 docs / hour
1000000 Docs
29509239 docs / hour
1100000 Docs
38593585 docs / hour
1200000 Docs
29582937 docs / hour
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
note: Run with RUST_BACKTRACE=1 for a backtrace.
1300000 Docs
37100076 docs / hour
1400000 Docs
34933046 docs / hour
1500000 Docs
50002639 docs / hour
1600000 Docs
36648185 docs / hour
1700000 Docs
43451182 docs / hour
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
1800000 Docs
36208233 docs / hour
1900000 Docs
36207170 docs / hour
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
2000000 Docs
34476517 docs / hour
2100000 Docs
37284904 docs / hour
2200000 Docs
34497714 docs / hour
2300000 Docs
51197990 docs / hour
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
2400000 Docs
35314496 docs / hour
2500000 Docs
35938516 docs / hour
2600000 Docs
35805693 docs / hour
2700000 Docs
45073260 docs / hour
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
2800000 Docs
40951638 docs / hour
2900000 Docs
32014644 docs / hour
3000000 Docs
53152959 docs / hour
3100000 Docs
41247139 docs / hour
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
3200000 Docs
48792619 docs / hour
3300000 Docs
44430117 docs / hour
3400000 Docs
23955508 docs / hour
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
3500000 Docs
35455778 docs / hour
3600000 Docs
42539936 docs / hour
3700000 Docs
39232883 docs / hour
3800000 Docs
36802974 docs / hour
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
3900000 Docs
32920675 docs / hour
4000000 Docs
35512050 docs / hour
4100000 Docs
42407731 docs / hour
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
4200000 Docs
59604287 docs / hour
4300000 Docs
42813997 docs / hour
4400000 Docs
40106933 docs / hour
4500000 Docs
48629958 docs / hour
4600000 Docs
41602207 docs / hour
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
4700000 Docs
43152739 docs / hour
4800000 Docs
39880076 docs / hour
4900000 Docs
33744674 docs / hour
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
5000000 Docs
40354826 docs / hour
Commit succeed, docstamp at 5032119
Waiting for merging threads
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
thread '' panicked at 'Encounterred a field that is not supposed to be
indexed. Have you modified the index?', /checkout/src/libcore/option.rs:823:4
ERROR:tantivy::indexer::index_writer: Some merging thread failed Error(ErrorInThread("Failed to join merging thread."), State { next_error: Some(Error(ErrorInThread("Merging thread failed."), State { next_error: None, backtrace: None })), backtrace: None })
thread 'main' panicked at 'called Result::unwrap() on an Err value: "Indexing failed : Error(ErrorInThread("Failed to join merging thread."), State { next_error: Some(Error(ErrorInThread("Merging thread failed."), State { next_error: None, backtrace: None })), backtrace: None })"', /checkout/src/libcore/result.rs:860:4

real 7m54.105s
user 22m39.868s
sys 3m58.152s

Steps to reproduce:

mkdir wikipedia-index
tantivy new -i wikipedia-index
See meta file here: meta.json.txt
time cat wiki-articles.json | tantivy index -i ./wikipedia-index

Do you need any further info?

I do create f64 field, it creates text field

I do create f64 field, it creates text field

sandbox/samples/tantivy » tantivy new -i csmobile-index2                                                                                                        130 ↵

Creating new index
First define its schema!



New field name  ? price
Choose Field Type (Text/u64/i64/f64/Date/Facet/Bytes) ? f64
Should the field be stored (Y/N) ? Y
Should the field be indexed (Y/N) ? Y
Should the term be tokenized? (Y/N) ? Y
Should the term frequencies (per doc) be in the index (Y/N) ? Y
Should the term positions (per doc) be in the index (Y/N) ? Y
Add another field (Y/N) ? N

[
  {
    "name": "price",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": true
    }
  }
]

Process hangs on futex when indexing

$ tantivy new --index .

Creating new index 
Let's define it's schema! 



New field name  ? data
Text or unsigned 32-bit integer (T/I) ? t
Should the field be stored (Y/N) ? n
Should the field be indexed (Y/N) ? t
Error: Invalid input. Options are (Y/N)
Should the field be indexed (Y/N) ? y
Should the term be tokenized? (Y/N) ? y
Should the term frequencies (per doc) be in the index (Y/N) ? y
Should the term positions (per doc) be in the index (Y/N) ? y
Add another field (Y/N) ? n

[
  {
    "name": "data",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": false
    }
  }
]
$ tantivy index --index .

Then the process appears to hang (maybe it takes over half an hour to index 1033 files?). I dump the stack as follows:

$ gdb -batch -ex "thread apply all bt" -p 20259
[New LWP 20260]
[New LWP 20261]
[New LWP 20262]
[New LWP 20263]
[New LWP 20264]
[New LWP 20265]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fb05935e072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb0586213e8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88	../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.

Thread 7 (Thread 0x7fb026bff700 (LWP 20265)):
#0  0x00007fb05935e072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb0586215f8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7fb058621600, cond=0x7fb0586215d0) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7fb0586215d0, mutex=0x7fb058621600) at pthread_cond_wait.c:655
#3  0x0000560bb5a93156 in <chan::Iter<T> as core::iter::iterator::Iterator>::next ()
#4  0x0000560bb5a60eb9 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#5  0x0000560bb5a825a7 in <F as alloc::boxed::FnBox<A>>::call_box ()
#6  0x0000560bb5c4ec48 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h290a492c91ea9d98 () at /checkout/src/liballoc/boxed.rs:798
#7  std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#8  std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#9  0x00007fb0593577fc in start_thread (arg=0x7fb026bff700) at pthread_create.c:465
#10 0x00007fb058e6db5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7fb0373ff700 (LWP 20264)):
#0  0x00007fb05935e072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb0586215f8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7fb058621600, cond=0x7fb0586215d0) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7fb0586215d0, mutex=0x7fb058621600) at pthread_cond_wait.c:655
#3  0x0000560bb5a93156 in <chan::Iter<T> as core::iter::iterator::Iterator>::next ()
#4  0x0000560bb5a60eb9 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#5  0x0000560bb5a825a7 in <F as alloc::boxed::FnBox<A>>::call_box ()
#6  0x0000560bb5c4ec48 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h290a492c91ea9d98 () at /checkout/src/liballoc/boxed.rs:798
#7  std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#8  std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#9  0x00007fb0593577fc in start_thread (arg=0x7fb0373ff700) at pthread_create.c:465
#10 0x00007fb058e6db5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7fb0477ff700 (LWP 20263)):
#0  0x00007fb05935e072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb0586215f8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7fb058621600, cond=0x7fb0586215d0) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7fb0586215d0, mutex=0x7fb058621600) at pthread_cond_wait.c:655
#3  0x0000560bb5a93156 in <chan::Iter<T> as core::iter::iterator::Iterator>::next ()
#4  0x0000560bb5a60eb9 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#5  0x0000560bb5a825a7 in <F as alloc::boxed::FnBox<A>>::call_box ()
#6  0x0000560bb5c4ec48 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h290a492c91ea9d98 () at /checkout/src/liballoc/boxed.rs:798
#7  std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#8  std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#9  0x00007fb0593577fc in start_thread (arg=0x7fb0477ff700) at pthread_create.c:465
#10 0x00007fb058e6db5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7fb057bff700 (LWP 20262)):
#0  0x00007fb05935e072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb058621748) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7fb0586216f0, cond=0x7fb058621720) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7fb058621720, mutex=0x7fb0586216f0) at pthread_cond_wait.c:655
#3  0x0000560bb5c40b82 in std::sys::unix::condvar::Condvar::wait () at libstd/sys/unix/condvar.rs:78
#4  std::sys_common::condvar::Condvar::wait () at libstd/sys_common/condvar.rs:51
#5  std::sync::condvar::Condvar::wait () at libstd/sync/condvar.rs:212
#6  std::thread::park () at libstd/thread/mod.rs:800
#7  0x0000560bb5b7f3c4 in <std::sync::mpsc::Receiver<T>>::recv ()
#8  0x0000560bb5b82764 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#9  0x0000560bb5b7bbcf in <F as alloc::boxed::FnBox<A>>::call_box ()
#10 0x0000560bb5c4ec48 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h290a492c91ea9d98 () at /checkout/src/liballoc/boxed.rs:798
#11 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#12 std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#13 0x00007fb0593577fc in start_thread (arg=0x7fb057bff700) at pthread_create.c:465
#14 0x00007fb058e6db5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fb0583fe700 (LWP 20261)):
#0  0x00007fb05935e072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb058621388) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7fb058621390, cond=0x7fb058621360) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7fb058621360, mutex=0x7fb058621390) at pthread_cond_wait.c:655
#3  0x0000560bb5a03c59 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#4  0x0000560bb5c62976 in <F as alloc::boxed::FnBox<A>>::call_box ()
#5  0x0000560bb5c4ec48 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h290a492c91ea9d98 () at /checkout/src/liballoc/boxed.rs:798
#6  std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#7  std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#8  0x00007fb0593577fc in start_thread (arg=0x7fb0583fe700) at pthread_create.c:465
#9  0x00007fb058e6db5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fb0585ff700 (LWP 20260)):
#0  0x00007fb059361d5d in __libc_read (fd=0, buf=0x7fb057e12000, nbytes=8192) at ../sysdeps/unix/sysv/linux/read.c:26
#1  0x0000560bb5c44617 in std::sys::unix::fd::FileDesc::read () at libstd/sys/unix/fd.rs:58
#2  std::sys::unix::stdio::Stdin::read () at libstd/sys/unix/stdio.rs:24
#3  <std::io::stdio::StdinRaw as std::io::Read>::read () at libstd/io/stdio.rs:77
#4  <std::io::stdio::Maybe<R> as std::io::Read>::read () at libstd/io/stdio.rs:117
#5  <std::io::buffered::BufReader<R> as std::io::Read>::read () at libstd/io/buffered.rs:229
#6  <std::io::stdio::StdinLock<'a> as std::io::Read>::read () at libstd/io/stdio.rs:309
#7  0x0000560bb5c4448b in <std::io::stdio::Stdin as std::io::Read>::read () at libstd/io/stdio.rs:289
#8  0x0000560bb5a03166 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#9  0x0000560bb5c62d16 in <F as alloc::boxed::FnBox<A>>::call_box ()
#10 0x0000560bb5c4ec48 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h290a492c91ea9d98 () at /checkout/src/liballoc/boxed.rs:798
#11 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#12 std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#13 0x00007fb0593577fc in start_thread (arg=0x7fb0585ff700) at pthread_create.c:465
#14 0x00007fb058e6db5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fb059b81900 (LWP 20259)):
#0  0x00007fb05935e072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb0586213e8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7fb0586213f0, cond=0x7fb0586213c0) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7fb0586213c0, mutex=0x7fb0586213f0) at pthread_cond_wait.c:655
#3  0x0000560bb5c91ac0 in tantivy::commands::index::run_index_cli ()
#4  0x0000560bb5c8eb5c in tantivy::main ()
#5  0x0000560bb5c68de3 in std::rt::lang_start::{{closure}} ()
#6  0x0000560bb5c88ac8 in main ()

I've reproduced this on OS X 10.11 and Ubuntu 17.10.

Thread Merge Error

When running through the wiki indexing tutorial, I keep getting the following error - regardless of multi- or single-thread. I am on WSL on an Intel I9-13900K.

 ○ cat wiki-articles.json | tantivy index -i ./wiki
65523 Docs
235882816 docs / hour 78.24 Mb/s
101293 Docs
128397080 docs / hour 49.50 Mb/s
145485 Docs
159090880 docs / hour 69.63 Mb/s
181873 Docs
120802736 docs / hour 52.18 Mb/s
227277 Docs
163452752 docs / hour 61.36 Mb/s
264562 Docs
121315152 docs / hour 77.24 Mb/s
281652 Docs
61521108 docs / hour 84.31 Mb/s
297545 Docs
57208676 docs / hour 44.33 Mb/s
thread 'merge_thread_0' panicked at 'Logic Error in Tantivy (Please report). Facet field should have required a`term_ordinal_mapping`.', /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.19.0/src/indexer/merger.rs:262:48
stack backtrace:
   0:     0x565357daacf3 - std::backtrace_rs::backtrace::libunwind::trace::he615646ea344481f
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x565357daacf3 - std::backtrace_rs::backtrace::trace_unsynchronized::h6ea8eaac68705b9c
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x565357daacf3 - std::sys_common::backtrace::_print_fmt::h7ac486a935ce0bf7
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x565357daacf3 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h1b5a095d3db2e28f
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x565357cba50e - core::fmt::write::h445545b92224a1cd
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/fmt/mod.rs:1209:17
   5:     0x565357d847e4 - std::io::Write::write_fmt::h55a43474c6520b00
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/io/mod.rs:1682:15
   6:     0x565357dabfcf - std::sys_common::backtrace::_print::h65d20526fdb736b0
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:47:5
   7:     0x565357dabfcf - std::sys_common::backtrace::print::h6555fbe12a1cc41b
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:34:9
   8:     0x565357dabbcf - std::panicking::default_hook::{{closure}}::hbdf58083140e7ac6
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:267:22
   9:     0x565357dacc68 - std::panicking::default_hook::haef8271c56b74d85
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:286:9
  10:     0x565357dacc68 - std::panicking::rust_panic_with_hook::hfd45b6b6c12d9fa5
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:688:13
  11:     0x565357dac732 - std::panicking::begin_panic_handler::{{closure}}::hf591e8609a75bd4b
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:579:13
  12:     0x565357dac69c - std::sys_common::backtrace::__rust_end_short_backtrace::h81899558795e4ff7
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:137:18
  13:     0x565357dac671 - rust_begin_unwind
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
  14:     0x565357bd1492 - core::panicking::panic_fmt::h4235fa9b4675b332
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
  15:     0x565357cbe830 - core::panicking::panic_display::h29316b4d8276b58d
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:139:5
  16:     0x565357cbe7db - core::panicking::panic_str::hdb14547b30ece385
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:123:5
  17:     0x565357bd1606 - core::option::expect_failed::h1c9d589f6e6349ed
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/option.rs:1879:5
  18:     0x565357e3c4eb - core::option::Option<T>::expect::h90c32df798761fb3
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/option.rs:741:21
  19:     0x565357e3c4eb - tantivy::indexer::merger::IndexMerger::write_fast_fields::h172e81bd94b084c9
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.19.0/src/indexer/merger.rs:262:48
  20:     0x565357e3c4eb - tantivy::indexer::merger::IndexMerger::write::h5fe7bf5b003800f5
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.19.0/src/indexer/merger.rs:1044:9
  21:     0x565357e923f7 - tantivy::indexer::segment_updater::merge::h862fa030ede427f4
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.19.0/src/indexer/segment_updater.rs:126:20
  22:     0x565357e923f7 - tantivy::indexer::segment_updater::SegmentUpdater::start_merge::{{closure}}::hc441b8015b7d0d6e
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.19.0/src/indexer/segment_updater.rs:520:19
  23:     0x565357e923f7 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h254358538d0e53d8
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panic/unwind_safe.rs:271:9
  24:     0x565357dd9a99 - std::panicking::try::do_call::h64870ef1c7a730c2
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:483:40
  25:     0x565357dd9a99 - std::panicking::try::h9603a79a35137070
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:447:19
  26:     0x565357dd9a99 - std::panic::catch_unwind::h38c3b0ccd1c1d895
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panic.rs:137:14
  27:     0x565357dd9a99 - rayon_core::unwind::halt_unwinding::he06f952ab153d619
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.10.1/src/unwind.rs:17:5
  28:     0x565357dd9a99 - rayon_core::registry::Registry::catch_unwind::h064d2137f6238c00
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.10.1/src/registry.rs:335:27
  29:     0x565357dd9a99 - rayon_core::spawn::spawn_job::{{closure}}::h138c7f51fcfa6197
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.10.1/src/spawn/mod.rs:97:13
  30:     0x565357dd9a99 - <rayon_core::job::HeapJob<BODY> as rayon_core::job::Job>::execute::h0f869ba85dee0982
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.10.1/src/job.rs:163:9
  31:     0x565357bd6ffa - rayon_core::job::JobRef::execute::h91fb3c5f9c51df7d
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.10.1/src/job.rs:58:9
  32:     0x565357bd6ffa - rayon_core::registry::WorkerThread::execute::he04f598845442008
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.10.1/src/registry.rs:804:9
  33:     0x565357bd6ffa - rayon_core::registry::WorkerThread::wait_until_cold::hb69082dd664ee9e1
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.10.1/src/registry.rs:781:17
  34:     0x565357cfd18c - rayon_core::registry::WorkerThread::wait_until::hf0637ef37b4e32b9
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.10.1/src/registry.rs:755:13
  35:     0x565357cfd18c - rayon_core::registry::main_loop::h1670fc76fc3ad382
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.10.1/src/registry.rs:889:5
  36:     0x565357cfd18c - rayon_core::registry::ThreadBuilder::run::h12e761e1159afda7
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.10.1/src/registry.rs:53:18
  37:     0x565357cfd18c - <rayon_core::registry::DefaultSpawn as rayon_core::registry::ThreadSpawn>::spawn::{{closure}}::h25c1c5b7dbd15b13
                               at /home/ielm/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.10.1/src/registry.rs:98:20
  38:     0x565357cfd18c - std::sys_common::backtrace::__rust_begin_short_backtrace::h8b37295483e7d1ad
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:121:18
  39:     0x565357cff5a5 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::h05654c7e3a46641c
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/thread/mod.rs:551:17
  40:     0x565357cff5a5 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h7c1a298f02a6b896
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panic/unwind_safe.rs:271:9
  41:     0x565357cff5a5 - std::panicking::try::do_call::h8db3226a983be2e8
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:483:40
  42:     0x565357cff5a5 - std::panicking::try::hc07a5a5c73a94da3
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:447:19
  43:     0x565357cff5a5 - std::panic::catch_unwind::heab44d0a4e39ab1b
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panic.rs:137:14
  44:     0x565357cff5a5 - std::thread::Builder::spawn_unchecked_::{{closure}}::h63788cda0c8357da
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/thread/mod.rs:550:30
  45:     0x565357cff5a5 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h46e072364d33c7ec
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5
  46:     0x565357dae135 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h4273f95ec44459b3
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/alloc/src/boxed.rs:1987:9
  47:     0x565357dae135 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h70f28fa4ddc269e5
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/alloc/src/boxed.rs:1987:9
  48:     0x565357dae135 - std::sys::unix::thread::Thread::new::thread_start::h85a9c16b988e2bd0
                               at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys/unix/thread.rs:108:17
  49:     0x7fb973b88609 - start_thread
  50:     0x7fb973958133 - clone
  51:                0x0 - <unknown>
Rayon: detected unexpected panic; aborting
[1]    5023 broken pipe  cat wiki-articles.json |
       5024 abort        tantivy index -i ./wiki

Merge failed with "out of bounds" error

Hi, I've been indexing documents (./tantivy index -i index --nomerge) and then merging every 5000 documents with ./tantivy merge -i index.

It's been working, but now I am unable to merge successfully and receive the error below.

Does anyone have suggestions on how to remedy this, or what other information I can provide to help debug this?

thread '<unnamed>' panicked at 'index out of bounds: the len is 196916 but the index is 201379', /checkout/src/libcore/slice/mod.rs:2051:10
stack backtrace:
   0:     0x55b065363292 - std::sys::unix::backtrace::tracing::imp::unwind_backtrace::h0c827793368fbc95
                               at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1:     0x55b06535610b - std::panicking::default_hook::{{closure}}::h78ff2c77689d1451
                               at libstd/sys_common/backtrace.rs:71
   2:     0x55b065355217 - std::panicking::rust_panic_with_hook::hcd548010674c2347
                               at libstd/panicking.rs:227
                               at libstd/panicking.rs:463
   3:     0x55b0653582fc - std::panicking::begin_panic_fmt::hb088e4aa19a35612
                               at libstd/panicking.rs:350
   4:     0x55b06536718a - core::panicking::panic_fmt::hed27deb097b8209d
                               at libstd/panicking.rs:328
   5:     0x55b06536713f - core::panicking::panic_bounds_check::h7b819858526f3dc9
                               at libcore/panicking.rs:58
   6:     0x55b0651af65d - <tantivy::indexer::merger::IndexMerger as tantivy::core::segment::SerializableSegment>::write::hcfbc180f0ad9e058
   7:     0x55b0651865e0 - std::sys_common::backtrace::__rust_begin_short_backtrace::h9497c20b2a3a7790
   8:     0x55b065193002 - <F as alloc::boxed::FnBox<A>>::call_box::h3b1fe427063780b8
   9:     0x55b0653509e3 - std::sys::unix::thread::Thread::new::thread_start::haaed00ecdbc477ec
                               at /checkout/src/liballoc/boxed.rs:648
  10:     0x7f07c48dc063 - start_thread
  11:     0x7f07c43fb62c - clone
  12:                0x0 - <unknown>
thread 'main' panicked at 'Merge failed: Canceled', libcore/result.rs:945:5
stack backtrace:
   0:     0x55b065363292 - std::sys::unix::backtrace::tracing::imp::unwind_backtrace::h0c827793368fbc95
                               at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1:     0x55b06535610b - std::panicking::default_hook::{{closure}}::h78ff2c77689d1451
                               at libstd/sys_common/backtrace.rs:71
   2:     0x55b065355217 - std::panicking::rust_panic_with_hook::hcd548010674c2347
                               at libstd/panicking.rs:227
                               at libstd/panicking.rs:463
   3:     0x55b0653582fc - std::panicking::begin_panic_fmt::hb088e4aa19a35612
                               at libstd/panicking.rs:350
   4:     0x55b06536718a - core::panicking::panic_fmt::hed27deb097b8209d
                               at libstd/panicking.rs:328
   5:     0x55b06537c380 - core::result::unwrap_failed::hb79615b688308113
   6:     0x55b06537bfb1 - tantivy::commands::merge::run_merge_cli::h3a81bcfc9e57e723
   7:     0x55b0653aa2f3 - tantivy::main::hc657201e713a0921
   8:     0x55b0650f0152 - std::rt::lang_start::{{closure}}::h10ca1e4558686e03
   9:     0x55b0653a3d58 - main
  10:     0x7f07c4334b44 - __libc_start_main
  11:     0x55b0650eff01 - <unknown>
  12:                0x0 - <unknown>

Prefix search

hello, i want to search "obama" only with "oba" or just "ama"
btw it's wilcard or regex support?

Match Some Features from Whoosh

I'm not sure how to split this up and I don't want to spam with issues.

Whoosh is a great, pure python search engine/tool that accomplishes most of the goals of this project (sans Rust-level performance). Matching some of their features would make tantivy something useful for a production application.

A couple of features that tantivy could use that would make it more useful/production ready:

  • More Schema types: KEYWORD, ID, STORED, DATETIME, BOOLEAN, etc.
  • Analyzers in general - I don't think I saw them when looking at the docs. This can be useful for creating comma separated field for the schema or something (though in rust maybe that would be more strongly typed - and I'm not sure why we're using this instead of the KEYWORD type in our application - presumably there is some difference).
  • general NLP tools like from https://whoosh.readthedocs.io/en/latest/stemming.html

panic when indexing dir that does not exist

$ tantivy index --index /xxx
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "Indexing failed : Error(PathDoesNotExist(\"/xxx\"), State { next_error: None, backtrace: None })"', libcore/result.rs:945:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

CLI panics if schema contains an json_document field type

I am indexing chemical structures, my basic schema:

{
  "index_settings": {
    "docstore_compression": "lz4"
  },
  "segments": [
    {
      "segment_id": "ecf472f4-5e12-42f4-989e-d2571f8ab745",
      "max_doc": 10,
      "deletes": null
    }
  ],
  "schema": [
    {
      "name": "smile",
      "type": "text",
      "options": {
        "indexing": {
          "record": "position",
          "fieldnorms": true,
          "tokenizer": "default"
        },
        "stored": true
      }
    },
    {
      "name": "description",
      "type": "json_object",
      "options": {
        "stored": true,
        "indexing": {
          "record": "position",
          "fieldnorms": true,
          "tokenizer": "default"
        }
      }
    }
  ],
  "opstamp": 11
}

and when I try to get the cli to run a query against this index:

% RUST_BACKTRACE=true tantivy serve --index=tmp/index 
thread 'main' panicked at 'unhandled type', /Users/xlange/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.15.3/src/schema/field_entry.rs:233:38
stack backtrace:
   0: std::panicking::begin_panic
   1: <<tantivy::schema::field_entry::FieldEntry as serde::de::Deserialize>::deserialize::FieldEntryVisitor as serde::de::Visitor>::visit_map
   2: <tantivy::schema::schema::Schema as serde::de::Deserialize>::deserialize
   3: tantivy::core::index::load_metas
   4: tantivy::core::index::Index::open
   5: tantivy::commands::serve::run_serve_cli
   6: tantivy::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Not compiled

D:\Projects>cargo install tantivy-cli
Updating registry https://github.com/rust-lang/crates.io-index
Compiling httparse v1.2.1
Compiling libc v0.2.19
Compiling ansi_term v0.9.0
Compiling unicode-normalization v0.1.3
Compiling modifier v0.1.0
Compiling memchr v0.1.11
Compiling traitobject v0.0.1
Compiling num_cpus v1.2.1
Compiling ansi_term v0.8.0
Compiling aho-corasick v0.5.3
Compiling strsim v0.5.2
Compiling utf8-ranges v0.1.3
Compiling itoa v0.1.1
Compiling typeable v0.1.2
Compiling dtoa v0.2.2
Compiling byteorder v0.4.2
Compiling traitobject v0.0.3
Compiling unsafe-any v0.4.1
Compiling error v0.1.9
Compiling rand v0.3.15
Compiling unicode-width v0.1.4
Compiling itertools v0.4.19
Compiling matches v0.1.4
Compiling regex-syntax v0.3.9
Compiling unicode-bidi v0.2.4
Compiling sequence_trie v0.0.13
Compiling bitflags v0.7.0
Compiling lazy_static v0.1.16
Compiling log v0.3.6
Compiling tempdir v0.3.5
Compiling rustc-serialize v0.3.22
Compiling mime v0.2.2
Compiling hpack v0.2.0
Compiling vec_map v0.6.0
Compiling serde v0.8.21
Compiling crossbeam v0.2.10
Compiling chan v0.1.18
Compiling byteorder v0.5.3
Compiling typemap v0.3.3
Compiling plugin v0.2.6
Compiling winapi-build v0.1.1
Compiling getopts v0.2.14
Compiling semver v0.1.20
Compiling solicit v0.4.4
Compiling conduit-mime-types v0.7.3
Compiling rustc_version v0.1.7
Compiling pulldown-cmark v0.0.3
Compiling unicase v1.4.0
Compiling tempfile v2.1.4
Compiling num-traits v0.1.36
Compiling winapi v0.2.8
Compiling serde_json v0.8.4
Compiling skeptic v0.6.1
Compiling num-integer v0.1.32
Compiling num-complex v0.1.35
Compiling num-iter v0.1.32
Compiling num-bigint v0.1.35
Compiling idna v0.1.0
Compiling kernel32-sys v0.2.2
Compiling ascii v0.7.1
Compiling combine v2.0.0
Compiling num-rational v0.1.35
Compiling url v1.2.4
Compiling num v0.1.36
Compiling serde v0.6.15
Compiling uuid v0.1.18
Compiling num_cpus v0.2.13
Compiling unicode-segmentation v0.1.3
Compiling gcc v0.3.41
Compiling bincode v0.4.1
Compiling language-tags v0.2.2
Compiling lz4 v1.20.0
Compiling lz4-sys v1.0.1+1.7.3
Compiling tantivy v0.2.0
Build failed, waiting for other jobs to finish...
error: failed to compile tantivy-cli v0.2.0, intermediate artifacts can be found at C:\Users\chvl\AppData\Local\Temp\cargo-install.x08urL3FwU1p

Caused by:
failed to run custom build command for tantivy v0.2.0
process didn't exit successfully: C:\Users\chvl\AppData\Local\Temp\cargo-install.x08urL3FwU1p\release\build\tantivy-55cf4812ba42e161\build-script-build (exit code: 101)
--- stderr
thread 'main' panicked at 'Failed to make simdcomp: Не удается найти указанный файл. (os error 2)', C:\Users\chvl.cargo\registry\src\github.com-1ecc6299db9ec823\tantivy-0.2.0\build.rs:11
note: Run with RUST_BACKTRACE=1 for a backtrace.


D:\Projects>cargo -V
cargo 0.15.0-nightly (298a012 2016-12-20)

D:\Projects>rustc --version
rustc 1.14.0 (e8a012324 2016-12-16)

Help me please

thread 'main' panicked at 'Unknown error while starting watching directory "wikipedia-index" [...] tantivy-0.9.1/src/directory/mmap_directory.rs:167:17

Hi there :) I went through the example in the readme and got an error while creating the schema.

It installed successfully with:

cargo install tantivy-cli

So I made the directory:

mkdir wikipedia-index

And went through the interactive schema creation:

export RUST_BACKTRACE=1; tantivy new -i wikipedia-index

Creating new index 
Let's define it's schema! 



New field name  ? title
Text or unsigned 32-bit integer (T/I) ? t
Should the field be stored (Y/N) ? y
Should the field be indexed (Y/N) ? y
Should the term be tokenized? (Y/N) ? y
Should the term frequencies (per doc) be in the index (Y/N) ? y
Should the term positions (per doc) be in the index (Y/N) ? y
Add another field (Y/N) ? y



New field name  ? body
Text or unsigned 32-bit integer (T/I) ? t
Should the field be stored (Y/N) ? y
Should the field be indexed (Y/N) ? y
Should the term be tokenized? (Y/N) ? y
Should the term frequencies (per doc) be in the index (Y/N) ? y
Should the term positions (per doc) be in the index (Y/N) ? y
Add another field (Y/N) ? y



New field name  ? url
Text or unsigned 32-bit integer (T/I) ? t
Should the field be stored (Y/N) ? y
Should the field be indexed (Y/N) ? n
Add another field (Y/N) ? n

[
  {
    "name": "title",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": true
    }
  },
  {
    "name": "body",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": true
    }
  },
  {
    "name": "url",
    "type": "text",
    "options": {
      "indexing": null,
      "stored": true
    }
  }
]

thread 'main' panicked at 'Unknown error while starting watching directory "wikipedia-index"', /home/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.9.1/src/directory/mmap_directory.rs:167:17
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
   1: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:70
             at src/libstd/sys_common/backtrace.rs:58
             at src/libstd/panicking.rs:200
   2: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:215
             at src/libstd/panicking.rs:478
   3: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:385
   4: std::panicking::begin_panic_fmt
             at src/libstd/panicking.rs:340
   5: tantivy::directory::mmap_directory::MmapDirectory::new
             at /home/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.9.1/src/directory/mmap_directory.rs:167
             at /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/libcore/result.rs:520
             at /home/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.9.1/src/directory/mmap_directory.rs:164
             at /home/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.9.1/src/directory/mmap_directory.rs:236
             at /home/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.9.1/src/directory/mmap_directory.rs:263
   6: tantivy::directory::mmap_directory::MmapDirectory::open
             at /home/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.9.1/src/directory/mmap_directory.rs:294
   7: tantivy::commands::new::run_new_cli
             at /home/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.9.1/src/core/index.rs:100
             at /home/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.9.0/src/commands/new.rs:16
   8: tantivy::main
             at /home/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-cli-0.9.0/src/main.rs:139
   9: std::rt::lang_start::{{closure}}
             at /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/libstd/rt.rs:64
  10: main
  11: __libc_start_main
  12: _start

I'm using Ubuntu 16.04 in case that matters. I'll probably end up using the actual library anyway but it would have been nice to test out the cli program first.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.