benashford / rs-es Goto Github PK

View Code? Open in Web Editor NEW

217.0 217.0 44.0 1.06 MB

A Rust client for the ElasticSearch REST API

License: Apache License 2.0

Rust 100.00%

elasticsearch-client rust

rs-es's Introduction

rs-es

Introduction

An ElasticSearch client for Rust via the REST API. Targetting ElasticSearch 2.0 and higher.

Other clients

For later versions of ElasticSearch you probably want the official client.

Documentation

Full documentation for rs-es.

Building and installation

Version 0.12.0 and higher have been tested with the prevailing "stable", "beta" and "nightly" versions of rustc at the time of their release. It should also work correctly with earlier versions, however some dependencies may use or require language features that are only available in recent versions of rustc.

crates.io

Available from crates.io.

ElasticSearch compatibility

The default version of ElasticSearch supported is 2.0. Higher versions will also work as long as the particular part of the ES API is compatible with the version 2 spec.

Newer versions of ElasticSearch do have some incompatibilities in some areas, therefore these are not supported by this library.

However, starting with version 0.12.1 there is experimental support for ES 5 using the es5 feature flag. The intention is this support will become more complete over time and will become the new baseline supported compatible version.

Design goals

There are two primary goals: 1) to be a full implementation of the ElasticSearch REST API, and 2) to be idiomatic both with ElasticSearch and Rust conventions.

The second goal is more difficult to achieve than the first as there are some areas which conflict. A small example of this is the word type, this is a word that refers to the type of an ElasticSearch document but it also a reserved word for definining types in Rust. This means we cannot name a field type for instance, so in this library the document type is always referred to as doc_type instead.

Usage guide

The client

The Client wraps a single HTTP connection to a specified ElasticSearch host/port.

(At present there is no connection pooling, each client has one connection; if you need multiple connections you will need multiple clients. This may change in the future).

use rs_es::Client;

let mut client = Client::init("http://localhost:9200");

Operations

The Client provides various operations, which are analogous to the various ElasticSearch APIs.

In each case the Client has a function which returns a builder-pattern object that allows additional options to be set. The function itself will require mandatory parameters, everything else is on the builder (e.g. operations that require an index to be specified will have index as a parameter on the function itself).

An example of optional parameters is routing. The routing parameter can be set on operations that support it with:

op.with_routing("user123")

See the ElasticSearch guide for the full set of options and what they mean.

`index`

An implementation of the Index API.

let index_op = client.index("index_name", "type_name");

Returned is an IndexOperation to add additional options. For example, to set an ID and a TTL:

index_op.with_id("ID_VALUE").with_ttl("100d");

The document to be indexed has to implement the Serialize trait from the serde library. This can be achieved by either implementing or deriving that on a custom type, or by manually creating a Value object.

Calling send submits the index operation and returns an IndexResult:

index_op.with_doc(&document).send();

`get`

An implementation of the Get API.

Index and ID are mandatory, but type is optional. Some examples:

// Finds a document of any type with the given ID
let result_1 = client.get("index_name", "ID_VALUE").send();

// Finds a document of a specific type with the given ID
let result_2 = client.get("index_name", "ID_VALUE").with_doc_type("type_name").send();

`delete`

An implementation of the Delete API.

Index, type and ID are mandatory.

let result = client.delete("index_name", "type_name", "ID_VALUE").send();

`refresh`

Sends a refresh request.

use rs_es::Client;

let mut client = Client::init("http://localhost:9200").expect("connection failed");
// To everything
let result = client.refresh().send();

// To specific indexes
let result = client.refresh().with_indexes(&["index_name", "other_index_name"]).send();

`search_uri`

An implementation of the Search API using query strings.

Example:

use rs_es::Client;

let mut client = Client::init("http://localhost:9200").expect("connection failed");
let result = client.search_uri()
                   .with_indexes(&["index_name"])
                   .with_query("field:value")
                   .send::<String>();

`search_query`

An implementation of the Search API using the Query DSL.

use rs_es::Client;
use rs_es::query::Query;

let mut client = Client::init("http://localhost:9200").expect("connection failed");
let result = client.search_query()
                   .with_indexes(&["index_name"])
                   .with_query(&Query::build_match("field", "value").build())
                   .send::<String>();

A search query also supports scan and scroll, sorting, and aggregations.

`count_uri`

An implementation of the Count API using query strings.

Example:

use rs_es::Client;

let mut client = Client::init("http://localhost:9200").expect("connection failed");
let result = client.count_uri()
                   .with_indexes(&["index_name"])
                   .with_query("field:value")
                   .send();

`count_query`

An implementation of the Count API using the Query DSL.

use rs_es::Client;
use rs_es::query::Query;

let mut client = Client::init("http://localhost:9200").expect("connection failed");
let result = client.count_query()
                   .with_indexes(&["index_name"])
                   .with_query(&Query::build_match("field", "value").build())
                   .send();

`bulk`

An implementation of the Bulk API. This is the preferred way of indexing (or deleting, when Delete-by-Query is removed) many documents.

use rs_es::operations::bulk::Action;

let result = client.bulk(&vec![Action::index(document1),
                               Action::index(document2).with_id("id")]);

In this case the document can be anything that implements ToJson.

Sorting

Sorting is supported on all forms of search (by query or by URI), and related operations (e.g. scan and scroll).

use rs_es::Client;
use rs_es::query::Query;
use rs_es::operations::search::{Order, Sort, SortBy, SortField};

let mut client = Client::init("http://localhost:9200").expect("connection failed");
let result = client.search_query()
                   .with_query(&Query::build_match_all().build())
                   .with_sort(&Sort::new(vec![
		       SortBy::Field(SortField::new("fieldname", Some(Order::Desc)))
		   ]))
                   .send::<String>();

This is quite unwieldy for simple cases, although it does support the more exotic combinations that ElasticSearch supports; so there are also a number of convenience functions for the more simple cases, e.g. sorting by a field in ascending order:

// Omitted the rest of the query
.with_sort(&Sort::field("fieldname"))

Results

Each of the defined operations above returns a result. Specifically this is a struct that is a direct mapping to the JSON that ElasticSearch returns.

One of the most common return types is that from the search operations, this too mirrors the JSON that ElasticSearch returns. The top-level contains two fields, shards returns counts of successful/failed operations per shard, and hits contains the search results. These results are in the form of another struct that has two fields total the total number of matching results; and hits which is a vector of individual results.

The individual results contain meta-data for each hit (such as the score) as well as the source document (unless the query set the various options which would disable or alter this).

The type of the source document can be anything that implemented Deserialize. ElasticSearch search may return many different types of document, it also doesn't (by default) enforce any schema, this together means the structure of a returned document may need to be validated before being deserialised. In this case a search result can return a Value from that data can be extracted and/or converted to other structures.

The Query DSL

ElasticSearch offers a rich DSL for searches. It is JSON based, and therefore very easy to use and composable if using from a dynamic language (e.g. Ruby); but Rust, being a staticly-typed language, things are different. The rs_es::query module defines a set of builder objects which can be similarly composed to the same ends.

For example:

use rs_es::query::Query;

let query = Query::build_bool()
    .with_must(vec![Query::build_term("field_a",
                                      "value").build(),
                    Query::build_range("field_b")
                          .with_gte(5)
                          .with_lt(10)
                          .build()])
    .build();

The resulting Query value can be used in the various search/query functions exposed by the client.

The implementation makes much use of conversion traits which are used to keep a lid on the verbosity of using such a builder pattern.

Scan and scroll

When working with large result sets that need to be loaded from an ElasticSearch query, the most efficient way is to use scan and scroll. This is preferred to simple pagination by setting the from option in a search as it will keep resources open server-side allowing the next page to literally carry-on from where it was, rather than having to execute additional queries. The downside to this is that it does require more memory/open file-handles on the server, which could go wrong if there were many un-finished scrolls; for this reason, ElasticSearch recommends a short time-out for such operations, after which it will close all resources whether the client has finished or not, the client is responsible to fetch the next page within the time-out.

To use scan and scroll, begin with a search query request, but instead of calling send call scan:

let scan = client.search_query()
                 .with_indexes(&["index_name"])
                 .with_query(Query::build_match("field", "value").build())
                 .scan(Duration::minutes(1))
                 .unwrap();

(Disclaimer: any use of unwrap in this or other example is for the purposes of brevity, obviously real code should handle errors in accordance to the needs of the application.)

Then scroll can be called multiple times to fetch each page. Finally close will tell ElasticSearch the scan has finished and it can close any open resources.

let first_page = scan.scroll(&mut client);
// omitted - calls of subsequent pages
scan.close(&mut client).unwrap();

The result of the call to scan does not include a reference to the client, hence the need to pass in a reference to the client in subsequent calls. The advantage of this is that that same client could be used for actions based on each scroll.

Scan and scroll with an iterator

Also supported is an iterator which will scroll through a scan.

let scan_iter = scan.iter(&mut client);

The iterator will include a mutable reference to the client, so the same client cannot be used concurrently. However the iterator will automatically call close when it is dropped, this is so the consumer of such an iterator can use iterator functions like take or take_while without having to decide when to call close.

The type of each value returned from the iterator is Result<SearchHitsHitsResult, EsError>. If an error is returned than it must be assumed the iterator is closed. The type SearchHitsHitsResult is the same as returned in a normal search (the verbose name is intended to mirror the structure of JSON returned by ElasticSearch).

Aggregations

Experimental support for aggregations is also supported.

client.search_query().with_indexes(&[index_name]).with_aggs(&aggs).send();

Where aggs is a rs_es::operations::search::aggregations::Aggregations, for convenience sake conversion traits are implemented for common patterns; specifically the tuple (&str, Aggregation) for a single aggregation, and Vec<(&str, Aggregation)> for multiple aggregations.

Bucket aggregations (i.e. those that define a bucket that can contain sub-aggregations) can also be specified as a tuple (Aggregation, Aggregations).

use rs_es::operations::search::aggregations::Aggregations;
use rs_es::operations::search::aggregations::bucket::{Order, OrderKey, Terms};
use rs_es::operations::search::aggregations::metrics::Min;

let aggs = Aggregations::from(("str",
                               (Terms::field("str_field").with_order(Order::asc(OrderKey::Term)),
                                Aggregations::from(("int",
                                                    Min::field("int_field"))))));

The above would, when used within a search_query operation, generate a JSON fragment within the search request:

"str": {
    "terms": {
        "field": "str_field",
        "order": {"_term": "asc"}
    },
    "aggs": {
        "int": {
            "field": "int_field"
        }
    }
}

The majority, but not all aggregations are currently supported. See the documentation of the aggregations package for details.

For example, to get the a reference to the result of the Terms aggregation called str (see above):

let terms_result = result.aggs_ref()
    .unwrap()
    .get("str")
    .unwrap()
    .as_terms()
    .unwrap()

EXPERIMENTAL: the structure of results may change as it currently feels quite cumbersome.

Unimplemented features

The ElasticSearch API is made-up of a large number of smaller APIs, the vast majority of which are not yet implemented, although the most frequently used ones (searching, indexing, etc.) are.

Some, non-exhaustive, specific TODOs

Add a CONTRIBUTING.md
Handling API calls that don't deal with JSON objects.
Documentation.
Potentially: Concrete (de)serialization for aggregations and aggregation results
Metric aggregations can have an empty body (check: all or some of them?) when used as a sub-aggregation underneath certain other aggregations.
Performance (ensure use of persistent HTTP connections, etc.).
All URI options are just String (or things that implement ToString), sometimes the values will be arrays that should be coerced into various formats.
Check type of "timeout" option on Search...

Licence

   Copyright 2015-2017 Ben Ashford

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

rs-es's People

Contributors

Stargazers

Watchers

Forkers

budabudimir zhangf911 astro tempbottle chills42 roxasshadow kodraus paulgrandperrin texitoi xlqian lexi-sh ph antoine-de alexxnica kryndex josephdunne roki1988 carevoyance nbigaouette-eai thedodd jmatraszek daph rrichardson alexkornitzer psox seanbe oussama amatissart guillaumegomez howtocards yaanhyy kumarh1982 berkes stargateur joymoe zzl221000 ajunlonglive standardgalactic jeffreyzzj playfloor julien-fruteau

rs-es's Issues

Re-enable Nightly

The Nightly Travis build is currently disabled due to an issue with an upstream dependency (Serde, in particular Quasi) being built on Nightly.

We should upgrade the dependency as soon as a fix is available, and re-enable the nightly build.

Support Inner Hits

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html

Implement Top Hits aggregation

Set an index for a field

Hello,

I was wondering how I can set an index for a certain field. Basically I have an array of strings work_roles that appears in the following way

"work_roles": {
    "type": "string"
}

and my goal is to make it in this way:

"work_roles": {
    "type": "string",
    "index": "not_analyzed"
}

Unfortunately I wasn't able to find anything in the docs. Am I missing something?

I guess it should be something like this.

Implement Multi GET

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-multi-get.html

Reduce use of macros in processing results of Bucket aggregations.

There are a lot of macros defined in .../aggregations/bucket.rs, many of which could be simple functions.

Support Percolate

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html

Support Version in searches

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-version.html

the trait core::convert::From<rs_es::query::BoolFilter> is not implemented for the type Box<rs_es::query::Filter>

Code

use rs_es::Client;
use rs_es::operations::search::{Sort, SortField, Order};
use rs_es::query::{Filter, Query};

// [...]
fn main() {
  // [...]

  // I'm just trying to compile this, it's the snippet inside the README 
  let query = Query::build_filtered(
    Filter::build_bool()
      .with_must(vec![
        Filter::build_term("field_a", "value").build(),
        Filter::build_range("field_b")
          .with_gte(5)
          .with_lt(10)
          .build()
      ])
  )
  .with_query(Query::build_query_string("some value").build())
  .build();

  let result = es.search_query()
    .with_indexes(ES_INDEXES)
    //.with_query(&Query::build_match_all().build())
    .with_query(&query)
    .with_sort(&Sort::new(vec![
      SortField::new("updated_at", Some(Order::Desc)).build()
    ]))
    .send()
    .ok()
    .unwrap();
}

Error

src/main.rs:26:15: 26:36 error: the trait core::convert::From<rs_es::query::BoolFilter> is not implemented for the type Box<rs_es::query::Filter> [E0277]
src/main.rs:26 let query = Query::build_filtered(
^~~~~~~~~~~~~~~~~~~~~
src/main.rs:26:15: 26:36 help: run rustc --explain E0277 to see a detailed explanation
src/main.rs:26:15: 26:36 note: required by rs_es::query::Query::build_filtered
error: aborting due to previous error
Could not compile sample.

Caused by:
Process didn't exit successfully: rustc src/main.rs --crate-name sample --crate-type bin -g --out-dir /Users/giovanni/Desktop/sample/target/debug --emit=dep-info,link -L dependency=/Users/giovanni/Desktop/sample/target/debug -L dependency=/Users/giovanni/Desktop/sample/target/debug/deps --extern postgres=/Users/giovanni/Desktop/sample/target/debug/deps/libpostgres-2680c17c75628e6e.rlib --extern rustc_serialize=/Users/giovanni/Desktop/sample/target/debug/deps/librustc_serialize-79a17eda1cd94e46.rlib --extern rs_es=/Users/giovanni/Desktop/sample/target/debug/deps/librs_es-5ee9d634caeceacd.rlib --extern postgres_array=/Users/giovanni/Desktop/sample/target/debug/deps/libpostgres_array-06c7843d8da23f1e.rlib -L native=/Users/giovanni/Desktop/sample/target/debug/build/openssl-09576f2f9776fa80/out -L native=/Users/giovanni/Desktop/sample/target/debug/build/openssl-sys-extras-52d5315fb71d3c6d/out (exit code: 101)

Platform
Machine

Darwin lifestream.Speedport_W723_V_Typ_A_1_01_011 15.3.0 Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64 x86_64

Version: 2.1.1, Build: 40e2c53/2015-12-15T13:05:55Z, JVM: 1.8.0_66

Rust

rustc 1.6.0

rs-es
Both 0.2 and master

Implement Search templates

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-template.html

If a way can be found to implement in a way that's compatible with idiomatic Rust.

search query with different result types

ElasticSearch can return objects from different types in a query. I think it's perfect to represent this with a Rust enum but I don't see how to manage this with rs-es 😕

with json it might be a bit clearer.

let's take a elastic search query returning 2 results, each of a different type:

{
    "_shards": {
       ...
    }, 
    "hits": {
        "hits": [
            {
                "_id": "AVQQZtpeUZ0aRT9bKks0", 
                "_index": "my_index", 
                "_score": 1.3870806, 
                "_source": {
                    "name": "toto"
                }, 
                "_type": "FirstType"
            }, 
            {
                "_id": "AVQQZtfmUZ0aRT9bKiL6", 
                "_index": "my_index", 
                "_score": 1.3752246, 
                "_source": {
                    "age": 12
                }, 
                "_type": "SecondType"
            }
        ]
     }
}

I would like to represent this in rs-es with

struct FirstType {
    name: String
}
struct SecondType {
    age: i32
}
enum Document {
    FirstType(FirstType),
    SecondType(SecondType)
}

and the es result would be SearchResult<Document>

The serde serialization could use the elastic search _type field to build the right type.

The problem is that this _type field is at the same level as the _source field, so it is not available when defining the serialization of Document.
I thus don't see a way to handle this apart from handling it directly in rs-es (and I'm not sure to see how to do this 😖 ).

Do you see an easy way to do this ? I'm willing to implement it but I don't see a clean and non api-breaking way to do this.

Reduce duplication with "scripted" fields

The Sort By Script option when searching, the Function query, and scripted aggregations all have a concept of a script. These appear to be mostly similar in terms of options and generated JSON. We should de-duplicate these implementations as they may make as-yet-to-be-implemented queries easier to implement.

Support Index Boost

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-index-boost.html

Implement Multi Search API

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html

Miscellaneous build failures on latest Rust, and even more on nightly

Investigate and fix...

The utility of them is not obvious, however.

Support for Elasticsearch 2.x

Do you have plans to support Elasticsearch 2.x? There are a bunch of changes, but it'd be good to be able to run against the latest version.

Add convenient option for deleting using Scan-and-Scroll that doesn't fetch the source document

e.g. use query_then_fetch to 0

Increase Hyper dependency to 0.9

Implement the breaking changes in the Search API for ES 2.0

https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_search_changes.html

Implement Highlighting

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html

Implement More Like This query

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html

Replace rustc_serialize with Serde

Serde is being pitched as the replacement for rustc_serialize.

Implement Post Filters

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-post-filter.html

Support Explain on searches

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-explain.html

(documentation doesn't explain what is returned, so some exploration will be required)

Implement Suggesters

Implement Validate API

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-validate.html

could not compile with serde_macros depdendency

please take a look at serde-deprecated/quasi#35 (comment) i've exacly the same problem, but couldn't solve this by using

[dependencies.rs-es]
version = "0.3.2"
features = ["nightly_without_ssl"]

Implement Count API

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-count.html

Implement Update API

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html

Implement the breaking changes in Aggregations for ES 2.0

https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_aggregation_changes.html

Support Shard Preference

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-preference.html

Tests for Geo queries

Specific tests would be useful for Geo queries

Implement Rescoring

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-rescore.html

Implement significant terms aggregation

The documentation implies it is an experimental feature, so it may be best to wait until it stabilises.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html

Implement Cluster APIs

https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster.html

Implement script fields

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html

Improve error handling

Currently the errors returned from any function that returns a Result are all EsError. There are five varieties of these depending on whether the error was generated: 1) internally within rs-es, 2) by ElasticSearch itself, 3) from the HTTP library, 4) miscellaneous IO errors, and 5) JSON encoding/decoding errors.

In particular server-side errors can convey a lot of useful information. Rather than converting to a String we should considering parsing the JSON response and making that available.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.