Code Monkey home page Code Monkey logo

elastomer-client's Introduction

ElastomerClient CI build Workflow

Making a stupid simple Elasticsearch client so your project can be smarter!

Client

The client provides a one-to-one mapping to the Elasticsearch API endpoints. The API is decomposed into logical sections and accessed according to what you are trying to accomplish. Each logical section is represented as a client class and a top-level accessor is provided for each.

Cluster

API endpoints dealing with cluster level information and settings are found in the Cluster class.

require 'elastomer_client/client'
client = ElastomerClient::Client.new

# the current health summary
client.cluster.health

# detailed cluster state information
client.cluster.state

# the list of all index templates
client.cluster.templates

Index

The methods in the Index class deal with the management of indexes in the cluster. This includes setting up type mappings and adjusting settings. The actual indexing and search of documents are handled by the Docs class (discussed next).

require 'elastomer_client/client'
client = ElastomerClient::Client.new

index = client.index('books')
index.create(
  :settings => { 'index.number_of_shards' => 3 },
  :mappings => {
    :_source => { :enabled => true },
    :properties => {
      :author => { :type => 'keyword' },
      :title  => { :type => 'text' }
    }
  }
)

index.exists?

index.delete

Docs

The Docs class handles the indexing and searching of documents. Each instance is scoped to an index and optionally a document type.

require 'elastomer_client/client'
client = ElastomerClient::Client.new

docs = client.docs('books')

docs.index({
  :_id    => 1,
  :author => 'Mark Twain',
  :title  => 'The Adventures of Huckleberry Finn'
})

docs.search({:query => {:match_all => {}}})

Performance

By default ElastomerClient uses Net::HTTP (via Faraday) to communicate with Elasticsearch. You may find that Excon performs better for your use. To enable Excon, add it to your bundle and then change your ElastomerClient initialization thusly:

ElastomerClient::Client.new(url: YOUR_ES_URL, adapter: :excon)

Retries

You can add retry logic to your Elastomer client connection using Faraday's Retry middleware. The ElastomerClient::Client.new method can accept a block, which you can use to customize the Faraday connection. Here's an example:

retry_options = {
  max: 2,
  interval: 0.05,
  methods: [:get]
}

ElastomerClient::Client.new do |connection|
  connection.request :retry, retry_options
end

Compatibility

This client is tested against:

  • Ruby version 3.2
  • Elasticsearch versions 5.6 and 8.13

Development

Get started by cloning and running a few scripts:

Bootstrap the project

script/bootstrap

Start an Elasticsearch server in Docker

To run ES 5 and ES 8:

docker compose --project-directory docker --profile all up

To run only ES 8:

docker compose --project-directory docker --profile es8 up

To run only ES 5:

docker compose --project-directory docker --profile es5 up

Run tests against a version of Elasticsearch

ES 8

ES_PORT=9208 rake test

ES 5

ES_PORT=9205 rake test

Releasing

  1. Create a new branch from main
  2. Bump the version number in lib/elastomer/version.rb
  3. Update CHANGELOG.md with info about the new version
  4. Commit your changes and tag the commit with a version number starting with the prefix "v" e.g. v4.0.2
  5. Execute rake build. This will place a new gem file in the pkg/ folder.
  6. Run gem install pkg/elastomer-client-{VERSION}.gem to install the new gem locally
  7. Start an irb session, require "elastomer/client" and make sure things work as you expect
  8. Once everything is working as you expect, push both your commit and your tag, and open a pull request
  9. Request review from a maintainer and wait for the pull request to be approved. Once it is approved, you can merge it to main yourself. After that, pull down a fresh copy of main and then...
  10. [Optional] If you intend to release a new version to Rubygems, run rake release
  11. [Optional] If necessary, manually push the new version to rubygems.org
  12. 🕺 💃 🎉

elastomer-client's People

Contributors

abraham avatar bleything avatar brianmario avatar chrisbloom7 avatar chrismwendt avatar composerinteralia avatar dependabot[bot] avatar dewski avatar djdefi avatar ekroon avatar elireisman avatar fuentesjr avatar georgebrock avatar grantr avatar gregmefford avatar jasonkim avatar juruen avatar kag728 avatar kevinsawicki avatar lerebear avatar look avatar misalcedo avatar ndonewar avatar richa-d avatar shayfrendt avatar stephenotalora avatar tmm1 avatar twp avatar wags avatar zkoppert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elastomer-client's Issues

Handle version differences with URL parameter fields

Certain document hash keys are "special" and get converted into URL parameters rather than being indexed with the document.

These fields are added to a document. They get pulled out and added to the query
string (rather than being passed in a a parameter to the index method). Any unknown underscore fields are ignored and indexed as usual.

SPECIAL_KEYS = %w[index type id version version_type op_type routing parent timestamp ttl consistency replication refresh].freeze

These URL parameters are different between versions. _consistency got re-implemented as _wait_for_active_shards and has different values, so it's not a direct mapping.

Additionally, ES 5 no longer allows unknown URL parameters. So we need to not send it or force the caller to deal with the error. And vice-versa, there is a problem as well: if someone uses a new
parameter from ES 5, it will end up in a document as a field unless we pull it out and put it on the URL as a parameter (which ES 2 will ignore), or pull it out and ignore it.

We need to decide how to handle these differences in the library.

References:

Add response status to response object

The Elasticsearch 1.0 api is removing the ok attributes that have historically been used to detect successful responses. In several of these apis it is unclear how to detect a successful response from the response body.

I've resisted this so far, but I wonder if now might be the time to start returning a Response object instead of the raw response hash. My initial vision for this is as simple as possible:

class Response < Hash
  attr_accessor :status, :headers
end

This would allow api consumers to do things like:

response = docs.index(document)
if response.status != 201
   # fail
end

I'd avoid adding abstractions like success? unless there was a clear benefit from having them in the low level client. IMO those kinds of things should be at a higher level (even though we don't have a higher level yet). The Response object is necessary in the client because the information would otherwise be unretrievable.

/cc @github/search

Incorporate new ES 5.X features

We need to pull in new features from ES 5.X that are not yet supported by the gem. When the gem is used with an ES 2.X cluster, a sensible empty result should be returned from these methods.

  • Delete by query docs @look #185
  • Task management API docs @elireisman #187
  • Raise IllegalArgument exceptions when returned from Elasticsearch #184
  • Upgrades tp + github/github deployment(s) of elastomer-client to pull in VersionSupport shims for ES2~ES5 transition period @elireisman @look (multiple patches)
  • Pattern for parameter validation @TwP here

Cannot provide `routing` parameter to Bulk Index API when targeting Elasticsearch 7+

As per these docs a number of underscored parameters (such as _routing and _retry_on_conflict) have been removed from certain Elasticsearch (ES) APIs as of ES 7.

However, it appears that this client will rewrite non-underscored versions of some of these params to underscored versions before forwarding them to ES:

SPECIAL_KEYS = %w[id type index version version_type routing parent timestamp ttl consistency refresh retry_on_conflict]
SPECIAL_KEYS_HASH = SPECIAL_KEYS.inject({}) { |h, k| h[k] = "_#{k}"; h }
# Internal: convert special key parameters to their wire representation
# and apply any override document parameters.
def prepare_params(document, params)
params = convert_special_keys(params)
if document.is_a? Hash
params = from_document(document).merge(params)
end
params.delete(:_id) if params[:_id].nil? || params[:_id].to_s.empty?
params
end

That prevents consumers of this library from specifying a routing parameter to the Bulk Index API of ES 7+, because an attempted use of the API as shown below will fail because the routing parameter gets converted to _routing:

index.bulk do |request|
  request.index({ foo: "bar" }, routing: "123")
end

The error I've observed looks like this:

Elastomer::Client::IllegalArgument: {"root_cause"=>[{"type"=>"illegal_argument_exception", "reason"=>"Action/metadata line [1] contains an unknown parameter [_routing]"}], "type"=>"illegal_argument_exception", "reason"=>"Action/metadata line [1] contains an unknown parameter [_routing]"}

Add setting to automatically refresh the index

Lots of nondeterministic test failures are caused by forgetting to refresh the index after writing or before reading. It would be convenient if elastomer-client had a setting which handled refreshing automatically.

For performance, it could be "smart" and only refresh if a write has happened since the last refresh.

Running script/test fails

When I run script/test, I get the following error message:

bundler: command not found: testrb
Install missing gem executables with `bundle install`

This is despite a successful run of script/bootstrap:

Gem environment up-to-date
Installing hooks into /home/administrator/elastomer-client
Successfully installed hooks into /home/administrator/elastomer-client

I installed Ruby 2.2.4 using rbenv on Ubuntu 14.04.

Has anyone encountered this error before? I have not been able to determine if it is a missing gem issue or something else. Thank you!

Create a docs folder

This is ambitious, but I would love to have a docs folder that walks a brand new user through the basics of ElasticSearch. The goal of this is a pedagogical tool where the user could learn by doing.

The docs would walk a user through setting up ElasticSearch, creating an index, adding documents, searching, etc. Topics would be introduced slowly and we could link to the ElasticSearch guide to help people figure out how to use that resource, too.

Just a thought. It's a big effort that would require some high-level planning to get going.

cc @github/search

Tests fail on Ruby 2.6+ with "No server available" error

Steps to reproduce:

git pull origin master
script/bootstrap
bundle exec rake test

...
/Users/chrisbloom7/src/github/elastomer-client/test/test_helper.rb:42:in `<top (required)>': No server available at http://localhost:9200 (RuntimeError)
rake aborted!

Triggered by call to $client.available?. When trying to connect with same params from script/console the same check returns true.

Doing some byebug spelunking I tracked it back to an unhandled ArgumentError: unknown keyword: write_timeout exception in

def ping
response = head "/", action: "cluster.ping"
response.success?
rescue StandardError
false
end
alias_method :available?, :ping
, which tracks back to this call in the standard Net::HTTP lib for Ruby 2.6

      @socket = BufferedIO.new(s, read_timeout: @read_timeout,
                               write_timeout: @write_timeout,
                               continue_timeout: @continue_timeout,
                               debug_output: @debug_output)

A Google search reveals the same error message reported in an issue in Webmock around the time Ruby 2.6 was in RC. bblimke/webmock#786 Webmock 3.5 was released to address it.

Remove the docs.add alias

Is this used anywhere? I saw it used in an elastomer test and was surprised that it exists.

If it's not needed for legacy code, I say remove it. Better to be consistent with the ES action names and be clear that there is no 'add' action.

Attempting to delete a document with id nil deletes the type or index

We should tighten our url generation so that nil parameters can't accidentally delete higher level objects.

>> es = Elastomer::Client.new(:url => "http://localhost:19200")
=> #<Elastomer::Client:0x007f91e31e1500 @url="http://localhost:19200", @host="localhost", @port=19200, @read_timeout=4, @open_timeout=2, @adapter=:excon>
>> es.index('hello').create({})
=> {"ok"=>true, "acknowledged"=>true}
>> es.index('hello').exists?
=> true
>> es.index('hello').docs.delete(:id => nil)
=> {"ok"=>true, "acknowledged"=>true}
>> es.index('hello').exists?
=> false

StringIO can change `body` String encodings

We ran into an issue where the encoding of a body string passed to a GET request was being changed by the underlying Excon Faraday adapter. The encoding was changed from UTF-8 to ASCII, and it caused exceptions further up the stack when we attempted to render analytics information about individual Elasticsearch requests.

Excon wraps the body of a request with a StringIO instance. This allows Excon to read the body in chunks and send those chunks in the HTTP request. Excon calls StringIO#binmode to force binary encoding such that the chunks are not affected by multibyte UTF-8 characters. The problem is that the encoding of the underlying String is also changed by this binmode call.

We should modify the Elastomer::Client#extract_body method to return frozen Strings. This will prevent this type of error, and others like it, from occurring in the future.

Upgrade Scroller to support arbitrary URL parameters

Scroller currently only supports a handful of parameters. This prevents things like query string parameters from working on scroll requests:

@index = opts.fetch(:index, nil)
@type = opts.fetch(:type, nil)
@scroll = opts.fetch(:scroll, '5m')
@size = opts.fetch(:size, 50)
@search_type = opts.fetch(:search_type, nil)

        @index       = opts.fetch(:index, nil)
        @type        = opts.fetch(:type, nil)
        @scroll      = opts.fetch(:scroll, '5m')
        @size        = opts.fetch(:size, 50)
        @search_type = opts.fetch(:search_type, nil)

The original opts just need to be propagated through the methods.

Large scroll IDs break scan queries

When I ran a scan/scroll query against a 0.90.5 cluster today, it was failing in a very odd fashion. The initial scan search request worked just fine, but the subsequent scroll request was failing with our beloved OpaqueIdError.

I tracked this down to a very large scroll ID being returned by ElasticSearch. When we send this very large scroll ID via a URL param, it is causes some sort of problem with haproxy or another part of our infrastructure - the URL is just too long. The solution is to send the scroll ID as the body of the request. The ES documentation for the scan/scroll queries actually recommend this usage.

@github/search

Bulk helper should instrument entire operation

The Bulk helper can issue multiple requests behind the scenes if requested. In this case we should instrument the entire operation as a single action in addition to the underlying scan/bulk request actions.

Converting underscore-prefixed SPECIAL_KEYS into params is confusing

Allowing the user of this client to put underscore-prefixed params (e.g. _routing) into the query document, then extracting them out before sending the request to Elasticsearch is confusing because it violates the "one-to-one mapping" property of this client. Elasticsearch itself doesn't support such underscore-prefixed params, so I'm curious why this behavior was introduced. This behavior is also inconsistent between Docs methods (e.g. get doesn't have this behavior, while index does).

The culprit is from_document, which implicates methods like docs.index.

Request for Comment

After fighting against the Tire gem for the past year or so, I'm really craving a stupid simple ElasticSearch client that has a one-to-one correlation with the ElasticSearch API. When the github/id was started, I recommended using the eson gem. It is a beautifully crafted gem that provides the one-to-one mapping, but it is completely frustrating to use - lot's of clever code in there that is poorly documented.

The only clever thing I'm doing in this client code is to break down the ElasticSearch API into logical chunks. All the API calls that deal with cluster management are handled in a Client::Cluster class. All the API calls that deal with index management are handled in a Client::Index class.

I'd love to hear people's thoughts on this approach in general. Also having some more eyes on this code would be much appreciated.

After the client code is fleshed out, all of our search niceties from .com will be pulled into the gem so the code can be used in gists2 and other places. A short list of things to pull in:

  • index management code (mappings, settings, creation, etc)
  • adapter code for getting information out of ActiveRecord models and into searchable documents
  • query parsing
  • filter building

cc @github/search @jbarnette @wfarr

Logging

We need to add some logging to the Client class. For each request the client makes, we should output a detailed debug message to a logger of some sort. I really like how Tire does it; logging a curl command that you can copy/paste and use directly. Also, outputting the status code of the response would be good, too ...

[200] curl -XGET 'http://localhost:9200/index/_search' -d '...'

Not sure how to handle large bodies and what not. And I don't think we need to log responses.

cc @github/search

Get all unit tests passing for ES 2.4 and ES 5.6

Now that we have a CI environment configured for Elasticsearch 5.6, we need to get all the unit tests passing for both ES 2.4 and ES 5.6. The end goal is that the elastomer-client gem be compatible with both versions while breaking as little backwards compatibility as possible.

After we have green tests we can proceed to pull in new features from ES 5.X into the gem.

Things To Fix

  • Warmers aren't a thing anymore in 5.x @look #161
  • :payloads at index-time are not support @juruen #155
  • Scan/scroll changes @juruen #163
  • Index mappings: string type becomes text for analyzed string fields, keyword for non-analyzed (1st to come up is title but there will be lots) @look #166
  • filtered query deprecation (replaced with bool queries) @look #169
  • misc field naming convention changes in mappings (.percolator) @elireisman #167
  • analyzer needs to be passed as a json object in the body (in Index#analyze from Elastomer::Client::Index#test_0009_analyzes text and returns tokens) @juruen #168
  • Suggest output removed (output in Elastomer::Client::Index::when an index exists#test_0014_performs suggestion queries) @look #173
  • Expected param change (consistency in Elastomer::Client::Docs#test_0004_extracts underscore attributes from the document) @look #178
  • seeing some errors using client accessor in Warmer tests now, probably just need to use test-global $client here? @elireisman #172
  • Exceptions aren't raised the same way ([Elastomer::Client::QueryParsingError] exception expected, not Class: <Elastomer::Client::RequestError>) @look #175

Elasticsearch 5.6 Travis CI build

To start the upgrade work for the elastomer-client gem, it would be very helpful to get an Elasticsearch 5.6 Travis CI build script up and running. We can then progressively make all the tests go green from there.

I am working through our list of breaking changes in ES 5.X and identifying what we need to update here in the gem. But a failing CI build will help that effort along.

Build on Travis CI

If we're going to accept 3rd-party PRs, we need to build on travis because our internal ci is not secure. @github/atom-ci, you have experience running both travis and janky, and we'd welcome your thoughts here.

@bhuga and I talked today about possibly extending chatterbox with an org hook that watches build statuses, thus allowing us to do /chatterbox subscribe elastomer-client-status to get build statuses from travis in chat. No idea when or if that will happen, but I'm preserving it here just to note that it would be ✨.

/cc @github/search

Exclude source retrieval in delete_by_query

The delete_by_query call should exclude the document source from being retrieved when performing the scan operation. We only need the index, type, and id of the documents in order to delete them. Excluding source retrieval will reduce load on the search cluster, and it will reduce the size of our HTTP packets.

@query = query.dup
@query.delete :_source if @query.key? :_source
@query.delete "_source" if @query.key? "_source"
@query[:_source] = false 

Example implementation to put into the DeleteByQuery initializer.

/cc @chrismwendt @grantr

Proper versioning

Today I tried to do a bulk transfer in haystack and ran into issues using the action_count bulk chunking feature. Turns out the version in haystack wasn't new enough to have it! But I had to look at the gem source to find out.

Even if we're not 1.0 quality yet, we should still have 0.x version numbers and tagged releases to be a little more obvious with features.

No way to correlate queries with their notifications

I have just a few query types and no way to correlate them to the notifications that come back for measuring. For example, 'Get me the list of all users who have spoken in chat rooms you can search' is a very different query from 'Get me the messages that match this query'.

It would be great if the params on Docs#search would pass through to the payload in ActiveSupport::Notifications, so that I could do something like:

res = Chat::ES.messages_client.search(users_query, search_type: :count, category: :users)

And later:

ActiveSupport::Notifications.subscribe('request.client.elastomer') do |name, start_time, end_time, _, payload|

  puts ":tada:" if payload[:category] == :users
  # Or maybe payload[:params][:category]. yknow whatevs.
end

Test API coverage against the spec

ES has a well defined API spec here: https://github.com/elasticsearch/elasticsearch/tree/master/rest-api-spec/api

We should test our API coverage against specific versions. On that note, we should probably explicitly note which versions we support (where support means we strive for 100% coverage) and which we don't (coverage might be incomplete). We could even add methods in the client for determining whether the connected version of ES is officially supported.

Reopen the connection on timeout

When the read_timeout is reached in Elastomer::Client, an exception is raised and the Ruby code aborts the request handling process. The Elasticsearch cluster will continue processing the search request and will eventually send a response - the HTTP connection is still open. Subsequent calls using this same connection can return search results for a previous request that timed out. This scenario is the reason that the OpaqueId middleware exists.

A more proactive solution would be to not reuse the connection when a read timeout is reached. Instead, the connection should be discarded and a new connection established. This will prevent subsequent search requests from triggering the OpaqueId error condition.

/cc @grantr @tmm1

Percolate API

Percolate is not supported yet, but since it's changing completely in 1.0, maybe we should wait until that is released.

A friend of mine using percolate in production describes it as an ops nightmare currently, so I doubt we'll be needing it until the 1.0 revamp.

Breaking changes in Elasticsearch 1.0

Elasticsearch 1.0 breaks a bunch of stuff. Current list of changes is at http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/breaking-changes.html.

As far as I can tell, the following will require changes to Elastomer or github.com:

New stats url format

The stats urls now rely on url path instead of parameters for filtering. http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/_stats_and_info_apis.html

New admin api url formats

Some of the admin url formats have changed. http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/_indices_apis.html

Cannot index document with type wrapper

I don't think we ever did this, but if we did it's not allowed by default anymore. http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/_index_request.html

All search apis take a :query element in the body

Yay! This was the source of much ambiguity (and a few Elastomer bugs). http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/_search_requests.html

Top level :filter element renamed to :post_filter

We used to use this heavily in dotcom, less so now. http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/_search_requests.html

New multifield syntax

I believe we use these in a few places in dotcom. Minor syntax change. http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/_multi_fields.html

No english stopwords in standard and pattern analyzers

If we want to continue using stopwords, we'll have to add a stopword filter manually. http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/_stopwords.html

New fuzzy query parameters

I think we use fuzzy queries for autocomplete queries. http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/_parameters.html

No more ok elements in responses

I think haystack relies on this, and maybe some tests and other things? http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/_return_values.html

That's what stood out to me. Everyone working with ES should take a look and see if they'll have to change anything.

/cc @github/search

Shim library API for ES 2.x to 5.x upgrade transition

An Issue to track updates to elastomer-client lib to shim various API calls for the 2.x to 5.x ES upgrade. I ran the test suite against a localk ES 5.6 docker image and I get 69 fails that we can bucket into these fixes so far (WIP, I'll be updating and refining this list as we go):

  • Warmers aren't a thing anymore in 5.x
  • Scan/scroll changes
  • Index mappings: string type becomes text for analyzed string fields, keyword for non-analyzed (1st to come up is title but there will be lots)
  • Strict boolean field values
  • misc field naming convention changes in mappings (.percolator, _type)
  • Index mappings: invalid params (suggest's payload param came up 1st but there will be more)

@TwP @look

@github/platform-data

Add ability to trigger basic auth and token auth via config

We've monkey patched our local elastomer wrapper to inject basic auth credentials into the URL that is ultimately sent to the elastomer client. This seems like a useful feature that would be easy enough to add to the client directly and trigger if a basic_auth or token_auth param was passed as a parameter to the initializer

Consider utilizing Elasticsearch Transport for communication not standard HTTP API

For better performance, consider using the Elasticsearch Transport which handles connecting to multiple nodes in the cluster, rotating across connections, logging and tracing requests and responses, maintaining failed connections, discovering nodes in the cluster, and provides an abstraction for data serialization and transport.

This should also be able to allow the use of Excon, Patron and other adapters in a similar manner to what you have which are supported with Elasticsearch's official client

Make the hot_threads test deterministic

The hot_threads test fails nondeterministically, which causes false negatives during builds.

Elastomer::Client::Nodes#test_0005_gets the hot threads for the node(s) [/var/lib/jenkins/workspace/elastomer-client/test/client/nodes_test.rb:44]:
Expected /cpu usage by thread/ to match "

Ignore basic_auth unless username and password are present

In the recent 3.2.0 release, passing in a param of basic_auth: { username: nil, password: nil } will trigger basic auth. We should make sure it is only triggered if there is a username and password present. This will be helpful for folks that are defining their params like so:

params = {
  # ...
  basic_auth: {
    username: ENV["basic_auth_username"],
    password: ENV["basic_auth_password"]
  }
}

where the ENV vars may be blank or unset in some environments.

Currently one has to do something like this:

params = {
  # ...
}

if %w[basic_auth_username basic_auth_password].any? { |v| ENV[v].present? }
  params[:basic_auth] = {
    username: ENV["basic_auth_username"],
     password: ENV[" basic_auth_password"]
  }
end

# or 

params = {
  # ...
  basic_auth: if %w[basic_auth_username basic_auth_password].any? { |v| ENV[v].present? }
                {
                  username: ENV["basic_auth_username"],
                  password: ENV[" basic_auth_password"]
                }
              end
}

Release to rubygems.org

Not having elastomer-client on rubygems is a bit of a hassle for gems like elastomer-indexing and elastomer-cli. These gems have to vendor all their dependencies because elastomer-client isn't published.

I don't see a reason not to publish elastomer-client, or any of the elastomer-* gems. I'd like to publish them all to rubygems going forward. We can keep the repositories private, but there's nothing secret in the code.

@TwP what do you think?

detect whether update_aliases body includes actions

The Cluster#update_aliases method assumes that the first parameter is an array of alias actions. If I give it the request body as expected by ES (a hash with an :actions key), the request fails because update_aliases adds a wrapper {:actions => [<parameters>]}.

This is counterintuitive. I expect the update_aliases command to accept the request body documented in the ES docs.

I think this is a pretty easy fix: don't add the wrapper if the parameter is a hash with an :actions key.

Also, the get_aliases and update_aliases methods should have their own tests in test/client/cluster_test.rb. They are currently only tested in test/client/index_test.rb.
/cc @TwP

Complete the API

There are a few esoteric API calls that are not yet implemented. Just putting an issue in here that these need to be taken care of.

The list of missing API calls are found in the source code in the client/index.rb file and the client/docs.rb file. Those lists are taken from the right-hand menu on the ElasticSearch API reference.

cc @github/search

Elastomer uses deprecated `reigster_middleware` with Faraday in Ruby 2.1

Ruby 2.1 requires Faraday 0.9.0.rc7 (as implied by this commit: lostisland/faraday@87950cd).

In faraday 0.9.0.rc7, loading elastomer in faraday mode fails:

/Users/ben/github/chat/vendor/bundle/ruby/2.1.0/gems/faraday-0.9.0.rc7/lib/faraday.rb:99:in `method_missing': undefined method `register_middleware' for #<Faraday::Connection:0x007fcccfc815b0> (NoMethodError)
    from /Users/ben/github/chat/vendor/bundle/ruby/2.1.0/gems/elastomer-0.2.0/lib/elastomer/middleware/opaque_id.rb:68:in `<top (required)>'
    from /Users/ben/github/chat/vendor/bundle/ruby/2.1.0/gems/elastomer-0.2.0/lib/elastomer/client.rb:255:in `require'
    from /Users/ben/github/chat/vendor/bundle/ruby/2.1.0/gems/elastomer-0.2.0/lib/elastomer/client.rb:255:in `block in <top (required)>'
    from /Users/ben/github/chat/vendor/bundle/ruby/2.1.0/gems/elastomer-0.2.0/lib/elastomer/client.rb:255:in `each'
    from /Users/ben/github/chat/vendor/bundle/ruby/2.1.0/gems/elastomer-0.2.0/lib/elastomer/client.rb:255:in `<top (required)>'
    from /Users/ben/github/chat/lib/chat.rb:8:in `require'
    from /Users/ben/github/chat/lib/chat.rb:8:in `<top (required)>'
    from /Users/ben/github/chat/config/application.rb:3:in `require'
    from /Users/ben/github/chat/config/application.rb:3:in `<top (required)>'

cc @mislav @github/chat

Style guide cleanup

We are following GitHub's ruby style guide, but there are many places in the codebase where the style guide has been ignored:

  • whitespace between parentheses and values
  • single character variable names should be eschewed
  • quotes "good" should be used for strings instead of apostrophes 'bad'
  • hashrockets => should be used instead of the JSON style : in hashes
  • trailing whitespace

Please add others to this list so I can do a sweep and fix my wayward coding style.

/cc @grantr @chrismwendt

Using action_count or request_size swallows bulk response

Normally a bulk request returns the bulk response:

client.bulk do |bulk_request|
  bulk_request.index({"foo" => "bar"}, :id => '1', :index => 'test', :type => 'test')
end

=> {"took"=>61, "errors"=>false, "items"=>[{"index"=>{"_index"=>"test", "_type"=>"test", "_id"=>"1", "_version"=>1, "status"=>201}}]}

When using the :action_count or :request_size options to chunk bulk requests behind the scenes, the response is lost:

client.bulk(:action_count => 1) do |bulk_request|
  bulk_request.index({"foo" => "bar"}, :id => '1', :index => 'test', :type => 'test')
  bulk_request.index({"foo" => "bar"}, :id => '2', :index => 'test', :type => 'test')
end

=> nil

I think when :action_count or :request_size are used, the return value should be an array of response hashes.

I can't decide what the return value should be if no requests are made. The non-chunking bulk returns nil, but maybe the chunking version should return an empty array.

Is it worth making the multiple-request bulk helper a separate class? We use it heavily, so changing it would be painful, but it has opinions about how to handle multiple requests that maybe don't belong in the single-request bulk helper. Perhaps we could move it to a utility gem and supplement it with other tools like a standard scan/bulk looper.

Compatibility layer for 0.90 -> 1.0 transition

We may need some compatibility code to maintain backward compatibility with 0.90 while we transition to 1.0. Ideally this compatibility layer would connect to both versions of ES, and work with code expecting either version.

Here's a list of compatibilities that might be useful:

  • where ok was removed from responses, put it back
  • where exists changed to found in responses, put it back alongside found
  • allow :query top-level element or not in count, validate, and delete_by_query
  • return both flat and nested settings from get_settings in the same response
  • return mappings with the "mappings" element and without in the same response
  • translate field query to query string

Would this be useful? We don't have to implement all of these. Some things could be upgraded before the transition, like removing ok and field queries.

/ref https://github.com/github/search/issues/12
/cc @github/search

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.