Solr River Plugin for ElasticSearch
The Solr River plugin allows to import data from Apache Solr to elasticsearch.
In order to install the plugin, simply run: bin/plugin -install javanna/elasticsearch-river-solr/1.0.0
.
Versions
Solr River Plugin | ElasticSearch |
master | 0.19.3 -> master |
1.0.0 | 0.19.3 -> master |
You might be able to use the river with older versions of elasticsearch, but the tests included with the project run successfully only with version 0.19.3 or higher, the first version using Lucene 3.6.
Getting Started
The Solr River allows to query a running Solr instance and index the returned documents.
It uses the SolrJ library to communicate with Solr.
The SolrJ version in use and distributed with the plugin is 3.6.1.
Although it's recommended to send queries using the same version that is installed on the Solr server, it's possible to query other Solr versions.
The default format used is javabin but you can solve compatibility issues just switching to the xml format using the wt parameter.
All the common query parameters are supported.
Installation
Here is how you can easily create the river and index data from Solr, just providing the solr url and the query to execute:
curl -XPUT localhost:9200/_river/solr_river/_meta -d '
{
"type" : "solr",
"solr" : {
"url" : "http://localhost:8080/solr/",
"q" : "*:*"
}
}'
All parameters are optional. The following example request contains all the possible parameters that you can use together with all the default values.
{
"type" : "solr",
"solr" : {
"url" : "http://localhost:8983/solr/",
"q" : "*:*",
"fq" : "",
"fl" : "",
"wt" : "javabin",
"qt" : "",
"uniqueKey" : "id",
"rows" : 10
},
"index" : {
"index" : "solr",
"type" : "import",
"bulk_size" : 100,
"max_concurrent_bulk" : 10,
"mapping" : "",
"settings": ""
}
}
The fq and fl parameters can be provided as either an array or a single value. You can provide your own mapping while creating the river, as well as the index settings, which will be used when creating the new index if needed. The index is created when not already existing, otherwise the documents are added to the existing one. The documents are indexed using the bulk api. You can control the size of each bulk (default 100) and the maximum number of concurrent bulk operations (default is 10). Once the limit is reached the indexing will slow down, waiting for one of the bulk operations to finish its work; no documents will be lost.
Limitations
- only stored fields can be retrieved from Solr, therefore indexed in elasticsearch
- the river is not meant to keep elasticsearch in sync with Solr, but only to import data once. It's possible to register the river multiple times in order to import different sets of documents though, even from different solr instances.
- it's recommended to create the mapping given the existing solr schema in order to apply the correct text analysis while importing the documents. In the future there might be an option to auto generating it from the Solr schema.
License
This software is licensed under the Apache 2 license, quoted below.
Copyright 2012 Luca Cavanna
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.