mallocator / elasticsearch-exporter Goto Github PK
View Code? Open in Web Editor NEWA small script to export data from one Elasticsearch cluster into another.
License: Apache License 2.0
A small script to export data from one Elasticsearch cluster into another.
License: Apache License 2.0
Add an option to use the create call instead of the index call, so that documents who's index already exist don't get overwritten, if they have already been modified.
The option is exported but I don't think it's enabled/used in the code (in particular the es
driver).
Line 471 of "es.js"
if (result.statusCode < 200 && result.statusCode > 299)
It would never be possible to be less than 200 and be greater than 299
Hello,
Firstly thanks for your module : it's very simple ans useful !
I get a error when I try to duplicate an index, the mapping is not correct.
node exporter.js -a -i -j <new_index>
In console log I get the message :
<...>
Waiting for mapping on target host to be ready, queue length 400
Waiting for mapping on target host to be ready, queue length 450
Waiting for mapping on target host to be ready, queue length 500
Host phmbusllogb01:9200 responded to PUT request on endpoint /lanceur_bkp with an error
Mapping is now ready. Starting with 500 queued hits.
Host phmbusllogb01:9200 responded to PUT request on endpoint /lanceur_bkp with an error
Mapping is now ready. Starting with 0 queued hits.
Processed 100 of 2268 entries (4%)
Processed 700 of 2268 entries (31%)
<...>
When I go to : http://:9200/<new_index>/_mapping I don't have the original mapping but the dynamic mapping
in ES Log :
<...>creating index, cause [api], shards [5]/[0], mappings [mappings]
<...>update_mapping XXXXX
Edit :
In DEBUG mode on ES side I get an exception :
[2014-06-25 14:15:41,229][DEBUG][cluster.service ] [XXXXX] processing [routing-table-updater]: execute
[2014-06-25 14:15:41,230][DEBUG][cluster.service ] [XXXXX] processing [routing-table-updater]: no change in cluster_state
[2014-06-25 14:15:44,458][DEBUG][cluster.service ] [XXXXX] processing [create-index [lanceur_bkp], cause [api]]: execute
[2014-06-25 14:15:45,399][DEBUG][http.netty ] [XXXXX] Caught exception while handling client http traffic, closing connection [id: 0x1ddde1db, /XXXXX:56038 => /XXXXX:9200]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2014-06-25 14:15:45,399][DEBUG][http.netty ] [XXXXX] Caught exception while handling client http traffic, closing connection [id: 0x85b4aa05, /XXXXX:56134 => /XXXXX:9200]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
e.g.
On master branch I get this error :
Elasticsearch Exporter - Version 1.3.2
Caught exception in Main process: TypeError: Cannot read property 'maxSockets' of undefined
TypeError: Cannot read property 'maxSockets' of undefined
at Object.exports.reset (/home1/Elasticsearch-Exporter-1.3.3/drivers/es.js:41:39)
at Object.exports.export (/home1/Elasticsearch-Exporter-1.3.3/exporter.js:263:26)
at Object. (/home1/Elasticsearch-Exporter-1.3.3/exporter.js:283:19)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)
at node.js:906:3
To workarround this issue, I force the maxSockets value in es.js :
vi drivers/es.js (line 41)
...
//http.globalAgent.maxSockets = opts.maxSockets;
http.globalAgent.maxSockets = 30;
...
While running exporter.js to export individual indices, I've found that sometimes, it'll exit with the following error:
Caught exception in Main process: TypeError: Cannot read property 'hits' of undefined
TypeError: Cannot read property 'hits' of undefined
at IncomingMessage. (/data1/opt/elasticsearch-exporter/node_modules/elasticsearch-exporter/drivers/es.js:272:31)
at IncomingMessage.EventEmitter.emit (events.js:117:20)
at _stream_readable.js:920:16
at process._tickCallback (node.js:415:13)
If I re-run it on the index it failed to dump, it usually works, though there's a chance it'll throw the same error.
Caught exception in Main process: Error: ENOENT, no such file or directory 'undefined.data'
here's a fix:
@@ -145,6 +145,10 @@
exports.lineCount = Math.ceil(count/2);
callback(exports.lineCount);
});
exports.lineCount = 0;
callback(exports.lineCount);
function getNewlineMatches(buffer) {
Add a call to fetch index statistics before actually running the exporter. This way we can get the number of total documents, documents per index/type/etc. This might simplify some calls as well as make it possible in the future to act more intelligently based on the layout of the source database (and the target as well).
Change the option flags to represent the new support for multiple database types.
Option to specify basic authentication against ES
I was referring to README for taking backup of elasticsearch index as a file. It seems documentation has typo while referring exporter.js as exports.js.
Regards,
Arun
When the request is Unauthorized, you get a message like:
SyntaxError: Unexpected token U
This is because it tries to parse the data object which is not a valid JSON object but a response like:
401 Unauthorized /_status
You could print this unauthoriezed message so the user knows the reason why it fails.
I have a single node (node1) and one cluster (clus1). Both of them are protected by basic auth. I tried running this:
node exporter.js -a node1 -b clust1
I got following output which did not imply any failed transfer and an exit code of 0.
But the indices actually didn't get transferred.
Had to run the following to get it passing:
node exporter.js -a node1 -b clust1 -A superadmin:password -B superadmin:password
It seems like you are missing auth: opts.sourceAuth or auth: opts.targetAuth each time http.request is performed. So it's not possible to use the exporter with basic auth without fixing it manualy.
http.request({
host : opts.sourceHost,
port : opts.sourcePort,
auth: opts.sourceAuth,
When using
$ node exporter.js -i nodes_1 -j nodes_1_export_test
I get a
Processed 144400 of 671344 entries (22%%)
{ [Error: connect EADDRNOTAVAIL]
code: 'EADDRNOTAVAIL',
errno: 'EADDRNOTAVAIL',
syscall: 'connect' }
Number of calls: 14450
Fetched Entries: 144490
Processed Entries: 144490
Source DB Size: 671344
or
Caught exception in Main process: Error: connect EADDRNOTAVAIL
Error: connect EADDRNOTAVAIL
at errnoException (net.js:646:11)
at connect (net.js:525:18)
at net.js:584:9
at asyncCallback (dns.js:84:16)
at Object.onanswer as oncomplete
always around the same time (21%-22%). Elasticsearch is still alive. The index to be copied has a size of 1GB.
$ node -v
v0.6.12
$ npm list
├── [email protected]
└─┬ [email protected]
├── [email protected]
└── [email protected]
Transfer index from one server to another by query failed after 300k docs received.
Thanks for the great tool.
When the script is running to copy test_v1 to test_v2, test_v1 is constantly updated with new entries.
Is there a way to copy only the new entries from test_v1 to test_v2 after the bulk copy is done?
Thanks for your help!
npm install elasticsearch-exporter
...
node node_modules/elasticsearch-exporter/exporter.js -a 10.223.240.225:9200 -g data -r true
Reading mapping from ElasticSearch
{ [Error: getaddrinfo ENOTFOUND] code: 'ENOTFOUND', errno: 'ENOTFOUND', syscall: 'getaddrinfo' }
{ [Error: getaddrinfo ENOTFOUND] code: 'ENOTFOUND', errno: 'ENOTFOUND', syscall: 'getaddrinfo' }
{ [Error: getaddrinfo ENOTFOUND] code: 'ENOTFOUND', errno: 'ENOTFOUND', syscall: 'getaddrinfo' }
{ [Error: getaddrinfo ENOTFOUND] code: 'ENOTFOUND', errno: 'ENOTFOUND', syscall: 'getaddrinfo' }
[deployer@el3 migrate]$ curl http://10.223.240.225:9200/
{
"ok" : true,
"status" : 200,
"name" : "CTM EL3",
"version" : {
"number" : "0.90.7",
"build_hash" : "36897d07dadcb70886db7f149e645ed3d44eb5f2",
"build_timestamp" : "2013-11-13T12:06:54Z",
"build_snapshot" : false,
"lucene_version" : "4.5.1"
},
"tagline" : "You Know, for Search"
I export file with node js version v0.10.9 with the script from master source.
To export data but it cannot import cause of the starting text is null.
I saw that write null to test the file existing so can you change it to empty string?
We experience memory issues when we try to export lots of data. Around 4 Million hits.
The problem seems to be related to these lines:
https://github.com/mallocator/Elasticsearch-Exporter/blob/master/drivers/es.js#L173-182
We tried to throttle the requests to 10 at a time, but no luck. Any ideas?
Hi guys,
This looks like a great project for our needs as we increasingly rely on elasticsearch and manually updating locally is a bit of a pain. I'd love to get involved in supporting this (once I tune up my familiarity with node).
It took me a little while to figure out how best to pass parameters, but now I think it's working. Can someone help me confirm? Does this look right? Do you just get timeouts from time-to-time or is this abnormal? Should I be seeing some other logging in the console if it's working successfully?
node exporter.js -a site:####obscured#####@api.searchbox.io -b localhost
Elasticsearch Exporter - Version 1.3.0
Reading source statistics from ElasticSearch
Reading mapping from ElasticSearch
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
connect ETIMEDOUT
There's a few other url variations I can hit searchbox with, for ex below, but it doesnt seem to change the results.
api.searchbox.io/api-key/####obscured#####
FWIW, our elasticsearch index is roughly 500K documents, 350MB.
Thanks for the help!
I was trying to move a logstash index from a elasticsearch that is getting data from logstash to a bare ES. Data ate moving cleanly but the mapping created on target is not correct. It does not contain the .raw fields that exist in the original ES.
Without that a lot of kibana dashboard are just failing.
Right now, this tool is very slow.
Are there any plans to allow forking/threading so that a large cluster export can be split into separate simultaneous export tasks that glue the data together again at the end?
Making this work automatically on some sensible defaults (cpu cores,whatever) would be great too.
Add an option where you can apply changes to a target database based on timestamps (which need to be active in the index mapping).
Hi,
When trying to import Elasticsearch-Export always seems to stall out around 30-40% or so. Also seeing some MaxListeners errors. Is this related?
Weirdly, even if I truncate the JSON .data file it stalls out around the same %. It makes me think something weird is going on with queueing. After trying a few different indexes/mappings it still hangs.
Output:
https://gist.github.com/oceanplexian/5961335
Tried node v0.11.3 & v0.10.5, same results...
Any ideas?
Hi,
I am trying to export/import from ES 0.9 to ES 1.0 (different live clusters).
When I run it in simulate mode on the donor cluster, I get:
Elasticsearch Exporter - Version 1.3.0
Reading source statistics from ElasticSearch
Reading mapping from ElasticSearch
Stopping further execution, since this is only a test run. No operations have been executed on the target database.
Number of calls: 0
Fetched Entries: 0 documents
Processed Entries: 0 documents
Source DB Size: 623066 documents
After running the actual export/import I get:
Number of calls: 15582
Fetched Entries: 623066 documents
Processed Entries: 623066 documents
Source DB Size: 623066 documents
But the receiving cluster (ES 1.0) only reports:
Elasticsearch Exporter - Version 1.3.0
Reading source statistics from ElasticSearch
Reading mapping from ElasticSearch
Reading mapping from ElasticSearch
Stopping further execution, since this is only a test run. No operations have been executed on the target database.
Number of calls: 0
Fetched Entries: 0 documents
Processed Entries: 0 documents
Source DB Size: 170515 documents
I don't get any errors in the logs, but the number of exported/imported articles don't match.
is there any way to find out why this happens?
I'm trying to do a simple export and I get this error... any clue?
node exporter.js -a localhost -i index1 -g /es/dump -l true
Elasticsearch Exporter - Version 1.3.1
Reading source statistics from ElasticSearch
Caught exception in Main process: Error: ENOENT, open 'undefined.data'
Error: ENOENT, open 'undefined.data'
Number of calls: 0
Fetched Entries: 0 documents
Processed Entries: 0 documents
Source DB Size: 0 documents
ElasticSearch returns more docs than unique docs are stored in the cluster, if it's set up with more than one node. This leads to the exported docs counter to be higher than the actual exported docs (some docs seem to be returned more than once).
To enhance the report keep track of doc IDs and report a count at the end.
This can also be used to skip docs in the bulk import and improve performance there.
This option should be optional, as it could take up a large amount of memory in a big export. Hopefully duplicate docs are exported around the same time as the original ones, so that after a short while the IDs don't need to be stored anymore.
When I export data from my development server, the settings (analyzers, specifically) and mappings on my index are not exported.
How to transfer data async or multithread?
I saw about max socket, but can't find how to speedup transfer in config or driver.
Would be nice to have an additional config file, holding some source field manipulation.
For example:
Source:
_source :{ field_a: "value a", field_b: "value b", field_to_delete: "this field I want to be removed", field_to_be_replaced: "this field will have another value" }
Config:
_config: { field_to_delete: delete _source["field_to_delete"], field_to_be_replaced: "this field has a new value", field_to_be_inserted: "this field is completely new", field_with_filter: _source["field_a"].length, field_with_filter_2: _source["field_b"].replace("b", "c") }
Result:
_source: { field_a: "value a", field_b: "value b", field_to_be_replaced: "this field has a new value", field_to_be_inserted: "this field is completely new", field_with_filter: 7, field_with_filter_2: "value c" }
Since the script supports retries we can add a stat that tells us how many times the script had to retry a call and how many times on average a call was successful on the first try.
possible fix:
@@ -69,7 +69,7 @@ function createParentDir(opts) {
var dir = '';
path.dirname(opts.targetFile).split(path.sep).forEach(function(dirPart){
dir += dirPart + path.sep;
if (!fs.existsSync(dir)) {
if (typeof(fs.existsSync) != "undefined" && !fs.existsSync(dir)) {
fs.mkdirSync(dir);
}
I work on elasticsearch 1.0.0-1.
I use Elasticsearch-Exporter export all indices to a local file. Meta of local file is look like this:
{
"test": {
"mappings": {
"mappings": {
"key_sum": {
"properties": {
below is my patch:
--- node_modules/elasticsearch-exporter/drivers/es.js 2014-03-15 18:25:24.114289938 +0800
+++ node_modules/elasticsearch-exporter/drivers/es.js 2014-03-15 18:25:56.246290802 +0800
@@ -34,13 +34,11 @@
if (opts.sourceType) {
getSettings(opts, data, callback);
} else if (opts.sourceIndex) {
- getSettings(opts, { mappings: data[opts.sourceIndex] }, callback);
+ getSettings(opts, data[opts.sourceIndex] , callback);
} else {
var metadata = {};
for (var index in data) {
- metadata[index] = {
- mappings: data[index]
- };
+ metadata[index] = data[index];
}
getSettings(opts, metadata, callback);
I exported to a file successfully, but importing it to a different ES server is now failing with the following error:
$ node exporter.js -b localhost -j myindex -f reuters
Reading mapping from meta file reuters.meta
Creating index mapping in target ElasticSearch instance
Mapping is now ready. Starting with 0 queued hits.
Caught exception in Main process: TypeError: Cannot read property 'length' of null
TypeError: Cannot read property 'length' of null
at ReadStream. (/home/eric/Elasticsearch-Exporter-master/drivers/file.js:75:50)
at ReadStream.EventEmitter.emit (events.js:100:17)
at emitReadable_ (_stream_readable.js:418:10)
at emitReadable (_stream_readable.js:412:7)
at onEofChunk (_stream_readable.js:395:3)
at readableAddChunk (_stream_readable.js:139:7)
at ReadStream.Readable.push (_stream_readable.js:123:10)
at onread (fs.js:1532:12)
at Object.wrapper as oncomplete
Number of calls: 0
Fetched Entries: 0 documents
Processed Entries: 0 documents
Source DB Size: 0 documents
Peak Memory Used: 0 bytes (0%)
Total Memory: 26057216 bytes
My run fails:
node exporter.js -a XXXXXXX -b 127.0.0.1 -t logstash-adm-log4j-2013.08.06
Reading mapping from ElasticSearch
Creating type mapping in target ElasticSearch instance
Caught exception in Main process: TypeError: Cannot read property 'total' of undefined
TypeError: Cannot read property 'total' of undefined
at IncomingMessage.<anonymous> (/home/rtoma/elasticsearchExporter/drivers/es.js:149:64)
at IncomingMessage.EventEmitter.emit (events.js:117:20)
at _stream_readable.js:910:16
at process._tickCallback (node.js:415:13)
Number of calls: 0
Fetched Entries: 0 documents
Processed Entries: 0 documents
Source DB Size: 0 documents
Peak Memory Used: 0 bytes
Total Memory: 7195904 bytes
Sniffing the traffic this failure is the result of the 2nd ES call:
$ curl -i -s -d "{\"fields\":[\"_source\",\"_timestamp\",\"_version\",\"_routing\",\"_percolate\",\"_parent\",\"_ttl\"],\"query\":{\"match_all\":{}}}" 'http://XXXXX:9200/_search?search_type=scan&scroll=5m'
HTTP/1.1 503 Service Unavailable
Content-Type: application/json; charset=UTF-8
Content-Length: 159
{"error":"EsRejectedExecutionException[rejected execution of [org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2]]","status":503}
Any clue what's wrong here?
Under the import / export examples area, the file options are wrong:
i think you switched -f with -g
Apparently ES doesn't support scan requests when using aliases for the index name.
A way to support this would be to make an initial request to ES to find out if the job is actually running against an alias.
λ parabellum Elasticsearch-Exporter → λ git master → node exporter.js -a 10.251.76.43 -b 10.251.76.42
Warning: compression has been set for target file, but no target file is being used!
Number of calls: 0
Fetched Entries: 0 documents
Processed Entries: 0 documents
Source DB Size: 0 documents
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.