Code Monkey home page Code Monkey logo

solr2solr's Introduction

What is added in this fork

  • Sorting, for example sort:'date asc'
  • ‘Processing’ of documents, e.g.:
process: (doc) ->
    delete doc._version_

    doc #return doc in the end
  • Pagination with Cursors. Just add cursorMark: '*' to start or your previous cursor mark to continue
  • Selecting only certain fields: fl: 'id,name,description'
  • ‘Insecure’ HTTPS. process.env.NODE_TLS_REJECT_UNAUTHORIZED = '0' was added somewhere, you don't need to do anything
  • Basic auth added in vlad-x/solr2solr from which this repo was forked:
from:
    # ...
    user: 'r00t'
    password: 'qwerty'

solr2solr - a simple Solr migration and test data fabrication tool

This tool will query a given Solr index and copy it to another. Along the way it will give you the opportunity to change field names, drop fields altogether, and fabricate new fields.

The goal of this tool isn't to be a suitable means to move large production indices around, though if your index is smallish, it will serve that purpose. It is instead meant to facilitate the development lifecycle during which schemas are constantly changing and real data isn't yet available, or isn't available in a quantity to stress Solr.

Install

solr2solr is a command line tool and should ideally be installed with -g

$ npm install -g solr2solr

Configuration

Copy the example config file from the root of the github repo into a directory on your machine.

from and to are pass through configurations to the node-solr library. These are the defaults:

var DEFAULTS = {
  host: '127.0.0.1',
  port: '8983',
  core: '', // if defined, should begin with a slash
  path: '/solr' // should also begin with a slash
}

query is used to hit the from Solr for documents. Leave this at *:* if you want to copy everything, or change it to something else if you want to copy a smaller set of documents.

rows indicates how many rows to copy at a time. solr2solr will go through your index from start to finish by this increment. This increment is important because based on the size of a document in your index, and how many times you might want that document duplicated (see duplicate below), you'll want to play with this number to keep your node process from running out of memory.

duplicate is a configuration will allow you to multiply your index during the copy. When duplicate has enabled set to true, solr2solr will manipulate the idField of your document to make it unique and it will create an extra document per numberOfTimes. So, if numberOfTimes is set to 2, you'll get 2 copies of every document. The original, and 2 dupes.

copy is a list of fields to copy from index to index verbatim.

transform is a list of fields to copy from index to index while changing the field name from source to destination.

fabricate is a list of new fields to create per document in the new index. The name of the new field is given in the field name, and the data for that field is created by the fabricate function, which is passed the document and the row number being processed.

fabricate:(fields, index) ->
  switch index % 5
    when 0 then 'Swahili'
    when 1 then 'Klingon'
    when 2 then 'Skrull'
    when 3 then 'Pig Latin'
    when 4 then 'English'
}

Execution

From the same directory you placed the config file, simply execute

$ solr2solr

and the tool will begin copying data. It will write to the console each time it goes to fetch another batch of data from Solr.

solr2solr's People

Contributors

dbashford avatar beyondcompute avatar vlad-x avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.