Code Monkey home page Code Monkey logo

vind's Introduction


Please note: This project is stopped, no bugfixes, features, etc. are done anymore.

Vind (faɪnd) is a modular Java library which aims to lower the hurdle of integrating information discovery facilities in Java projects. It should help programmers to come to a good solution in an assessable amount of time, improve the maintainability of software projects, and simplify a centralized information discovery service management including monitoring and reporting.

Build Status Maven Central Sonatype Nexus (Snapshots) Javadocs Apache 2.0 License Gitter

Design principles

In Vind we try to design an API which follows this 3 design principles:

1. Versatility: Vind will be used in many different projects, so it was an aim to keeping the dependency footprint small, which avoids version-clashes in the downstream projects.

2. Backend Agnostic: Wherever possible and feasible, the library has to abstracted from the basic search framework. This enabled us to change the backend without migrating application software.

3. Flat learning curve: It was an aim to keep the learning curve rather flat, so we tried to use Java built-in constructs whenever possible. Additionally we tried to follow the concept: easy things should be easy, complex things can (but does not have to) be complex.

The search lib is modular and currently implements the following layers:

Search Lib Architecture

Versioning & release policy

Each Vind release is labeled in the repository with a tag fitting the following schema: vind-va.b.c where vind-v just points out that this belongs to a Vind version and a.b.c represents the release number, also used for the artifact version. Prior to vind-v1.3.0 there was not a clear policy on versioning.

From 1.3.0 on releases will strictly stick to the following diagram:

Versioning policy

Regarding the release process, prior to every non hot fix release, a release candidate would be published to be tested on an staging environment as close as possible to production. These releases will apply the previous described name schema followed by RCa suffix, being a the release candidate iteration number. I.e., vind-v1.3.0-RC1 would be the first release candidate for version 1.2.6 which, if rejected due to bugs found in staging environment, will be released again after fixing as vind-v1.3.0-RC2. After approval, the artifact will be released as vind-v1.3.0.

From vind-v1.3.0, all the hot fixes will be handle in a release specific maintenance branch, that means development branch should always be in an a.b.0-SNAPSHOT Vind version.


If you'd like to get a deeper look into the lib or if you are interested in our future goals just have a look at our blog series. There we give an outlook on the next development steps and introduce new features.

How to use

The modules of the Vind lib are provided as Maven artifacts and thus can be seamlessly integrated in new and existing Java Software projects. Vind decouples API and the real indexing components. The first backend which is also the reference implementation is build on top of Apache Solr. The lib integrates an in-memory indexer on top of an Embedded Solr Server which enables developers to start without setting up a complex infrastructure. Furthermore Vind includes a backend maintainance component which makes it easy to setup Vind index collections and keep them in sync with the Vind version.

Get a detailed documentation of all functions and features or dive deeper in the API of the Vind with Javadoc.

How to contribute

Vind is an Open Source project so everyone is encouraged to improve it. Don't hesitate to report bugs, provide fixes or share new ideas with us. We have various ways for contribution:

  • use the issue tracker - report bugs, suggest features or give hints how we can improve the documentation.
  • discuss issues with the community - two brains are better than one.
  • write code - no patch is too small. So even fixing typos helps to improve Vind.

Release Process

  • Snapshot: Pushes to branch develop are automatically deployed to sonatype snapshots. Current version: Sonatype Nexus (Snapshots)

  • Release: Stable releases need to be performed manually:

    1. make sure all changes have been pushed to the repository and all tests are working fine.
    2. run mvn release:prepare
    3. run mvn release:perform
    4. run (cd target/checkout; mvn nexus-staging:release)
    5. push changes to the repository git push && git push --tags


Free use of this software is granted under the terms of the Apache License Version 2.0. See the License for more details.


Vind is lead by Red Bull Media House Technology and was initiated in 2017.


The Changelog provides a complete list of changes in older releases.

vind's People


alfonso-noriega avatar goerge avatar ja-fra avatar luaks avatar pilzm avatar purthaler avatar stefan-sachs avatar tkurz avatar wernerharing avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vind's Issues

multiple values in suggestion single value field

Unexpected behavior on Vind 1.2.3 solr schema. When a document is indexed with a different suggestion value and there is still an old value from previous versions of the index in the field dynamic_suggest.string_fieldname the new suggestion field dynamic_suggest_analyzed_fieldname
gets two values even if defined as a single value field.
This is due to the definition of the copy rule from dynamic_suggest.string_fieldname to dynamic_suggest_analyzed_fieldname

10 Error logs per monitoring server search

The error message "Cannot get scope for non existing field descriptor" logged by the class com.rbmhtechnology.vind.api.query.filter.Filter appear many times per search in the logs, even if the search is working.

Enable global meta data for batch commit identification

Current State

Vind provides some methods to index documents:

  • void index(Document... doc)
  • void index(List<Document> doc)
  • void indexBean(List<Object> t)
  • void indexBean(Object... t)

Internally, both methods trigger an indexing process but not a commit (which is an intended behavior, as the server itself can handle commits internally much more efficient). Note, there are methods for commit, which guarantee that all indexing processes are commited (with all negative consequences regarding performance).


In applications that support Read-Your-Writes this behaviour might be a problem (because the application has to guarantee an always-up-to-date index status and thus is forced to use many hard commits).


Vind could support version numbering for indexing processes so an application could proof, which is the latest version that has been indexed (and thus is able to control via an additional method, if the necessary indexes already has been processed). This could be an internal counter or a counter based within the application, which could lead to the following api:

  • long index(List<Document> doc)
  • void index(List<Document> doc, long version)

Note, that the other methods would work analogous. To get the latest index version there could be a method, like:

  • long getLatestVersion()
  • boolean isVersionIndexed(long version)

In addition, each Document could have an additional field version.

Provide before and after filter for java.util.Date

Please add methods similar to

com.rbmhtechnology.searchlib.api.query.filter.Filter.after(String, ZonedDateTime)
com.rbmhtechnology.searchlib.api.query.filter.Filter.before(String, ZonedDateTime)
to be used with java.util.Date.

It would be easy to convert the Date to a ZonedDateTime outside the search lib. However, since it provides a Field Descriptor to handle java.util.Data (com.rbmhtechnology.searchlib.model.SingleValueFieldDescriptor.UtilDateFieldDescriptor) this conversion should be done by the search lib in that case to be sure the conversion is done in a consistent way.

Solr parse error when creating filters shared fields on nested and parent docs

Extrenally reported issue which happens on a nested search or suggestion filtered by a field member of both the parent and the nested document:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at Expected identifier at pos 52 str='{!child of="_type_:asset" v='({!parent which='_type_:asset' v='_type_:marker AND dynamic_multi_stored_face t_string_static_entityType:"asset"'}' at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod( at org.apache.solr.client.solrj.impl.HttpSolrClient.request( at org.apache.solr.client.solrj.impl.HttpSolrClient.request( at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest( at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request( at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest( at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState( at org.apache.solr.client.solrj.impl.CloudSolrClient.request( at org.apache.solr.client.solrj.SolrRequest.process( at org.apache.solr.client.solrj.SolrClient.query( at org.apache.solr.client.solrj.SolrClient.query( at com.rbmhtechnology.vind.solr.backend.SolrSearchServer.execute( at com.rbmhtechnology.vind.monitoring.MonitoringSearchServer.execute( at com.redbullmediabase.mediamanager.core.index.AbstractVindSearchEngine.performSuggestionSearchAndGetSuggestionsFromResponse( at com.redbullmediabase.mediamanager.core.index.AbstractVindSearchEngine.getSuggestions( at com.redbullmediabase.mediamanager.core.mam.index.AssetVindSearchEngine.getSuggestions( at com.redbullmediabase.mediamanager.manager.module.assets.controller.AssetsModuleController.suggest( at sun.reflect.GeneratedMethodAccessor2063.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke( at java.lang.reflect.Method.invoke(

Update suggestionhandler result to NamedList

Currently the suggestion handler gives back a Map object instead of NamedList as Solr usually does. This is an inheritance from previous suggestionHandlers version but it should be changed to NamedList as it is more efficient and the expected type result from a solr handler. Some Vind modifications are needed to support this return type.

Set session per query

The Reporting server, among other information, logs info about the session and the user.
Currently the session is set when instantiating the report server, but it has to be possible to have, with the same reporting server instance, different sessions logged.

Atomic update takes too long

In an specific usecase the atomic update is taking 2 seconds to update a document.

  • find out the reason.
  • find possible fix.

NOT filter in Solr needs a positive base operator

In Solr filter syntax a NOT operator is not valid as stand alone expression as it is calculated as a substraction:
'NOT status:active' is parsed as '-status:active'

For simple operations like the one mention above Solr is able to interpret it but more complex ones of the style 'NOT status:active AND (NOT due_date:[* TO NOW])' will not give the expected results.

Report creation fails with SocketTimeout

The report creation fails with SocketTimeouts.

build	10-Jul-2018 12:41:58	12:41:58.348 [main] WARN  c.r.v.m.utils.ElasticSearchClient - - Try 0 - Error in query scroll request query: Read timed out
build	10-Jul-2018 12:41:58 Read timed out
build	10-Jul-2018 12:41:58		at Method)
build	10-Jul-2018 12:41:58		at
build	10-Jul-2018 12:41:58		at
build	10-Jul-2018 12:41:58		at
build	10-Jul-2018 12:41:58		at
build	10-Jul-2018 12:41:58		at
build	10-Jul-2018 12:41:58		at

This may due to non closing the ES Scrolle queries while setting a big Timeout of 30 minutes. As it is mentioned here the scroll should be explicitly cleared.

Improve Filter&Facet report

Design a reporting model for filters and facets far from the current java pojo representation and closer to a user friendly format.

Page behaviour is inconsistent

I was replacing Slice by Page in a certain usecase and ended up asking myself if a page is 0-based or 1-based.

I guess it is 1-based, because, there is this FulltextSearch constructor

FulltextSearch() {
     this.searchString = "*";
     this.resultSet = new Page(1, SearchConfiguration.get(SearchConfiguration.SEARCH_RESULT_PAGESIZE,10));

But if I look into the constructor of Page itself, I see that

    public Page(int page, int pagesize) {
        if(page < 0) {
            log.error("Page number can not be lower than 0: {}",page);
            throw new IllegalArgumentException("Page should not be a negative value:" + page);
        } = page;
        this.pagesize = pagesize;
        type =;

is stating that a page with value 0 is fine, unless it is not negative.

But If I fire up a search with

    final FulltextSearch search = Search.fulltext().page(0,10);

I end up with a solr exception

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/my_collection: 'start' parameter cannot be negative

I guess, the Page constructor should be changed to be consistent here.

Add configuration support via environment variables

Currently the vind configuration is mostly done via a properties file. To ensure the cloud-readiness of the library the configuration via environment variables is needed.



Homogenize monitoring field types

Some of the monitoring fields are actually giving a type depending on the original Vind component (i.e. an interval facet for a numeric field will have start and end typed as long/float while a date interval will give back dates). This creates issues when writing the json to an elasticsearch and probably to other non structured DBs.

To solve this identify the fields and translate them to the same type (i.e. dates to timestamp).

Add health check functionality

At the moment vind does not provide functionality for health checks (e.g. ping) so the clients have to use some custom implementations (for example expose a solr client and use the Spring Boot actuator SolrHealthIndicator). It would be nice if vind could offer some functionality to support these health checks.

Possibility to index document into two solr servers of different version

In order to enable migration strategies from one Solr version to another, it would be helpful if Vind supports indexing into two Solr servers of different version at the same time. In such a case, an application could build up the index in the new Solr server in parallel to an already existing one. As soon as both Solr server contain the same amount of documents, the application could switch to the new Solr server for querying.

Support Term Query Parser for huge ID searches

There are some use cases where we want to search for a large set of document IDs but there is no other search filter that identifies these specific group of documents. Hence we need to search via the IDs only, to offer the user further possibilities to sort, page and apply additional filters.

The current problem is, that this group of document IDs can be up to 5000. In the furture, this may be even extended up to 30-50k.

As the standard query parser only supports up to 1024 boolean clauses, please offer the possibility to use the term query parser instead.

Add log writer to Demos

Add the simple log writer and a logger plus configuration to the demo in Vind so there is an example of usage.

vinds dependency stack includes elasticsearch client

The com.rbmhtechnology.vind:monitoring-api module of vind depends on the elasticsearch client.

+--- com.rbmhtechnology.vind:log-writer:1.2.1
|    \--- com.rbmhtechnology.vind:monitoring-api:1.2.1
|         +--- com.rbmhtechnology.vind:vind-api:1.2.1 (*)
|         +--- com.fasterxml.jackson.core:jackson-databind:2.7.5 -> 2.8.3 (*)
|         +--- com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.7.5 -> 2.8.3
|         |    +--- com.fasterxml.jackson.core:jackson-core:2.8.3
|         |    +--- com.fasterxml.jackson.core:jackson-databind:2.8.3 (*)
|         |    \--- com.fasterxml.jackson.core:jackson-annotations:2.8.0 -> 2.8.3
|         +--- io.redlink.utils:utils:1.1.0
|         |    +--- org.slf4j:slf4j-api:1.7.25 -> 1.7.12
|         |    \--- org.apache.commons:commons-lang3:3.5
|         \--- io.searchbox:jest:5.3.3 -> 2.0.3

If vind is used inside a spring boot (at least in 1.x, 2.x needs to be confirmed) app, this triggers the elastic search health endpoint to be configured.

Is this dependency necessary? Or do we need to configure that somehow.

Passing several children searches

In our case, we search for documents which have child documents which are matching different filter criterias.

Currently only one childrenSearch can be defined

// first set of filter criteria
final FulltextSearch childSearch1 = Search.fulltext()

final FulltextSearch search = Search.fulltext()
   .andChildrenSearch(childSearch1, indexer.getAtomDocumentFactory());

which results in

(_type_:asset AND dynamic_multi_filter_string_parent:"xyz") AND 
    {!parent which='_type_:asset' v='_type_:atom AND 
      dynamic_multi_filter_string_field_1:"VALUE1" AND

But we need to search for parents which

  • have children matching our first set of criteria and
  • have children matching our second set of criteria and so on

Basically we want to result in something like this

(_type_:asset AND dynamic_multi_filter_string_parent:"xyz") AND 
    {!parent which='_type_:asset' v='_type_:atom AND 
      dynamic_multi_filter_string_field_1:"VALUE1" AND
    {!parent which='_type_:asset' v='_type_:atom AND 
      dynamic_multi_filter_string_field_1:"ANOTHER_VALUE1" AND

Something like this could be imagined

// first set of filter criteria
final FulltextSearch childSearch1 = Search.fulltext()

// second set of filter criteria
final FulltextSearch childSearch2 = Search.fulltext()

final FulltextSearch search = Search.fulltext()
   .andChildrenSearches(indexer.getAtomDocumentFactory(), childSearch1, childSearch2);

TermFacet ignores facet limit property

When a facet limit is set for a search, the TermFacet json implementation is completely ignoring it due to a missing 'limit' parameter in the json generated.

Suggestion: override of default operator

The default logical operator in the suggestions handler is hard-coded to "AND". This should be fixed, providing the option of setting "OR" instead if wished.

Wrong filters in 1.2.3

With vind 1.2.1 the following search

{"q":"*","filter":"((static_status='passive') OR (static_status='active')) AND ((static_partitionID='MV-1HP6U6PQS1W11') OR (static_partitionID='MV-1HP6TNXVH1W11') OR (static_partitionID='MV-1HP6UG2V51W11'))","timeZone":"null","sort":[{'direction':'Desc','field':'static_recordLastUpdateTimestamp'}],"result":{"sliceSize":21,"offset":0},"nestedDocSearchFlag":false,"nestedDocOp":"OR","nestedDocFactory":null,"nestedDocSearch":null,"facetFlag":false,"facetMinCount":1,"facetLimit":10,"facet":{},"geoDistance":null,"searchContext":"null","strictFlag":true}

resulted in those filters (only status and partitionID are relevant here)


which is the expected fq.

When using vind 1.2.3 the same code produces this search

{"q":"*","filter":"((static_status='active') OR (static_status='passive')) AND ((static_partitionID='MV-1HP6U6PQS1W11') OR (static_partitionID='MV-1HP6UG2V51W11') OR (static_partitionID='MV-1HP6TNXVH1W11'))","timeZone":"null","sort":[{'direction':'Desc','field':'static_recordLastUpdateTimestamp'}],"result":{"sliceSize":21,"offset":0},"nestedDocSearchFlag":false,"nestedDocOp":"OR","nestedDocFactory":null,"nestedDocSearch":[],"facetFlag":false,"facetMinCount":1,"facetLimit":10,"facet":{},"geoDistance":null,"searchContext":"null","strictFlag":true}

The filters in the search are the same as before (except the order). However, the generated fq for solr is broken since it now generates this:


Support hierarchical paths as field values

Currently hierarchical paths (e.g. Taxonomy Fields) are not considered. They have to be supported by a field descriptor and properly fit into suggestion infrastructure.

Children search with AND filter searches in all child documents instead of one document

When performing a children search with an AND filter the resulting query searches in all children instead of one:

    final FulltextSearch atomSearch = Search.fulltext()

    final FulltextSearch search = Search.fulltext()
        .andChildrenSearch(atomSearch, indexer.getAtomDocumentFactory());

result in the following query:

(_type_:asset AND dynamic_multi_filter_string_parent:"xyz") AND 
    {!parent which='_type_:asset' v='_type_:atom AND dynamic_multi_filter_string_field_1:"VALUE1"'} AND 
    {!parent which='_type_:asset' v='_type_:atom AND dynamic_multi_filter_string_field_2:"VALUE2"'}

instead of:

(_type_:asset AND dynamic_multi_filter_string_parent:"xyz") AND 
    {!parent which='_type_:asset' v='_type_:atom AND 
      dynamic_multi_filter_string_field_1:"VALUE1" AND

Suggestions not working after upgrading from 1.2.0 to 1.2.1

We did an upgrade from vind 1.2.0 to vind 1.2.1 and updated all our collections to the new config version. Unfortunately the suggestions do not work anymore after the upgrade.

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://....: java.lang.IllegalStateException: Type mismatch: dynamic_multi_stored_suggest_string_company was indexed as SORTED_SET
        at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(
        at org.apache.solr.client.solrj.impl.HttpSolrClient.request(
        at org.apache.solr.client.solrj.impl.HttpSolrClient.request(
        at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(
        at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(
        at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(
        at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(
        at org.apache.solr.client.solrj.impl.CloudSolrClient.request(
        at org.apache.solr.client.solrj.SolrRequest.process(
        at org.apache.solr.client.solrj.SolrClient.query(
        at org.apache.solr.client.solrj.SolrClient.query(
        at com.rbmhtechnology.vind.solr.backend.SolrSearchServer.execute(
        at com.rbmhtechnology.vind.solr.backend.SolrSearchServer.execute(
        at com.rbmhtechnology.vind.monitoring.MonitoringSearchServer.execute(
        at com.rbmhtechnology.vind.monitoring.MonitoringSearchServer.execute(

Indexing the data did not solve the problem. Removing all the documents and indexing seems to solve it. However, due to the amount of data that is not an option for us.

Please provide a way we can do the upgrade without deleting all the data from the index.

Provide Docker-Image for Solr Backend

In order to simplify testing vind integration with a "real" backend, it would be convenient to provide a ready-to-use docker image containing the vind-schema and -extensions.

Scoped facets

Add the possibility to define in which field value use case (Filter, Suggest or Facet) the facet will be done.

check Collection manager 404 / success update

While running a collection update from a private repository the collection manager tool logs a 404 when updating but still displays the successful update message (and successfully updates the collection).

make MonitoringServer configurable: exception resilient

Request from an integration:

can we make the MonitoringSearchServer configurable so it only logs the monitoring exceptions and performs the search nevertheless? In my opinion the tracking is not important enough to let the search fail if there is a problem only with tracking

Precendence of configuration settings using environment variables

According to the documentation of vind 1.2 :

The properties are overwritten following the ordering: Default Properties < Environment Variables < Property File

This behaviour is unlike e.g. Spring and typesafe config which do “Default Properties < Property File < Environment Variables”.

This means we have to provide all settings in every environment as environment variables as we cannot simply provide a property file for development which can be overwritten using environment variables. For productive deployment environment variables are easy to handle but locally one might want to provide defaults using a file instead of manually having to configure IDE env vars.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.