jcustenborder / kafka-connect-solr Goto Github PK

View Code? Open in Web Editor NEW

44.0 4.0 28.0 171 KB

Kafka Connect connector for writing to Solr.

License: Apache License 2.0

Java 98.37% Shell 1.63%

kafka-connect solr

kafka-connect-solr's People

Contributors

Stargazers

Watchers

kafka-connect-solr's Issues

Timeout configuration ??

While indexing large document we get timeout is,

Caused by: org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Timeout occurred while waiting response from server at: http://<>/_shard7_replica_n52
at org.apache.solr.client.solrj.impl.CloudSolrClient.getRouteException(CloudSolrClient.java:125)
at org.apache.solr.client.solrj.impl.CloudSolrClient.getRouteException(CloudSolrClient.java:46)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.directUpdate(BaseCloudSolrClient.java:551)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1037)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:897)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:829)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:228)
at com.github.jcustenborder.kafka.connect.solr.CloudSolrSinkTask.process(CloudSolrSinkTask.java:53)
at com.github.jcustenborder.kafka.connect.solr.SolrSinkTask.put(SolrSinkTask.java:75)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:538)
... 10 more
Caused by: org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: http://<>/shard7_replica_n52
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:667)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:245)
at org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:368)
at org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:296)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.lambda$directUpdate$0(BaseCloudSolrClient.java:525)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
... 3 more

is it possible to introduce timeout parameter for Configuration like what we did for ZK connection timeout .

ZK connection time out config variable ?

org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:560)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper zookeeper.cert.scopussearch.net:2181/author within 15000 ms
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:201)
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:125)
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:115)
at org.apache.solr.common.cloud.ZkStateReader.(ZkStateReader.java:355)
at org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getZkStateReader(ZkClientClusterStateProvider.java:175)
at org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:160)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.connect(BaseCloudSolrClient.java:331)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:839)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:829)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:228)
at com.github.jcustenborder.kafka.connect.solr.CloudSolrSinkTask.process(CloudSolrSinkTask.java:53)
at com.github.jcustenborder.kafka.connect.solr.SolrSinkTask.put(SolrSinkTask.java:75)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:538)
... 10 more
Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper test.zookeeper.com/test within 15000 ms
at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:250)
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:192)

solr connector should support deleting data as well.

Maven shade plugin

Hi Jeremy,

I have updated pom.xml with maven shade plugin. Could you please take a look at the PR ?

Thank you.

ERROR Failed to create job for config/httpsolr.properties

@jcustenborder : Please help me...
Here is my httpsolr.properties
name=httpsolr
topics=syslogkafka
tasks.max=2
connector.class=com.github.jcustenborder.kafka.connect.solr.HttpSolrSinkConnector
solr0.url=http://m:8983/solr/
solr0.topic=syslogkafka
solr0.core.name=divtest01

PS: divtest01 is a collection name created on one of the Solr server within CDH cluster.

When I try to start the connector, getting below error..
/usr/bin/connect-standalone /etc/schema-registry/connect-avro-standalone.properties config/httpsolr.properties

[2017-04-07 10:03:24,321] ERROR Failed to create job for config/httpsolr.properties (org.apache.kafka.connect.cli.ConnectStandalone:88)
[2017-04-07 10:03:24,334] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:99)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid (use the endpoint /{connectorType}/config/validate to get a full list of errors)

Topic partitions

Hi,

it's possibile use the connector if topic has many partitions (e.g. 20)?

Many thanks

change to upgraded pom

What is the JSON format used to add new rows to Solr

HI,

Can you give an example of schema/payload used in JSON to add new rows to Solr please ?

Thank you

Basic Authentication is missing in the first request

We're running Solr 8.4.1 with basic authentication enabled for all type of requests and paths.
Next running Kafka Connect with kafka-connect-solr plugin installed and default solrj library replaced with solr-solrj-8.4.1.jar.
Kafka-connect-solr plugin (HttpSolrSinkConnector) has been configured with solr.username, solr.password

{
  "name": "solr-sink-stage",
  "config": {
    "connector.class": "com.github.jcustenborder.kafka.connect.solr.HttpSolrSinkConnector",
    "solr.url": "https://test-solr:8983/solr/collection-test-kafkaconnect1",
    "tasks.max": "1",
    "topics": "topic1",
    "name": "solr-sink-stage",
    "solr.username": "user",
    "solr.password": "password"
  },
  "tasks": [
    {
      "connector": "solr-sink-stage",
      "task": 0
    }
  ],
  "type": "sink"
}

But the very simple test showing the first request is coming without Basic Authentication headers

/ # nc -lp 8983
POST /solr/collection-test-kafkaconnect1/update?wt=javabin&version=2 HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0
Content-Type: application/javabin
Transfer-Encoding: chunked
Host: test-solr:8983
Connection: Keep-Alive

Content

Expecting this commit to work but it seems not.
036fe18

Is it possible to force kafka-connect-solr to do Preemptive Basic Authentication and add authentication headers to all requests including the first one?

Destination Solr Document fails to preserve field order

Examining a topic in kafka shows field ordering as desired for Solr update.

Yet this connector generates a randomly (not front to back or back to front) SolrInputDocument field list.

User Impact: SolrDocuments in the destination cloud are inconsistent with design specs.

Performance Impact: Solr Cloud will suffer performance impacts from misordered document schema. High performance Inverted indexes are frequently designed to minimize time to find a field occurrence of the term when doing constrained field based queries.

Suggested Remedy: Not sure about the code involved, the SinkRecord(?) needs to retrieve topic fields in order into an ordered map. JSON objects are notorious for ordering randomness. LinkedHashMap's are one pattern that can be used to preserve original topic field ordering into a SolrInputDocument.

Can't compile because of missing dependency

Hello, first of all: thanks for the project!

I'm trying to experiment with it, but can't compile since Bintray Repo is no longer available (https://jfrog.com/blog/into-the-sunset-bintray-jcenter-gocenter-and-chartcenter/) and one of the dependencies uses it: com.palantir.docker.compose:docker-compose-rule-core:jar:0.34.0 .

[ERROR] Failed to execute goal on project kafka-connect-solr: Could not resolve dependencies for project com.github.jcustenborder.kafka.connect:kafka-connect-solr:jar:0.1-SNAPSHOT: Could not find artifact com.palantir.docker.compose:docker-compose-rule-core:jar:0.34.0

Recent versions (>=1.0.0) are available in Maven Central: https://search.maven.org/artifact/com.palantir.docker.compose/docker-compose-rule-core .

java.lang.String cannot be cast to java.util.Map

I am trying to send a plain json messages from kafka to solr. I made it work on local computer but when I try it in the cluster (which has the cloudera distribution of solr version of 4.10.3) it doesnt work. I thought it might be the solrj version problem but I also use the same version of solr server on local computer where it works well.

I have also tried to change the solrj client version in dependencies from 7.5.0 to 4.10.3 but that creates another error of NoClassDefFound. So some classes are not present in the solrj client version 4.10.3

Below is the error message I receive from connect-distributed logs (but there are no error messages on the solr logs side):

I will really very appreciate if anybody can help or direct me for this issue?
Thank you very much in advance

[2018-11-08 16:36:52,834] ERROR Request to collection [mh-eval-layer] failed due to (0) java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map, retry? 0 (org.apache.solr.client.solrj.impl.CloudSolrClient:919)
[2018-11-08 16:36:52,835] ERROR WorkerSinkTask{id=SolrSink1-0} RetriableException from SinkTask: (org.apache.kafka.connect.runtime.WorkerSinkTask:577)
org.apache.kafka.connect.errors.RetriableException: org.apache.solr.client.solrj.SolrServerException: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
at com.github.jcustenborder.kafka.connect.solr.SolrSinkTask.put(SolrSinkTask.java:79)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:564)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:999)
at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
at com.github.jcustenborder.kafka.connect.solr.CloudSolrSinkTask.process(CloudSolrSinkTask.java:53)
at com.github.jcustenborder.kafka.connect.solr.SolrSinkTask.put(SolrSinkTask.java:75)
... 11 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
at org.apache.solr.common.cloud.DocRouter.getRouteField(DocRouter.java:53)
at org.apache.solr.common.cloud.CompositeIdRouter.sliceHash(CompositeIdRouter.java:46)
at org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:38)

Usable with AWS Cloud Search?

We think about migrating from Solr to AWS CloudSearch and wonder if the connector would also be usable with CloudSearch (or could be made usable)?

CloudSolrSinkConnector seems to read topics as solr collection

Hi Jeremy,

i use the HttpSolrSinkConnector and all rights, it's fine, works. I want use the CloudSolrSinkConnector with this configuration too:

name=cloudsolr
topics=source-db-11
tasks.max=2
connector.class=com.github.jcustenborder.kafka.connect.solr.CloudSolrSinkConnector
solr.collection.name=solrCollection
solr.zookeeper.hosts=...list of zookeeper nodes....
solr.commit.within=5000

but i have the error below. Can you help me please?
Many thanks

[2017-12-11 15:42:56,113] ERROR Task cloudsolr-1 threw an uncaught and unrecover
able exception (org.apache.kafka.connect.runtime.WorkerSinkTask:482)
org.apache.solr.common.SolrException: Collection not found: source-db-11
at org.apache.solr.client.solrj.impl.CloudSolrClient.getCollectionNames(
CloudSolrClient.java:1139)
at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnS
taleState(CloudSolrClient.java:822)
at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrCl
ient.java:793)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:178
)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:195
)
at com.github.jcustenborder.kafka.connect.solr.CloudSolrSinkTask.process
(CloudSolrSinkTask.java:53)
at com.github.jcustenborder.kafka.connect.solr.SolrSinkTask.put(SolrSink
Task.java:74)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(Worke
rSinkTask.java:464)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.j
ava:265)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkT
ask.java:182)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTas
k.java:150)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146
)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)

Add example using the regex router to write to a different output topic.

Bump parent version

Use topic name to compute the collection to send the updates in "Standard Solr"

In "Standard Solr" mode, there is no way to route messages dynamically to multiple collections using RegexRouter as noted in the docs.

HttpSolrSinkTask should use topic to send the documents to a solr collection with that very same name.
Either appending the collection to the base solrUrl in ...

kafka-connect-solr/src/main/java/com/github/jcustenborder/kafka/connect/solr/HttpSolrSinkTask.java

Line 42 in c917b5c

SolrClient client(String topic) {

Or using the overloaded method process(SolrClient client, String collection) in

kafka-connect-solr/src/main/java/com/github/jcustenborder/kafka/connect/solr/HttpSolrSinkTask.java

Line 56 in c917b5c

UpdateResponse response = updateRequest.process(client);

Retry Attempt without data loss when sink side solr is down during runtime

Need to retry ( without any data loss) based on count from connector config if solr server is down during runtime.

In my case, Its working perfectly whenboth connector and solr are in running state [Active].
But while only solr server is down, there is no retry process until my data passed to the solr leads to data loss..

Sample Connnector config

curl -X POST -H "Content-Type: application/json" --data '{
"name": "t1",
"config": {
"tasks.max": "1",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"connector.class": "com.github.jcustenborder.kafka.connect.solr.HttpSolrSinkConnector",
"topics": "TRAN",
"solr.queue.size": "100",
"solr.commit.within": "10",
"solr.url": "http://192.168.2.221:27052/solr/TRAN",
"errors.retry.delay.max.ms":"5000",
"errors.retry.timeout":"600000",
"errors.tolerance":"all",
"errors.log.enable":"true",
"errors.log.include.messages":"false",
"errors.deadletterqueue.topic.name":"DEAD_TRAN",
"errors.deadletterqueue.topic.replication.factor":"1",
"retry.backoff.ms":"1000",
"reconnect.backoff.ms":"5000",
"reconnect.backoff.max.ms":"600000"
}
}' http://localhost:8083/connectors

Error Information

Problems using Kafka Connect Solr - Invalid Value type java.lang.String is not a supported type.

Hi all !!

@jcustenborder
@didiez
@ilosamart

Currently, I'm trying to create a connector that takes messages from the Kafka topic and sends them to Solr standalone core. Topic messages were created using jcustenborder spooldir csv and linebyline connectors. All messages were parsed to Avro and have a schema in the schema registry service, as shown in the image below.

My problem is the next.

ERROR [solr_connector_test_0046|task-0] WorkerSinkTask{id=solr_connector_test_0046-0} Task threw an uncaught and unrecoverable exception. The task is being killed and will not recover until manually restarted. Error: Only Schema (org.apache.kafka.connect.data.Struct) or Schema less (java.util.Map) are supported. java.lang.String is not a supported type. (org.apache.kafka.connect.runtime.WorkerSinkTask:612)

According to the error, it seems that messages should be stored with schema. I think that these are stored in this way. I checked a lot of forums and the unique recommendation that I found is to change the key converter of the source connector to storage.string but when I try this it doesn't; work. I'm confused with the error also due that messages are not pure strings.

My source connector has the next config

{ "connector.class": "com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector", "processing.file.extension": ".*\\.csv", "csv.first.row.as.header": "True", "finished.path": "/data/CREANGEL/administrator/DATA_DRUID_006/processed", "schema.generation.enabled": "true", "value.converter.schema.registry.url": "http://schema-registry:8081", "input.file.pattern": ".*\\.csv", "name": "CREANGEL_administrator_DATA_DRUID_006_csv", "topic": "CREANGEL_administrator_DATA_DRUID_006_csv", "error.path": "/data/CREANGEL/administrator/DATA_DRUID_006/error", "input.path": "/data/CREANGEL/administrator/DATA_DRUID_006/unprocessed", "value.converter": "io.confluent.connect.avro.AvroConverter", "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://schema-registry:8081" }

My sink connector has the next configuration

{ "connector.class": "com.github.jcustenborder.kafka.connect.solr.HttpSolrSinkConnector", "solr.url": "http://192.168.230.94:15555/", "topics": "CREANGEL_administrator_DATA_DRUID_011_csv", "name": "solr_connector_test_0046" }

My Kafka connect was deployed in a docker container. I use the image confluentinc/cp-kafka-connect-base:6.1.0
I installed the Solr connector manually as you suggested in the Solr sink connector documentation.

Docker-ize Kafka Connect

Thanks for uploading a Kafka Connect example with Solr! Could you also Docker-ize Kafka Connect, to make it easier and more standardized for people to use?

6y547u567u6u

Kafka Connect - Connection could not be established between kafka to Solr .
Even though i give the wrong solr connection url ,but it is not throwing any error.

below is my configurations.
if config is wrong ... please guide me to resolve this issue.

name=KAFKA_CONNECT_SOLAR_CONNECTOR_3
connector.class=com.github.jcustenborder.kafka.connect.solr.HttpSolrSinkConnector
solr.core.name=PRC_TOKENS_CORE
tasks.max=1
topics=PRE_KAFKA_CONNECT_TEST
solr.url=http://localhost:8985/
solr.username=rabeeshv
solr.password=password12345
solr.queue.size = 100

Solr Version - v5.3.0
kafka - v2.5

Support user configurable timeout for connection to zookeeper

Invalid version (expected 2, but 60) or the data in not in 'javabin' format :ConcurrentUpdateSolrClient

[2017-12-08 13:34:48,710] WARN Failed to parse error response from http://:8983/solr/due to: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format (org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient:343)
[2017-12-08 13:34:48,710] ERROR error (org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient:540)
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://:8983/solr/: Not Found

Standard Solr Mode: Failure in updating documents due to malformed update URL

I am facing in issue trying to run the connector in Standard Solr mode, where the ingestion of documents from the topic fails because the solr update URL that is being created is not correct.

With the below configuration of the connector -

{
  "name" : "httpSolrSinkConnector1",
  "config" : {
    "connector.class" : "com.github.jcustenborder.kafka.connect.solr.HttpSolrSinkConnector",
    "tasks.max" : "1",
    "topics" : "authors",
    "solr.url" : "http://indexer:8983/",
    "key.converter":"io.confluent.connect.avro.AvroConverter",
    "key.converter.schema.registry.url":"http://schema-registry:8081",
    "value.converter":"io.confluent.connect.avro.AvroConverter",
    "value.converter.schema.registry.url":"http://schema-registry:8081"
  }
}

I get the following error trace

pipelinesetup-connect-1          | [2022-10-24 06:55:37,850] WARN Failed to parse error response from http://indexer:8983 due to: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format (org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient)
pipelinesetup-connect-1          | [2022-10-24 06:55:37,850] ERROR error (org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient)
pipelinesetup-connect-1          | org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://indexer:8983: Not Found
pipelinesetup-connect-1          | 
pipelinesetup-connect-1          | 
pipelinesetup-connect-1          | 
pipelinesetup-connect-1          | request: http://indexer:8983/update?wt=javabin&version=2
pipelinesetup-connect-1          |      at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:385)
pipelinesetup-connect-1          |      at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:183)
pipelinesetup-connect-1          |      at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
pipelinesetup-connect-1          |      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
pipelinesetup-connect-1          |      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
pipelinesetup-connect-1          |      at java.lang.Thread.run(Thread.java:748)

As you can see the Solr URL generated is http://indexer:8983/update but it should be http://indexer:8983/solr/authors/update

The connector works if I change the solr.url in the config to

"solr.url" : "http://indexer:8983/solr/authors",

This however will force the user to create one connector per topic.

To further make sense of the error stacktrace, especially
Invalid version (expected 2, but 60) or the data in not in 'javabin' format

Which is mentioned in the README as being caused due to version mismatch, is being caused here because when we hit the malformed URL, we get

<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/update. Reason:
<pre>    Not Found</pre></p>
</body>
</html>

Here the first < has the ASCII value 60, which probably explains the error.

I am using Solr version 8.2.0 and my SolrJ version is the same too.

Connector missing in confluent-hub

The connector could not be found anymore in confluent-hub, and the link in the README is also broken https://www.confluent.io/hub/jcustenborder/kafka-connect-solr

Could it be restored? This is no longer possible:

confluent-hub install --no-prompt jcustenborder/kafka-connect-solr:0.1.39

Support basic authentication.

Cannot start kafka-connect process due to "java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory"

I am running this on CentOS8. Kafka, zookeeper, Kafka-Connect and Solr are running on the same server.

Following is the terminal output:
execution error.txt

My plugin path defined in /usr/local/kafka-connect/connect-standalone.properties is: plugin.path=/usr/local/kafka-connect,/usr/local/kafka/libs,/usr/share/java

Here is all the jar files after compilation:

[root@solr lib]# pwd
/usr/local/kafka-connect/kafka-connect-solr/jcustenborder-kafka-connect-solr-0.1-SNAPSHOT/lib
[root@solr lib]# ls
checker-qual-2.10.0.jar httpclient-4.5.6.jar jetty-io-9.4.19.v20190610.jar
commons-io-2.5.jar httpcore-4.4.10.jar jetty-util-9.4.19.v20190610.jar
commons-math3-3.6.1.jar httpmime-4.5.6.jar jsr305-3.0.2.jar
connect-utils-0.4.164.jar j2objc-annotations-1.3.jar kafka-connect-solr-0.1-SNAPSHOT.jar
error_prone_annotations-2.3.4.jar jackson-annotations-2.11.2.jar listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
failureaccess-1.0.1.jar jackson-core-2.11.2.jar reflections-0.9.11.jar
freemarker-2.3.28.jar jackson-databind-2.11.2.jar solr-solrj-8.2.0.jar
guava-28.2-jre.jar javassist-3.21.0-GA.jar stax2-api-3.1.4.jar
http2-client-9.4.19.v20190610.jar jetty-alpn-client-9.4.19.v20190610.jar woodstox-core-asl-4.4.1.jar
http2-common-9.4.19.v20190610.jar jetty-alpn-java-client-9.4.19.v20190610.jar zookeeper-3.5.5.jar
http2-hpack-9.4.19.v20190610.jar jetty-client-9.4.19.v20190610.jar zookeeper-jute-3.5.5.jar
http2-http-client-transport-9.4.19.v20190610.jar jetty-http-9.4.19.v20190610.jar
[root@solr lib]#

It is a standalone system. I built solr for testing purposes. Here is my properties file:

[root@solr etc]# cat httpsolr.properties

name=connsolr
topics=conn
tasks.max=1
connector.class=com.github.jcustenborder.kafka.connect.solr.HttpSolrSinkConnector
solr.url=http://solr:8983/solr/lsl_conn
solr.topic=lsl_conn
solr.core.name=lsl_conn
transforms=prune,key,cast
transforms.prune.type=org.apache.kafka.connect.transforms.ReplaceField$Value
transforms.prune.whitelist=CreatedAt,Id,Text,Source,Truncated
transforms.key.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.key.field=Id
transforms.cast.type=org.apache.kafka.connect.transforms.Cast$Key
transforms.cast.spec=string
[root@solr etc]# ^C
[root@solr etc]# pwd
/usr/local/kafka-connect/kafka-connect-solr/jcustenborder-kafka-connect-solr-0.1-SNAPSHOT/etc

Am I missing something here? Do I need all the attributes for my setup? It is not a cluster. Do I need these "transforms" or "solr.topic"?

Update Solr Version

latest is 7.3.1

Transaction across Solr and Kafka acks handled?

Team,

How are the transactions across Solr and Kafka handled i.e between a document being indexed to Solr and message in kafka being acknowledged?
What if a Solr commit is successful but ack to an individual messages to kafka fails? Will you handle this by deleting that one document from Solr?

Thanks

Exiting WorkerSinkTask due to unrecoverable exception

When there is a problem with Message and SOLR is not able to index and returns exception so we get in to the below , How do we resolve this and skipping this message and move forward. Even if I restart it is keep stuck with this message,
In my connector configuration , Like this, Still it is stuck.
behavior.on.malformed.documents=warn
solr.commit.within=100
errors.tolerance=all
errors.log.enable=true

While connecting to solr , I am getting java.lang.IllegalArgumentException

I am trying to send data to solr from kafka but getting errors as mentioned below.

[2017-03-29 11:03:16,989] INFO HttpSolrSinkConnectorConfig values:
solr.commit.within = -1
solr.core.name = girish1
solr.ignore.unknown.fields = false
solr.password = [hidden]
solr.url = http://192.168.1.5:8983/solr/
solr.username =
(com.github.jcustenborder.kafka.connect.solr.HttpSolrSinkConnectorConfig:180)
[2017-03-29 11:03:16,990] INFO Creating Solr client. (com.github.jcustenborder.kafka.connect.solr.SolrSinkTask:48)
[2017-03-29 11:03:17,348] ERROR Task girishsolr-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:142)
java.lang.IllegalArgumentException
at java.util.concurrent.LinkedBlockingQueue.(LinkedBlockingQueue.java:261)
at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient.(ConcurrentUpdateSolrClient.java:146)
at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Builder.build(ConcurrentUpdateSolrClient.java:684)
at com.github.jcustenborder.kafka.connect.solr.HttpSolrSinkTask.client(HttpSolrSinkTask.java:39)
at com.github.jcustenborder.kafka.connect.solr.SolrSinkTask.start(SolrSinkTask.java:49)
at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:221)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2017-03-29 11:03:17,352] ERROR Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:143)
[2017-03-29 11:03:17,352] ERROR Task girishsolr-0 threw an uncaught and unrecoverable exception during shutdown (org.apache.kafka.connect.runtime.WorkerTask:123)
java.lang.NullPointerException
at com.github.jcustenborder.kafka.connect.solr.SolrSinkTask.stop(SolrSinkTask.java:86)
at org.apache.kafka.connect.runtime.WorkerSinkTask.close(WorkerSinkTask.java:127)
at org.apache.kafka.connect.runtime.WorkerTask.doClose(WorkerTask.java:121)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2017-03-29 11:03:26,485] INFO Reflections took 11541 ms to scan 279 urls, producing 12476 keys and 82021 values (org.reflections.Reflections:229)

Environment Details -
Confluent Version - 3.1.0
Solr Version - 6.3.0
Java Version - openjdk version "1.8.0_111"

command - connect-standalone $CONFLUENT_HOME/etc/schema-registry/connect-avro-standalone.properties /opt/confluent/connectors/kafka-connect-solr/config/httpsolr.properties

$CONFLUENT_HOME - /opt/confluent/confluent-3.1.0

Run update to solr without overwriting existing data

I've already indexed one topic into solr, I'm now trying to index a separate but related topic into the same collection, due to solr's default functionality this results in the existing data being overwritten. Is there a way to tell the connector that the topic being indexed is an update and should therefore be appended rather than overwritten? I've tried looking at kafka transforms to get around this.

My topics -

topic 1 -
{
  "foo": "example", 
  "bar": 1
  "id": 10
}
topic 2 -

{
  "baz":2
  "id": 10
}

Indexing in this order will leave only topic 2 data in solr.

What solr requires -

{
  "baz": 
    {
      "add": [2]
    }
}

Is it possible to tell the connector to include the add field?

Connector Hub Support

jcustenborder/kafka-connect-parent#10

SOLRj 7.3.0 and UpdateRequest.setCommitWithin error

Thanks a lot for this awesome plugin. It is so great.

Hi ,
As per https://www.confluent.io/hub/jcustenborder/kafka-connect-solr
I have installed
jcustenborder/kafka-connect-solr:0.1.33
After that replaced solrsolr-solrj-8.2.0.jar by solr-solrj-7.3.0.jar and added one more jar noggit-0.8.jar,

With below config creator connector but received below exception:
{

"connector.class" : "com.github.jcustenborder.kafka.connect.solr.CloudSolrSinkConnector",
"tasks.max" : "1",
"topics" : "test",
"solr.zookeeper.hosts" : "test.zk:2181",
"transforms" : "dropPrefix",
"transforms.dropPrefix.type" : "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex" : ".*",
"transforms.dropPrefix.replacement" : "test_collection",
"solr.commit.within" : "1"

}

java.lang.NoSuchMethodError: org.apache.solr.client.solrj.request.UpdateRequest.setCommitWithin(I)Lorg/apache/solr/client/solrj/request/AbstractUpdateRequest;
at com.github.jcustenborder.kafka.connect.solr.Operations.addOperation(Operations.java:45)
at com.github.jcustenborder.kafka.connect.solr.Operations.delete(Operations.java:72

Solr Collection

How to set the specific solr collection name in sink config

jcustenborder / kafka-connect-solr Goto Github PK

kafka-connect-solr's People

Contributors

Stargazers

Watchers

Forkers

kafka-connect-solr's Issues

Sample Connnector config

Error Information

below is my configurations. if config is wrong ... please guide me to resolve this issue.

Recommend Projects

Recommend Topics

Recommend Org

below is my configurations.
if config is wrong ... please guide me to resolve this issue.