Comments (17)
I would like to provide what solr client throws exception. Ideally this will not occur but if it is occuring we should have solution to proceed so please suggest some or can I take a look your code and provide pull request ?
Caused by: org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error from server at http://localhost/solr/realtimeindexing_shard6_replica_n7: Exception writing document id 12345678 to the index; possible analysis error: Document contains at least one immense term in field="abc" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[49, 48, 46, 49, 49, 48, 51, 47, 80, 104, 121, 115, 82, 101, 118, 76, 101, 116, 116, 46, 57, 51, 46, 49, 51, 48, 54, 48, 51, 80]...', original message: bytes can be at most 32766 in length; got 38490. Perhaps the document has an indexed string field (solr.StrField) which is too large
at org.apache.solr.client.solrj.impl.CloudSolrClient.getRouteException(CloudSolrClient.java:125)
at org.apache.solr.client.solrj.impl.CloudSolrClient.getRouteException(CloudSolrClient.java:46)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.directUpdate(BaseCloudSolrClient.java:549)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1037)
from kafka-connect-solr.
Reason behind is : below exception only caught.
catch (SolrServerException | IOException ex) {
throw new RetriableException(ex);
}
SOLR cloud exception above is different.
from kafka-connect-solr.
Isn't this fatal though?
Caused by: org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error from server at http://localhost/solr/realtimeindexing_shard6_replica_n7: Exception writing document id 12345678 to the index; possible analysis error: Document contains at least one immense term in field="abc" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[49, 48, 46, 49, 49, 48, 51, 47, 80, 104, 121, 115, 82, 101, 118, 76, 101, 116, 116, 46, 57, 51, 46, 49, 51, 48, 54, 48, 51, 80]...', original message: bytes can be at most 32766 in length; got 38490. Perhaps the document has an indexed string field (solr.StrField) which is too large
from kafka-connect-solr.
Correct, this is the reason. But this should be retried and go to Dead letter queue to proceed further messages, In current situation it stuck not moving at all.
from kafka-connect-solr.
That's not how the dead letter topic works unfortunately. It's only for deserialization errors.
from kafka-connect-solr.
How to resolve this issue, if there is an error in message which is unrecoverable. Just ignore that message and proceed. As I am doing SOLR in schema managed so some messages data is coming unexpectedly. Any suggestion you could provide very helpful.
from kafka-connect-solr.
In the elastic search plugin they provide option drop invalid message and proceeding so probably we could add the flag like this and catch this kind of exception to proceed further. Right now the Task is stuck and partition offset not moving So need some kind of handling this.
catch (ConnectException convertException) {
if (dropInvalidMessage) {
log.error(
"Can't convert record from topic {} with partition {} and offset {}. "
- "Error message: {}",
sinkRecord.topic(),
sinkRecord.kafkaPartition(),
sinkRecord.kafkaOffset(),
convertException.getMessage()
);
} else {
throw convertException;
}
}
from kafka-connect-solr.
+1
this is a valid situation - I'd also welcome an option to 'unblock' by either log+skip (the proposed behavior.on.malformed.documents=warn
)
from kafka-connect-solr.
@ramyogi did you find a solution / settled with an alternative / forked?
from kafka-connect-solr.
We could potentially add support for something like this. The concern I have is most if not all the examples were problems where due to infrastructure failing. Meaning we'd fail on the next message anyway.
from kafka-connect-solr.
Hmm valid concern!
In case of infrastructure failing / timeout/network or other temporary issues - it certainly wouldn't make sense to skip or move messages to dead-letter queue.
Would something more fine granular be feasible?
TBH I'm not familiar with the Solr Java client lib - so I don't know about exception/error behaviour.
from kafka-connect-solr.
I think it boils down to a limitation of the SOLR api. This connector specifically uses add(Collection docs) to index documents. This is the fastest way to write data to Solr. Each batch of records that get written to poll are converted and sent. Do you all know of a way for me to figure out which document failed? That would be immensely helpful in this use case. The alternative is to use add(SolrInputDocument doc) which comes with the warning Adds a single document Many SolrClient implementations have drastically slower indexing performance when documents are added individually. Document batching generally leads to better indexing performance and should be used whenever possible.
Without being able to figure out which document(s) failed I'd have to report the entire batch as failed.
from kafka-connect-solr.
Yes Jeremy, we should be able to log the document unique Id if the situation occurs. Right now this kind of error completely stuck and offset not moving at all. Manually we need to delete the record and resume the kafkaconnect process. I am.looking forward your suggestion then I can prepare a PR .
from kafka-connect-solr.
The issue is it's not going to be A document it's going to be a batch of documents. If we send 50 documents to solr and one is bad which one is it?
from kafka-connect-solr.
is there a config setting to change the batch size?
For a (still manual work involved) recovery process - one could:
- stop/delete the connect task
- restart with batch size=1
- process until again stuck at the exact record causing the failure
- restart with 'skip on error' enabled
- stop and set config back to initial state (~ batch size >1, 'skipOnError'=false)
Or as an alternative - move the entire failing batch to a dl topic for further manual analysis / manual 'replay'.
Depends on the use case - but it might be preferable over the entire sink task to freeze and block.
from kafka-connect-solr.
Yes Jeremy, we should be able to log the document unique Id if the situation occurs. Right now this kind of error completely stuck and offset not moving at all. Manually we need to delete the record and resume the kafkaconnect process. I am.looking forward your suggestion then I can prepare a PR .
The problem is I am unclear on how to determine which document is the offending one. Meaning that if we did something where we dumped to the log, we would have to do the whole batch which could be 500 documents pending on how you did your batch size. The part I really don't like is throwing away a batch over a single document or two.
is there a config setting to change the batch size?
For a (still manual work involved) recovery process - one could:
- stop/delete the connect task
- restart with batch size=1
- process until again stuck at the exact record causing the failure
- restart with 'skip on error' enabled
- stop and set config back to initial state (~ batch size >1, 'skipOnError'=false)
You can do with with max.poll.records
. It's a standard kafka setting. You might need to set this at the worker level pending on the kafka connect version.
Or as an alternative - move the entire failing batch to a dl topic for further manual analysis / manual 'replay'.
Depends on the use case - but it might be preferable over the entire sink task to freeze and block.
Unfortunately dlq functionality is only for serialization at the moment. You don't have an api to produce messages to a DLQ from a connector. I'd literally have to create a producer to do so and that would mean pulling in all of the producer settings.
from kafka-connect-solr.
Unfortunately dlq functionality is only for serialization at the moment. You don't have an api to produce messages to a DLQ from a connector. I'd literally have to create a producer to do so and that would mean pulling in all of the producer settings.
Note: Kafka 2.6 shipped KIP-610
Which brings such capability.
Though with batching and given error scenario - the limitation is the same - the entire batch would have to be 'skipped' and sent to DLQ.
The part I really don't like is throwing away a batch over a single document or two.
Yes, this would probably apply for most use cases.
from kafka-connect-solr.
Related Issues (20)
- What is the JSON format used to add new rows to Solr HOT 8
- Destination Solr Document fails to preserve field order HOT 1
- SOLRj 7.3.0 and UpdateRequest.setCommitWithin error HOT 5
- ZK connection time out config variable ? HOT 3
- Support user configurable timeout for connection to zookeeper
- Usable with AWS Cloud Search? HOT 2
- Basic Authentication is missing in the first request HOT 2
- 6y547u567u6u
- Timeout configuration ?? HOT 4
- Cannot start kafka-connect process due to "java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory" HOT 8
- Retry Attempt without data loss when sink side solr is down during runtime HOT 1
- Run update to solr without overwriting existing data HOT 3
- Can't compile because of missing dependency HOT 1
- Transaction across Solr and Kafka acks handled?
- Solr Collection HOT 1
- Standard Solr Mode: Failure in updating documents due to malformed update URL HOT 1
- Connector missing in confluent-hub
- Use topic name to compute the collection to send the updates in "Standard Solr"
- Problems using Kafka Connect Solr - Invalid Value type java.lang.String is not a supported type.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kafka-connect-solr.