Is there an existing issue for this? <li class="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

/assign <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

Hi, responding to your comments, <a class="user-mention notranslate" data-hovercard-ty

[Bug]: Data load from collection results in error 'java.lang.Exception: Fail to describe collection' about milvus HOT 18 OPEN

dgoldenberg-ias commented on May 24, 2024

[Bug]: Data load from collection results in error 'java.lang.Exception: Fail to describe collection'

from milvus.

Comments (18)

xiaofan-luan commented on May 24, 2024

Fail to describe collection

could you offer full client side message and also server side log?
We can't see any details in the error message you offered

from milvus.

dgoldenberg-ias commented on May 24, 2024

Hi @xiaofan-luan, below please find the client-side stack trace. I'll see if I can get a server-side log.

java.lang.Exception: Fail to describe collection
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
File <command-3658777111610979>, line 88
     78     client.upsert(COLLECTION_NAME, {"doc_id": row[0], "vector": row[1]})
     80 # -- Read data from Milvus using the Spark connector --
     81 df = spark.read.format("milvus") \
     82     .option("milvus.host", "milvus-standalone.alpha.k8s.dev.303net.net") \
     83     .option("milvus.port", "50051") \
     84     .option("milvus.collection.name", COLLECTION_NAME) \
     85     .option("milvus.collection.vectorField", "vector") \
     86     .option("milvus.collection.vectorDim", str(NUM_DIMENSIONS)) \
     87     .option("milvus.collection.primaryKeyField", "doc_id") \
---> 88     .load() \
     89     .select("doc_id", "vector")
     91 df.show(n=20, truncate=False)

File /databricks/spark/python/pyspark/instrumentation_utils.py:48, in _wrap_function.<locals>.wrapper(*args, **kwargs)
     46 start = time.perf_counter()
     47 try:
---> 48     res = func(*args, **kwargs)
     49     logger.log_success(
     50         module_name, class_name, function_name, time.perf_counter() - start, signature
     51     )
     52     return res

File /databricks/spark/python/pyspark/sql/readwriter.py:314, in DataFrameReader.load(self, path, format, schema, **options)
    312     return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
    313 else:
--> 314     return self._df(self._jreader.load())

File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.__call__(self, *args)
   1349 command = proto.CALL_COMMAND_NAME +\
   1350     self.command_header +\
   1351     args_command +\
   1352     proto.END_COMMAND_PART
   1354 answer = self.gateway_client.send_command(command)
-> 1355 return_value = get_return_value(
   1356    answer, self.gateway_client, self.target_id, self.name)
   1358 for temp_arg in temp_args:
   1359     if hasattr(temp_arg, "_detach"):

File /databricks/spark/python/pyspark/errors/exceptions/captured.py:188, in capture_sql_exception.<locals>.deco(*a, **kw)
    186 def deco(*a: Any, **kw: Any) -> Any:
    187     try:
--> 188         return f(*a, **kw)
    189     except Py4JJavaError as e:
    190         converted = convert_exception(e.java_exception)

File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o943.load.
: java.lang.Exception: Fail to describe collection
	at zilliztech.spark.milvus.MilvusCollection.initCollectionMeta(MilvusCollection.scala:57)
	at zilliztech.spark.milvus.MilvusCollection.milvusCollectionMeta$lzycompute(MilvusCollection.scala:22)
	at zilliztech.spark.milvus.MilvusCollection.milvusCollectionMeta(MilvusCollection.scala:22)
	at zilliztech.spark.milvus.MilvusCollection.collectionSchema$lzycompute(MilvusCollection.scala:24)
	at zilliztech.spark.milvus.MilvusCollection.collectionSchema(MilvusCollection.scala:24)
	at zilliztech.spark.milvus.MilvusCollection.schema(MilvusCollection.scala:35)
	at zilliztech.spark.milvus.Milvus.inferSchema(Milvus.scala:127)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:91)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:138)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:333)
	at scala.Option.flatMap(Option.scala:271)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:331)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:226)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
	at java.lang.Thread.run(Thread.java:750)
Caused by: io.grpc.StatusRuntimeException: INTERNAL: http2 exception
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165)
	at io.milvus.grpc.MilvusServiceGrpc$MilvusServiceBlockingStub.describeCollection(MilvusServiceGrpc.java:4421)
	at io.milvus.client.AbstractMilvusGrpcClient.describeCollection(AbstractMilvusGrpcClient.java:691)
	at io.milvus.client.MilvusServiceClient.lambda$describeCollection$9(MilvusServiceClient.java:343)
	at io.milvus.client.MilvusServiceClient.retry(MilvusServiceClient.java:259)
	at io.milvus.client.MilvusServiceClient.describeCollection(MilvusServiceClient.java:343)
	at zilliztech.spark.milvus.MilvusCollection.initCollectionMeta(MilvusCollection.scala:54)
	... 24 more
Caused by: io.netty.handler.codec.http2.Http2Exception: First received frame was not SETTINGS. Hex dump for first 5 bytes: 485454502f
	at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:109)
	at io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.verifyFirstFrameIsSettings(Http2ConnectionHandler.java:353)
	at io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.decode(Http2ConnectionHandler.java:247)
	at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:453)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 more

from milvus.

dgoldenberg-ias commented on May 24, 2024

@xiaofan-luan this is something about HTTP vs HTTP2.
These seem similar:
grpc/grpc-java#2905
https://stackoverflow.com/questions/73331402/grpc-client-http2exception-first-received-frame-was-not-settings-hex-dump-fo

Any idea? Might this be something to do with dependency libraries? protobuf, grpc... ?

The server-side logs are clean, no errors.

serverside-logs-last-50-lines.txt

serverside-logs.txt

from milvus.

xiaofan-luan commented on May 24, 2024

I think this is not a bug. Might becasue of you enabled tls server side but connected without tls or what.

from milvus.

yanliang567 commented on May 24, 2024

/assign @dgoldenberg-ias
please share some info about how you config tls or deploy milvus?
/unassign

from milvus.

yhmo commented on May 24, 2024

Why the "milvus.port" is "50051"?
Is the 50051 mapped from 19530 of Milvus?

from milvus.

dgoldenberg-ias commented on May 24, 2024

Hi, responding to your comments, @yanliang567, @yhmo:

We did not configure TLS or do anything explicit related to it.
Regarding the ports, here's the explanation from our devops:

50051 is the default port for grpc client
we’re forwarding it to 19530
We restrict 19530 port from the client side

Note that 50051 works from the client side generally, when using pymilvus, e.g. the MilvusClient.
3. How the standalone Milvus is currently configured - I'm attaching the values yaml file with all the settings.

Generally, I think there may be a classpath issue whereby grpc is getting http1 dependency jars loaded ahead of http2 dependencies. Can you provide a list of http2 dependencies you think should be there? or http1 jars we could check for on the classpath?

Of note, the code is running as a python notebook in Databricks, with the spark_milvus_1_0_0_SNAPSHOT.jar installed into the Libraries of the Spark cluster.

milvus_standalone_configs.txt

from milvus.

xiaofan-luan commented on May 24, 2024

Caused by: io.netty.handler.codec.http2.Http2Exception: First received frame was not SETTINGS. Hex dump for first 5 bytes: 485454502f

Don't quite understand that, since

Caused by: io.netty.handler.codec.http2.Http2Exception: First received frame was not SETTINGS. Hex dump for first 5 bytes: 485454502f

this seems to be very related to network.

All the dependencies is list in the mvn file: https://github.com/milvus-io/milvus-sdk-java

from milvus.

xiaofan-luan commented on May 24, 2024

try lastest 2.3.5 might help you since we upgrade grpc and netty version

from milvus.

dgoldenberg-ias commented on May 24, 2024

I think you mean upgrading to 2.4.0, right? We're actually on 2.3.12, just checked.

-- Side note --
I've re-run my test on Databricks 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) which is Spark 3.3.2; that's the version your Spark connector is built for. Spark is all the way up to 3.5.1 now. Databricks supports Spark up to 3.5.0.

I get the same error, with Spark 3.3.2. Just thinking that you may want to build the connector for various Spark versions. Maybe upgrade the code to latest Spark. Spark API's have changed. For instance, if I do:

        <!-- <spark.version>3.3.2</spark.version> -->
        <spark.version>3.4.1</spark.version>

I get a build error:

[ERROR] /Users/dgoldenberg/tech/milvus/spark-milvus/src/main/java/org/apache/spark/sql/execution/datasources/parquet/MVectorizedParquetRecordReader.java:[268,43] incompatible types: org.apache.spark.sql.execution.vectorized.WritableColumnVector cannot be converted to org.apache.spark.sql.execution.vectorized.ConstantColumnVector

from milvus.

xiaofan-luan commented on May 24, 2024

no I mean SDK version.

Using Milvus-SDK-java 2.3.5

from milvus.

dgoldenberg-ias commented on May 24, 2024

Not sure how this is applicable. We're developing using Python. We're using PyMilvus 2.4.0.

How/where would the Milvus Java SDK come into the picture?

from milvus.

xiaofan-luan commented on May 24, 2024

but the error message seems to be from java connector?

from milvus.

xiaofan-luan commented on May 24, 2024

seems that spark connector is using java connector.
could you try to ping your milvus cluster from spark cluster and check if network is avalaible?

from milvus.

dgoldenberg-ias commented on May 24, 2024

seems that spark connector is using java connector.

Milvus' Spark connector is written in Java. There is no Spark connector using Java connector. There is one JAR which is the Milvus Spark connector, it's in Java, therefore the Java stack trace.

could you try to ping your milvus cluster from spark cluster and check if network is avalaible?

We don't need to do that. Look at the code. As you can see, we first establish a connection with Milvus using the pymilvus MilvusClient, we then create a collection, and feed several documents to it. All of that succeeds. That means that network is totally fine, the port is absolutely correct, and Milvus Standalone is working.

Let's establish that and get past these points, please.

What's not working is the Spark connector. My guess is that there is some classpath issue where HTTP1 dependency libraries are being loaded instead of HTTP2.

from milvus.

xiaofan-luan commented on May 24, 2024

agreed on the analysis. is there any suggestion on how to verify the probleam.
And we are sure that pymilvus and spark is running on the same network env?

from milvus.

xiaofan-luan commented on May 24, 2024

grpc/grpc-java#2905 this might be the issue?

from milvus.

dgoldenberg-ias commented on May 24, 2024

Hi @xiaofan-luan, I don't know if that is "the issue". It seems to me that at least for the Milvus Spark connector, HTTP2 is expected. Is that true? This doesn't seem to be documented but seems likely. Then it'd follow that we need to enable TLS.

There is no mention of it in https://milvus.io/docs/integrate_with_spark.md.

I'm thinking we'd then want to follow https://milvus.io/docs/tls.md.

If this is what's happening, I think it'd be definitely worth mentioning in the Spark-Milvus Connector User Guide.

from milvus.

[Bug]: Data load from collection results in error 'java.lang.Exception: Fail to describe collection' about milvus HOT 18 OPEN

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent