Comments (9)
it seems that the milvus service lost connection to etcd, please double check the etcd is healthy. Usually Milvus lost the connection for etcd responses too slowly, check the network and ssd volumes for etcd
/assign @lukezx3
/unassign
from milvus.
it seems that the milvus service lost connection to etcd, please double check the etcd is healthy. Usually Milvus lost the connection for etcd responses too slowly, check the network and ssd volumes for etcd /assign @lukezx3 /unassign
hi yes etcd is unhealthy but why? it should be able to handle only 100k records insertion at 10k at a time. this also prevents connection from getting loaded and every basic function call (get_load_state, load_collection) takes several minutes to execute or times out. how to fix?
from milvus.
it seems that the milvus service lost connection to etcd, please double check the etcd is healthy. Usually Milvus lost the connection for etcd responses too slowly, check the network and ssd volumes for etcd /assign @lukezx3 /unassign
hi yes etcd is unhealthy but why? it should be able to handle only 100k records insertion at 10k at a time. this also prevents connection from getting loaded and every basic function call (get_load_state, load_collection) takes several minutes to execute or times out. how to fix?
Try to restart etcd, please provide logs of etcd if it still is unhealthy.
from milvus.
it seems that the milvus service lost connection to etcd, please double check the etcd is healthy. Usually Milvus lost the connection for etcd responses too slowly, check the network and ssd volumes for etcd /assign @lukezx3 /unassign
hi yes etcd is unhealthy but why? it should be able to handle only 100k records insertion at 10k at a time. this also prevents connection from getting loaded and every basic function call (get_load_state, load_collection) takes several minutes to execute or times out. how to fix?
Try to restart etcd, please provide logs of etcd if it still is unhealthy.
Hey! Yeah I restarted multiple times. Restarted etcd when unhealthy and milvus standalone when it suddenly stops. Do you need etcd logs or standalone logs? I provided some output for standalone logs in my problem statement
from milvus.
it seems that the milvus service lost connection to etcd, please double check the etcd is healthy. Usually Milvus lost the connection for etcd responses too slowly, check the network and ssd volumes for etcd /assign @lukezx3 /unassign
hi yes etcd is unhealthy but why? it should be able to handle only 100k records insertion at 10k at a time. this also prevents connection from getting loaded and every basic function call (get_load_state, load_collection) takes several minutes to execute or times out. how to fix?
Try to restart etcd, please provide logs of etcd if it still is unhealthy.
ETCD LOGS
{"level":"info","ts":"2024-07-19T20:26:18.770Z","caller":"traceutil/trace.go:171","msg":"trace[2110741098] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4238; }","duration":"1.621665079s","start":"2024-07-19T20:26:17.148Z","end":"2024-07-19T20:26:18.770Z","steps":["trace[2110741098] 'process raft request' (duration: 631.780521ms)","trace[2110741098] 'get key's previous created_revision and leaseID' (duration: 989.727471ms)"],"step_count":2}
{"level":"warn","ts":"2024-07-19T20:26:18.770Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:17.148Z","time spent":"1.621708247s","remote":"172.18.0.4:52020","response type":"/etcdserverpb.KV/Put","request count":1,"request size":35,"response count":0,"response size":29,"request content":"key:"by-dev/kv/gid/timestamp" value_size:8 "}
{"level":"info","ts":"2024-07-19T20:26:21.714Z","caller":"traceutil/trace.go:171","msg":"trace[1128419264] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4239; }","duration":"1.566294941s","start":"2024-07-19T20:26:20.147Z","end":"2024-07-19T20:26:21.714Z","steps":["trace[1128419264] 'process raft request' (duration: 1.566191688s)"],"step_count":1}
{"level":"warn","ts":"2024-07-19T20:26:21.714Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:20.147Z","time spent":"1.566372068s","remote":"172.18.0.4:52020","response type":"/etcdserverpb.KV/Put","request count":1,"request size":35,"response count":0,"response size":29,"request content":"key:"by-dev/kv/gid/timestamp" value_size:8 "}
{"level":"warn","ts":"2024-07-19T20:26:28.273Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"4.371402076s","expected-duration":"100ms","prefix":"","request":"header:<ID:7587880153017506362 > put:<key:"by-dev/kv/gid/timestamp" value_size:8 >","response":"size:5"}
{"level":"info","ts":"2024-07-19T20:26:28.273Z","caller":"traceutil/trace.go:171","msg":"trace[512952376] linearizableReadLoop","detail":"{readStateIndex:4697; appliedIndex:4696; }","duration":"558.406335ms","start":"2024-07-19T20:26:27.714Z","end":"2024-07-19T20:26:28.273Z","steps":["trace[512952376] 'read index received' (duration: 15.459µs)","trace[512952376] 'applied index is now lower than readState.Index' (duration: 558.390209ms)"],"step_count":2}
{"level":"info","ts":"2024-07-19T20:26:28.273Z","caller":"traceutil/trace.go:171","msg":"trace[2096251078] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4240; }","duration":"5.125443211s","start":"2024-07-19T20:26:23.147Z","end":"2024-07-19T20:26:28.273Z","steps":["trace[2096251078] 'process raft request' (duration: 753.901257ms)","trace[2096251078] 'get key's previous created_revision and leaseID' (duration: 4.37124003s)"],"step_count":2}
{"level":"warn","ts":"2024-07-19T20:26:28.273Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:23.147Z","time spent":"5.125515046s","remote":"172.18.0.4:52020","response type":"/etcdserverpb.KV/Put","request count":1,"request size":35,"response count":0,"response size":29,"request content":"key:"by-dev/kv/gid/timestamp" value_size:8 "}
{"level":"warn","ts":"2024-07-19T20:26:28.273Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"558.550463ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:"health" ","response":"range_response_count:0 size:5"}
{"level":"info","ts":"2024-07-19T20:26:28.273Z","caller":"traceutil/trace.go:171","msg":"trace[583032573] range","detail":"{range_begin:health; range_end:; response_count:0; response_revision:4240; }","duration":"558.604589ms","start":"2024-07-19T20:26:27.714Z","end":"2024-07-19T20:26:28.273Z","steps":["trace[583032573] 'agreement among raft nodes before linearized reading' (duration: 558.454877ms)"],"step_count":1}
{"level":"warn","ts":"2024-07-19T20:26:28.273Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:27.714Z","time spent":"558.644924ms","remote":"127.0.0.1:57750","response type":"/etcdserverpb.KV/Range","request count":0,"request size":8,"response count":0,"response size":29,"request content":"key:"health" "}
{"level":"warn","ts":"2024-07-19T20:26:30.756Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"2.126042125s","expected-duration":"100ms","prefix":"","request":"header:<ID:7587880153017506364 > put:<key:"by-dev/kv/gid/timestamp" value_size:8 >","response":"size:5"}
{"level":"info","ts":"2024-07-19T20:26:30.756Z","caller":"traceutil/trace.go:171","msg":"trace[1742953296] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4241; }","duration":"2.482687125s","start":"2024-07-19T20:26:28.274Z","end":"2024-07-19T20:26:30.756Z","steps":["trace[1742953296] 'process raft request' (duration: 356.438578ms)","trace[1742953296] 'get key's previous created_revision and leaseID' (duration: 2.125923413s)"],"step_count":2}
{"level":"warn","ts":"2024-07-19T20:26:30.756Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:28.274Z","time spent":"2.482726543s","remote":"172.18.0.4:52020","response type":"/etcdserverpb.KV/Put","request count":1,"request size":35,"response count":0,"response size":29,"request content":"key:"by-dev/kv/gid/timestamp" value_size:8 "}
{"level":"warn","ts":"2024-07-19T20:26:30.756Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:28.274Z","time spent":"2.482540496s","remote":"127.0.0.1:57750","response type":"/etcdserverpb.Maintenance/Alarm","request count":-1,"request size":-1,"response count":-1,"response size":-1,"request content":""}
{"level":"warn","ts":"2024-07-19T20:26:36.224Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"4.25964568s","expected-duration":"100ms","prefix":"","request":"header:<ID:7587880153017506366 > put:<key:"by-dev/kv/gid/timestamp" value_size:8 >","response":"size:5"}
{"level":"info","ts":"2024-07-19T20:26:36.225Z","caller":"traceutil/trace.go:171","msg":"trace[91793388] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4242; }","duration":"4.926674898s","start":"2024-07-19T20:26:31.298Z","end":"2024-07-19T20:26:36.225Z","steps":["trace[91793388] 'process raft request' (duration: 666.76842ms)","trace[91793388] 'get key's previous created_revision and leaseID' (duration: 4.259539302s)"],"step_count":2}
{"level":"warn","ts":"2024-07-19T20:26:36.225Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:31.298Z","time spent":"4.926785859s","remote":"172.18.0.4:52020","response type":"/etcdserverpb.KV/Put","request count":1,"request size":35,"response count":0,"response size":29,"request content":"key:"by-dev/kv/gid/timestamp" value_size:8 "}
from milvus.
{"level":"info","ts":"2024-07-19T20:26:21.714Z","caller":"traceutil/trace.go:171","msg":"trace[1128419264] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4239; }","duration":"1.566294941s","start":"2024-07-19T20:26:20.147Z","end":"2024-07-19T20:26:21.714Z","steps":["trace[1128419264] 'process raft request' (duration: 1.566191688s)"],"step_count":1}
{"level":"info","ts":"2024-07-19T20:26:21.714Z","caller":"traceutil/trace.go:171","msg":"trace[1128419264] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4239; }","duration":"1.566294941s","start":"2024-07-19T20:26:20.147Z","end":"2024-07-19T20:26:21.714Z","steps":["trace[1128419264] 'process raft request' (duration: 1.566191688s)"],"step_count":1}
if you look at your trace, the etcd seems to be super slow 1.5s (We expect it to be 10ms )
what kind of disk are u using for etcd? I would recommend to use ssd (not really need to be ssd) or ebs
from milvus.
{"level":"info","ts":"2024-07-19T20:26:21.714Z","caller":"traceutil/trace.go:171","msg":"trace[1128419264] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4239; }","duration":"1.566294941s","start":"2024-07-19T20:26:20.147Z","end":"2024-07-19T20:26:21.714Z","steps":["trace[1128419264] 'process raft request' (duration: 1.566191688s)"],"step_count":1}
{"level":"info","ts":"2024-07-19T20:26:21.714Z","caller":"traceutil/trace.go:171","msg":"trace[1128419264] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4239; }","duration":"1.566294941s","start":"2024-07-19T20:26:20.147Z","end":"2024-07-19T20:26:21.714Z","steps":["trace[1128419264] 'process raft request' (duration: 1.566191688s)"],"step_count":1}
if you look at your trace, the etcd seems to be super slow 1.5s (We expect it to be 10ms )
what kind of disk are u using for etcd? I would recommend to use ssd (not really need to be ssd) or ebs
Hi im using m1 MacBook Pro I think 2020 version. Not exactly sure what disk it is but it’s 16 GB RAM. I’m running all my code/milvus docker containers on a VMware machine though. I’ve given it 12GB ram and 4 CPU cores. I’m not sure how to prevent etcd from being unhealthy or how to resolve the 1.5s issue. What do you suggest?
from milvus.
100k data seems not to be a huge amount of data.12GB should be good enough to run it and mac has ssd for sure.
- could you show me your code to insert into milvus? Did you try to flush every time you do insert?
- how many partititons do you have?
from milvus.
100k data seems not to be a huge amount of data.12GB should be good enough to run it and mac has ssd for sure.
- could you show me your code to insert into milvus? Did you try to flush every time you do insert?
- how many partititons do you have?
hi yes i have 16 partitions (default because im using partition_key_field). the following is my code
`
for j in range(10):
k = j * 10000
testData = []
for i in range(k, k+10000):
testDict = {}
vector = [round(random.uniform(-1.0, 1.0), 16) for _ in range(1536)]
text = "text" + str(i)
if i < k + 1000:
sku = "sku0"
elif i < k + 2000:
sku = "sku1"
elif i < k + 3000:
sku = "sku2"
elif i < k + 4000:
sku = "sku3"
elif i < k + 5000:
sku = "sku4"
elif i < k + 6000:
sku = "sku5"
elif i < k + 7000:
sku = "sku6"
elif i < k + 8000:
sku = "sku7"
elif i < k + 9000:
sku = "sku8"
else:
sku = "sku9"
metadataKey = "m" + str(i)
metadataValue = "meta" + str(i)
metadata = {metadataKey: metadataValue}
testDict["vector"] = vector
testDict["text"] = text
testDict["sku"] = sku
testDict["metadata"] = metadata
testData.append(testDict)
insertData = client.insert(
collection_name="partitions",
data=testData,
timeout=3600
)
insertData2 = client.insert(
collection_name="noPartitions",
data=testData,
timeout=3600
)`
from milvus.
Related Issues (20)
- [Bug]: failed to gpubuilder make with grpc error HOT 2
- [Bug]: search_iterator taking too much time to give response after some iteration HOT 11
- [Bug]: incomplete query result, missing id HOT 8
- [Bug]: [Bug]:failed to search/query delegator 617 for channel by-dev-rootcoord-dml_4_448959102009636230v0: fail to Search, QueryNode ID=617, reason=Timestamp lag too large HOT 2
- [Bug]: unstable ut of compaction pickSlot HOT 3
- [Bug]: unstable ut test case parseIndexParams HOT 1
- [Bug]: unstable compaction UT TestPickAnyNode HOT 1
- [Bug]: NULL Pointer Dereference Vulnerability HOT 1
- [Bug]: [Nightly] Search failed reporting GetOutput failed when existing index on varchar field HOT 1
- [Bug]: RSS memory too high in datanode when continuously create collections and insert data HOT 5
- [Bug]: collection meta isn't removed in dataCoord after collection GC completed HOT 4
- [Bug]: milvus standalone crash by bitset assert check fail HOT 2
- [Bug]: Can't use MilvusClient() because it says "ModuleNotFoundError: No module named 'milvus_lite'" HOT 5
- [Enhancement]: Under standalone mode, limit all node cpu and memory usage.
- [Bug]: clustering compaction collectionIsCompacting logic is wrong HOT 2
- [Enhancement]: Show stack backtrace info if milvus crashes with signal generated from non-go thread
- [Enhancement]: use the collection id to group msg pack in the msg dispatcher
- [Enhancement]: Optimizing GitHub Cache Usage for Milvus Builds: Reducing Redundancy and Prioritizing Valuable Artifacts
- [Bug]: proxy auth interceptor may fail in first few search requests due to hook initialization logic
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from milvus.