Code Monkey home page Code Monkey logo

Comments (9)

yanliang567 avatar yanliang567 commented on July 22, 2024

it seems that the milvus service lost connection to etcd, please double check the etcd is healthy. Usually Milvus lost the connection for etcd responses too slowly, check the network and ssd volumes for etcd
/assign @lukezx3
/unassign

from milvus.

lukezx3 avatar lukezx3 commented on July 22, 2024

it seems that the milvus service lost connection to etcd, please double check the etcd is healthy. Usually Milvus lost the connection for etcd responses too slowly, check the network and ssd volumes for etcd /assign @lukezx3 /unassign

hi yes etcd is unhealthy but why? it should be able to handle only 100k records insertion at 10k at a time. this also prevents connection from getting loaded and every basic function call (get_load_state, load_collection) takes several minutes to execute or times out. how to fix?

from milvus.

jaime0815 avatar jaime0815 commented on July 22, 2024

it seems that the milvus service lost connection to etcd, please double check the etcd is healthy. Usually Milvus lost the connection for etcd responses too slowly, check the network and ssd volumes for etcd /assign @lukezx3 /unassign

hi yes etcd is unhealthy but why? it should be able to handle only 100k records insertion at 10k at a time. this also prevents connection from getting loaded and every basic function call (get_load_state, load_collection) takes several minutes to execute or times out. how to fix?

Try to restart etcd, please provide logs of etcd if it still is unhealthy.

from milvus.

lukezx3 avatar lukezx3 commented on July 22, 2024

it seems that the milvus service lost connection to etcd, please double check the etcd is healthy. Usually Milvus lost the connection for etcd responses too slowly, check the network and ssd volumes for etcd /assign @lukezx3 /unassign

hi yes etcd is unhealthy but why? it should be able to handle only 100k records insertion at 10k at a time. this also prevents connection from getting loaded and every basic function call (get_load_state, load_collection) takes several minutes to execute or times out. how to fix?

Try to restart etcd, please provide logs of etcd if it still is unhealthy.

Hey! Yeah I restarted multiple times. Restarted etcd when unhealthy and milvus standalone when it suddenly stops. Do you need etcd logs or standalone logs? I provided some output for standalone logs in my problem statement

from milvus.

lukezx3 avatar lukezx3 commented on July 22, 2024

it seems that the milvus service lost connection to etcd, please double check the etcd is healthy. Usually Milvus lost the connection for etcd responses too slowly, check the network and ssd volumes for etcd /assign @lukezx3 /unassign

hi yes etcd is unhealthy but why? it should be able to handle only 100k records insertion at 10k at a time. this also prevents connection from getting loaded and every basic function call (get_load_state, load_collection) takes several minutes to execute or times out. how to fix?

Try to restart etcd, please provide logs of etcd if it still is unhealthy.

ETCD LOGS

{"level":"info","ts":"2024-07-19T20:26:18.770Z","caller":"traceutil/trace.go:171","msg":"trace[2110741098] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4238; }","duration":"1.621665079s","start":"2024-07-19T20:26:17.148Z","end":"2024-07-19T20:26:18.770Z","steps":["trace[2110741098] 'process raft request' (duration: 631.780521ms)","trace[2110741098] 'get key's previous created_revision and leaseID' (duration: 989.727471ms)"],"step_count":2}
{"level":"warn","ts":"2024-07-19T20:26:18.770Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:17.148Z","time spent":"1.621708247s","remote":"172.18.0.4:52020","response type":"/etcdserverpb.KV/Put","request count":1,"request size":35,"response count":0,"response size":29,"request content":"key:"by-dev/kv/gid/timestamp" value_size:8 "}
{"level":"info","ts":"2024-07-19T20:26:21.714Z","caller":"traceutil/trace.go:171","msg":"trace[1128419264] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4239; }","duration":"1.566294941s","start":"2024-07-19T20:26:20.147Z","end":"2024-07-19T20:26:21.714Z","steps":["trace[1128419264] 'process raft request' (duration: 1.566191688s)"],"step_count":1}
{"level":"warn","ts":"2024-07-19T20:26:21.714Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:20.147Z","time spent":"1.566372068s","remote":"172.18.0.4:52020","response type":"/etcdserverpb.KV/Put","request count":1,"request size":35,"response count":0,"response size":29,"request content":"key:"by-dev/kv/gid/timestamp" value_size:8 "}
{"level":"warn","ts":"2024-07-19T20:26:28.273Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"4.371402076s","expected-duration":"100ms","prefix":"","request":"header:<ID:7587880153017506362 > put:<key:"by-dev/kv/gid/timestamp" value_size:8 >","response":"size:5"}
{"level":"info","ts":"2024-07-19T20:26:28.273Z","caller":"traceutil/trace.go:171","msg":"trace[512952376] linearizableReadLoop","detail":"{readStateIndex:4697; appliedIndex:4696; }","duration":"558.406335ms","start":"2024-07-19T20:26:27.714Z","end":"2024-07-19T20:26:28.273Z","steps":["trace[512952376] 'read index received' (duration: 15.459µs)","trace[512952376] 'applied index is now lower than readState.Index' (duration: 558.390209ms)"],"step_count":2}
{"level":"info","ts":"2024-07-19T20:26:28.273Z","caller":"traceutil/trace.go:171","msg":"trace[2096251078] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4240; }","duration":"5.125443211s","start":"2024-07-19T20:26:23.147Z","end":"2024-07-19T20:26:28.273Z","steps":["trace[2096251078] 'process raft request' (duration: 753.901257ms)","trace[2096251078] 'get key's previous created_revision and leaseID' (duration: 4.37124003s)"],"step_count":2}
{"level":"warn","ts":"2024-07-19T20:26:28.273Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:23.147Z","time spent":"5.125515046s","remote":"172.18.0.4:52020","response type":"/etcdserverpb.KV/Put","request count":1,"request size":35,"response count":0,"response size":29,"request content":"key:"by-dev/kv/gid/timestamp" value_size:8 "}
{"level":"warn","ts":"2024-07-19T20:26:28.273Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"558.550463ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:"health" ","response":"range_response_count:0 size:5"}
{"level":"info","ts":"2024-07-19T20:26:28.273Z","caller":"traceutil/trace.go:171","msg":"trace[583032573] range","detail":"{range_begin:health; range_end:; response_count:0; response_revision:4240; }","duration":"558.604589ms","start":"2024-07-19T20:26:27.714Z","end":"2024-07-19T20:26:28.273Z","steps":["trace[583032573] 'agreement among raft nodes before linearized reading' (duration: 558.454877ms)"],"step_count":1}
{"level":"warn","ts":"2024-07-19T20:26:28.273Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:27.714Z","time spent":"558.644924ms","remote":"127.0.0.1:57750","response type":"/etcdserverpb.KV/Range","request count":0,"request size":8,"response count":0,"response size":29,"request content":"key:"health" "}
{"level":"warn","ts":"2024-07-19T20:26:30.756Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"2.126042125s","expected-duration":"100ms","prefix":"","request":"header:<ID:7587880153017506364 > put:<key:"by-dev/kv/gid/timestamp" value_size:8 >","response":"size:5"}
{"level":"info","ts":"2024-07-19T20:26:30.756Z","caller":"traceutil/trace.go:171","msg":"trace[1742953296] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4241; }","duration":"2.482687125s","start":"2024-07-19T20:26:28.274Z","end":"2024-07-19T20:26:30.756Z","steps":["trace[1742953296] 'process raft request' (duration: 356.438578ms)","trace[1742953296] 'get key's previous created_revision and leaseID' (duration: 2.125923413s)"],"step_count":2}
{"level":"warn","ts":"2024-07-19T20:26:30.756Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:28.274Z","time spent":"2.482726543s","remote":"172.18.0.4:52020","response type":"/etcdserverpb.KV/Put","request count":1,"request size":35,"response count":0,"response size":29,"request content":"key:"by-dev/kv/gid/timestamp" value_size:8 "}
{"level":"warn","ts":"2024-07-19T20:26:30.756Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:28.274Z","time spent":"2.482540496s","remote":"127.0.0.1:57750","response type":"/etcdserverpb.Maintenance/Alarm","request count":-1,"request size":-1,"response count":-1,"response size":-1,"request content":""}
{"level":"warn","ts":"2024-07-19T20:26:36.224Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"4.25964568s","expected-duration":"100ms","prefix":"","request":"header:<ID:7587880153017506366 > put:<key:"by-dev/kv/gid/timestamp" value_size:8 >","response":"size:5"}
{"level":"info","ts":"2024-07-19T20:26:36.225Z","caller":"traceutil/trace.go:171","msg":"trace[91793388] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4242; }","duration":"4.926674898s","start":"2024-07-19T20:26:31.298Z","end":"2024-07-19T20:26:36.225Z","steps":["trace[91793388] 'process raft request' (duration: 666.76842ms)","trace[91793388] 'get key's previous created_revision and leaseID' (duration: 4.259539302s)"],"step_count":2}
{"level":"warn","ts":"2024-07-19T20:26:36.225Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-07-19T20:26:31.298Z","time spent":"4.926785859s","remote":"172.18.0.4:52020","response type":"/etcdserverpb.KV/Put","request count":1,"request size":35,"response count":0,"response size":29,"request content":"key:"by-dev/kv/gid/timestamp" value_size:8 "}

from milvus.

xiaofan-luan avatar xiaofan-luan commented on July 22, 2024

{"level":"info","ts":"2024-07-19T20:26:21.714Z","caller":"traceutil/trace.go:171","msg":"trace[1128419264] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4239; }","duration":"1.566294941s","start":"2024-07-19T20:26:20.147Z","end":"2024-07-19T20:26:21.714Z","steps":["trace[1128419264] 'process raft request' (duration: 1.566191688s)"],"step_count":1}

{"level":"info","ts":"2024-07-19T20:26:21.714Z","caller":"traceutil/trace.go:171","msg":"trace[1128419264] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4239; }","duration":"1.566294941s","start":"2024-07-19T20:26:20.147Z","end":"2024-07-19T20:26:21.714Z","steps":["trace[1128419264] 'process raft request' (duration: 1.566191688s)"],"step_count":1}

if you look at your trace, the etcd seems to be super slow 1.5s (We expect it to be 10ms )

what kind of disk are u using for etcd? I would recommend to use ssd (not really need to be ssd) or ebs

from milvus.

lukezx3 avatar lukezx3 commented on July 22, 2024

{"level":"info","ts":"2024-07-19T20:26:21.714Z","caller":"traceutil/trace.go:171","msg":"trace[1128419264] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4239; }","duration":"1.566294941s","start":"2024-07-19T20:26:20.147Z","end":"2024-07-19T20:26:21.714Z","steps":["trace[1128419264] 'process raft request' (duration: 1.566191688s)"],"step_count":1}

{"level":"info","ts":"2024-07-19T20:26:21.714Z","caller":"traceutil/trace.go:171","msg":"trace[1128419264] put","detail":"{key:by-dev/kv/gid/timestamp; req_size:35; response_revision:4239; }","duration":"1.566294941s","start":"2024-07-19T20:26:20.147Z","end":"2024-07-19T20:26:21.714Z","steps":["trace[1128419264] 'process raft request' (duration: 1.566191688s)"],"step_count":1}

if you look at your trace, the etcd seems to be super slow 1.5s (We expect it to be 10ms )

what kind of disk are u using for etcd? I would recommend to use ssd (not really need to be ssd) or ebs

Hi im using m1 MacBook Pro I think 2020 version. Not exactly sure what disk it is but it’s 16 GB RAM. I’m running all my code/milvus docker containers on a VMware machine though. I’ve given it 12GB ram and 4 CPU cores. I’m not sure how to prevent etcd from being unhealthy or how to resolve the 1.5s issue. What do you suggest?

from milvus.

xiaofan-luan avatar xiaofan-luan commented on July 22, 2024

100k data seems not to be a huge amount of data.12GB should be good enough to run it and mac has ssd for sure.

  1. could you show me your code to insert into milvus? Did you try to flush every time you do insert?
  2. how many partititons do you have?

from milvus.

lukezx3 avatar lukezx3 commented on July 22, 2024

100k data seems not to be a huge amount of data.12GB should be good enough to run it and mac has ssd for sure.

  1. could you show me your code to insert into milvus? Did you try to flush every time you do insert?
  2. how many partititons do you have?

hi yes i have 16 partitions (default because im using partition_key_field). the following is my code

`
for j in range(10):
k = j * 10000
testData = []
for i in range(k, k+10000):

    testDict = {}
    vector = [round(random.uniform(-1.0, 1.0), 16) for _ in range(1536)]
    text = "text" + str(i)

    if i < k + 1000:
        sku = "sku0"
    elif i < k + 2000:
        sku = "sku1"
    elif i < k + 3000:
        sku = "sku2"
    elif i < k + 4000:
        sku = "sku3"
    elif i < k + 5000:
        sku = "sku4"
    elif i < k + 6000:
        sku = "sku5"
    elif i < k + 7000:
        sku = "sku6"
    elif i < k + 8000:
        sku = "sku7"
    elif i < k + 9000:
        sku = "sku8"
    else:
        sku = "sku9"

    metadataKey = "m" + str(i)
    metadataValue = "meta" + str(i)
    metadata = {metadataKey: metadataValue}

    testDict["vector"] = vector
    testDict["text"] = text
    testDict["sku"] = sku
    testDict["metadata"] = metadata
    testData.append(testDict)

insertData = client.insert(
    collection_name="partitions",
    data=testData,
    timeout=3600
)
insertData2 = client.insert(
    collection_name="noPartitions",
    data=testData,
    timeout=3600
)` 

from milvus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.