Code Monkey home page Code Monkey logo

Comments (7)

yanliang567 avatar yanliang567 commented on September 26, 2024

/assign @congqixia
/unassign

from milvus.

xiaofan-luan avatar xiaofan-luan commented on September 26, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.3.14
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 256G
- GPU: 
- Others:

Current Behavior

业务提交方式为;逐条提交,产生了44w个dropped状态的segment和50w个flushed状态segment,当集群资源不足造成milvus服务挂掉。再次重启是 datacoord节点在读取etcd的数据时报错如果:datacoord 日志

2024/05/24 04:52:56.013 +00:00] [INFO] [datacoord/channel_checker.go:113] ["timer started"] ["watch state"=ToWatch] [nodeID=1318] [channelName=by-dev-rootcoord-dml_6_444366786892873263v0] ["check interval"=15m0s] {"level":"warn","ts":"2024-05-24T04:52:56.019Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0005c6e00/xxxxxxxx:2379","attempt":0,"error":"rpc error: code = ResourceExhausted desc = trying to send message larger than max (3386659 vs. 2097152)"} [2024/05/24 04:52:56.271 +00:00] [WARN] [datacoord/channel_manager.go:639] ["fail to update"] [updates="["{type=Delete,nodeID=1055,channels="[by-dev-rootcoord-dml_11_444366786892873263v5, by-dev-rootcoord-dml_13_444366786893627283v2, by-dev-rootcoord-dml_2_444366786893627212v0, by-dev-rootcoord-dml_4_444366786893627212v2, by-dev-rootcoord-dml_6_444366786892873263v0, by-dev-rootcoord-dml_7_444366786892873263v1, by-dev-rootcoord-dml_8_444366786892873263v2, by-dev-rootcoord-dml_9_444366786892873263v3]"}","{type=Add,nodeID=1317,channels="[by-dev-rootcoord-dml_13_444366786893627283v2, by-dev-rootcoord-dml_7_444366786892873263v1]"}","{type=Add,nodeID=1316,channels="[by-dev-rootcoord-dml_2_444366786893627212v0, by-dev-rootcoord-dml_8_444366786892873263v2]"}","{type=Add,nodeID=1214,channels="[by-dev-rootcoord-dml_4_444366786893627212v2, by-dev-rootcoord-dml_9_444366786892873263v3]"}","{type=Add,nodeID=1318,channels="[by-dev-rootcoord-dml_11_444366786892873263v5, by-dev-rootcoord-dml_6_444366786892873263v0]"}"]"] [error="rpc error: code = ResourceExhausted desc = trying to send message larger than max (3386659 vs. 2097152)"] [2024/05/24 04:52:56.271 +00:00] [INFO] [datacoord/channel_checker.go:155] ["remove timer for channel"] [channel=by-dev-rootcoord-dml_13_444366786893627283v2] [timerCount=12] [2024/05/24 04:52:56.271 +00:00] [INFO] [datacoord/channel_checker.go:155] ["remove timer for channel"] [channel=by-dev-rootcoord-dml_7_444366786892873263v1] [timerCount=11] [2024/05/24 04:52:56.271 +00:00] [INFO] [datacoord/channel_checker.go:155] ["remove timer for channel"] [channel=by-dev-rootcoord-dml_2_444366786893627212v0] [timerCount=10] [2024/05/24 04:52:56.271 +00:00] [INFO] [datacoord/channel_checker.go:155] ["remove timer for channel"] [channel=by-dev-rootcoord-dml_8_444366786892873263v2] [timerCount=9] [2024/05/24 04:52:56.271 +00:00] [INFO] [datacoord/channel_checker.go:155] ["remove timer for channel"] [channel=by-dev-rootcoord-dml_4_444366786893627212v2] [timerCount=8] [2024/05/24 04:52:56.271 +00:00] [INFO] [datacoord/channel_checker.go:155] ["remove timer for channel"] [channel=by-dev-rootcoord-dml_9_444366786892873263v3] [timerCount=7] [2024/05/24 04:52:56.271 +00:00] [INFO] [datacoord/channel_checker.go:155] ["remove timer for channel"] [channel=by-dev-rootcoord-dml_11_444366786892873263v5] [timerCount=6] [2024/05/24 04:52:56.271 +00:00] [INFO] [datacoord/channel_checker.go:155] ["remove timer for channel"] [channel=by-dev-rootcoord-dml_6_444366786892873263v0] [timerCount=5] [2024/05/24 04:52:56.271 +00:00] [WARN] [datacoord/server.go:516] ["DataCoord Cluster Manager failed to start up"] [error="rpc error: code = ResourceExhausted desc = trying to send message larger than max (3386659 vs. 2097152)"] [2024/05/24 04:52:56.271 +00:00] [ERROR] [datacoord/server.go:314] ["DataCoord init failed"] [error="rpc error: code = ResourceExhausted desc = trying to send message larger than max (3386659 vs. 2097152)"] [stack="github.com/milvus-io/milvus/internal/datacoord.(*Server).Init.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:314\ngithub.com/milvus-io/milvus/internal/util/sessionutil.(*Session).ProcessActiveStandBy\n\t/go/src/github.com/milvus-io/milvus/internal/util/sessionutil/session_util.go:1103\ngithub.com/milvus-io/milvus/internal/datacoord.(*Server).Register.func2\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:266"] [2024/05/24 04:52:56.271 +00:00] [INFO] [datacoord/channel_checker.go:134] ["stop timer before timeout"] ["watch state"=ToWatch] [nodeID=1316] [channelName=by-dev-rootcoord-dml_8_444366786892873263v2] ["timeout interval"=15m0s] [runningTimerCount=5]

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

milvus-logs20240524_00.tar.gz

No response

Anything else?

No response

you should not run flush on every insertion.

You are hitting a issue that you has too many segments and

from milvus.

xiaofan-luan avatar xiaofan-luan commented on September 26, 2024

you can tune the rpc size limit to recover the cluster temporarily. but this is not the way milvus build to use

from milvus.

xiaofan-luan avatar xiaofan-luan commented on September 26, 2024

the recommended way is to write all data then flush once or don't flush at all

from milvus.

goldbridge18 avatar goldbridge18 commented on September 26, 2024

we will change from single insert to batch insert,

from milvus.

xiaofan-luan avatar xiaofan-luan commented on September 26, 2024

they key is you should not call flush while insertion. the data is visible even not flushed

from milvus.

stale avatar stale commented on September 26, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

from milvus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.