Code Monkey home page Code Monkey logo

Comments (13)

xiaofan-luan avatar xiaofan-luan commented on September 26, 2024 1

I guess the blocking here is the segment you have here, the more upload operations are.

Once you have 10000 segments, each upload takes 20ms, then, it's 200s.

Let's take a pool with cpu core * 8 nodes for concurrent file flush.

from milvus.

yanliang567 avatar yanliang567 commented on September 26, 2024

/unassign

from milvus.

XuanYang-cn avatar XuanYang-cn commented on September 26, 2024

It's not stucked, L0 compaction executes are slower and slower when segment num increases. And all the l0 tasks are pending to execute. Making MixCompaction unable to schedule.

need to refine scheduler to avoid this situation.

from milvus.

xiaofan-luan avatar xiaofan-luan commented on September 26, 2024

L0 compaction becomes slower because the bf?

from milvus.

xiaofan-luan avatar xiaofan-luan commented on September 26, 2024

maybe we need an extra stage of L0 compaction:

  1. concurrently load bf into a datanode(maybe cached)
  2. concurrently split all L0 segments to existing segments
  3. lock the L1/L2 compaction
  4. copy delta logs if necessary to compacted result
  5. release the lock.

I would assume this will simply release the pressure of compaction and reduce the lock holding time

from milvus.

xiaofan-luan avatar xiaofan-luan commented on September 26, 2024

we may also need a cache mechanism for cache BF for fast loading in the future.

maybe cache it on log node is good idea? @XuanYang-cn @congqixia @tedxu

from milvus.

XuanYang-cn avatar XuanYang-cn commented on September 26, 2024

L0 compaction becomes slower because the bf?

@xiaofan-luan
L0 rely on the cached bf in datanode, becase we want datanode to work when disable L0. So currently, L0 compaction process doesn't need to load bf from S3, there're all in datanode already.

Need tracing to know the time costs, as segment num increased to 8000, l0 compaction p99 cost 15mins.

And l0 compaction tasks by pass the scheduler max task num, submitted hundreds of L0 tasks. causing no MixCompaction can be scheduled. And no MixCompaction can be scheduled is the root cause of the increasing segment num.

#31270 would prevent scheduler to append endless l0 tasks.

from milvus.

XuanYang-cn avatar XuanYang-cn commented on September 26, 2024

@wangting0128 please help verify, also could you enable tracing during tests?
/assign @wangting0128

from milvus.

XuanYang-cn avatar XuanYang-cn commented on September 26, 2024

A typcial batch L0 process tracing
image

from milvus.

XuanYang-cn avatar XuanYang-cn commented on September 26, 2024

Very obvious and quick enahncement: change linear upload to batch upload

from milvus.

xiaofan-luan avatar xiaofan-luan commented on September 26, 2024

make sense to me

from milvus.

wangting0128 avatar wangting0128 commented on September 26, 2024

After verification, under the current mechanism, there are two ways to alleviate the problem of slow compaction speed under concurrent DML.

verified image:2.4-20240318-506534c2

  1. Add dataNode count
截屏2024-03-18 20 49 19 截屏2024-03-18 20 49 37 截屏2024-03-18 20 49 52
  1. Adjust config parameters to improve compaction concurrency
    dataCoord:
      compaction:
        workerMaxParallelTaskNum: 5
        maxParallelTaskNum: 20
截屏2024-03-18 20 52 02 截屏2024-03-18 20 52 23

from milvus.

wangting0128 avatar wangting0128 commented on September 26, 2024

After verification, under the current mechanism, there are two ways to alleviate the problem of slow compaction speed under concurrent DML.

verified image:2.4-20240318-506534c2

  1. Add dataNode count

截屏2024-03-18 20 49 19 截屏2024-03-18 20 49 37 截屏2024-03-18 20 49 52
2. Adjust config parameters to improve compaction concurrency

    dataCoord:
      compaction:
        workerMaxParallelTaskNum: 5
        maxParallelTaskNum: 20

截屏2024-03-18 20 52 02 截屏2024-03-18 20 52 23

After verification, this issue can be closed

from milvus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.