Code Monkey home page Code Monkey logo

Comments (5)

xiaofan-luan avatar xiaofan-luan commented on September 27, 2024

/assign @bigsheeper
can you help on checking this?

from milvus.

bigsheeper avatar bigsheeper commented on September 27, 2024

At 11:02, stop querynode 2:
3Zo3JSg5MK

But for a long period thereafter, querycoord continued attempting to let querynode 2 watch the DML channel:
BABNnHbXOj

Maybe we need to check if querynode state is stopping before sending watch dml request.

@weiliu1031
cc @xiaofan-luan

from milvus.

xiaofan-luan avatar xiaofan-luan commented on September 27, 2024

At 11:02, stop querynode 2: 3Zo3JSg5MK

But for a long period thereafter, querycoord continued attempting to let querynode 2 watch the DML channel: BABNnHbXOj

Maybe we need to check if querynode state is stopping before sending watch dml request.

@weiliu1031 cc @xiaofan-luan

func (scheduler *taskScheduler) checkStale(task Task) error

for step, action := range task.Actions() {
	log := log.With(
		zap.Int64("nodeID", action.Node()),
		zap.Int("step", step))

	if scheduler.nodeMgr.Get(action.Node()) == nil {
		log.Warn("the task is stale, the target node is offline")
		return merr.WrapErrNodeNotFound(action.Node())
	}
}

for stopping node, we should also mark the task as stale?

from milvus.

ThreadDao avatar ThreadDao commented on September 27, 2024

fixed 2.3-20240426-d56bec07-amd64
https://argo-workflows.zilliz.cc/archived-workflows/qa/653c80c7-677d-49eb-90c9-0123d0b0d7b8?nodeId=zong-chaos-cluster-233-1

from milvus.

weiliu1031 avatar weiliu1031 commented on September 27, 2024

At 11:02, stop querynode 2: 3Zo3JSg5MK
But for a long period thereafter, querycoord continued attempting to let querynode 2 watch the DML channel: BABNnHbXOj
Maybe we need to check if querynode state is stopping before sending watch dml request.
@weiliu1031 cc @xiaofan-luan

func (scheduler *taskScheduler) checkStale(task Task) error

for step, action := range task.Actions() {
	log := log.With(
		zap.Int64("nodeID", action.Node()),
		zap.Int("step", step))

	if scheduler.nodeMgr.Get(action.Node()) == nil {
		log.Warn("the task is stale, the target node is offline")
		return merr.WrapErrNodeNotFound(action.Node())
	}
}

for stopping node, we should also mark the task as stale?

we only make grow task as stale, cause for graceful stopping query node, qc need to release all channel/segment on the stopping node. so we only add the check logic for grow task

from milvus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.