Comments (14)
According to these issues:
#32714
#32894
It seems that this scene is caused by the high writing throughput, thus some datanodes became unhealthy, so there will be too much growing segments in some querynode. (But why the already sealed segment loaded num is decreased? I am not very clearly about what happened inside).
I tried to re-deploy the querynodes and re-load the collection, the memory usage becomes back to normal:
Hope there will be some helpful info.
- Should I scale up the datanodes?
- And how to control the writing rate or concurrency for large scale data scene?
from milvus.
@tianshihan818 thank you for the issue and updates. It seems that something stuck in that querynode, but we cannot tell without milvus logs. Could you please refer this doc to export the whole Milvus logs for investigation?
BTW, which index type are you running?
/assign @tianshihan818
from milvus.
@yanliang567 Thanks! Here are the logs:
milvus-log.tar.gz
from milvus.
/assign @congqixia
please help to take a look.
from milvus.
Additional info:
The initial logs are lost, so I reproduce this test(concurrent insert), and get the above logs, i.e.
milvus-log.tar.gz
This time the insert API first report error at June 17th 22:50: pymilvus.exceptions.MilvusException: <MilvusException: (code=65535, message=message send timeout: TimeoutError)>
(The time on the machine nodes is set to be 8 hours later, so in logs should be around 14:50.)
And I checked the metrics and pods list during this test period.
- Pod list
![企业微信截图_17186918952022](https://private-user-images.githubusercontent.com/154961771/340563647-437c2dcd-6bb8-4c84-a502-cef00d8ac45b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwMDUzNjAsIm5iZiI6MTcyMDAwNTA2MCwicGF0aCI6Ii8xNTQ5NjE3NzEvMzQwNTYzNjQ3LTQzN2MyZGNkLTZiYjgtNGM4NC1hNTAyLWNlZjAwZDhhYzQ1Yi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwM1QxMTExMDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mNjE4ZDRiZjE5NGRiYjcyN2M0MDZmYzA1MzNmZjhjNjhlZWFiMjE4ZmM3ODA2NDFhMDU0MDQ2OTY3YzZjNDVjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.woilh1SS3xeJuEk4Misbq5rwXAtGSMpYJ-SyLHDICc0)
- Memory
![企业微信截图_17186908729771](https://private-user-images.githubusercontent.com/154961771/340564220-6e3363d8-cef2-4e2b-bd83-5d7f437fc2c7.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwMDUzNjAsIm5iZiI6MTcyMDAwNTA2MCwicGF0aCI6Ii8xNTQ5NjE3NzEvMzQwNTY0MjIwLTZlMzM2M2Q4LWNlZjItNGUyYi1iZDgzLTVkN2Y0MzdmYzJjNy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwM1QxMTExMDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03MDM4YTRiOTM4MmNiZjhjOGUzOWI4MzQ3MDQ1NWNkMGY2YzMyNGMzNGY0NGRjYzhkMWIwMjE0ZDRmZjY4NWI0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.STsJua2QXjhAxcQkTJfsLC_bJR0XvzVGl2A6gUDXSrI)
- Segement Loaded Num
![企业微信截图_17186912067480](https://private-user-images.githubusercontent.com/154961771/340564396-e3b0b4cc-6466-475c-a83d-bf41e3b0b0a3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwMDUzNjAsIm5iZiI6MTcyMDAwNTA2MCwicGF0aCI6Ii8xNTQ5NjE3NzEvMzQwNTY0Mzk2LWUzYjBiNGNjLTY0NjYtNDc1Yy1hODNkLWJmNDFlM2IwYjBhMy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwM1QxMTExMDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0wY2JiMWQ2MjcxMTU0NjdlYmJkYWU3ZjZjOWY2ODJhYzk5NDFmYWQyODEyYTA3ZDgyZTBhNDljZTM2YzU2ZDFkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.Q2hBuB9sRw5CbBZl9HcgeiWXkQOuzoZxDEOnRceLiAQ)
- Queryable Entity Num
![企业微信截图_17186912832872](https://private-user-images.githubusercontent.com/154961771/340564585-c617e85b-f948-4fdf-be91-d7ba083df26c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwMDUzNjAsIm5iZiI6MTcyMDAwNTA2MCwicGF0aCI6Ii8xNTQ5NjE3NzEvMzQwNTY0NTg1LWM2MTdlODViLWY5NDgtNGZkZi1iZTkxLWQ3YmEwODNkZjI2Yy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwM1QxMTExMDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mMzAwOGFkNzQwYjUyOTRmNTUxZTQ4YjJlNDBkMWVlMjE2ZGU1ZmRjMDBkYThiNmZkZTFmMjZhZjk3ZTk0NTBhJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.PqnnLyOM0IkGmL816bzZqRGY7NOA2vmJi_105Zz2XW8)
- Goroutines Total
![企业微信截图_17186922195118](https://private-user-images.githubusercontent.com/154961771/340564945-2145dcb6-3693-4fdf-9bd1-882796402e04.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwMDUzNjAsIm5iZiI6MTcyMDAwNTA2MCwicGF0aCI6Ii8xNTQ5NjE3NzEvMzQwNTY0OTQ1LTIxNDVkY2I2LTM2OTMtNGZkZi05YmQxLTg4Mjc5NjQwMmUwNC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwM1QxMTExMDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00ZTliYjY4MTU1NmQ2ZmY4NjhjYzA1YWVhZmE1NmI2OGRlMDI1MzQ5ZTYyNDJlNzc2MWJhZmYxNWExOTNkNjkwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.0YWCXE2t7975CScnaj5E-VTcYDv9Tw09vhU0Q6CDWW0)
- Goroutines
![企业微信截图_17186923342838](https://private-user-images.githubusercontent.com/154961771/340565395-743f780b-ad7d-4700-bf78-ceb71bc3eef8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwMDUzNjAsIm5iZiI6MTcyMDAwNTA2MCwicGF0aCI6Ii8xNTQ5NjE3NzEvMzQwNTY1Mzk1LTc0M2Y3ODBiLWFkN2QtNDcwMC1iZjc4LWNlYjcxYmMzZWVmOC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwM1QxMTExMDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hYWIzOTdlNTVkMTRjYTdhZThlZDY5Zjk0ZWM3ZDA3MmUzYzE5MjU3ZjcxMjZiOGFiY2FmY2Y1ODI5YjcwMDhkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.cQKhKUUq8tIzP3L7Soz3MW-xoZvUIbN_7yHp_qz844c)
- CPU
![企业微信截图_17186924613424](https://private-user-images.githubusercontent.com/154961771/340566042-a3242aa9-a8ff-4899-84b3-867e96b4a9fd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwMDUzNjAsIm5iZiI6MTcyMDAwNTA2MCwicGF0aCI6Ii8xNTQ5NjE3NzEvMzQwNTY2MDQyLWEzMjQyYWE5LWE4ZmYtNDg5OS04NGIzLTg2N2U5NmI0YTlmZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwM1QxMTExMDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zNTA3NjlmYTlkZDc0N2I3Zjk5Zjc1NDMzMmMwOGI3N2QwNzVhZjViNGI2YjU1ZjQ4NzQ3M2JjZWFhNjEwNThiJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.RTuX0INAN5Am6DCYuRBV5MTMpotdL0tYicnU2rvM-Wo)
- OS Threads
![企业微信截图_17186925929468](https://private-user-images.githubusercontent.com/154961771/340566473-e1b07855-9a76-442d-9109-54ad6689982e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwMDUzNjAsIm5iZiI6MTcyMDAwNTA2MCwicGF0aCI6Ii8xNTQ5NjE3NzEvMzQwNTY2NDczLWUxYjA3ODU1LTlhNzYtNDQyZC05MTA5LTU0YWQ2Njg5OTgyZS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwM1QxMTExMDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00YWQ2MDlhY2I5ODZjOWJmMmZjOTQzNzgwNzk1OWZmOWZiNTIyM2U3NzBkMWM1MTlmNmU5YWI2MTkzMzNhNDdlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.VSbHsRcFAkKomBzESofwUKc2zUngHzC5ySBfHdT-rhY)
- GC Max duration seconds
![企业微信截图_17186926669084](https://private-user-images.githubusercontent.com/154961771/340566745-942af394-07b0-4f8d-8a7d-aca5147cd0da.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwMDUzNjAsIm5iZiI6MTcyMDAwNTA2MCwicGF0aCI6Ii8xNTQ5NjE3NzEvMzQwNTY2NzQ1LTk0MmFmMzk0LTA3YjAtNGY4ZC04YTdkLWFjYTUxNDdjZDBkYS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwM1QxMTExMDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xYmQ3ODMzMTllZjQyMzIyMzcxNDM2YmViMGM4ZDZjY2IwNTM1YTkzM2U5M2ZlYWE1ZGNlNWUyNjE0MzQ5ZWI5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.YTh6cG6vPMByjxyPhA0vGuoyD17ifIQefY6CgApSVu8)
- Other metrics
![企业微信截图_17186945946792](https://private-user-images.githubusercontent.com/154961771/340576254-4546b0f9-2829-4e4a-8d81-0bf5c7faf4ad.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwMDUzNjAsIm5iZiI6MTcyMDAwNTA2MCwicGF0aCI6Ii8xNTQ5NjE3NzEvMzQwNTc2MjU0LTQ1NDZiMGY5LTI4MjktNGU0YS04ZDgxLTBiZjVjN2ZhZjRhZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwM1QxMTExMDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xMTQwN2NkM2ZhNzE2OWQzNTFiYzE4ZGZhMzVmMmQwMjNkMGE5MTJhZGQ5NTlmNGVlZDI2MTM3NzJiNzZmOTJhJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.HeHQyCBflZsqpzOQMlsQjI-mMgGVR_SfRXmcRMu1ddU)
![企业微信截图_17186947094234](https://private-user-images.githubusercontent.com/154961771/340576284-4649e2de-bd62-43f1-bc94-5d54fd3bf188.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwMDUzNjAsIm5iZiI6MTcyMDAwNTA2MCwicGF0aCI6Ii8xNTQ5NjE3NzEvMzQwNTc2Mjg0LTQ2NDllMmRlLWJkNjItNDNmMS1iYzk0LTVkNTRmZDNiZjE4OC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwM1QxMTExMDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04N2Y4MmU4NWJiNGQwOTI0YTFjY2RmMzBhNzRlM2JiYjNlZWI2M2Y3YTIzNTkwZjliZTg0YmZjMThlOWE2MGI3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.Ye29Zw9tKCVJTWw056eeCINRV0X_ER75UoRgMJS4l1I)
from milvus.
And I continue inserting, the memory usage of querynode miltest-milvus-querynode-5559dc65d-knhn7
is increasing, when it reached the quota water level, will report the same error as before: pymilvus.exceptions.MilvusException: <MilvusException: (code=9, message=quota exceeded[reason=memory quota exceeded, please allocate more resources])>
![企业微信截图_17186966014880](https://private-user-images.githubusercontent.com/154961771/340585938-a0f67c93-a288-4665-a248-52b9ffe2ac11.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwMDUzNjAsIm5iZiI6MTcyMDAwNTA2MCwicGF0aCI6Ii8xNTQ5NjE3NzEvMzQwNTg1OTM4LWEwZjY3YzkzLWEyODgtNDY2NS1hMjQ4LTUyYjlmZmUyYWMxMS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwM1QxMTExMDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01MDAzYjcwMGNjZjRhNDg0MTM0ODNlNjQ1YWI5MWIyNTk3MTk3ZWRiODU1NDA1NGIwYWRiMzYzNDFmYTlhNjkwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.hmTKEPpNgh9Gz8iwXR6-5-HV-19o71LPyHl-mMcsLDo)
And the logs of this time are here:
milvus-log-new.tar.gz
from milvus.
you can try to increase shard number.
In our design, best practice is to have 100 - 200 million data for each shard.
On your case, you have roughly 1B data, we would recoomend to use 8 shards.
The reason you see one query nodes cpu very high is because it is the shard delegator. If index build can not catch up, data may accumulate. also since there is only delegator forward request, you may see it becomes the bottleneck on heavy search.
If you can use pprof to check why the memory is high it could be also helpful
from milvus.
@tianshihan818
The segment is still been loaded into the delegator querynode, which is not by design
could you please provided a birdwatcher backup and output for show segment-loaded-grpc
command?
https://github.com/milvus-io/birdwatcher/releases/tag/v1.0.4
from milvus.
Thank for your reply! @xiaofan-luan @congqixia
I use birdwatcher to get pprof heap files:
bw_pprof_heap.240619-045745.tar.gz
And the segment-loaded-grpc list results are in this file:
show_segment_loaded_grpc.log
from milvus.
@tianshihan818 thanks for the quick reply, could you please provided the backup
file as well?
from milvus.
@congqixia The etcd backup is here:
bw_etcd_ALL.240619-060804.bak.gz
from milvus.
DataNode painiked on konwn bugs, should be fixed by #33829
from milvus.
@xiaofan-luan Hello! Thanks for sharing the tech detail. I verified in the logs, the "delegator" is indeed on the abnormal querynode. Is this "delegator" responsiable for forwarding the loading or searching request and gathering the search results?
So under the concurrent insert case, once called collection.load()
, the querynode will load data later automatically, as my collection's consistency_level is set to be "Strong", if the index build process (use IVF_SQ8 index type in this test, ideally the memory usage of the index files will be around 30% of the raw data size) don't catch up the insert speed, the querynode will load the raw data instead of the index files into the memory, thus causing the high memory usage, right?
I am still not very clearly on these things:
- Why only the "delegator" querynode itself bear the load pressure in this case?
- Will it better for "load all data after insert all data"?
from milvus.
delegator hold all the growing segment data. when segment is sealed and index is built, all other node can take the work .
@XuanYang-cn upgrade to 2.4.5 could solve the problem? do we need special fix for this segment not found error?
from milvus.
Related Issues (20)
- [Bug]: failed to search/query delegator 13 for channel by-dev-rootcoord-dml_10_450726761228907207v0: fail to Search, QueryNode ID=13, reason=Timestamp lag too large HOT 2
- [Bug]: [benchmark][standalone] Milvus panic `segment not found` in concurrent DML scene HOT 3
- [Bug]: Search performance of float16 vector is much slower than float32 vector HOT 3
- [Bug]: It takes long time to build index for Float16Vector HOT 2
- [Feature]: Support Partition Key Isolation HOT 2
- [Bug]: False failure: "some node(s) haven't received input"
- [Bug]: <DescribeCollectionException: (code=1, message=can't find collection[database=default][collection=Vector_index_d08e3acd_d0c6_40c0_8047_2ade9fcc95ac_Node])> HOT 2
- [Bug]: `CASCachedValue` fails when param value is different after formatter or using default value
- [Enhancement]: when there is no index task, indexcoord is not required to submit the index task
- [Enhancement]: Support timestamp field HOT 2
- [Enhancement]: Remove set replica_number logic in sdk HOT 1
- [Enhancement]: Add binlog writer fallback encoding
- [Enhancement]: Support the RESTFul request hook
- [Bug]: standalone panic with error `invalid memory address or nil pointer dereference` during test after chaos HOT 1
- [Bug]: [benchmark][standalone] Milvus panic `invalid memory address or nil pointer dereference` in concurrent DML scene HOT 4
- [Bug]: [benchmark]Concurrent `insert, query, seach, delete, and flush`with datanode and querynode rebooting HOT 1
- [Bug]: one pod standalone restarted during test after chaos HOT 1
- [Enhancement]: alter database operations support new properties
- [Bug]: Datanode may panic when `SyncSegment` contains L0 segment
- [Bug]: restarting milvus proxy, other milvus proxy search failed, querynode reports that “delegator search failed to wait tsafe” HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from milvus.