Code Monkey home page Code Monkey logo

Comments (6)

PKUFlyingPig avatar PKUFlyingPig commented on August 22, 2024
  1. 只会做 prefill
  2. 您的理解是对的,vllm本身也是这么实现的
  3. 提升来自几个方面:(1)prefill 阶段的 kernel 算子性能;(2)rate 高了之后由于排队 vllm 无法及时处理 prefill 导致会将 queue 中的多个 prefill batch 在一起进行计算

from distserve.

YLSnowy avatar YLSnowy commented on August 22, 2024

(1)我看到您确实自己实现了算子,但我看到您在vllm中也调用了新的算子,我以为二者最终调用的kernel是一致的,如果不一致并且收益来源于kernel的变化,这样的比较是否不公平?
(2)rate高了之后,我发现在实验中您为了保证相对公平,distserve的rate始终为vllm的2倍(66B的背景下,4张卡 vs 8张卡),因此,rate提高之后,distserve的排队情况会更激烈,导致distserve会出现更多的prefill batch一起计算,但是prefill,vllm和distserve均为tp4,所以理论而言应该是distserve的TTFT latency高一些?
(3)所以我们复现的结果也的确是TTFT在不排队的时候性能和vllm一致,但是排队之后,distserve的TTFT latency明显高于vllm,导致性能下降

from distserve.

PKUFlyingPig avatar PKUFlyingPig commented on August 22, 2024

(1) 在论文实验中我们为了公平比较所以让 vllm 也调用了新的算子,此前我理解成了你直接跑的官方的 vllm
(2) 是的,如果只看 prefill 的话,distserve 是在用相同的计算资源承受 2倍的 rate,理论上 TTFT latency 就是会更高
(3) goodput 的计算是要求同时满足 TTFT 和 TPOT 的 SLO,(2)中vllm为了更好的 TTFT 会损失大量的 TPOT 性能。distserve 可以在两个 SLO 之间 tradeoff,如果 TTFT 的要求很高,distserve 会增加 P:D 的比例,例如 7 张卡 prefill,1张卡 decode。通过调整最优配比来实现更好的 TTFT latency。

from distserve.

YLSnowy avatar YLSnowy commented on August 22, 2024

那我理解的就是distserve在和vllm的prefill使用相同的配置的情况下,即tp、pp、卡数一致的情况下,由于rate的不同,会导致distserve的TTFT latency高一点
所以您论文中的结果是怎么做到存在性能提升的呢?您论文中的配置我看到论文里面写清楚了,在66B的请款下,vllm是tp4,distserve是4-1-2-2,所以通过前面的结论看,TTFT latency应该不会有提升?

from distserve.

PKUFlyingPig avatar PKUFlyingPig commented on August 22, 2024

论文中比较的 metric 是 goodput,即同时满足 TTFT 和 TPOT 的 SLO request 才能算作 effective throughput。

from distserve.

YLSnowy avatar YLSnowy commented on August 22, 2024

谢谢!非常感谢!

from distserve.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.