Code Monkey home page Code Monkey logo

Comments (7)

darkyzhou avatar darkyzhou commented on May 29, 2024
  1. 可以提供三个问题下的 Submission 任务描述吗
  2. 对于第一和第二个问题,seele 提供了 OpenTelemetry 的 tracing 数据导出,可否麻烦你搭建一个 Grafana 和 Grafana Tempo 收集 seele 在执行评测任务过程中产生的 tracing 数据,这样可以查看每个提交(特别是 TLE 提交)的具体执行和耗时情况

from seele.

haswelliris avatar haswelliris commented on May 29, 2024

关于tracing和metrics,配置项collector_url只有一个,url填到tempo之后似乎查不到metrics信息了
image
在用tempo收集trace的时候,怎么才能同时拿到https://github.com/darkyzhou/seele/blob/main/docs/public/grafana.png 这样的metrics呢?

from seele.

haswelliris avatar haswelliris commented on May 29, 2024

三个问题的示例,这里用plain提交了,原始代码为:

#include<iostream>
using namespace std;
int main() {
    cout<<"hello,world2"<<endl;
    return 0;
}

第一个示例

上传文件,编译,运行 共三个子任务

steps:
  prepare:
    action: "seele/add-file@1"
    files:
      - path: "solution.cpp"
        base64: "I2luY2x1ZGU8aW9zdHJlYW0+CnVzaW5nIG5hbWVzcGFjZSBzdGQ7CmludCBtYWluKCkgewogICAgaW50IGk9MDsKICAgIHdoaWxlKGk8MTAwMDAwMDAwKSB7CiAgICAgICAgaSsrOwogICAgfQogICAgY291dDw8ImhlbGxvLHdvcmxkMiI8PGVuZGw7CiAgICByZXR1cm4gMDsKfQ"
 
  compile:
    action: "seele/run-judge/compile@1"
    image: "gcc:11-bullseye"
    command: "g++ solution.cpp -o solution"
    container_uid: 65534
    container_gid: 65534
    sources: ["solution.cpp",]
    saves: ["solution.cpp","solution",]
    paths: []
    fd:
      stdout: "compile_stdout.txt"
      stderr: "compile_stderr.txt"
    report:
      embeds:
        - path: "compile_stdout.txt"
          field: compile_stdout
          truncate_kib: 16384
        - path: "compile_stderr.txt"
          field: compile_stderr
          truncate_kib: 16384
    cache:
      enabled: true
      extra: ["cache1",]

  run:
    action: "seele/run-judge/run@1"
    image: "gcc:11-bullseye"
    command: "./solution"
    container_uid: 65534
    container_gid: 65534
    paths: []
    files: ["solution.cpp","solution",]
    fd:
      stdout: "cis_stdout.txt"
      stderr: "cis_stderr.txt"
    report:
      embeds:
        - path: "cis_stdout.txt"
          field: cis_stdout
          truncate_kib: 4096
        - path: "cis_stderr.txt"
          field: cis_stderr
          truncate_kib: 4096
    limits:
      time_ms: 10000
      memory_kib: 262144
      pids_count: 32
      fsize_kib: 65536

第二、三个示例

上传文件,编译,"ls -la"编译结果-运行 共4个子任务

steps:
  prepare:
    action: "seele/add-file@1"
    files:
      - path: "solution.cpp"
        base64: "I2luY2x1ZGU8aW9zdHJlYW0+CnVzaW5nIG5hbWVzcGFjZSBzdGQ7CmludCBtYWluKCkgewogICAgaW50IGk9MDsKICAgIHdoaWxlKGk8MTAwMDAwMDAwKSB7CiAgICAgICAgaSsrOwogICAgfQogICAgY291dDw8ImhlbGxvLHdvcmxkMiI8PGVuZGw7CiAgICByZXR1cm4gMDsKfQ"
 
  compile:
    action: "seele/run-judge/compile@1"
    image: "gcc:11-bullseye"
    command: "g++ solution.cpp -o solution"
    container_uid: 65534
    container_gid: 65534
    sources: ["solution.cpp",]
    saves: ["solution.cpp","solution",]
    paths: []
    fd:
      stdout: "compile_stdout.txt"
      stderr: "compile_stderr.txt"
    report:
      embeds:
        - path: "compile_stdout.txt"
          field: compile_stdout
          truncate_kib: 16384
        - path: "compile_stderr.txt"
          field: compile_stderr
          truncate_kib: 16384
    cache:
      enabled: true
      extra: ["cache1",]

  compile2:
    action: "seele/run-judge/compile@1"
    image: "gcc:11-bullseye"
    command: "ls -la"
    container_uid: 65534
    container_gid: 65534
    sources: ["solution.cpp","solution",]
    saves: ["solution.cpp","solution",]
    paths: []
    fd:
      stdout: "compile_stdout.txt"
      stderr: "compile_stderr.txt"
    report:
      embeds:
        - path: "compile_stdout.txt"
          field: compile_stdout
          truncate_kib: 16384
        - path: "compile_stderr.txt"
          field: compile_stderr
          truncate_kib: 16384
    cache:
      enabled: true
      extra: ["cache2",]

  run:
    action: "seele/run-judge/run@1"
    image: "gcc:11-bullseye"
    command: "./solution"
    container_uid: 65534
    container_gid: 65534
    paths: []
    files: ["solution.cpp","solution",]
    fd:
      stdout: "cis_stdout.txt"
      stderr: "cis_stderr.txt"
    report:
      embeds:
        - path: "cis_stdout.txt"
          field: cis_stdout
          truncate_kib: 4096
        - path: "cis_stderr.txt"
          field: cis_stderr
          truncate_kib: 4096
    limits:
      time_ms: 10000
      memory_kib: 262144
      pids_count: 32
      fsize_kib: 65536

from seele.

haswelliris avatar haswelliris commented on May 29, 2024

另外根据tracing结果来看,耗时长的(已经去除本身死循环那些代码),主要长在event: {
"value": "Bound the runj container to cpu 338",
"key": "message"
}
image
不过不知道这里算上了等待时间吗?如果并发太大,等待时间被算进去的话倒是合理

from seele.

darkyzhou avatar darkyzhou commented on May 29, 2024
  1. Tempo 是用来查询 Tracing 数据的,Metrics 需要使用其它方案,例如使用 opentelemetry-collector 收集 seele 的 Metrics 数据导出至 Prometheus,再使用 Grafana 查询。当然也可以使用 Grafana 自家的 Metrics 方案。参见: https://seele.darkyzhou.net/configurations/file#telemetry-%E9%85%8D%E7%BD%AE
  2. 第三个问题可能和 cache 有关,你可以尝试关掉每个步骤的缓存再尝试一下
  3. 从 Tracing 结果来看,4m19s 发生 Bound the runj container to cpu 338 说明直到此时,当前提交才排队排到能用的 CPU,正式开始执行 runj4m43s 发生 Run container completed,说明 runj 花了二十秒才将容器执行完毕,这里看上去就是 runj 花了太长时间执行容器导致了问题。
  • 印象里,高并发下 runj 底层的 runc 确实可能遇到性能问题,或许与 opencontainers/runc#3181 有关。
  • 对于 Linux 内核,高并发地创建容器所需的各种命名空间(尤其是用户命名空间)会带来较大的性能负担。或许你可以尝试转而在这台机器上运行 2 个或 4 个虚拟机实例来分辨运行 seele(为了公平性考虑,需要对每台虚拟机使用的 CPU 进行绑核操作)。
  • 我近期有空也会看一下 runj 在高并发下的表现

你可以尝试一下 seele 0.3.0 版本,这个版本中 runj 升级到了最新的 1.1.12

from seele.

haswelliris avatar haswelliris commented on May 29, 2024

非常感谢建议。
我在试用k8s先拉起多个kata容器,然后在kata容器里面跑seele。不过使用kata(或者虚拟机),导致系统要开虚拟化,要弹性扩容得其他机器都开虚拟化,对我们现阶段架构不是很友好。但隔离kernel的方案目前除了虚拟机和kata container我想不到什么其他更好的了。
更进一步讨论:如果使用kata的话,是不是可以把底层runj改成用kata了,相对于设置namespace带来的问题,kata只有启动阶段固定的秒级别的开销,其他时候安全性也更高(攻击要同时击穿低权限限制和虚拟化限制),启动评测线程也变成调用k8s api启动kata container,这样seele自身要求的权限就变低了。不过问题是要求整个k8s支持kata架构,而云上大概率会面临嵌套虚拟化的问题,裸金属又会面临要不要开虚拟化的问题

from seele.

darkyzhou avatar darkyzhou commented on May 29, 2024

其实 Seele 能够支持在 Kubernetes 上使用 runj 创建容器,只需要将文档中提到的 docker 参数修改为对应的 Pod 配置项,同时配置 Seele 所在 Pod 的 CPU 和内存资源分配即可。

除了云原生的路径,其实还有一种比较传统的办法,就是利用虚拟化平台管理多台虚拟机,利用 Ansible 在每台虚拟机上运行 Seele

from seele.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.