Code Monkey home page Code Monkey logo

wyntonquery's People

Contributors

henrikbengtsson avatar

Watchers

 avatar  avatar  avatar

wyntonquery's Issues

NOTES: Parsing SGE accounting file

Some benchmarking of parsing the complete accounting file (5,846,603-4 records; 1.9 GB => ~334 bytes/record):

$ ll .local/accounting
-rw-r--r--. 1 hb cbi 1953419761 Nov 20 15:22 .local/accounting
wc -l .local/accounting
5846603 .local/accounting

Read raw file

> library(wyntonquery)
> file.size(".local/accounting")
[1] 1953419761
> dt <- system.time(raw <- read_raw_sge_accounting(".local/accounting", skip = 4L))
|=================================================================| 100% 1862 MB
> dt
   user  system elapsed
 58.798   6.859  66.909
> file.size(".local/accounting") / nrow(raw)
[1] 334.1121

Save as RDS file (binary R data file):

> dts <- system.time(saveRDS(raw, ".local/accounting.rds"))
> dts
   user  system elapsed
101.262   0.382 102.990

Read from RDS file:

> file.size(".local/accounting.rds")  ## 303.8 MB => ~52 bytes/record
[1] 303811781
> dtr <- system.time(raw2 <- readRDS(".local/accounting.rds"))
> dtr
   user  system elapsed 
 34.704   1.477  36.201 
> file.size(".local/accounting.rds") / nrow(raw2)
[1] 51.96385

FACT: readr::read_tsv() is faster than vroom::vroom() withVROOM_THREADS=1

> Sys.setenv(VROOM_THREADS=1)
> library(wyntonquery); library(readr); library(vroom); source("R/sge_accounting.R")                                                     
> stats <- bench::mark(read_raw_sge_accounting(file), read_raw_sge_accounting_vroom(file), max_iterations=1L, check=FALSE)
> stats
# A tibble: 2 x 13
  expression                            min median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>                          <bch> <bch:>     <dbl> <bch:byt>    <dbl>
1 read_raw_sge_accounting(file)       2.62m  2.62m   0.00637    10.3GB  0.0127 
2 read_raw_sge_accounting_vroom(file) 3.45m  3.45m   0.00483     5.5GB  0.00483
# … with 7 more variables: n_itr <int>, n_gc <dbl>, total_time <bch:tm>,
#   result <list>, memory <list>, time <list>, gc <list>

Gather GPU info

Background

Information on GPUs can be obtained from qconf, e.g.

$ qconf -se msg-iogpu3
hostname              msg-iogpu3
load_scaling          NONE
complex_values        mem_free=128000M
load_values           arch=lx-amd64,num_proc=32,mem_total=128739.226562M, \
[...]
                      np_load_medium=0.156875,np_load_long=0.159688, \
                      gpu.ncuda=2,gpu.ndev=2,gpu.cuda.0.mem_free=758054912, \
                      gpu.cuda.0.procs=1,gpu.cuda.0.clock=2025, \
                      gpu.cuda.0.util=57,gpu.cuda.1.mem_free=758054912, \
                      gpu.cuda.1.procs=1,gpu.cuda.1.clock=2025, \
                      gpu.cuda.1.util=54,gpu.names=GeForce GTX 1080;GeForce \
                      GTX 1080;
processors            32
[...]

Issue

The qconf command works only on the login nodes (which btw is clarified on https://ucsf-hpc.github.io/wynton/scheduler/gpu.html). This prevents us from calling qconf from R and the wyntonquery package, which in turn makes it much more tedious to automate the gathering of GPU info.

EDIT 2019-04-12: Just checked, qconf -se msg-iogpu3 now works on development nodes.

Add make_file_index(..., index = <existing index>)

Instead of having to re-index a file from scratch everytime it's updated;

index <- make_file_index(pathname, skip = 4L)

it would be useful if one could start indexing based on what we already know, e.g.

old_index <- read_file_index("accounting.index_by_row")
index <- make_file_index(pathname, skip = 4L, index = old_index)

NOTES: Wynton SGE load usage over time

(pasting some old notes of mine here, so they don't get lost)

library(ggplot2)
library(dplyr)
library(tidyr)

## Read SGE job data
jobs <- rrd::read_rrd("data/jobs.rrd")
slots <- rrd::read_rrd("data/slots.rrd")

set <- "AVERAGE86400"
jobs <- jobs[[set]]
slots <- slots[[set]]

jobs <- left_join(slots, jobs, by = "timestamp")
jobs <- rename(jobs, date = timestamp)
jobs <- select(jobs, -total)
print(jobs)

load <- mutate(jobs, running = running/avail, queued=queued/avail)
load <- select(load, date, running, queued)

## Wide-to-tall
jobs <- gather(jobs, key = "status", value = "count", -date, factor_key = TRUE)
load <- gather(load, key = "status", value = "count", -date, factor_key = TRUE)


## Plot available, running, and queued slots
gg <- ggplot(jobs, aes(x = date, y=count, color = status))
gg <- gg + geom_line(size = 1)
gg <- gg + xlab("") + ylab("Number of jobs")
#gg <- gg + stat_smooth(method = "loess", formula = y ~ x, adjust=0.1, se = FALSE, size = 2)
gg <- gg + theme(text = element_text(size = 20))
ggsave("WyntonHPC_20210505-jobs-last-year.png", gg, width = 10, height = 6)


## Plot load
gg <- ggplot(load, aes(x = date, y=count, color = status))
gg <- gg + geom_line(size = 1)
gg <- gg + xlab("") + ylab("Load (fraction of slots occupied)")
gg <- gg + scale_y_continuous(labels = scales::percent)
gg <- gg + theme(text = element_text(size = 20))
ggsave("WyntonHPC_20210505-load-last-year.png", gg, width = 10, height = 6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.