andrie / rrd Goto Github PK

View Code? Open in Web Editor NEW

7.0 2.0 0.0 13.24 MB

R package for working with .rrd files

Home Page: https://andrie.github.io/rrd/

License: Other

R 53.35% C 46.47% Shell 0.18%

r r-package rstats rrd rrdtool

rrd's Introduction

rrd

The rrd package allows you to read data from an RRD Round Robin Database.

Installation

System requirements

In order to build the package from source you need librrd. Installing RRDtool from your package manager will usually also install the library.

Platform	Installation
Debian / Ubuntu	`apt-get install librrd-dev`
RHEL / CentOS	`yum install rrdtool-devel`
Fedora	`yum install rrdtool-devel`
Solaris / CSW	Install `rrdtool`
OSX	`brew install rrdtool`
Windows	Not available

Note: on OSX you may have to update xcode, using xcode-select --install.

Package installation

You can install the stable version of the package from CRAN:

install.packages("rrd")

And the development version from GitHub:

# install.packages("remotes")
remotes::install_github("andrie/rrd")

About RRD and RRDtool

The rrd package is a wrapper around RRDtool. Internally it uses librrd to import the binary data directly into R without exporting it to an intermediate format first.

For an introduction to RRD database, see https://oss.oetiker.ch/rrdtool/tut/rrd-beginners.en.html

Example

The package contains some example RRD files that originated in an instance of RStudio Connect. In this example, you analyze CPU data in the file cpu-0.rrd.

Load the package and assign the location of the cpu-0.rrd file to a variable:

library(rrd)
rrd_cpu_0 <- system.file("extdata/cpu-0.rrd", package = "rrd")

To describe the contents of an RRD file, use describe_rrd():

describe_rrd(rrd_cpu_0)
#> An RRD file with 10 RRA arrays and step size 60
#> [1] AVERAGE_60 (43200 rows)
#> [2] AVERAGE_300 (25920 rows)
#> [3] MIN_300 (25920 rows)
#> [4] MAX_300 (25920 rows)
#> [5] AVERAGE_3600 (8760 rows)
#> [6] MIN_3600 (8760 rows)
#> [7] MAX_3600 (8760 rows)
#> [8] AVERAGE_86400 (1825 rows)
#> [9] MIN_86400 (1825 rows)
#> [10] MAX_86400 (1825 rows)

To read an entire RRD file, i.e. all of the RRA archives, use read_rrd(). This returns a list of tibble objects:

cpu <- read_rrd(rrd_cpu_0)

str(cpu, max.level = 1)
#> List of 10
#>  $ AVERAGE60   : tibble [43,199 × 9] (S3: tbl_df/tbl/data.frame)
#>  $ AVERAGE300  : tibble [25,919 × 9] (S3: tbl_df/tbl/data.frame)
#>  $ MIN300      : tibble [25,919 × 9] (S3: tbl_df/tbl/data.frame)
#>  $ MAX300      : tibble [25,919 × 9] (S3: tbl_df/tbl/data.frame)
#>  $ AVERAGE3600 : tibble [8,759 × 9] (S3: tbl_df/tbl/data.frame)
#>  $ MIN3600     : tibble [8,759 × 9] (S3: tbl_df/tbl/data.frame)
#>  $ MAX3600     : tibble [8,759 × 9] (S3: tbl_df/tbl/data.frame)
#>  $ AVERAGE86400: tibble [1,824 × 9] (S3: tbl_df/tbl/data.frame)
#>  $ MIN86400    : tibble [1,824 × 9] (S3: tbl_df/tbl/data.frame)
#>  $ MAX86400    : tibble [1,824 × 9] (S3: tbl_df/tbl/data.frame)

Since the resulting object is a list of tibbles, you can easily work with individual data frames:

names(cpu)
#>  [1] "AVERAGE60"    "AVERAGE300"   "MIN300"       "MAX300"       "AVERAGE3600" 
#>  [6] "MIN3600"      "MAX3600"      "AVERAGE86400" "MIN86400"     "MAX86400"

cpu[[1]]
#> # A tibble: 43,199 × 9
#>    timestamp              user     sys  nice  idle  wait   irq softirq   stolen
#>    <dttm>                <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>    <dbl>
#>  1 2018-04-02 12:24:00 0.0104  0.00811     0 0.981     0     0       0 0.000137
#>  2 2018-04-02 12:25:00 0.0126  0.00630     0 0.979     0     0       0 0.00192 
#>  3 2018-04-02 12:26:00 0.0159  0.00808     0 0.976     0     0       0 0       
#>  4 2018-04-02 12:27:00 0.00853 0.00647     0 0.985     0     0       0 0       
#>  5 2018-04-02 12:28:00 0.0122  0.00999     0 0.978     0     0       0 0       
#>  6 2018-04-02 12:29:00 0.0106  0.00604     0 0.983     0     0       0 0       
#>  7 2018-04-02 12:30:00 0.0147  0.00427     0 0.981     0     0       0 0.000137
#>  8 2018-04-02 12:31:00 0.0193  0.00767     0 0.971     0     0       0 0.00191 
#>  9 2018-04-02 12:32:00 0.0300  0.0274      0 0.943     0     0       0 0       
#> 10 2018-04-02 12:33:00 0.0162  0.00617     0 0.978     0     0       0 0.000137
#> # … with 43,189 more rows
#> # ℹ Use `print(n = ...)` to see more rows

tail(cpu$AVERAGE60$sys)
#> [1] 0.0014390667 0.0020080000 0.0005689333 0.0000000000 0.0014390667
#> [6] 0.0005689333

To read a single RRA archive from an RRD file, use read_rra(). To use this function, you must specify several arguments that define the specific data to retrieve. This includes the consolidation function (e.g. “AVERAGE”) and time step (e.g. 60), the end time. You must also specifiy either the start time, or the number of steps, n_steps.

In this example, you extract the average for 1 minute periods (step = 60), for one entire day (n_steps = 24 * 60):

end_time <- as.POSIXct("2018-05-02") # timestamp with data in example
avg_60 <- read_rra(rrd_cpu_0, cf = "AVERAGE", step = 60, n_steps = 24 * 60,
                     end = end_time)

avg_60
#> # A tibble: 1,440 × 9
#>    timestamp              user     sys  nice  idle    wait   irq softirq  stolen
#>    <dttm>                <dbl>   <dbl> <dbl> <dbl>   <dbl> <dbl>   <dbl>   <dbl>
#>  1 2018-05-01 00:01:00 0.00458 2.01e-3     0 0.992 0           0       0 1.44e-3
#>  2 2018-05-01 00:02:00 0.00258 5.70e-4     0 0.996 0           0       0 5.70e-4
#>  3 2018-05-01 00:03:00 0.00633 1.44e-3     0 0.992 0           0       0 0      
#>  4 2018-05-01 00:04:00 0.00515 2.01e-3     0 0.991 0           0       0 1.44e-3
#>  5 2018-05-01 00:05:00 0.00402 5.69e-4     0 0.995 0           0       0 5.69e-4
#>  6 2018-05-01 00:06:00 0.00689 1.44e-3     0 0.992 0           0       0 0      
#>  7 2018-05-01 00:07:00 0.00371 2.01e-3     0 0.993 1.44e-3     0       0 0      
#>  8 2018-05-01 00:08:00 0.00488 2.01e-3     0 0.993 5.69e-4     0       0 0      
#>  9 2018-05-01 00:09:00 0.00748 5.68e-4     0 0.992 0           0       0 0      
#> 10 2018-05-01 00:10:00 0.00516 0           0 0.995 0           0       0 0      
#> # … with 1,430 more rows
#> # ℹ Use `print(n = ...)` to see more rows

And you can easily plot using your favourite packages:

library(ggplot2)
ggplot(avg_60, aes(x = timestamp, y = user)) + 
  geom_line() +
  stat_smooth(method = "loess", span = 0.125, se = FALSE) +
  ggtitle("CPU0 usage, data read from RRD file")
#> `geom_smooth()` using formula 'y ~ x'

More information

For more information on rrdtool and the rrd format please refer to the official rrdtool documentation and tutorials.

You can also read a more in-depth description of the package in an R Views blog post Reading and analyzing log files in the RRD database format.

rrd's People

Contributors

Stargazers

Watchers

rrd's Issues

Patch for corrupt data and segfaults

Patch provided by a user:

I ran into a problem which, to the extent I could determine, was due to malloc returning non-zeroed memory, leading to corruption of the rra_info structures, and ultimately either nonsense data or a segv in R. This seemed to happen only with some large rrd files (which I can't share), and not with the examples included in the package.

Ultimately, I'm not sure why this is the case, but it was observed with both MRO R 3.5.1 (Linux, librrd 1.7.2 ), and Core R 3.6.0 (Mac, librrd 1.7.0). It may well be something to do with the underlying rrd package, but the command line tools had no issue handling these files, so it seems something particular to the R interface. As to whether my patch ultimately "fixes" the problem, or just pushes it further into the weeds, I do not know, but it has been working reliably so far in my testing.

Valgrind still complains about a leak, but I have not attempted to chase that down. It seems inconsequential in the use cases I have need of, which are either notebooks, or standalone scripts, to process a single file. Snippet of valgrind output below, if you are curious.

There is one minor change in addition to the allocator and its use, which I don't think should be necessary (my C is rusty, and things have changed over the decades :-), but it made valgrind a bit happier, so I left that in.

Apologies for the manual patch, but I'm not on github, and you wouldn't be able to reach into our site to do a pull, anyhow. Hopefully it is useful. I made it against the latest master, as cloned from github.

commit 0aafa105c714a453c5e3a73b918bd7fa3196a78d
Author: Steven Rezsutek <[email protected]>
Date:   Wed Jul 3 11:16:13 2019 -0700

    add allocator for rra_info to avoid apparent non-zeroed memory being returned by malloc; minor tweak to appease valgrind (uninitialized variable)

diff --git a/DESCRIPTION b/DESCRIPTION
index 4cc4914..50db44d 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,8 +1,8 @@
 Package: rrd
 Type: Package
 Title: Import Data from a RRD (Round Robin Database) File
-Version: 0.2.1.9000
-Date: 2018-07-06
+Version: 0.2.1.9000-rvbd
+Date: 2019-07-03
 Authors@R: c(person("Andrie", "de Vries", email = "[email protected]", 
     role = c("cre", "cph")), person("Plamen", "Dimitrov", 
     email = "[email protected]", role = c("aut", "cph")))
diff --git a/src/rrd.c b/src/rrd.c
index 8294518..628f1a3 100644
--- a/src/rrd.c
+++ b/src/rrd.c
@@ -15,6 +15,23 @@ typedef struct _rra_info {
 } rra_info; 
 
 
+/* SMR: guard against unitinialized memory */
+rra_info* alloc_rra_info(void) {
+  rra_info* new_rra_info;
+  
+  new_rra_info = malloc(sizeof(rra_info));
+  
+  if (new_rra_info != NULL) {
+    new_rra_info->next = NULL;
+    new_rra_info->cf[0] = '\0';
+    new_rra_info->rows = 0L;
+    new_rra_info->pdp_per_row = 0L;
+    
+  }
+  return(new_rra_info);
+}
+
+
 void free_rra_info(rra_info* rraInfoOut) {
   while (rraInfoOut) {
     rra_info* tmp = rraInfoOut;
@@ -29,10 +46,11 @@ void print_rra_info(rra_info* rraInfoIn, int count, long unsigned int step) {
   long unsigned int pdp_per_row;
   int i;
   long unsigned int rra_step;
+  rra_info* rra_info_tmp;
   
-  Rprintf("A RRD file with %d RRA arrays and step size %ld\n", count, step);
+  Rprintf("An RRD file with %d RRA arrays and step size %ld\n", count, step);
   
-  rra_info* rra_info_tmp = rraInfoIn;
+  rra_info_tmp = rraInfoIn;
   i = 1;
   while (rra_info_tmp) {
     pdp_per_row = rra_info_tmp->pdp_per_row;
@@ -73,7 +91,7 @@ rra_info* get_rra_info(rrd_info_t* rrdInfoIn, int *rraCntOut, unsigned long *ste
   rra_info* rraInfoOut;
   rra_info* rra_info_tmp;
   
-  rraInfoOut = malloc(sizeof(rra_info)); 
+  rraInfoOut = alloc_rra_info(); 
   if (rraInfoOut == NULL) {
     Rprintf("error allocating memory\n");
     return NULL;
@@ -91,7 +109,7 @@ rra_info* get_rra_info(rrd_info_t* rrdInfoIn, int *rraCntOut, unsigned long *ste
     
     if (!strcmp(rrdInfoIn->key, cfKey)){
       if (rraCnt > 0) {
-        rra_info_tmp->next = malloc(sizeof(rra_info));
+        rra_info_tmp->next = alloc_rra_info();
         if (rra_info_tmp->next == NULL) {
           free_rra_info(rraInfoOut);
           return NULL;

Add pkgdown documentation

Support for MacOS?

I was playing around with Smokeping which continously monitors my home internet speed by pings on a raspberry pi. It stores data as an RRD database I’m not familiar with. I would like to use your package to import these data into R. However, I was not able to install rrd. I get the following error:

* installing *source* package ‘rrd’ ...
** package ‘rrd’ successfully unpacked and MD5 sums checked
** libs
clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/usr/local/include   -fPIC  -Wall -g -O2  -c registerDynamicSymbol.c -o registerDynamicSymbol.o
clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/usr/local/include   -fPIC  -Wall -g -O2  -c rrd.c -o rrd.o
rrd.c:1:10: fatal error: 'rrd.h' file not found
#include <rrd.h>
        ^~~~~~~
1 error generated.
make: *** [rrd.o] Error 1

It’s not stated in the vignette if I need other stuff installed for rrd to work. Do I need to install rrdlib and other stuff? How do I get them for MacOS?

* installing *source* package ‘rrd’ ...
** package ‘rrd’ successfully unpacked and MD5 sums checked
** using staged installation
Package librrd was not found in the pkg-config search path.
Perhaps you should add the directory containing `librrd.pc'
to the PKG_CONFIG_PATH environment variable
No package 'librrd' found

To fix this, @jeroen recommends:

If you want to fix that the official way is opening an issue here https://github.com/r-macos/recipes

Add support for Windows

@jeroen, as discussed at eRum 2018, I would appreciate your help to create the Windows DLLs so we can build this package for Windows.