Code Monkey home page Code Monkey logo

cluefs's Introduction

clueFS — a tool for tracing I/O activity at the file system level

Overview

cluefs is a lightweight utility to collect data on the I/O events induced by an application when interacting with a file system. It emits detailed, machine-parseable data on every file system-level operation.

The trace information emitted by this utility is meant to be analysed using tools not included in this package. You can find a collection of such tools in a separate project.

Motivation

The main goal of developing this utility is to observe and quantify the file I/O load induced by the software system being developed by the LSST data management team to process the data to be collected by the Large Synoptic Survey Telescope (LSST).

However, cluefs does not depend on LSST software system and can be used in several unrelated contexts. It may also be useful for other use cases, such as to get an overall understanding of how file systems work or to observe the (usually hidden and unexpected) operations performed when you mount a file system on your computer.

Although there are several tools for tracing system activity such as strace, DTrace, SystemTap or sysdig, for different reasons none of them was considered suitable for our particular use case.

How to use

Let's suppose you want to observe what file operations the command cat $HOME/data/hello.txt induces on the file system where the file hello.txt is actually located. You can use cluefs to expose the contents under the directory $HOME/data (the shadow directory) through a synthesized file system mounted on /tmp/trace. To mount the file system use the command:

$ cluefs --shadow=$HOME/data  --mount=/tmp/trace &

Once the file system is successfully mounted, when an application accesses a file or directory under /tmp/trace, cluefs emits an event for every call to the file system (e.g. access, open, read, close, etc.). For instance, the command:

$ cat /tmp/trace/hello.txt

will make cluefs emit the events below (one event per line):

...
2015-07-10T13:14:13.066799456Z,2015-07-10T13:14:13.066854171Z,54715,fabio,1000,fabio,1000,/bin/cat,28997,/home/fabio/data/hello.txt,file,open,O_RDONLY,0000,14,4096,58
2015-07-10T13:14:13.067274118Z,2015-07-10T13:14:13.067287085Z,12967,fabio,1000,fabio,1000,/bin/cat,28997,/home/fabio/data/hello.txt,file,read,14,0,4096,14,58
2015-07-10T13:14:13.067602625Z,2015-07-10T13:14:13.069215159Z,1612534,fabio,1000,fabio,1000,/bin/cat,28997,/home/fabio/data/hello.txt,file,flush,O_RDONLY,14,58
2015-07-10T13:14:13.069899802Z,2015-07-10T13:14:13.0699212Z,21398,root,0,root,0,,0,/home/fabio/data/hello.txt,file,release,58
...

To get detailed help on how to use this utility, including examples of usage, do:

$ cluefs

USAGE:
   cluefs --mount=<directory>  --shadow=<directory>  [--out=<file>]
           [(--csv | --json)]  [--ro]
   cluefs --help
   cluefs --version

Use 'cluefs --help' to get detailed information about options and
examples of usage.

When you are done collecting the trace information you want, you can unmount the file system created by cluefs with the command:

$ sudo umount /tmp/trace

Event formats

cluefs emits event records formatted in CSV or JSON. The format of each record is documented here.

How to install

Operating environment

This utility is tested on Scientific Linux v6 and v7, Ubuntu v14.04, CentOS v7 and MacOS X v10.9. It is possible cluefs also works on other systems or other versions of those operating systems where its dependencies are satisfied (see below).

Dependencies

To use cluefs you need Filesystem in Userspace (FUSE) installed on your system. To to that, please follow the installation instructions for your operating system according in the table below:

To install FUSE on ... ... follow the instructions below
Ubuntu $ sudo apt-get --yes install fuse
Scientific Linux, CentOS $ sudo yum install --assumeyes fuse
MacOS X install the latest stable version of FUSE for OS X

In addition, if you intend to build this software from sources you need both:

To install the Go tool chain please follow these detailed instructions. To install a C compiler please refer to the table below:

To install C compiler on ... ... follow the instructions below
Ubuntu $ sudo apt-get --yes install gcc
Scientific Linux, CentOS $ sudo yum install --assumeyes gcc
MacOS X download and install Xcode, including its command line tools

Installation

The recommended way to install this tool is to download one of the ready-to-use binary files available for your target execution platform. Those are self-contained executable files so you only need to download, unpack and you are ready to start using the tool.

Download binary releases here.

Alternatively, to build from sources do:

go get -u github.com/airnandez/cluefs

How this utility works

cluefs implements a synthesized file system which exposes all the files and directories existing on the underlying shadow file system. It intercepts each system call (e.g. open, read, etc.), emits a trace event about the call and forwards the operation to the appropriate file system for execution.cluefs collects the result of the operation and returns it to the calling application.

Although special attention has been given to make this utility as lightweight as possible, it is not intended to be permanently run in heavy-load I/O environments as there is an intrinsic non-zero performance penalty.

Known limitations

Currently, lock-related file system operations are not supported by cluefs. That is, it does not emit traces for those operations and makes them appear as unsupported by the file system. These are the operations induced by calling the fcntl(3) file system call using as second argument any of the values F_GETLK, F_SETLK or F_SETLKW.

You can contribute

Your contribution is more than welcome. There are several ways you can help:

  • Test this software on your particular environment and let us know how it works. If it does not work for you and you think it should, please provide all the relevant details when opening a new issue
  • If you find a bug, please report it by opening an issue
  • If you spot a defect either in this documentation or in the source code documentation we consider it a bug so please let us know
  • Providing feedback on how to improve this software by opening an issue

Roadmap

The items in our to-do list are documented separately.

Disclaimer

Although we have payed a lot attention to make this utility as reliable as possible, it is still experimental and surely contains undiscovered bugs that may adversely affect your data.

In particular, please note that cluefs does not protect you against any destructive operation you can normally perform on your data. Use it at your own risk.

Credits

Author

This software was developed and is maintained by Fabio Hernandez at IN2P3 / CNRS computing center (Lyon, France).

Acknowledgements

This work is based in other people's work, including:

License

Copyright 2015 Fabio Hernandez

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

cluefs's People

Contributors

airnandez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cluefs's Issues

Doesn't support Fsync?

When using VIM on CentOS 7.5, whenever I save a file, I see this error:
"/tmp/trace/nanofile.txt"
"/tmp/trace/nanofile.txt" E667: Fsync failed
WARNING: Original file may be lost or damaged
don't quit the editor until the file is successfully written!
Press ENTER or type command to continue

Does cluefs not support fsync?

Rename does not work

Renaming a file leaves the renamed file in a broken state that is only fixed by unmounting and mounting again:

$ echo foo > foo.txt
$ ls -l
-rw-rw-r--. 1 jakob jakob 4  7. Sep 20:58 foo.txt
$ mv foo.txt bar.txt
$ ls -l
----------. 0 root root 0  1. Jan 1970  bar.txt
$ rm bar.txt
rm: cannot remove ‘bar.txt’: Input/output error

Building with recent tools

  1. The uploaded binaries are not useful to me, I need linux-arm64
  2. The build instructions no longer with with current go

I've tried this

FROM golang:bookworm AS base
RUN go install github.com/airnandez/cluefs@latest

but that fails with

 > [build-install 1/1] RUN go install github.com/airnandez/cluefs@latest:                                                                     
#9 0.464 go: downloading github.com/airnandez/cluefs v0.0.0-20150923144044-af401baff04b                                                       
#9 0.569 go: finding module for package golang.org/x/net/context                                                                              
#9 0.569 go: finding module for package bazil.org/fuse/fs                                                                                     
#9 0.569 go: finding module for package bazil.org/fuse                                                                                        
#9 0.570 go: finding module for package bazil.org/fuse/syscallx
#9 0.651 go: downloading golang.org/x/net v0.22.0
#9 0.877 go: downloading bazil.org/fuse v0.0.0-20230120002735-62a210ff1fd5
#9 1.202 go: found bazil.org/fuse in bazil.org/fuse v0.0.0-20230120002735-62a210ff1fd5
#9 1.202 go: found bazil.org/fuse/fs in bazil.org/fuse v0.0.0-20230120002735-62a210ff1fd5
#9 1.202 go: found bazil.org/fuse/syscallx in bazil.org/fuse v0.0.0-20230120002735-62a210ff1fd5
#9 1.202 go: found golang.org/x/net/context in golang.org/x/net v0.22.0
#9 1.204 go: downloading golang.org/x/sys v0.18.0
#9 12.21 # github.com/airnandez/cluefs
#9 12.21 pkg/mod/github.com/airnandez/[email protected]/fs.go:48:8: undefined: fuse.VolumeName
#9 12.21 pkg/mod/github.com/airnandez/[email protected]/fs.go:49:8: undefined: fuse.LocalVolume
#9 12.21 pkg/mod/github.com/airnandez/[email protected]/fs.go:66:9: conn.Ready undefined (type *fuse.Conn has no field or method Ready)
#9 12.21 pkg/mod/github.com/airnandez/[email protected]/fs.go:67:16: conn.MountError undefined (type *fuse.Conn has no field or method MountError)
#9 12.21 pkg/mod/github.com/airnandez/[email protected]/node.go:82:22: req.Valid.Bkuptime undefined (type fuse.SetattrValid has no field or method Bkuptime)
#9 12.21 pkg/mod/github.com/airnandez/[email protected]/node.go:84:22: req.Valid.Chgtime undefined (type fuse.SetattrValid has no field or method Chgtime)
#9 12.21 pkg/mod/github.com/airnandez/[email protected]/node.go:86:22: req.Valid.Crtime undefined (type fuse.SetattrValid has no field or method Crtime)
#9 12.21 pkg/mod/github.com/airnandez/[email protected]/node.go:88:22: req.Valid.Flags undefined (type fuse.SetattrValid has no field or method Flags)
#9 12.21 pkg/mod/github.com/airnandez/[email protected]/stat_linux.go:46:3: unknown field Crtime in struct literal of type fuse.Attr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.