Code Monkey home page Code Monkey logo

datafileutil's Introduction

DataFileUtil

Utility functions for operating on files and data objects in KBase data stores like Shock and the Workspace.

Current Status

Branch Build Coverage LGTM Alerts
master KBase SDK Tests codecov Total alerts

Details

DataFileUtil is the lowest level wrapper around the KBase data stores, including the workspace, shock, and handle services. It thus handles operations on files, including file transfer and compression, as well as operations on KBase data objects.

Assuming DataFileUtil is being called in the context of a set of local SDK modules calling each other, all file operations occur in the KBase job execution environment.

For any given type of data, the developer should use the appropriate existing SDK module for that data type regardless of whether the data is an object or a file. This appropriate data type module should also contain the logic for deciding how the data is stored, including which KIDL type specification is used as well as which data store.

Some examples of methods available in DataFileUtil:

  • Download from external URL
  • File un/compression (gzip, tar, zip)
  • Upload to shock
  • Download from shock
  • Download from staging area to scratch space in SDK Docker container
  • Upload/download from WS
  • Copy shock node and gain permissions, files and their metadata

All methods can be browsed in the DataFileUtil KIDL type spec: https://github.com/kbaseapps/DataFileUtil/blob/master/DataFileUtil.spec

KBase module development design pattern notes

Data that will be used frequently should be stored in a workspace object, if you can retrieve and parse it faster than the entire raw data file. Even better if there is a standard format for the data type, which can serve as the de facto data storage structure stored as a file shock and as a reference in an appropriate workspace object. If the data is too large you may need to split it into multiple workspace objects.

Note that there may be limits on the size of data, number of files, and time of data transfer as well as requirements for network speed.

Code examples

Translate a workspace name to id:

dfu = DataFileUtil(self.callback_url)
  if wsname:
      self.log('Translating workspace name to id')
      if not isinstance(wsname, six.string_types):
          raise ValueError('wsname must be a string')
      wsid = dfu.ws_name_to_id(wsname)
      self.log('translation done')

Download a file from shock:

params = {'shock_id': handle['id'],
                  'unpack': 'uncompress',
                  'file_path': os.path.join(self.scratch, handle['id'])
                  }
        # download
dfu = DataFileUtil(self.callback_url)
ret = dfu.shock_to_file(params)

Download files from external URL:

fwdpath = dfu.download_web_file(
                        {'file_url': fwd,
                        'download_type': download_type}).get(
                                        'copy_file_path')
revpath = dfu.download_web_file(
            {'file_url': rev,
            'download_type': download_type}).get(
                            'copy_file_path') if rev else None

Retrieval of the remote filename:

  • DFU will use the Content-Disposition header from the remote URL to set the filename, if present
  • Without the above header, we will fall back to using the URL path for the filename, including extension (but minus any query parameters or hash value).
  • If the filename is longer than 255 characters, then we will truncate it to that maximum length (without losing the extension).
  • If we are unable to retrieve a filename from a header or the URL, then the file will be assigned a UUID without any extension.

Documentation dependencies

SDK, SDK module, docker container, execution environment, narrative, code cell, shock, shock node, shock service, app chain, workspace, workspace object, workspace service, reference, handle service, scratch space.

datafileutil's People

Contributors

mrcreosote avatar tianhao-gu avatar jayrbolton avatar jsfillman avatar sychan avatar jamesjeffryes avatar scanon avatar ugswork avatar realmarcin avatar qzzhang avatar briehl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.