Code Monkey home page Code Monkey logo

paws_gasp_2019's Introduction

Using Amazon Web Services with R

This presentation shows how you can use Amazon Web Services (AWS) in R with the Paws package. The Paws package provides access to 150+ services on AWS.

One use case is to run large, complex analyses on dedicated servers. The example code here runs an R script on a large server which starts on command and stops when done, using AWS Batch.

The example is based on the Creating a Simple "Fetch & Run" AWS Batch Job blog post written by Amazon.

This presentation was prepared for the Government Advances in Statistical Programming (GASP!) conference, held on September 23, 2019 in Washington DC.

Summary

  1. Install Paws
  2. Make a Docker container (optional)
  3. Set up AWS Batch
  4. Run an R job on Batch!

1. Install Paws

Run install.packages("paws") to install from CRAN.

If you are using Linux, you'll need to install development packages for cURL, OpenSSL, and libxml2. In Debian/Ubuntu, install libcurl4-openssl-dev, libssl-dev, and libxml2-dev.

The example also assumes that you have AWS credentials saved in OS environment variables or in a shared credentials file. See this document for more info on authenticating with AWS.

2. Make a Docker container

Your batch job runs in a Docker container, which is a self-contained environment with an OS and other software, such as R. The example in this repo uses a pre-built Docker container with R installed, which is hosted on Docker Hub.

You can make your own Docker container using the Dockerfile in the docker folder. The Creating a Simple "Fetch & Run" AWS Batch Job blog post shows how to do that.

3. Set up AWS Batch

To use Batch, you must set up a compute environment (e.g. max CPUs), a job queue, and a job definition (e.g. what container to use). See the Batch user guide for more info about what each of these is for.

The script 1_aws_batch_setup.R will create these for you.

You can also follow the instructions in Creating a Simple "Fetch & Run" AWS Batch Job.

4. Run an R job on Batch

The example in 2_run_batch_job.R copies an R script to an S3 file storage bucket, then runs an AWS Batch job which fetches the R script and runs it.

Other resources

install.packages("paws") - install Paws from CRAN.

Paws home page - See online documentation.

GitHub - See getting started guide and examples; submit issues.

paws_gasp_2019's People

Contributors

davidkretch avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.