Code Monkey home page Code Monkey logo

dataproc-java-dependencies's Introduction

This repository contains a simple demo Spark application that translates words using Google's Translation API and running on Cloud Dataproc.

  1. Record the project ID in an environment variable for later use:

    export PROJECT=$(gcloud info --format='value(config.project)')
    
  2. Enable the translate and dataproc APIs:

    gcloud services enable translate.googleapis.com dataproc.googleapis.com
    
  3. Compile the JAR (this may take a few minutes):

  • Option 1: with Maven
    cd maven
    mvn package
    
  • Option 2: with SBT
    cd sbt
    sbt assembly
    mv target/scala-2.11/translate-example-assembly-1.0.jar target/translate-example-1.0.jar
    
  1. Create a bucket:

    gsutil mb gs://$PROJECT-bucket
    
  2. Upload words.txt to the bucket:

    gsutil cp ../words.txt gs://$PROJECT-bucket
    

    The file words.txt contains the following:

    cat
    dog
    fish
    
  3. Create a Cloud Dataproc cluster:

    gcloud dataproc clusters create demo-cluster \
    --zone=us-central1-a \
    --scopes=cloud-platform \
    --image-version=1.3
    
  4. Submit the Spark job to translate the words to French:

    gcloud dataproc jobs submit spark \
    --cluster demo-cluster \
    --jar target/translate-example-1.0.jar \
    -- fr gs://$PROJECT-bucket words.txt translated-fr
    
  5. Verify that the words have been translated:

    gsutil cat gs://$PROJECT-bucket/translated-fr/part-*
    

    The output is:

    chat
    chien
    poisson
    

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.