Code Monkey home page Code Monkey logo

aws-step-functions-rpa's Introduction

Getting started with RPA using AWS Step Functions and Amazon Textract

AWS Step Functions is a serverless function orchestrator and workflow automation tool. Amazon Textract is a fully managed machine learning service that automatically extracts text and data from scanned documents. Combining these services, you can create an RPA bot to automate the processing of documents.

See Getting started with RPA using AWS Step Functions and Amazon Textract blog post.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Prerequisites

Before you get started with deploying the solution, you must install the following prerequisites:

  1. Python

  2. AWS Command Line Interface (AWS CLI) -- for instructions, see Installing the AWS CLI

  3. AWS Serverless Application Model Command Line Interface (AWS SAM CLI) -- for instructions, see Installing the AWS SAM CLI

Deploying the solution

The solution will create the following three Amazon Simple Storage Service (S3) buckets with names suffixed by your AWS Account ID to prevent a global namespace collision of your S3 bucket names:

  • scanned-invoices-<YOUR AWS ACCOUNT ID>

  • invoice-analyses-<YOUR AWS ACCOUNT ID>

  • processed-invoices-<YOUR AWS ACCOUNT ID>

The below steps deploy the reference implementation in your AWS account. The solution deploys several components including an AWS Step Functions state machine, AWS Lambda functions, Amazon Simple Storage Service (S3) buckets, an Amazon DynamoDB table for payment information, and AWS Simple Notification Service (SNS) topics. You will need an Amazon S3 bucket to be used by AWS CloudFormation for deploying the solution. You will also need a stack name, e.g., Getting-Started-with-RPA, for deploying the solution. To deploy run the following commands from a terminal session:

  1. Download code from GitHub repo (https://github.com/aws-samples/aws-step-functions-rpa).

  2. Run the following command to build the artifacts locally on your workstation:

    sam build
    
  3. Run the following command to create a CloudFormation stack and deploy your resources:

    sam deploy --guided --capabilities CAPABILITY_NAMED_IAM
    

Monitor the progress and wait for the completion of the stack creation process from the AWS CloudFormation console before proceeding.

Testing the solution

To test the solution, upload the .PDF test invoices from the invoices folder of the downloaded solution to the S3 bucket named scanned-invoices-<Your AWS Account ID> created during deployment.

An AWS Step Functions state machine with the name <YOUR STACK NAME>-ProcessedScannedInvoiceWorkflow will execute the workflow. Amazon Textract document analyses will be stored in the S3 bucket named invoice-analyses-<YOUR AWS ACCOUNT ID>, and processed invoices will be stored in the S3 bucket named processed-invoices-<YOUR AWS ACCOUNT ID>. Processed payments will be found in the DynamoDB table named <YOUR STACK NAME>-invoices.

You can monitor the execution status of the workflows from the AWS Step Functions console.

Upon completion of the workflow executions, review the items added to DynamoDB from the Amazon DynamoDB console.

Cleanup

To avoid ongoing charges for resources you created, follow the below steps which will delete the stack of resources deployed:

  1. Empty the three S3 buckets created during deployment using the Amazon S3 Console:

    • scanned-invoices-<YOUR AWS ACCOUNT ID>
    • invoice-analyses-<YOUR AWS ACCOUNT ID>
    • processed-invoices-<YOUR AWS ACCOUNT ID>
  2. Delete the CloudFormation stack created during deployment using the AWS CloudFormation console.

aws-step-functions-rpa's People

Contributors

amazon-auto avatar tringj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

aws-step-functions-rpa's Issues

How get_document_analysis_status is used?

The status used on the Step Funtions flow is set by start_process_scanned_invoice_workflow.

The get_document_analysis_status is not referenced in the template.yaml, I think that not is used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.