translate-a2i-human-workflow
An Amazon Translate and Amazon Augmented AI driven workflow to post edit machine translated documents.
Solution Architecture
Prerequisites
-
Download and install the latest version of Python for your OS from here. We shall be using Python 3.8 and above.
-
You will be needing AWS CLI version 2 as well. If you already have AWS CLI, please upgrade to a minimum version of 2.0.5 follwing the instructions on the link above.
Deployment Instructions
-
Download the contents of this repository on your local machine (say: project-directory)
-
The solution is implemented in python, so make sure you have a working python environment on your local machine.
-
Create a private workforce from the Amazon Sagemaker console.
-
Create a Custom Worker Template from Amazon Augmented AI console. Use the file /code/ui/translate_template.html as the custom template.
-
Create a Flow Definition from the Amazon Augmented AI console.
- Use s3://ta2i-demo/tms for S3 Bucket.
- For IAM role, choose 'Create a new role'.
- For Task Type, choose 'Custom'
- For Template, choose the custom worker template you created in step 4 above.
- For Worker Types, choose 'Private'.
- For Private Teams, choose the team you created in step 3 above.
- After creation, copy the Flow Definition ARN. We will use it later.
-
Create a S3 bucket for deployment (note: use the same region throughout the following steps, I have used us-east-1, you can replace it with the region of your choice. Refer to the region table for service availability.)
-
aws s3 mb s3://ta2i-cf-2020-us-east-1 --region us-east-1
-
-
Create another S3 bucket - this will be used by the solution to accept inputs and provide outputs.
-
aws s3 mb s3://ta2i-working-2020-us-east-1 --region us-east-1
-
-
For rest of the setup, we will use the CloudFormation template. Navigate to the /code/source sub directory. Package the contents and prepare deployment package using the following command
-
aws cloudformation package --template-file translate-a2i-setup.yaml --output-template-file translate-a2i-setup-output.yaml --s3-bucket ta2i-cf-2020-us-east-1 --region us-east-1
-
-
Deploy the SAM package using the command below. Replace the 'flowdefarn' placeholder in the below command with flow definition ARN identified in the steps above and run the command:
-
aws cloudformation deploy --template-file translate-a2i-setup-output.yaml --capabilities CAPABILITY_IAM --region us-east-1 --parameter-overrides FlowDefinitionARNParameter=flowdefarn S3BucketNameParameter=ta2i-working-2020-us-east-1 --stack-name Translate-A2I
-
-
If you want to make changes to the Lambda functions, you can do so on your local machine and redeploy them using the steps 5 through 6 above. The package and deploy commands take care of zipping up the new Lambda files (along with the dependencies) and uploading them to AWS for execution.
Execution Instructions
- Navigate to S3 console and upload a text file (english) to ta2i-working-2020-us-east-1/source folder.
- Navigate to Amazon SageMaker console. Under Labelling workforces -> Private, click on the URL under Labeling portal sign-in URL.
- Login with appropriate credentials.
- You should see a task assigned to you. Choose the task and click on "Start Working".
- You can 'post edit' the machine output on the right hand side of the UX.
- Click Submit when done.
- The solution will translate the text to spanish and create a txt file under ta2i-working-2020-us-east-1/machine_output. This is the direct ouptut from Amazon Translate.
- You can find the post edited file under ta2i-working-2020-us-east-1/post_edits folder.
- The translation memory (created by the human workflow) can be found under ta2i-working-2020-us-east-1/tms.
Further Reading:
License
This library is licensed under the MIT License.