Comments (10)
@sachalau - Glad you were able to resolve this on your own.
Note, the README instructions are for deploying the launch assets into your own privately hosted buckets. This is for customizing the solution end-to-end - e.g. if you want to add additional resources to the "Zone" stack beyond what is provided by the publicly available solution.
For the purposes of customizing any of the workflow execution resources - e.g. batch compute environments, launch templates, etc. - you can do the following:
- launch the solution from its landing page
- clone the CodeCommit repo created by the "Pipe" stack - this has all the source code for all the underlying AWS resources for running workflows
- make edits, commit, and push the changes up to the repo
The latter will trigger a CodePipeline pipeline to re-deploy the workflow execution resources with any updates you've made.
from genomics-secondary-analysis-using-aws-step-functions-and-aws-batch.
from genomics-secondary-analysis-using-aws-step-functions-and-aws-batch.
Actually I would like to ask an additional question regarding the architecture of this solution and if you had any advice regarding implementation.
During the setup of the Stack, I'm downloading some specific references to the S3 zone bucket. Those references will then be used for the Workflow defined for each of the samples I would like to process. At the moment I'm downloading from NCBI all the references that I need in the setup/setup.sh file, with additional python script for instance.
However before these file can be used, they need additional transformation using some of the tools for which Docker images are constructed during the build. It could be really simply indexing of the fasta references with samtools or bwa, or something more complex like building a kraken database using multiple references.
At the moment, after the CloudFormation is complete, I can manually submit a Batch Job using the job definition that I want and write the outputs into the S3 result bucket. Then I will be able to use as inputs these files for all my Workflows. However I think these submissions could be automated during the CloudFormation.
My idea was to submit Batch jobs directly at the end of the build using awscli. However, to do so I need to access to the name of the S3 bucket that was just created, and I'm not sure I can do that in the setup/setup.sh file. Another possibility would be to define separate Workflows for each of these tasks that have to be run only once initially and then trigger them only once. However those workflows would only include one single step which would be run once so I'm not sure this solution makes actually sense. Do you have any opinion on that ?
To access the S3 bucket name, could I do something like that in the setup ?
S3_RESULT_BUCKET=$(aws cloudformation describe-stacks --stack-name $STACKNAME_CODE --query 'Stacks[].Outputs[?OutputKey==`JobResultsBucket`].OutputValue' --output text)
And then feed the S3_RESULT_BUCKET variable to
aws batch jobmit-job ...
Do you think that is the proper way to proceed ? Do you think it would be more proper to put all one timer batch jobs in a different file then setup.sh (like test.sh or something else?)
Thanks a lot !
from genomics-secondary-analysis-using-aws-step-functions-and-aws-batch.
from genomics-secondary-analysis-using-aws-step-functions-and-aws-batch.
Thanks for getting back to me @rulaszek. It's helped me understand better how to use the solution and I'll try to stick to the intended use.
I've moved my instruction to buildspec.yml as you advised. For the one timer jobs I think I'll define workflows anyway, do you have an advice on how I could define these job where it would make the most sense ?
I have a couple of questions.
-
What is the preferred way of designing new workflows ? At the moment I'm doing so directly in the webinterface of StepFunctions but I'm not sure that's the proper way. However formatting yaml and json at the same time as in workflow-variantcalling-simple.cfn.yaml is not ideal...
-
Why aren't you resolving JOB_OUTPUTS in stage_out as JOB_INPUTS in stage_in with
envsubst
in entrypoint.aws.sh for building docker images? I just had a problem with this and just realized it was because my JOB_OUTPUTS were not resolved with the environment variables
Thanks a lot for the call proposal. I'm still wrapping by head around things so I'm not sure now is the best time for a call, maybe wait until I'm more confortable with every part.
from genomics-secondary-analysis-using-aws-step-functions-and-aws-batch.
@sachalau Developing the workflow in the Step Functions console or using the new Visual Studio code plugin is probably ideal. After the workflow is working, you can create a new state machine resource in the workflow-variantcalling-simple.cfn.yaml and paste that workflow in. Also, make sure to substitute in the variables, i.e., ${SOMETHING}, but ignoring ${!SOMETHING}. Finally, commit and push your changes.
The second issue sound like a bug. Let me look into this more and get back to you.
https://aws.amazon.com/blogs/compute/aws-step-functions-support-in-visual-studio-code/
from genomics-secondary-analysis-using-aws-step-functions-and-aws-batch.
For the second issue, you also need to resolve JOB_OUTPUT_PREFIX
with envsubst
as it is done for JOB_INPUT_PREFIX
I think.
Thanks for the advice on the step machines.
from genomics-secondary-analysis-using-aws-step-functions-and-aws-batch.
@sachalau - can you provide more details on how you are defining your job outputs?
You can add envsubst
evaluation as needed to the the entrypoint script, push the code, and the containers will rebuild.
from genomics-secondary-analysis-using-aws-step-functions-and-aws-batch.
Also, I've updated the README to clarify the customized deployment instructions.
from genomics-secondary-analysis-using-aws-step-functions-and-aws-batch.
from genomics-secondary-analysis-using-aws-step-functions-and-aws-batch.
Related Issues (12)
- Issues when executing build-s3-dist.sh HOT 6
- Multiple Input files have the same name.
- 403 downloading s3 assets HOT 1
- lacking lambda permissions for Cloudformation Zone role?
- Error while downloading sample files HOT 2
- [Errno 28] No Space left on device HOT 2
- Deployment fails at setup in the stack formation HOT 2
- Error building when modifying the solution HOT 6
- Is there any way to increase the VCPU and Memory when starting a state machine? HOT 7
- Where in the code can I increase the size of the persistant EC2 volume? HOT 2
- No Space left on device HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from genomics-secondary-analysis-using-aws-step-functions-and-aws-batch.