cunningham-lab / neurocaas Goto Github PK

View Code? Open in Web Editor NEW

31.0 31.0 26.0 154.53 MB

IaC codebase for the NeuroCAAS Platform

Home Page: http://www.neurocaas.org

License: GNU General Public License v3.0

Python 98.05% Shell 0.92% CSS 0.79% TeX 0.02% JavaScript 0.19% Rich Text Format 0.03%

neurocaas's People

Contributors

Stargazers

Watchers

neurocaas's Issues

Pull Request Based Workflow

We prototyped github actions in the neurocaas_contrib repository when we switched over our CI from Travis to Github actions. It would be feasible for us to design custom actions that are triggered on pull requests, providing us a concrete path forwards to do things like spin up localstack and check that the proposed workflow is functional (#31), and then to build the resulting stack in our neurocaas account.

For the future, we should make it so that the ONLY way to build on our neurocaas account is through pull requests via github actions, easing the burden of managing different stack versions generating different resources under the same name on cfn.

Integrate all neurocaas usage to tag based workflow.

Now, developer usage is switched to a tag based workflow. Here is the current layout:
"Soft cap" protections:

test-ec2-killer
- Kills all ec2 instances that are not exempt after 180 minutes of activity.
ec2-rogue-killer
- Kills all ec2 instances that are not on ssm, or explicitly provided with a timeout after 15 minutes of activity.

“Hard cap functions” on total usage.

neurocaas-guardduty-develop
- Stops all ec2 instances that have the developer security group after 2880 minutes of activity (2 days)
neurocaas-guardduty-deploy
- Stops all ec2 instances that have the deploy security group after 120 minutes of activity.

These functions provide a nice layer of security against unexpected usage in all cases except a ssm job that continues unnecessarily.

Here are the next steps:

Test these permissions with John Luoyu, then move over all dev permissions to this model and announce.
Build tags into the deployment lambda pipeline.
- Use the timeout tag in ssm timeouts as well, for redundancy
- Build the corresponding monitoring lambda function guard and add as an additional soft cap to those above.
- Reference ownership tags in the lambda startup script to find currently active instances when calculating budget.
- Include the current request load of instances with given timeout when calculating budget.
- Generate messages back to the user if we have to kill a deployment instance with a clear description of what went wrong.
- Use pr workflow to vet instances and bring them into tag based workflow.
Better messaging: separate out topic arns for different use cases.

Integrate neurocaas_remote repo to handle certificate generation.

Neurocaas_remote repo should be an independent python package for developers, and integrate with proper certificate generation routines.

Additional CLI tools

We need additional CLI tools to:

clean up stray images (optionally also by using the squash flag)
list the io-dir files (or provide easy access to them)
run test_container, not just run_analysis

Add CORS configuration to developer buckets by default

CORS configuration settings are required to make buckets function through the neurocaas website. We need to add these manually right now, which is a bottleneck.

`fulldeploy.sh` failed when a test user was added

my UXData in stack_config_template.json looks like this:

  "UXData": {
        "Affiliates": [
            {
                "AffiliateName": "debuggers",
                "UserNames": [
                    "tacosyne"
                ],
                "UserInput": true,
                "ContactEmail": "NOTE: KEEP THIS AFFILIATE TO ENABLE EASY TESTING"
            }
        ]
    }

(added tacosyne user as a test user)
but then when i ran fulldeploymen.sh (step 6 in the documentation), it gives the error:

$ bash ./iac_utils/fulldeploy.sh "bardensr"
webdev
webdev mode
webdev mode
{'AffiliateName': 'debuggers', 'UserNames': ['tacosyne'], 'UserInput': True, 'ContactEmail': 'NOTE: KEEP THIS AFFILIATE TO ENABLE EASY TESTING'} affiliatedict
Error adding User tacosyne, please evaluate An error occurred (AccessDenied) when calling the GetUser operation: User: arn:aws:iam::739988523141:user/cunninghamlab/tacosyneus-east-1 is not authorized to perform: iam:GetUser on resource: user tacosyneus-east-1
Traceback (most recent call last):
  File "dev_builder.py", line 1200, in <module>
    temp =WebDevTemplate(filename)
  File "dev_builder.py", line 597, in __init__
    self.add_affiliate(affdict)
  File "dev_builder.py", line 134, in add_affiliate
    self.add_affiliate_usernet(affiliatedict)
  File "dev_builder.py", line 192, in add_affiliate_usernet
    users,usernames  = self.attach_users(affiliatedict)
  File "dev_builder.py", line 156, in attach_users
    self.iam_resource.User(user_local).create_date  # this line gives an error.
  File "/Users/Shuonan/opt/anaconda3/envs/neurocaas/lib/python3.6/site-packages/boto3/resources/factory.py", line 339, in property_loader
    self.load()
  File "/Users/Shuonan/opt/anaconda3/envs/neurocaas/lib/python3.6/site-packages/boto3/resources/factory.py", line 505, in do_action
    response = action(self, *args, **kwargs)
  File "/Users/Shuonan/opt/anaconda3/envs/neurocaas/lib/python3.6/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/Users/Shuonan/opt/anaconda3/envs/neurocaas/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/Shuonan/opt/anaconda3/envs/neurocaas/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetUser operation: User: arn:aws:iam::739988523141:user/cunninghamlab/tacosyneus-east-1 is not authorized to perform: iam:GetUser on resource: user tacosyneus-east-1

any suggestions? thank you!

Figure 2 Revision

Revise Figure 2 to include prominently displayed developer workflow. This issue overlaps significantly with issue #20.

User client for file transfer

Develop a locally hosted user client for NeuroCAAS file transfer in large batches. Talk to Joao about adopting his GUI for these purposes.

Cost Monitoring: Individualized budgets

Right now, cost monitoring is dictated by the blueprints of analysis algorithms, as opposed to being declared per user. We should refactor this code at some point to centralize budgets to be per user, not per analysis.

Reformat Blueprint

Currently the blueprint contains a lot of AWS specific materials that are good for us to be able to manipulate, but not useful to a developer (SSM role, security group, etc.) Hide these in a configuration file, and instead include:

The git commit of the lambda code you are using
The docker image id that specifies the analysis version you are using.

Include additional measures to parse these parameters (see #31 for relevant details on 2).

Transient AWS Communication Error

occasionally running the job on NeuroCAAS website gives the following error:

AWS ERROR. Transient AWS Communication Error. Please Try Again

The full message in certificate.txt is following:

REQUEST START TIME: 2020-08-17 16:14:14.26 (GMT)	 [+0:00:00.00]
ANALYSIS VERSION ID: 72def07263245ec7c47ba6c157230a7715f5c225	 [+0:00:00.12]
JOB ID: 1597680851	 [+0:00:00.17]

 
[Job Manager] Detected new job: starting up.	 [+0:00:00.28]
        [Internal (init)] Initializing job manager.	 [+0:00:00.33]
        [Internal (init)] Using default instance type p2.xlarge from config file.	 [+0:00:00.39]
        [Internal (init)] Analysis request with dataset(s): ['bardensr/inputs/20xdata_300_300.hdf5'], config file bardensr/configs/config_test.yaml	 [+0:00:00.45]
[Job Manager] STEP 1/4 (Initialization): DONE	 [+0:00:00.54]
        [Internal (get_costmonitoring)] Incurred cost so far: $2.5527633333333335. Remaining budget: $297.44723666666664	 [+0:00:04.55]
[Job Manager] STEP 2/4 (Validation): DONE	 [+0:00:04.63]
        [Internal (parse_config)] parameter __duration__ not given, proceeding with standard compute launch.	 [+0:00:04.79]
        [Internal (parse_config)] parameter __dataset_size__ is not given, proceeding with standard storage.	 [+0:00:04.85]
[Job Manager] STEP 3/4 (Environment Setup): DONE	 [+0:00:05.13]
        [Utils] Acquiring new p2.xlarge instances from ami-003157f05ec0f37eb ...	 [+0:00:05.58]
        [Utils] save not available (duration not given or greater than 6 hours). Launching standard instance.	 [+0:00:05.58]
        [Utils] New instance ec2.Instance(id='i-02a211e6f418b3ea6') created!	 [+0:00:07.31]
        [Internal (put_instance_monitor_rule)] Setting up monitoring on all instances...	 [+0:00:07.86]
        [Utils] Instance i-02a211e6f418b3ea6 State: pending...	 [+0:00:08.21]
        [Utils] Starting Instance...	 [+0:00:08.21]
        [Utils] Instance started!	 [+0:00:23.93]
        [Utils] Initializing instances. This could take a moment...	 [+0:00:24.16]
        [Utils] All Instances Initialized.	 [+0:01:24.27]
        [Internal (start_instance)] Created 1 immutable analysis environments.	 [+0:01:24.46]
[Job Manager] STEP 4/4 (Initialize Processing): AWS ERROR. Transient AWS Communication Error. Please Try Again
[Job Manager] Shutting down job.	 [+0:01:24.83]

Integrate docker prototype with AWS

Our docker prototype needs to be integrated with AWS in three ways:
1) pull data from AWS (or localstack) s3 in an automatically generated script (#21) to close the loop back to the user.
2) run locally with an AWS EC2 instance as remote host (take a look here: https://hashinteractive.com/blog/docker-machine-remote-deployment-management/)
3) pull in the aws instance starting tools and make them work well with the rest of the tools you have.
This second step could also be super useful for streaming, potentially.

Job status always 'InProgress'

When I submit jobs using devami.submit_job_log(), it seems it ran successfully but when checked devami.job_status() it always says its 'InProgress'.

from htop it seems nothing is running on the instance
s3://"$bucketname"/"$groupdir"/"$resultdir"/logs/DATASET_NAME:"$dataname"_STATUS.txt says "status": "SUCCESS" (see below)
output of the job devami.job_output():

debug_direct/57b47d5b-87e1-44e7-8ff6-e246c901b0a0/i-01998e55d084227b0/awsrunShellScript/0.awsrunShellScript/stderr not found. may not be updated yet.

the full information from s3://"$bucketname"/"$groupdir"/"$resultdir"/logs/DATASET_NAME:"$dataname"_STATUS.txt:

{
  "status": "SUCCESS",
  "cpu_usage": [
    "0"
  ],
  "stdout": {
    "0": "/home/ubuntu/neurocaas_remote/bardensr/run_bardensr.sh workflow dirname\n",
    "1": "this is the remote directory. \n",
    "2": "aws s3 cp s3://bardensr/debuggers/inputs/human_cs_57.p input\n",
    "3": "aws s3 cp s3://bardensr/debuggers/configs/config.yaml config\n",
    "4": "Completed 256.0 KiB/45.8 MiB (2.6 MiB/s) with 1 file(s) remaining\rCompleted 512.0 KiB/45.8 MiB (5.0 MiB/s) with 1 file(s) remaining\rCompleted 768.0 KiB/45.8 MiB (7.3 MiB/s) with 1 file(s) remaining\rCompleted 1.0 MiB/45.8 MiB (9.5 MiB/s) with 1 file(s) remaining  \rCompleted 1.2 MiB/45.8 MiB (11.7 MiB/s) with 1 file(s) remaining \rCompleted 1.5 MiB/45.8 MiB (13.8 MiB/s) with 1 file(s) remaining \rCompleted 1.8 MiB/45.8 MiB (15.9 MiB/s) with 1 file(s) remaining \rCompleted 2.0 MiB/45.8 MiB (17.9 MiB/s) with 1 file(s) remaining \rCompleted 2.2 MiB/45.8 MiB (19.9 MiB/s) with 1 file(s) remaining \rCompleted 2.5 MiB/45.8 MiB (21.9 MiB/s) with 1 file(s) remaining \rCompleted 2.8 MiB/45.8 MiB (23.8 MiB/s) with 1 file(s) remaining \rCompleted 3.0 MiB/45.8 MiB (25.7 MiB/s) with 1 file(s) remaining \rCompleted 3.2 MiB/45.8 MiB (27.6 MiB/s) with 1 file(s) remaining \rCompleted 3.5 MiB/45.8 MiB (29.5 MiB/s) with 1 file(s) remaining \rCompleted 3.8 MiB/45.8 MiB (31.3 MiB/s) with 1 file(s) remaining \rCompleted 4.0 MiB/45.8 MiB (33.0 MiB/s) with 1 file(s) remaining \rCompleted 4.2 MiB/45.8 MiB (34.7 MiB/s) with 1 file(s) remaining \rCompleted 4.5 MiB/45.8 MiB (36.4 MiB/s) with 1 file(s) remaining \rCompleted 4.8 MiB/45.8 MiB (38.0 MiB/s) with 1 file(s) remaining \rCompleted 5.0 MiB/45.8 MiB (39.6 MiB/s) with 1 file(s) remaining \rCompleted 5.2 MiB/45.8 MiB (41.2 MiB/s) with 1 file(s) remaining \rCompleted 5.5 MiB/45.8 MiB (42.8 MiB/s) with 1 file(s) remaining \rCompleted 5.8 MiB/45.8 MiB (44.1 MiB/s) with 1 file(s) remaining \rCompleted 6.0 MiB/45.8 MiB (45.4 MiB/s) with 1 file(s) remaining \rCompleted 6.2 MiB/45.8 MiB (46.9 MiB/s) with 1 file(s) remaining \rCompleted 6.5 MiB/45.8 MiB (48.3 MiB/s) with 1 file(s) remaining \rCompleted 6.8 MiB/45.8 MiB (48.9 MiB/s) with 1 file(s) remaining \rCompleted 7.0 MiB/45.8 MiB (50.5 MiB/s) with 1 file(s) remaining \rCompleted 7.2 MiB/45.8 MiB (50.9 MiB/s) with 1 file(s) remaining \rCompleted 7.5 MiB/45.8 MiB (52.5 MiB/s) with 1 file(s) remaining \rCompleted 7.8 MiB/45.8 MiB (54.0 MiB/s) with 1 file(s) remaining \rCompleted 8.0 MiB/45.8 MiB (55.2 MiB/s) with 1 file(s) remaining \rCompleted 8.2 MiB/45.8 MiB (56.1 MiB/s) with 1 file(s) remaining \rCompleted 8.5 MiB/45.8 MiB (57.5 MiB/s) with 1 file(s) remaining \rCompleted 8.8 MiB/45.8 MiB (58.9 MiB/s) with 1 file(s) remaining \rCompleted 9.0 MiB/45.8 MiB (59.9 MiB/s) with 1 file(s) remaining \rCompleted 9.2 MiB/45.8 MiB (61.2 MiB/s) with 1 file(s) remaining \rCompleted 9.5 MiB/45.8 MiB (62.7 MiB/s) with 1 file(s) remaining \rCompleted 9.8 MiB/45.8 MiB (64.1 MiB/s) with 1 file(s) remaining \rCompleted 10.0 MiB/45.8 MiB (65.1 MiB/s) with 1 file(s) remaining\rCompleted 10.2 MiB/45.8 MiB (66.1 MiB/s) with 1 file(s) remaining\rCompleted 10.5 MiB/45.8 MiB (67.3 MiB/s) with 1 file(s) remaining\rCompleted 10.8 MiB/45.8 MiB (68.6 MiB/s) with 1 file(s) remaining\rCompleted 11.0 MiB/45.8 MiB (69.9 MiB/s) with 1 file(s) remaining\rCompleted 11.2 MiB/45.8 MiB (71.3 MiB/s) with 1 file(s) remaining\rCompleted 11.5 MiB/45.8 MiB (72.1 MiB/s) with 1 file(s) remaining\rCompleted 11.8 MiB/45.8 MiB (73.3 MiB/s) with 1 file(s) remaining\rCompleted 12.0 MiB/45.8 MiB (74.4 MiB/s) with 1 file(s) remaining\rCompleted 12.2 MiB/45.8 MiB (75.6 MiB/s) with 1 file(s) remaining\rCompleted 12.5 MiB/45.8 MiB (76.7 MiB/s) with 1 file(s) remaining\rCompleted 12.8 MiB/45.8 MiB (77.8 MiB/s) with 1 file(s) remaining\rCompleted 13.0 MiB/45.8 MiB (78.8 MiB/s) with 1 file(s) remaining\rCompleted 13.2 MiB/45.8 MiB (79.9 MiB/s) with 1 file(s) remaining\rCompleted 13.5 MiB/45.8 MiB (81.0 MiB/s) with 1 file(s) remaining\rCompleted 13.8 MiB/45.8 MiB (82.0 MiB/s) with 1 file(s) remaining\rCompleted 14.0 MiB/45.8 MiB (83.3 MiB/s) with 1 file(s) remaining\rCompleted 14.2 MiB/45.8 MiB (84.3 MiB/s) with 1 file(s) remaining\rCompleted 14.5 MiB/45.8 MiB (85.2 MiB/s) with 1 file(s) remaining\rCompleted 14.8 MiB/45.8 MiB (86.1 MiB/s) with 1 file(s) remaining\rCompleted 15.0 MiB/45.8 MiB (87.5 MiB/s) with 1 file(s) remaining\rCompleted 15.2 MiB/45.8 MiB (88.3 MiB/s) with 1 file(s) remaining\rCompleted 15.5 MiB/45.8 MiB (89.1 MiB/s) with 1 file(s) remaining\rCompleted 15.8 MiB/45.8 MiB (90.1 MiB/s) with 1 file(s) remaining\rCompleted 16.0 MiB/45.8 MiB (90.5 MiB/s) with 1 file(s) remaining\rCompleted 16.2 MiB/45.8 MiB (91.5 MiB/s) with 1 file(s) remaining\rCompleted 16.5 MiB/45.8 MiB (92.4 MiB/s) with 1 file(s) remaining\rCompleted 16.8 MiB/45.8 MiB (93.1 MiB/s) with 1 file(s) remaining\rCompleted 17.0 MiB/45.8 MiB (93.8 MiB/s) with 1 file(s) remaining\rCompleted 17.2 MiB/45.8 MiB (94.9 MiB/s) with 1 file(s) remaining\rCompleted 17.5 MiB/45.8 MiB (96.0 MiB/s) with 1 file(s) remaining\rCompleted 17.8 MiB/45.8 MiB (97.1 MiB/s) with 1 file(s) remaining\rCompleted 18.0 MiB/45.8 MiB (97.6 MiB/s) with 1 file(s) remaining\rCompleted 18.2 MiB/45.8 MiB (98.5 MiB/s) with 1 file(s) remaining\rCompleted 18.5 MiB/45.8 MiB (99.7 MiB/s) with 1 file(s) remaining\rCompleted 18.8 MiB/45.8 MiB (100.2 MiB/s) with 1 file(s) remaining\rCompleted 19.0 MiB/45.8 MiB (101.3 MiB/s) with 1 file(s) remaining\rCompleted 19.2 MiB/45.8 MiB (102.0 MiB/s) with 1 file(s) remaining\rCompleted 19.5 MiB/45.8 MiB (103.2 MiB/s) with 1 file(s) remaining\rCompleted 19.8 MiB/45.8 MiB (103.7 MiB/s) with 1 file(s) remaining\rCompleted 20.0 MiB/45.8 MiB (104.6 MiB/s) with 1 file(s) remaining\rCompleted 20.2 MiB/45.8 MiB (105.5 MiB/s) with 1 file(s) remaining\rCompleted 20.5 MiB/45.8 MiB (106.5 MiB/s) with 1 file(s) remaining\rCompleted 20.8 MiB/45.8 MiB (107.3 MiB/s) with 1 file(s) remaining\rCompleted 21.0 MiB/45.8 MiB (107.9 MiB/s) with 1 file(s) remaining\rCompleted 21.2 MiB/45.8 MiB (108.9 MiB/s) with 1 file(s) remaining\rCompleted 21.5 MiB/45.8 MiB (109.9 MiB/s) with 1 file(s) remaining\rCompleted 21.8 MiB/45.8 MiB (110.6 MiB/s) with 1 file(s) remaining\rCompleted 22.0 MiB/45.8 MiB (111.2 MiB/s) with 1 file(s) remaining\rCompleted 22.2 MiB/45.8 MiB (112.4 MiB/s) with 1 file(s) remaining\rCompleted 22.5 MiB/45.8 MiB (113.1 MiB/s) with 1 file(s) remaining\rCompleted 22.8 MiB/45.8 MiB (114.0 MiB/s) with 1 file(s) remaining\rCompleted 23.0 MiB/45.8 MiB (114.6 MiB/s) with 1 file(s) remaining\rCompleted 23.2 MiB/45.8 MiB (115.4 MiB/s) with 1 file(s) remaining\rCompleted 23.5 MiB/45.8 MiB (116.2 MiB/s) with 1 file(s) remaining\rCompleted 23.8 MiB/45.8 MiB (117.1 MiB/s) with 1 file(s) remaining\rCompleted 24.0 MiB/45.8 MiB (117.6 MiB/s) with 1 file(s) remaining\rCompleted 24.2 MiB/45.8 MiB (118.3 MiB/s) with 1 file(s) remaining\rCompleted 24.5 MiB/45.8 MiB (119.2 MiB/s) with 1 file(s) remaining\rCompleted 24.8 MiB/45.8 MiB (120.1 MiB/s) with 1 file(s) remaining\rCompleted 25.0 MiB/45.8 MiB (120.6 MiB/s) with 1 file(s) remaining\rCompleted 25.2 MiB/45.8 MiB (121.7 MiB/s) with 1 file(s) remaining\rCompleted 25.5 MiB/45.8 MiB (122.1 MiB/s) with 1 file(s) remaining\rCompleted 25.8 MiB/45.8 MiB (123.0 MiB/s) with 1 file(s) remaining\rCompleted 26.0 MiB/45.8 MiB (123.5 MiB/s) with 1 file(s) remaining\rCompleted 26.2 MiB/45.8 MiB (124.2 MiB/s) with 1 file(s) remaining\rCompleted 26.5 MiB/45.8 MiB (125.3 MiB/s) with 1 file(s) remaining\rCompleted 26.8 MiB/45.8 MiB (125.7 MiB/s) with 1 file(s) remaining\rCompleted 27.0 MiB/45.8 MiB (126.4 MiB/s) with 1 file(s) remaining\rCompleted 27.2 MiB/45.8 MiB (127.1 MiB/s) with 1 file(s) remaining\rCompleted 27.5 MiB/45.8 MiB (128.0 MiB/s) with 1 file(s) remaining\rCompleted 27.8 MiB/45.8 MiB (128.3 MiB/s) with 1 file(s) remaining\rCompleted 28.0 MiB/45.8 MiB (129.3 MiB/s) with 1 file(s) remaining\rCompleted 28.2 MiB/45.8 MiB (130.1 MiB/s) with 1 file(s) remaining\rCompleted 28.5 MiB/45.8 MiB (130.8 MiB/s) with 1 file(s) remaining\rCompleted 28.8 MiB/45.8 MiB (131.2 MiB/s) with 1 file(s) remaining\rCompleted 29.0 MiB/45.8 MiB (131.9 MiB/s) with 1 file(s) remaining\rCompleted 29.2 MiB/45.8 MiB (132.8 MiB/s) with 1 file(s) remaining\rCompleted 29.5 MiB/45.8 MiB (133.4 MiB/s) with 1 file(s) remaining\rCompleted 29.8 MiB/45.8 MiB (133.7 MiB/s) with 1 file(s) remaining\rCompleted 30.0 MiB/45.8 MiB (134.6 MiB/s) with 1 file(s) remaining\rCompleted 30.2 MiB/45.8 MiB (135.3 MiB/s) with 1 file(s) remaining\rCompleted 30.5 MiB/45.8 MiB (135.8 MiB/s) with 1 file(s) remaining\rCompleted 30.8 MiB/45.8 MiB (136.2 MiB/s) with 1 file(s) remaining\rCompleted 31.0 MiB/45.8 MiB (137.1 MiB/s) with 1 file(s) remaining\rCompleted 31.2 MiB/45.8 MiB (138.0 MiB/s) with 1 file(s) remaining\rCompleted 31.5 MiB/45.8 MiB (138.5 MiB/s) with 1 file(s) remaining\rCompleted 31.8 MiB/45.8 MiB (138.9 MiB/s) with 1 file(s) remaining\rCompleted 32.0 MiB/45.8 MiB (139.4 MiB/s) with 1 file(s) remaining\rCompleted 32.2 MiB/45.8 MiB (140.3 MiB/s) with 1 file(s) remaining\rCompleted 32.5 MiB/45.8 MiB (141.2 MiB/s) with 1 file(s) remaining\rCompleted 32.8 MiB/45.8 MiB (141.4 MiB/s) with 1 file(s) remaining\rCompleted 33.0 MiB/45.8 MiB (142.0 MiB/s) with 1 file(s) remaining\rCompleted 33.2 MiB/45.8 MiB (142.9 MiB/s) with 1 file(s) remaining\rCompleted 33.5 MiB/45.8 MiB (143.5 MiB/s) with 1 file(s) remaining\rCompleted 33.8 MiB/45.8 MiB (143.8 MiB/s) with 1 file(s) remaining\rCompleted 34.0 MiB/45.8 MiB (144.4 MiB/s) with 1 file(s) remaining\rCompleted 34.2 MiB/45.8 MiB (145.2 MiB/s) with 1 file(s) remaining\rCompleted 34.5 MiB/45.8 MiB (145.7 MiB/s) with 1 file(s) remaining\rCompleted 34.8 MiB/45.8 MiB (146.2 MiB/s) with 1 file(s) remaining\rCompleted 35.0 MiB/45.8 MiB (146.7 MiB/s) with 1 file(s) remaining\rCompleted 35.2 MiB/45.8 MiB (147.5 MiB/s) with 1 file(s) remaining\rCompleted 35.5 MiB/45.8 MiB (148.0 MiB/s) with 1 file(s) remaining\rCompleted 35.8 MiB/45.8 MiB (148.5 MiB/s) with 1 file(s) remaining\rCompleted 36.0 MiB/45.8 MiB (149.0 MiB/s) with 1 file(s) remaining\rCompleted 36.2 MiB/45.8 MiB (149.6 MiB/s) with 1 file(s) remaining\rCompleted 36.5 MiB/45.8 MiB (150.2 MiB/s) with 1 file(s) remaining\rCompleted 36.8 MiB/45.8 MiB (150.7 MiB/s) with 1 file(s) remaining\rCompleted 37.0 MiB/45.8 MiB (151.5 MiB/s) with 1 file(s) remaining\rCompleted 37.2 MiB/45.8 MiB (152.1 MiB/s) with 1 file(s) remaining\rCompleted 37.5 MiB/45.8 MiB (152.5 MiB/s) with 1 file(s) remaining\rCompleted 37.8 MiB/45.8 MiB (153.2 MiB/s) with 1 file(s) remaining\rCompleted 38.0 MiB/45.8 MiB (153.5 MiB/s) with 1 file(s) remaining\rCompleted 38.2 MiB/45.8 MiB (154.1 MiB/s) with 1 file(s) remaining\rCompleted 38.5 MiB/45.8 MiB (154.8 MiB/s) with 1 file(s) remaining\rCompleted 38.8 MiB/45.8 MiB (155.0 MiB/s) with 1 file(s) remaining\rCompleted 39.0 MiB/45.8 MiB (155.9 MiB/s) with 1 file(s) remaining\rCompleted 39.2 MiB/45.8 MiB (156.5 MiB/s) with 1 file(s) remaining\rCompleted 39.5 MiB/45.8 MiB (157.0 MiB/s) with 1 file(s) remaining\rCompleted 39.8 MiB/45.8 MiB (157.5 MiB/s) with 1 file(s) remaining\rCompleted 40.0 MiB/45.8 MiB (158.1 MiB/s) with 1 file(s) remaining\rCompleted 40.2 MiB/45.8 MiB (158.6 MiB/s) with 1 file(s) remaining\rCompleted 40.5 MiB/45.8 MiB (159.3 MiB/s) with 1 file(s) remaining\rCompleted 40.8 MiB/45.8 MiB (160.0 MiB/s) with 1 file(s) remaining\rCompleted 41.0 MiB/45.8 MiB (132.5 MiB/s) with 1 file(s) remaining\rCompleted 41.2 MiB/45.8 MiB (132.8 MiB/s) with 1 file(s) remaining\rCompleted 41.5 MiB/45.8 MiB (133.1 MiB/s) with 1 file(s) remaining\rCompleted 41.8 MiB/45.8 MiB (133.5 MiB/s) with 1 file(s) remaining\rCompleted 42.0 MiB/45.8 MiB (133.9 MiB/s) with 1 file(s) remaining\rCompleted 42.2 MiB/45.8 MiB (134.3 MiB/s) with 1 file(s) remaining\rCompleted 42.5 MiB/45.8 MiB (134.8 MiB/s) with 1 file(s) remaining\rCompleted 42.8 MiB/45.8 MiB (135.1 MiB/s) with 1 file(s) remaining\rCompleted 43.0 MiB/45.8 MiB (135.6 MiB/s) with 1 file(s) remaining\rCompleted 43.2 MiB/45.8 MiB (136.0 MiB/s) with 1 file(s) remaining\rCompleted 43.5 MiB/45.8 MiB (136.0 MiB/s) with 1 file(s) remaining\rCompleted 43.8 MiB/45.8 MiB (136.0 MiB/s) with 1 file(s) remaining\rCompleted 44.0 MiB/45.8 MiB (135.7 MiB/s) with 1 file(s) remaining\rCompleted 44.2 MiB/45.8 MiB (135.6 MiB/s) with 1 file(s) remaining\rCompleted 44.5 MiB/45.8 MiB (135.7 MiB/s) with 1 file(s) remaining\rCompleted 44.8 MiB/45.8 MiB (135.8 MiB/s) with 1 file(s) remaining\rCompleted 45.0 MiB/45.8 MiB (108.8 MiB/s) with 1 file(s) remaining\rCompleted 45.2 MiB/45.8 MiB (109.0 MiB/s) with 1 file(s) remaining\rCompleted 45.5 MiB/45.8 MiB (109.1 MiB/s) with 1 file(s) remaining\rCompleted 45.8 MiB/45.8 MiB (109.4 MiB/s) with 1 file(s) remaining\rCompleted 45.8 MiB/45.8 MiB (109.3 MiB/s) with 1 file(s) remaining\rdownload: s3://bardensr/debuggers/inputs/human_cs_57.p to tmp/input/human_cs_57.p\n",
    "5": "Completed 42 Bytes/42 Bytes (321 Bytes/s) with 1 file(s) remaining\rdownload: s3://bardensr/debuggers/configs/config.yaml to tmp/input/config.yaml\n",
    "6": "/home/ubuntu/neurocaas_remote/bardensr/sync.sh /home/ubuntu/tmp/log s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/logs &\n",
    "7": "** running the python script now **\n",
    "8": "Completed 29 Bytes/29 Bytes (694 Bytes/s) with 1 file(s) remaining\rupload: tmp/log/bardensr_out.txt to s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/logs/bardensr_out.txt\n",
    "9": "** finish python script! ** \n",
    "10": "***\n",
    "11": "s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/process_results/\n",
    "12": "****\n",
    "13": "copy the results and logs to s3..... \n",
    "14": "Completed 58.8 KiB/2.1 MiB (602.4 KiB/s) with 57 file(s) remaining\rupload: tmp/output/gene_15.png to s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/process_results/gene_15.png\n",
    "15": "Completed 58.8 KiB/2.1 MiB (602.4 KiB/s) with 56 file(s) remaining\rCompleted 73.9 KiB/2.1 MiB (733.1 KiB/s) with 56 file(s) remaining\rupload: tmp/output/gene_1.png to s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/process_results/gene_1.png\n",
      ........
    "70": "Completed 2.1 MiB/2.1 MiB (3.7 MiB/s) with 1 file(s) remaining\rCompleted 2.1 MiB/2.1 MiB (2.1 MiB/s) with 1 file(s) remaining\rupload: tmp/output/gene_9.png to s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/process_results/gene_9.png\n",
    "71": "Completed 188 Bytes/188 Bytes (5.4 KiB/s) with 1 file(s) remaining\rupload: tmp/log/bardensr_out.txt to s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/logs/bardensr_out.txt\n",
    "72": "** DONE **\n"
  },
  "instance": "i-01998e55d084227b0",
  "reason": [
    "aws s3 sync $LOGDIR s3://$1/\"$6\"/logs/"
  ],
  "command": "57b47d5b-87e1-44e7-8ff6-e246c901b0a0",
  "stderr": {
    "0": "2020-07-13 20:26:58.092287: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F\n",
    "1": "2020-07-13 20:26:58.133278: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz\n",
    "2": "2020-07-13 20:26:58.134926: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5574a9bf3da0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:\n",
    "3": "2020-07-13 20:26:58.134957: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version\n",
    "4": "2020-07-13 20:26:58.139440: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.\n",
    "5": "/home/ubuntu/neurocaas_remote/bardensr/test_image_new.py:110: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).\n",
    "6": "  plt.figure(figsize = (10, 10))\n"
  },
  "input": "debuggers/inputs/human_cs_57.p"
}

The command cleanup in the config json file is NOT included - is this what I expected to get without cleanup?
Any suggestions would be much appreciated!

PLATFORM LIMIT: Users per analysis

Currently, due to the AWS Cloudformation limit on resources per stack, each analysis can support (200-14)/2 concurrent users (14 non-user specific resources, 2 resources per user). Since we have already factored user resources into substacks, this should be pretty easy to extend further.

Improve Instance Limit Behavior

Improve behavior of the NeuroCAAS Instance monitor.

If it's a NeuroCAAS Deploy Instance, update the relevant certificate files so it's clear that they were terminated by external factors. Also useful for user- submitted job cancellation.
Incorporate into a testing framework.
Parametrize limits.

Add quantification of installation difficulties

Quantify installation difficulties for local comparisons by using Docker to simulate different operating systems, language versions and hardware (local vs. aws instances [and the columbia cluster?]). Run an install script per analysis tool derived from installation instructions on the repo, and see where you get failures. Quantify failures across different image os, host os, language version and hardware conditions and get a total robustness measure for each analysis.

These changes can go into a revision of Figure 1D or 3A/B

Figures Checklist

The figure revisions that we want to make can be grouped together in a natural order:

Figure 2 Developer version (#30)
Take Apart Dev version figure into subparts and introduce into Dev Guide. (#20)

Application 2: Widefield imaging

Application 2 is about the widefield imaging protocol, so it should be done with input of Ian, Shreya and Joao.

Reach out for details on:
- figure materials
- workflow description
- is the quantification of install difficulties clear?
write section
make schematic figure
quantify things?

Input Type Checking

When reading in inputs and config files (and other stuff) from aws, we should first perform type checking to save ourselves spare processing time. This relies upon #31 and will inform #33.

default delete mode for user data

In order to protect user privacy and keep down costs, we will add a platform-wide parameter to delete data after successful analysis, with the default being TRUE

AWS Region, CLI version for developer install (CTN internal)

It's currently unclear whether the installation process works with all combinations of AWS Regions (as given through the CLI) and CLI versions when working with the CTN AWS account. We should test these extensively, either through travis or by recruiting remote developers.

Prototype Streaming

Build a prototype for streaming mode. It looks like AWS Kinesis Video Streams are a good candidate for this IF we can figure out how not to write the producer API ourselves (is the producer API built into anything?). The Consumer looks okay, as it's just REST api calls. Thanks to Ryan Glassman for writing this up:
https://github.com/cunningham-lab/neurocaas/wiki/streaming_research

Cost Monitoring: Storage

We need access to the payer account in order to set up cost monitoring for data storage through buckets.

create_image vs create_devami

Hi,
I am reviewing the ami creation process in the dev doc file
and it states to use the cmd
devami.create_image(name)

to finalize your AMI.

I am also looking at the python code and I can see that the function in NeuroCAASAMI
is called :
devami.create_devami(name)

could there be a typo?

Paper Revisions: Developer Perspective

Add more language around developer workflow to the paper. In particular:

Describe some use cases that are facilitated by NeuroCAAS. I'd like to include two examples here:
- Pipelines that are chained together. Show the PMD/LocaNMF pipeline as an example of this, and reference as potential for people to come up with their own preprocessing by using the developer package.
- Multiple pipelines on the same dataset (DGP/DLC analyzing the same video, or three different time series analyses from Sian. )

PLATFORM LIMIT: Total users

AWS IAM has a non-negotiable maximum user limit of 5000. While we don't expect to need this capacity any time in the near future, we should consider switching user accounts to federated users.

Paper Revisions: Grant Materials

We have material for grants like:

A landscape figure that shows the need for NeuroCAAS, despite the existence of other platforms. We need to show how NeuroCAAS is distinguished from these others. One of the key points will be the transparency of what's going on, and the ease of loading new analyses onto neurocaas due to the developer package. This point can be made in the text, but a graphic or even just a general acknowledgement of what else is out there would be helpful for people to situate themselves.
summarized usage metrics from February-July 2020. We can adapt these to the paper as well.

Add some guiding images to the dev guide

Add in images to show what success and failure cases look like in the dev guide.

PLATFORM LIMIT: Analyses per user

Currently, due to the limitation that a single IAM user can only belong to 10 IAM groups, each user has access to only 10 analyses. Although this is not yet a limiting factor, we'd like to refactor the code so that each user only belongs to a single IAM group (corresponding to their affiliate), allowing them to use more than 10 analyses with a single account.

Cost efficient AMIs (review 2/14/21)

For cost efficiency, we should reduce our usage of EBS resources. We can do this as follows:

delete all volumes not actively in use by an instance. Volumes can be recreated at will from snapshots.
Set up a cron job to periodically delete unused volumes and snapshots. See the snapshot_utils code you have to set up a lambda function for this.
In two weeks, delete all AMIs that are not currently referenced in a CFN stack. Send out an email to notify on this.
Delete all unattended instances older than 6 weeks.

neurocaas fails if AMI lacks RootDeviceName, but RootDeviceName is optional

I think sometimes AMI's dont actually have a RootDeviceName -- according to https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_RegisterImage.html it is not required. So then that throws an error when computing volume size.

Need to figure out how to compute volume size without using RootDeviceName from the AMI.

Error getting certificate

When I submitted a job, the output (devami.job_output()) shows the following:

neurocaas_remote/run_main_bardensr.sh workflow dirname
this is the remote directory. 
download: s3://bardensr/debuggers/results/debugjob2020-07-08 15:20:41.978100/logs/DATASET_NAME:human_cs_57.p_STATUS.txt to neurocaas_remote/ncap_utils/statusdict.json
upload: neurocaas_remote/ncap_utils/statusdict.json to s3://bardensr/debuggers/results/debugjob2020-07-08 15:20:41.978100/logs/DATASET_NAME:human_cs_57.p_STATUS.txt
2014, is the pid of the background process
Encountered AWS Error: NoSuchKey
upload: neurocaas_remote/ncap_utils/statusdict.json to s3://bardensr/debuggers/results/debugjob2020-07-08 15:20:41.978100/logs/DATASET_NAME:human_cs_57.p_STATUS.txt
copy: s3://bardensr/debuggers/configs/config.yaml to s3://bardensr/debuggers/results/debugjob2020-07-08 15:20:41.978100/process_results/config.yaml
upload: neurocaas_remote/update.txt to s3://bardensr/debuggers/results/debugjob2020-07-08 15:20:41.978100/process_results/update.txt
/var/lib/amazon/ssm/i-09e57ebffe3557d8b/document/orchestration/6916a62b-af6e-4a56-b539-18e64bbf5c21/awsrunShellScript/0.awsrunShellScript/_script.sh workflow dirname

Traceback (most recent call last):
  File "/home/ubuntu/neurocaas_remote/ncap_utils/finalcert.py", line 41, in <module>
    raise Exception("error getting certificate, not formatted for per-job logging. Message: {}".format(e))
Exception: error getting certificate, not formatted for per-job logging. Message: Traceback (most recent call last):
  File "/home/ubuntu/neurocaas_remote/ncap_utils/finalcert.py", line 30, in <module>
    c = load_cert(bucketname,certpath)
  File "/home/ubuntu/neurocaas_remote/ncap_utils/updatecert.py", line 44, in load_cert
    raise ValueError
ValueError

any suggestion? thank you!!!

Length of timestamp cannot be longer than 64

If it is, the job fails silently.

Instance Abandonment

When lambda timeouts are set too short, instances can be created and abandoned without sending a command. this is an issue that has to be caught by account wide protections, which is highly suboptimal. Fix these timeouts.

Add assign_instance method to NeuroCaaSAMI

Add a method to assign a running instance to a NeuroCaaSAMI object. Useful for longer development cycles where the instance stays up longer than the IPython console.

problem signing up for alpha

Developer guide should be updated with new workflow.

Developer guide should be integrated with the current neurocaas_contrib readme. Dependent upon finishing (#31).

Environment to create when following installation procedure

Hi,
This is Jerome from the Allen Institute.

I appreciate this is coming a little unexpected and you are working hard to release NeuroCaas.
I am very interested in this effort so I was going through your PDF installation to understand how it all will work to deploy an algorithm (I have a very good candidate for this). Most of it went smoothly except when I reached

"4 Initializing a blueprint"

This script bash iac_utils/configure.sh, expect that this environment :
"source activate sam" was already set up. Unless I missed it I could not find it in the instruction.

You might be aware of this already but in case you are not I thought this could be helpful.

I thought that replacing
"source activate sam"
with
"source activate neurocaas"

would work given the previous steps.

Happy to help more if I can.

Jerome

Additional CLI tools

We need additional CLI tools to:

clean up stray images.
list the io-dir files (or provide easey access to them)

How to locate the s3 bucket

Following the step 5.5 in the developer guide, trying to run devami.submit_job_log().
in demo_submit.json in pmd_web_stack, it specifies the s3 bucket

{
    "dataname": "debuggers/inputs/demoMovie.npy",
    "configname":"debuggers/configs/config.yaml",
    "instance_type": "m5.16xlarge",
    "timestamp": "05_05_20_1_23"

}

where is debuggers/inputs/? or how do I know where to put my data and the config.yaml file?
thank you!

Application 1: Ensembling

Script Documentation update

Make the module script_doc_utils a true python package so you can use it across your different projects. Add in functionality to automatically write files to the wiki of a github repo as well as the local script_docs directory.

Revise figure pyramid

Revise the figure 1 pyramid to reflect examples of each infrastructure layer that discriminate better between different analyses. For example, have layer 2 include the language version (I.e. python 3.8) instead of job monitor or resource usage. Each layer should be something you can change, in order to create a potentially breaking change.

For example:
Virtualized OS -> OS Version
Job Monitor -> Language Version
Resource Usage -> simultaneous processes

Developer Interface Workflow

One of the main bottlenecks to developer workflow at the moment is bash scripting in the remote ec2 instance- I will add a set of scripts to automate this process from a template (given a desired set of inputs and outputs, automatically write the script to transfer data and otherwise set up the local environment.)

Start Development Blog

Have a blog page tied to this repository that tracks development process and provides a user friendly getting started manual.

PR workflow: testing

In the future, it would be great to add additional features:

run tests automatically on pull request to make sure nothing fishy is going on (although we'd have to approve tests running on forks somehow too).
- This is already done! The pr requires that status checks pass before starting a deployment. Kind of annoying, but okay.
have additional tags #develop:{stackname}, #test:{stackname} to deploy a testing version, or even a localstack version before touching production analyses.
- you can link these to the iac_utils/build.sh and iac_utils/test_main.sh scripts IF you correctly format and put data into the given buckets.
have additional deployment environments, and link those urls to the analyses, or websites in aws.
Handle different regions.

Per-Analysis Developer Stats

Create a module in neurocaas_contrib to read in a blueprint that you own. From this blueprint, retrieve statistics on:

Active users
Per-user usage
Analysis version being used
Job success or failure per dataset.

Collect these statistics, and aggregate them across all analyses to add usage quantifications to the paper.

Docker Prototype

To allow developers to develop more functionality locally, I will create a docker image that has the latest version of the neurocaas_contrib package installed, and test functionality of this package with existing workflow.

No RootDeviceName on AMI

Occasionally, NeuroCAAS will complain with a KeyError that there is no RootDeviceName when creating instances from existing AMIs. This is most likely due to the AMI still being registered when a new instance creation request is submitted- investigate this further.

See the following certificate from Shuonan running bardensr:
REQUEST START TIME: 2020-07-21 00:11:20.91 (GMT) [+0:00:00.00]
ANALYSIS VERSION ID: 7444a38627c660bd6e93d5ab0a45a73c563eddb4 [+0:00:00.12]
JOB ID: 07_20_test_4 [+0:00:00.17]
[Job Manager] Detected new job: starting up. [+0:00:00.25]
[Internal (init)] Initializing job manager. [+0:00:00.29]
[Internal (init)] Analysis request with dataset(s): debuggers/inputs/data_2.hdf5, config file debuggers/configs/config_2.yaml [+0:00:00.36]
[Job Manager] STEP 1/4 (Initialization): DONE [+0:00:00.41]
[Internal (get_costmonitoring)] Incurred cost so far: $0.6018133333333333. Remaining budget: $299.3981866666667 [+0:00:01.19]
[Job Manager] STEP 2/4 (Validation): DONE [+0:00:01.27]
[Internal (parse_config)] parameter duration not given, proceeding with standard compute launch. [+0:00:01.43]
[Internal (parse_config)] parameter dataset_size is not given, proceeding with standard storage. [+0:00:01.49]
[Job Manager] STEP 3/4 (Environment Setup): INTERNAL ERROR. Traceback (most recent call last):
File "/var/task/submit_start.py", line 717, in process_upload_dev
submission.compute_volumesize()
File "/var/task/submit_start.py", line 386, in compute_volumesize
default_size = utilsparamec2.get_volumesize(os.environ["AMI"])
File "/var/task/utilsparam/ec2.py", line 266, in get_volumesize
root = response["Images"][0]["RootDeviceName"]
KeyError: 'RootDeviceName'
[Job Manager] Shutting down job. [+0:00:01.78]

NeuroCAAS Paper Revisions + Dev Package Main Issue

Description

This is the main issue that collects together all of the individual todo points we would like to accomplish. This issue and the Developer Package + Paper Revision Milestone go together, and this issue should be treated as a long form (markdown enabled) description of that milestone.

I have grouped together Paper Revisions and the Dev Package because the Dev Package is a crucial step to increase the accessibility of the NeuroCAAS Development workflow, and I'd like to have it streamlined before drawing lots of attention to our project again.

Topics

I have broken down the work items we will address into several topics:

Developer Package

Developer Package 1: Dockerization: In the time between submitting the paper and now, it's become clear that Docker would make the developer process a lot smoother. In particular, we can go from developers spending most of their time configuring and saving an AWS instance that we host, to having them develop and test a docker image locally that is compatible with NeuroCAAS, and notifying us when it's ready to be deployed. This process has the following workflow:

Package methods to set up/develop docker container to be compatible with NeuroCAAS (#22)
Set up local testing and logging to mock what user would see in S3 bucket (#22)
Host locally developed docker image on AWS EC2 to test (#31)
Set up build on Pull Request workflow with localstack based test to check (#31)

With dockerized analyses, it's more feasible to address some more reviewer comments like:

How would you handle custom preprocessing? (develop locally, run locally or set up PR to NeuroCAAS)
What about a local/cluster implementation (we don't have plans to do it, but it's more feasible now).

Developer Package 2: Analysis Monitoring/Update: We currently have data about usage of each analysis, the number of users/ the number of active jobs, and total per-user usage that we are using to monitor costs and restrict usage as necessary. It would be good to process this information and make it available to developers through some simple interface. Likewise, as users develop their analyses, we want to make it easy for them to update. Most of the time this should be possible by simply updating the docker image their analysis lives in, and submitting a pull request again.

Develop module to read in blueprint and fetch usage statistics from S3 bucket. (#32)
Update blueprint to include docker image id. (#33)

Paper Revisions

Paper Revisions 1: Developer Perspective: One comment we heard from both reviewers was that we did not focus on the developer's perspective. To this end:

Come up with some good use cases that we can put into a figure (chaining together analyses, running two analyses on the same data), and add a corresponding section to results. (#23)
Mention in text that users can put up their own preprocessing through our new dev workflow, or download a docker image that does the same processing locally. (#23)
Feature the developer's workflow more prominently in Figure 2. (#30)

Paper Revisions 2: Usage Metrics: We are now collecting usage metrics for NeuroCAAS via Google Analytics and through the records generated when users run analyses. These could be added to or replace the schematic currently included as Figure 3.

Make some quantifications of : 1) site visits, 2) distribution of compute usage per user, 3) distribution of analysis duration per user. (#32)
To make the point clearer, include quantifications of difficulties involved with local installations (for example, run CI-type operating system and language matrix on github repos). (#29)

Paper Revisions 3: Grant Materials: Having put together material for grants gives up a lot of extra figures to work with.

Find a place for the Analysis Platform Landscape figure or table. (#34)
Update usage metrics that we recorded. (#34)

Paper Revisions 4: Miscellaneous: We got a few comments on informal style and the balance of content to motivation. It doesn't seem like reviewers liked our 3 part exposition of contribution (which does read more like motivation to be fair). When addressing topics 1 and 2 here let's focus on decreasing motivation and increasing our own quantifications or concrete examples.