cunningham-lab / neurocaas Goto Github PK
View Code? Open in Web Editor NEWIaC codebase for the NeuroCAAS Platform
Home Page: http://www.neurocaas.org
License: GNU General Public License v3.0
IaC codebase for the NeuroCAAS Platform
Home Page: http://www.neurocaas.org
License: GNU General Public License v3.0
We prototyped github actions in the neurocaas_contrib repository when we switched over our CI from Travis to Github actions. It would be feasible for us to design custom actions that are triggered on pull requests, providing us a concrete path forwards to do things like spin up localstack and check that the proposed workflow is functional (#31), and then to build the resulting stack in our neurocaas account.
For the future, we should make it so that the ONLY way to build on our neurocaas account is through pull requests via github actions, easing the burden of managing different stack versions generating different resources under the same name on cfn.
Now, developer usage is switched to a tag based workflow. Here is the current layout:
"Soft cap" protections:
“Hard cap functions” on total usage.
These functions provide a nice layer of security against unexpected usage in all cases except a ssm job that continues unnecessarily.
Here are the next steps:
Neurocaas_remote repo should be an independent python package for developers, and integrate with proper certificate generation routines.
We need additional CLI tools to:
CORS configuration settings are required to make buckets function through the neurocaas website. We need to add these manually right now, which is a bottleneck.
my UXData
in stack_config_template.json
looks like this:
"UXData": {
"Affiliates": [
{
"AffiliateName": "debuggers",
"UserNames": [
"tacosyne"
],
"UserInput": true,
"ContactEmail": "NOTE: KEEP THIS AFFILIATE TO ENABLE EASY TESTING"
}
]
}
(added tacosyne
user as a test user)
but then when i ran fulldeploymen.sh
(step 6 in the documentation), it gives the error:
$ bash ./iac_utils/fulldeploy.sh "bardensr"
webdev
webdev mode
webdev mode
{'AffiliateName': 'debuggers', 'UserNames': ['tacosyne'], 'UserInput': True, 'ContactEmail': 'NOTE: KEEP THIS AFFILIATE TO ENABLE EASY TESTING'} affiliatedict
Error adding User tacosyne, please evaluate An error occurred (AccessDenied) when calling the GetUser operation: User: arn:aws:iam::739988523141:user/cunninghamlab/tacosyneus-east-1 is not authorized to perform: iam:GetUser on resource: user tacosyneus-east-1
Traceback (most recent call last):
File "dev_builder.py", line 1200, in <module>
temp =WebDevTemplate(filename)
File "dev_builder.py", line 597, in __init__
self.add_affiliate(affdict)
File "dev_builder.py", line 134, in add_affiliate
self.add_affiliate_usernet(affiliatedict)
File "dev_builder.py", line 192, in add_affiliate_usernet
users,usernames = self.attach_users(affiliatedict)
File "dev_builder.py", line 156, in attach_users
self.iam_resource.User(user_local).create_date # this line gives an error.
File "/Users/Shuonan/opt/anaconda3/envs/neurocaas/lib/python3.6/site-packages/boto3/resources/factory.py", line 339, in property_loader
self.load()
File "/Users/Shuonan/opt/anaconda3/envs/neurocaas/lib/python3.6/site-packages/boto3/resources/factory.py", line 505, in do_action
response = action(self, *args, **kwargs)
File "/Users/Shuonan/opt/anaconda3/envs/neurocaas/lib/python3.6/site-packages/boto3/resources/action.py", line 83, in __call__
response = getattr(parent.meta.client, operation_name)(**params)
File "/Users/Shuonan/opt/anaconda3/envs/neurocaas/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/Users/Shuonan/opt/anaconda3/envs/neurocaas/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetUser operation: User: arn:aws:iam::739988523141:user/cunninghamlab/tacosyneus-east-1 is not authorized to perform: iam:GetUser on resource: user tacosyneus-east-1
any suggestions? thank you!
Revise Figure 2 to include prominently displayed developer workflow. This issue overlaps significantly with issue #20.
Develop a locally hosted user client for NeuroCAAS file transfer in large batches. Talk to Joao about adopting his GUI for these purposes.
Right now, cost monitoring is dictated by the blueprints of analysis algorithms, as opposed to being declared per user. We should refactor this code at some point to centralize budgets to be per user, not per analysis.
Currently the blueprint contains a lot of AWS specific materials that are good for us to be able to manipulate, but not useful to a developer (SSM role, security group, etc.) Hide these in a configuration file, and instead include:
Include additional measures to parse these parameters (see #31 for relevant details on 2).
occasionally running the job on NeuroCAAS website gives the following error:
AWS ERROR. Transient AWS Communication Error. Please Try Again
The full message in certificate.txt
is following:
REQUEST START TIME: 2020-08-17 16:14:14.26 (GMT) [+0:00:00.00]
ANALYSIS VERSION ID: 72def07263245ec7c47ba6c157230a7715f5c225 [+0:00:00.12]
JOB ID: 1597680851 [+0:00:00.17]
[Job Manager] Detected new job: starting up. [+0:00:00.28]
[Internal (init)] Initializing job manager. [+0:00:00.33]
[Internal (init)] Using default instance type p2.xlarge from config file. [+0:00:00.39]
[Internal (init)] Analysis request with dataset(s): ['bardensr/inputs/20xdata_300_300.hdf5'], config file bardensr/configs/config_test.yaml [+0:00:00.45]
[Job Manager] STEP 1/4 (Initialization): DONE [+0:00:00.54]
[Internal (get_costmonitoring)] Incurred cost so far: $2.5527633333333335. Remaining budget: $297.44723666666664 [+0:00:04.55]
[Job Manager] STEP 2/4 (Validation): DONE [+0:00:04.63]
[Internal (parse_config)] parameter __duration__ not given, proceeding with standard compute launch. [+0:00:04.79]
[Internal (parse_config)] parameter __dataset_size__ is not given, proceeding with standard storage. [+0:00:04.85]
[Job Manager] STEP 3/4 (Environment Setup): DONE [+0:00:05.13]
[Utils] Acquiring new p2.xlarge instances from ami-003157f05ec0f37eb ... [+0:00:05.58]
[Utils] save not available (duration not given or greater than 6 hours). Launching standard instance. [+0:00:05.58]
[Utils] New instance ec2.Instance(id='i-02a211e6f418b3ea6') created! [+0:00:07.31]
[Internal (put_instance_monitor_rule)] Setting up monitoring on all instances... [+0:00:07.86]
[Utils] Instance i-02a211e6f418b3ea6 State: pending... [+0:00:08.21]
[Utils] Starting Instance... [+0:00:08.21]
[Utils] Instance started! [+0:00:23.93]
[Utils] Initializing instances. This could take a moment... [+0:00:24.16]
[Utils] All Instances Initialized. [+0:01:24.27]
[Internal (start_instance)] Created 1 immutable analysis environments. [+0:01:24.46]
[Job Manager] STEP 4/4 (Initialize Processing): AWS ERROR. Transient AWS Communication Error. Please Try Again
[Job Manager] Shutting down job. [+0:01:24.83]
When I submit jobs using devami.submit_job_log()
, it seems it ran successfully but when checked devami.job_status()
it always says its 'InProgress'.
htop
it seems nothing is running on the instances3://"$bucketname"/"$groupdir"/"$resultdir"/logs/DATASET_NAME:"$dataname"_STATUS.txt
says "status": "SUCCESS" (see below)devami.job_output()
:debug_direct/57b47d5b-87e1-44e7-8ff6-e246c901b0a0/i-01998e55d084227b0/awsrunShellScript/0.awsrunShellScript/stderr not found. may not be updated yet.
the full information from s3://"$bucketname"/"$groupdir"/"$resultdir"/logs/DATASET_NAME:"$dataname"_STATUS.txt
:
{
"status": "SUCCESS",
"cpu_usage": [
"0"
],
"stdout": {
"0": "/home/ubuntu/neurocaas_remote/bardensr/run_bardensr.sh workflow dirname\n",
"1": "this is the remote directory. \n",
"2": "aws s3 cp s3://bardensr/debuggers/inputs/human_cs_57.p input\n",
"3": "aws s3 cp s3://bardensr/debuggers/configs/config.yaml config\n",
"4": "Completed 256.0 KiB/45.8 MiB (2.6 MiB/s) with 1 file(s) remaining\rCompleted 512.0 KiB/45.8 MiB (5.0 MiB/s) with 1 file(s) remaining\rCompleted 768.0 KiB/45.8 MiB (7.3 MiB/s) with 1 file(s) remaining\rCompleted 1.0 MiB/45.8 MiB (9.5 MiB/s) with 1 file(s) remaining \rCompleted 1.2 MiB/45.8 MiB (11.7 MiB/s) with 1 file(s) remaining \rCompleted 1.5 MiB/45.8 MiB (13.8 MiB/s) with 1 file(s) remaining \rCompleted 1.8 MiB/45.8 MiB (15.9 MiB/s) with 1 file(s) remaining \rCompleted 2.0 MiB/45.8 MiB (17.9 MiB/s) with 1 file(s) remaining \rCompleted 2.2 MiB/45.8 MiB (19.9 MiB/s) with 1 file(s) remaining \rCompleted 2.5 MiB/45.8 MiB (21.9 MiB/s) with 1 file(s) remaining \rCompleted 2.8 MiB/45.8 MiB (23.8 MiB/s) with 1 file(s) remaining \rCompleted 3.0 MiB/45.8 MiB (25.7 MiB/s) with 1 file(s) remaining \rCompleted 3.2 MiB/45.8 MiB (27.6 MiB/s) with 1 file(s) remaining \rCompleted 3.5 MiB/45.8 MiB (29.5 MiB/s) with 1 file(s) remaining \rCompleted 3.8 MiB/45.8 MiB (31.3 MiB/s) with 1 file(s) remaining \rCompleted 4.0 MiB/45.8 MiB (33.0 MiB/s) with 1 file(s) remaining \rCompleted 4.2 MiB/45.8 MiB (34.7 MiB/s) with 1 file(s) remaining \rCompleted 4.5 MiB/45.8 MiB (36.4 MiB/s) with 1 file(s) remaining \rCompleted 4.8 MiB/45.8 MiB (38.0 MiB/s) with 1 file(s) remaining \rCompleted 5.0 MiB/45.8 MiB (39.6 MiB/s) with 1 file(s) remaining \rCompleted 5.2 MiB/45.8 MiB (41.2 MiB/s) with 1 file(s) remaining \rCompleted 5.5 MiB/45.8 MiB (42.8 MiB/s) with 1 file(s) remaining \rCompleted 5.8 MiB/45.8 MiB (44.1 MiB/s) with 1 file(s) remaining \rCompleted 6.0 MiB/45.8 MiB (45.4 MiB/s) with 1 file(s) remaining \rCompleted 6.2 MiB/45.8 MiB (46.9 MiB/s) with 1 file(s) remaining \rCompleted 6.5 MiB/45.8 MiB (48.3 MiB/s) with 1 file(s) remaining \rCompleted 6.8 MiB/45.8 MiB (48.9 MiB/s) with 1 file(s) remaining \rCompleted 7.0 MiB/45.8 MiB (50.5 MiB/s) with 1 file(s) remaining \rCompleted 7.2 MiB/45.8 MiB (50.9 MiB/s) with 1 file(s) remaining \rCompleted 7.5 MiB/45.8 MiB (52.5 MiB/s) with 1 file(s) remaining \rCompleted 7.8 MiB/45.8 MiB (54.0 MiB/s) with 1 file(s) remaining \rCompleted 8.0 MiB/45.8 MiB (55.2 MiB/s) with 1 file(s) remaining \rCompleted 8.2 MiB/45.8 MiB (56.1 MiB/s) with 1 file(s) remaining \rCompleted 8.5 MiB/45.8 MiB (57.5 MiB/s) with 1 file(s) remaining \rCompleted 8.8 MiB/45.8 MiB (58.9 MiB/s) with 1 file(s) remaining \rCompleted 9.0 MiB/45.8 MiB (59.9 MiB/s) with 1 file(s) remaining \rCompleted 9.2 MiB/45.8 MiB (61.2 MiB/s) with 1 file(s) remaining \rCompleted 9.5 MiB/45.8 MiB (62.7 MiB/s) with 1 file(s) remaining \rCompleted 9.8 MiB/45.8 MiB (64.1 MiB/s) with 1 file(s) remaining \rCompleted 10.0 MiB/45.8 MiB (65.1 MiB/s) with 1 file(s) remaining\rCompleted 10.2 MiB/45.8 MiB (66.1 MiB/s) with 1 file(s) remaining\rCompleted 10.5 MiB/45.8 MiB (67.3 MiB/s) with 1 file(s) remaining\rCompleted 10.8 MiB/45.8 MiB (68.6 MiB/s) with 1 file(s) remaining\rCompleted 11.0 MiB/45.8 MiB (69.9 MiB/s) with 1 file(s) remaining\rCompleted 11.2 MiB/45.8 MiB (71.3 MiB/s) with 1 file(s) remaining\rCompleted 11.5 MiB/45.8 MiB (72.1 MiB/s) with 1 file(s) remaining\rCompleted 11.8 MiB/45.8 MiB (73.3 MiB/s) with 1 file(s) remaining\rCompleted 12.0 MiB/45.8 MiB (74.4 MiB/s) with 1 file(s) remaining\rCompleted 12.2 MiB/45.8 MiB (75.6 MiB/s) with 1 file(s) remaining\rCompleted 12.5 MiB/45.8 MiB (76.7 MiB/s) with 1 file(s) remaining\rCompleted 12.8 MiB/45.8 MiB (77.8 MiB/s) with 1 file(s) remaining\rCompleted 13.0 MiB/45.8 MiB (78.8 MiB/s) with 1 file(s) remaining\rCompleted 13.2 MiB/45.8 MiB (79.9 MiB/s) with 1 file(s) remaining\rCompleted 13.5 MiB/45.8 MiB (81.0 MiB/s) with 1 file(s) remaining\rCompleted 13.8 MiB/45.8 MiB (82.0 MiB/s) with 1 file(s) remaining\rCompleted 14.0 MiB/45.8 MiB (83.3 MiB/s) with 1 file(s) remaining\rCompleted 14.2 MiB/45.8 MiB (84.3 MiB/s) with 1 file(s) remaining\rCompleted 14.5 MiB/45.8 MiB (85.2 MiB/s) with 1 file(s) remaining\rCompleted 14.8 MiB/45.8 MiB (86.1 MiB/s) with 1 file(s) remaining\rCompleted 15.0 MiB/45.8 MiB (87.5 MiB/s) with 1 file(s) remaining\rCompleted 15.2 MiB/45.8 MiB (88.3 MiB/s) with 1 file(s) remaining\rCompleted 15.5 MiB/45.8 MiB (89.1 MiB/s) with 1 file(s) remaining\rCompleted 15.8 MiB/45.8 MiB (90.1 MiB/s) with 1 file(s) remaining\rCompleted 16.0 MiB/45.8 MiB (90.5 MiB/s) with 1 file(s) remaining\rCompleted 16.2 MiB/45.8 MiB (91.5 MiB/s) with 1 file(s) remaining\rCompleted 16.5 MiB/45.8 MiB (92.4 MiB/s) with 1 file(s) remaining\rCompleted 16.8 MiB/45.8 MiB (93.1 MiB/s) with 1 file(s) remaining\rCompleted 17.0 MiB/45.8 MiB (93.8 MiB/s) with 1 file(s) remaining\rCompleted 17.2 MiB/45.8 MiB (94.9 MiB/s) with 1 file(s) remaining\rCompleted 17.5 MiB/45.8 MiB (96.0 MiB/s) with 1 file(s) remaining\rCompleted 17.8 MiB/45.8 MiB (97.1 MiB/s) with 1 file(s) remaining\rCompleted 18.0 MiB/45.8 MiB (97.6 MiB/s) with 1 file(s) remaining\rCompleted 18.2 MiB/45.8 MiB (98.5 MiB/s) with 1 file(s) remaining\rCompleted 18.5 MiB/45.8 MiB (99.7 MiB/s) with 1 file(s) remaining\rCompleted 18.8 MiB/45.8 MiB (100.2 MiB/s) with 1 file(s) remaining\rCompleted 19.0 MiB/45.8 MiB (101.3 MiB/s) with 1 file(s) remaining\rCompleted 19.2 MiB/45.8 MiB (102.0 MiB/s) with 1 file(s) remaining\rCompleted 19.5 MiB/45.8 MiB (103.2 MiB/s) with 1 file(s) remaining\rCompleted 19.8 MiB/45.8 MiB (103.7 MiB/s) with 1 file(s) remaining\rCompleted 20.0 MiB/45.8 MiB (104.6 MiB/s) with 1 file(s) remaining\rCompleted 20.2 MiB/45.8 MiB (105.5 MiB/s) with 1 file(s) remaining\rCompleted 20.5 MiB/45.8 MiB (106.5 MiB/s) with 1 file(s) remaining\rCompleted 20.8 MiB/45.8 MiB (107.3 MiB/s) with 1 file(s) remaining\rCompleted 21.0 MiB/45.8 MiB (107.9 MiB/s) with 1 file(s) remaining\rCompleted 21.2 MiB/45.8 MiB (108.9 MiB/s) with 1 file(s) remaining\rCompleted 21.5 MiB/45.8 MiB (109.9 MiB/s) with 1 file(s) remaining\rCompleted 21.8 MiB/45.8 MiB (110.6 MiB/s) with 1 file(s) remaining\rCompleted 22.0 MiB/45.8 MiB (111.2 MiB/s) with 1 file(s) remaining\rCompleted 22.2 MiB/45.8 MiB (112.4 MiB/s) with 1 file(s) remaining\rCompleted 22.5 MiB/45.8 MiB (113.1 MiB/s) with 1 file(s) remaining\rCompleted 22.8 MiB/45.8 MiB (114.0 MiB/s) with 1 file(s) remaining\rCompleted 23.0 MiB/45.8 MiB (114.6 MiB/s) with 1 file(s) remaining\rCompleted 23.2 MiB/45.8 MiB (115.4 MiB/s) with 1 file(s) remaining\rCompleted 23.5 MiB/45.8 MiB (116.2 MiB/s) with 1 file(s) remaining\rCompleted 23.8 MiB/45.8 MiB (117.1 MiB/s) with 1 file(s) remaining\rCompleted 24.0 MiB/45.8 MiB (117.6 MiB/s) with 1 file(s) remaining\rCompleted 24.2 MiB/45.8 MiB (118.3 MiB/s) with 1 file(s) remaining\rCompleted 24.5 MiB/45.8 MiB (119.2 MiB/s) with 1 file(s) remaining\rCompleted 24.8 MiB/45.8 MiB (120.1 MiB/s) with 1 file(s) remaining\rCompleted 25.0 MiB/45.8 MiB (120.6 MiB/s) with 1 file(s) remaining\rCompleted 25.2 MiB/45.8 MiB (121.7 MiB/s) with 1 file(s) remaining\rCompleted 25.5 MiB/45.8 MiB (122.1 MiB/s) with 1 file(s) remaining\rCompleted 25.8 MiB/45.8 MiB (123.0 MiB/s) with 1 file(s) remaining\rCompleted 26.0 MiB/45.8 MiB (123.5 MiB/s) with 1 file(s) remaining\rCompleted 26.2 MiB/45.8 MiB (124.2 MiB/s) with 1 file(s) remaining\rCompleted 26.5 MiB/45.8 MiB (125.3 MiB/s) with 1 file(s) remaining\rCompleted 26.8 MiB/45.8 MiB (125.7 MiB/s) with 1 file(s) remaining\rCompleted 27.0 MiB/45.8 MiB (126.4 MiB/s) with 1 file(s) remaining\rCompleted 27.2 MiB/45.8 MiB (127.1 MiB/s) with 1 file(s) remaining\rCompleted 27.5 MiB/45.8 MiB (128.0 MiB/s) with 1 file(s) remaining\rCompleted 27.8 MiB/45.8 MiB (128.3 MiB/s) with 1 file(s) remaining\rCompleted 28.0 MiB/45.8 MiB (129.3 MiB/s) with 1 file(s) remaining\rCompleted 28.2 MiB/45.8 MiB (130.1 MiB/s) with 1 file(s) remaining\rCompleted 28.5 MiB/45.8 MiB (130.8 MiB/s) with 1 file(s) remaining\rCompleted 28.8 MiB/45.8 MiB (131.2 MiB/s) with 1 file(s) remaining\rCompleted 29.0 MiB/45.8 MiB (131.9 MiB/s) with 1 file(s) remaining\rCompleted 29.2 MiB/45.8 MiB (132.8 MiB/s) with 1 file(s) remaining\rCompleted 29.5 MiB/45.8 MiB (133.4 MiB/s) with 1 file(s) remaining\rCompleted 29.8 MiB/45.8 MiB (133.7 MiB/s) with 1 file(s) remaining\rCompleted 30.0 MiB/45.8 MiB (134.6 MiB/s) with 1 file(s) remaining\rCompleted 30.2 MiB/45.8 MiB (135.3 MiB/s) with 1 file(s) remaining\rCompleted 30.5 MiB/45.8 MiB (135.8 MiB/s) with 1 file(s) remaining\rCompleted 30.8 MiB/45.8 MiB (136.2 MiB/s) with 1 file(s) remaining\rCompleted 31.0 MiB/45.8 MiB (137.1 MiB/s) with 1 file(s) remaining\rCompleted 31.2 MiB/45.8 MiB (138.0 MiB/s) with 1 file(s) remaining\rCompleted 31.5 MiB/45.8 MiB (138.5 MiB/s) with 1 file(s) remaining\rCompleted 31.8 MiB/45.8 MiB (138.9 MiB/s) with 1 file(s) remaining\rCompleted 32.0 MiB/45.8 MiB (139.4 MiB/s) with 1 file(s) remaining\rCompleted 32.2 MiB/45.8 MiB (140.3 MiB/s) with 1 file(s) remaining\rCompleted 32.5 MiB/45.8 MiB (141.2 MiB/s) with 1 file(s) remaining\rCompleted 32.8 MiB/45.8 MiB (141.4 MiB/s) with 1 file(s) remaining\rCompleted 33.0 MiB/45.8 MiB (142.0 MiB/s) with 1 file(s) remaining\rCompleted 33.2 MiB/45.8 MiB (142.9 MiB/s) with 1 file(s) remaining\rCompleted 33.5 MiB/45.8 MiB (143.5 MiB/s) with 1 file(s) remaining\rCompleted 33.8 MiB/45.8 MiB (143.8 MiB/s) with 1 file(s) remaining\rCompleted 34.0 MiB/45.8 MiB (144.4 MiB/s) with 1 file(s) remaining\rCompleted 34.2 MiB/45.8 MiB (145.2 MiB/s) with 1 file(s) remaining\rCompleted 34.5 MiB/45.8 MiB (145.7 MiB/s) with 1 file(s) remaining\rCompleted 34.8 MiB/45.8 MiB (146.2 MiB/s) with 1 file(s) remaining\rCompleted 35.0 MiB/45.8 MiB (146.7 MiB/s) with 1 file(s) remaining\rCompleted 35.2 MiB/45.8 MiB (147.5 MiB/s) with 1 file(s) remaining\rCompleted 35.5 MiB/45.8 MiB (148.0 MiB/s) with 1 file(s) remaining\rCompleted 35.8 MiB/45.8 MiB (148.5 MiB/s) with 1 file(s) remaining\rCompleted 36.0 MiB/45.8 MiB (149.0 MiB/s) with 1 file(s) remaining\rCompleted 36.2 MiB/45.8 MiB (149.6 MiB/s) with 1 file(s) remaining\rCompleted 36.5 MiB/45.8 MiB (150.2 MiB/s) with 1 file(s) remaining\rCompleted 36.8 MiB/45.8 MiB (150.7 MiB/s) with 1 file(s) remaining\rCompleted 37.0 MiB/45.8 MiB (151.5 MiB/s) with 1 file(s) remaining\rCompleted 37.2 MiB/45.8 MiB (152.1 MiB/s) with 1 file(s) remaining\rCompleted 37.5 MiB/45.8 MiB (152.5 MiB/s) with 1 file(s) remaining\rCompleted 37.8 MiB/45.8 MiB (153.2 MiB/s) with 1 file(s) remaining\rCompleted 38.0 MiB/45.8 MiB (153.5 MiB/s) with 1 file(s) remaining\rCompleted 38.2 MiB/45.8 MiB (154.1 MiB/s) with 1 file(s) remaining\rCompleted 38.5 MiB/45.8 MiB (154.8 MiB/s) with 1 file(s) remaining\rCompleted 38.8 MiB/45.8 MiB (155.0 MiB/s) with 1 file(s) remaining\rCompleted 39.0 MiB/45.8 MiB (155.9 MiB/s) with 1 file(s) remaining\rCompleted 39.2 MiB/45.8 MiB (156.5 MiB/s) with 1 file(s) remaining\rCompleted 39.5 MiB/45.8 MiB (157.0 MiB/s) with 1 file(s) remaining\rCompleted 39.8 MiB/45.8 MiB (157.5 MiB/s) with 1 file(s) remaining\rCompleted 40.0 MiB/45.8 MiB (158.1 MiB/s) with 1 file(s) remaining\rCompleted 40.2 MiB/45.8 MiB (158.6 MiB/s) with 1 file(s) remaining\rCompleted 40.5 MiB/45.8 MiB (159.3 MiB/s) with 1 file(s) remaining\rCompleted 40.8 MiB/45.8 MiB (160.0 MiB/s) with 1 file(s) remaining\rCompleted 41.0 MiB/45.8 MiB (132.5 MiB/s) with 1 file(s) remaining\rCompleted 41.2 MiB/45.8 MiB (132.8 MiB/s) with 1 file(s) remaining\rCompleted 41.5 MiB/45.8 MiB (133.1 MiB/s) with 1 file(s) remaining\rCompleted 41.8 MiB/45.8 MiB (133.5 MiB/s) with 1 file(s) remaining\rCompleted 42.0 MiB/45.8 MiB (133.9 MiB/s) with 1 file(s) remaining\rCompleted 42.2 MiB/45.8 MiB (134.3 MiB/s) with 1 file(s) remaining\rCompleted 42.5 MiB/45.8 MiB (134.8 MiB/s) with 1 file(s) remaining\rCompleted 42.8 MiB/45.8 MiB (135.1 MiB/s) with 1 file(s) remaining\rCompleted 43.0 MiB/45.8 MiB (135.6 MiB/s) with 1 file(s) remaining\rCompleted 43.2 MiB/45.8 MiB (136.0 MiB/s) with 1 file(s) remaining\rCompleted 43.5 MiB/45.8 MiB (136.0 MiB/s) with 1 file(s) remaining\rCompleted 43.8 MiB/45.8 MiB (136.0 MiB/s) with 1 file(s) remaining\rCompleted 44.0 MiB/45.8 MiB (135.7 MiB/s) with 1 file(s) remaining\rCompleted 44.2 MiB/45.8 MiB (135.6 MiB/s) with 1 file(s) remaining\rCompleted 44.5 MiB/45.8 MiB (135.7 MiB/s) with 1 file(s) remaining\rCompleted 44.8 MiB/45.8 MiB (135.8 MiB/s) with 1 file(s) remaining\rCompleted 45.0 MiB/45.8 MiB (108.8 MiB/s) with 1 file(s) remaining\rCompleted 45.2 MiB/45.8 MiB (109.0 MiB/s) with 1 file(s) remaining\rCompleted 45.5 MiB/45.8 MiB (109.1 MiB/s) with 1 file(s) remaining\rCompleted 45.8 MiB/45.8 MiB (109.4 MiB/s) with 1 file(s) remaining\rCompleted 45.8 MiB/45.8 MiB (109.3 MiB/s) with 1 file(s) remaining\rdownload: s3://bardensr/debuggers/inputs/human_cs_57.p to tmp/input/human_cs_57.p\n",
"5": "Completed 42 Bytes/42 Bytes (321 Bytes/s) with 1 file(s) remaining\rdownload: s3://bardensr/debuggers/configs/config.yaml to tmp/input/config.yaml\n",
"6": "/home/ubuntu/neurocaas_remote/bardensr/sync.sh /home/ubuntu/tmp/log s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/logs &\n",
"7": "** running the python script now **\n",
"8": "Completed 29 Bytes/29 Bytes (694 Bytes/s) with 1 file(s) remaining\rupload: tmp/log/bardensr_out.txt to s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/logs/bardensr_out.txt\n",
"9": "** finish python script! ** \n",
"10": "***\n",
"11": "s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/process_results/\n",
"12": "****\n",
"13": "copy the results and logs to s3..... \n",
"14": "Completed 58.8 KiB/2.1 MiB (602.4 KiB/s) with 57 file(s) remaining\rupload: tmp/output/gene_15.png to s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/process_results/gene_15.png\n",
"15": "Completed 58.8 KiB/2.1 MiB (602.4 KiB/s) with 56 file(s) remaining\rCompleted 73.9 KiB/2.1 MiB (733.1 KiB/s) with 56 file(s) remaining\rupload: tmp/output/gene_1.png to s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/process_results/gene_1.png\n",
........
"70": "Completed 2.1 MiB/2.1 MiB (3.7 MiB/s) with 1 file(s) remaining\rCompleted 2.1 MiB/2.1 MiB (2.1 MiB/s) with 1 file(s) remaining\rupload: tmp/output/gene_9.png to s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/process_results/gene_9.png\n",
"71": "Completed 188 Bytes/188 Bytes (5.4 KiB/s) with 1 file(s) remaining\rupload: tmp/log/bardensr_out.txt to s3://bardensr/debuggers/results/debugjob2020-07-13 16:26:45.941730/logs/bardensr_out.txt\n",
"72": "** DONE **\n"
},
"instance": "i-01998e55d084227b0",
"reason": [
"aws s3 sync $LOGDIR s3://$1/\"$6\"/logs/"
],
"command": "57b47d5b-87e1-44e7-8ff6-e246c901b0a0",
"stderr": {
"0": "2020-07-13 20:26:58.092287: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F\n",
"1": "2020-07-13 20:26:58.133278: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz\n",
"2": "2020-07-13 20:26:58.134926: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5574a9bf3da0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:\n",
"3": "2020-07-13 20:26:58.134957: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version\n",
"4": "2020-07-13 20:26:58.139440: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.\n",
"5": "/home/ubuntu/neurocaas_remote/bardensr/test_image_new.py:110: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).\n",
"6": " plt.figure(figsize = (10, 10))\n"
},
"input": "debuggers/inputs/human_cs_57.p"
}
The command cleanup
in the config json file is NOT included - is this what I expected to get without cleanup?
Any suggestions would be much appreciated!
Currently, due to the AWS Cloudformation limit on resources per stack, each analysis can support (200-14)/2 concurrent users (14 non-user specific resources, 2 resources per user). Since we have already factored user resources into substacks, this should be pretty easy to extend further.
Improve behavior of the NeuroCAAS Instance monitor.
If it's a NeuroCAAS Deploy Instance, update the relevant certificate files so it's clear that they were terminated by external factors. Also useful for user- submitted job cancellation.
Incorporate into a testing framework.
Parametrize limits.
Quantify installation difficulties for local comparisons by using Docker to simulate different operating systems, language versions and hardware (local vs. aws instances [and the columbia cluster?]). Run an install script per analysis tool derived from installation instructions on the repo, and see where you get failures. Quantify failures across different image os, host os, language version and hardware conditions and get a total robustness measure for each analysis.
These changes can go into a revision of Figure 1D or 3A/B
Application 2 is about the widefield imaging protocol, so it should be done with input of Ian, Shreya and Joao.
In order to protect user privacy and keep down costs, we will add a platform-wide parameter to delete data after successful analysis, with the default being TRUE
It's currently unclear whether the installation process works with all combinations of AWS Regions (as given through the CLI) and CLI versions when working with the CTN AWS account. We should test these extensively, either through travis or by recruiting remote developers.
Build a prototype for streaming mode. It looks like AWS Kinesis Video Streams are a good candidate for this IF we can figure out how not to write the producer API ourselves (is the producer API built into anything?). The Consumer looks okay, as it's just REST api calls. Thanks to Ryan Glassman for writing this up:
https://github.com/cunningham-lab/neurocaas/wiki/streaming_research
We need access to the payer account in order to set up cost monitoring for data storage through buckets.
Hi,
I am reviewing the ami creation process in the dev doc file
and it states to use the cmd
devami.create_image(name)
to finalize your AMI.
I am also looking at the python code and I can see that the function in NeuroCAASAMI
is called :
devami.create_devami(name)
could there be a typo?
Add more language around developer workflow to the paper. In particular:
AWS IAM has a non-negotiable maximum user limit of 5000. While we don't expect to need this capacity any time in the near future, we should consider switching user accounts to federated users.
We have material for grants like:
Add in images to show what success and failure cases look like in the dev guide.
Currently, due to the limitation that a single IAM user can only belong to 10 IAM groups, each user has access to only 10 analyses. Although this is not yet a limiting factor, we'd like to refactor the code so that each user only belongs to a single IAM group (corresponding to their affiliate), allowing them to use more than 10 analyses with a single account.
For cost efficiency, we should reduce our usage of EBS resources. We can do this as follows:
I think sometimes AMI's dont actually have a RootDeviceName -- according to https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_RegisterImage.html it is not required. So then that throws an error when computing volume size.
Need to figure out how to compute volume size without using RootDeviceName from the AMI.
When I submitted a job, the output (devami.job_output()
) shows the following:
neurocaas_remote/run_main_bardensr.sh workflow dirname
this is the remote directory.
download: s3://bardensr/debuggers/results/debugjob2020-07-08 15:20:41.978100/logs/DATASET_NAME:human_cs_57.p_STATUS.txt to neurocaas_remote/ncap_utils/statusdict.json
upload: neurocaas_remote/ncap_utils/statusdict.json to s3://bardensr/debuggers/results/debugjob2020-07-08 15:20:41.978100/logs/DATASET_NAME:human_cs_57.p_STATUS.txt
2014, is the pid of the background process
Encountered AWS Error: NoSuchKey
upload: neurocaas_remote/ncap_utils/statusdict.json to s3://bardensr/debuggers/results/debugjob2020-07-08 15:20:41.978100/logs/DATASET_NAME:human_cs_57.p_STATUS.txt
copy: s3://bardensr/debuggers/configs/config.yaml to s3://bardensr/debuggers/results/debugjob2020-07-08 15:20:41.978100/process_results/config.yaml
upload: neurocaas_remote/update.txt to s3://bardensr/debuggers/results/debugjob2020-07-08 15:20:41.978100/process_results/update.txt
/var/lib/amazon/ssm/i-09e57ebffe3557d8b/document/orchestration/6916a62b-af6e-4a56-b539-18e64bbf5c21/awsrunShellScript/0.awsrunShellScript/_script.sh workflow dirname
Traceback (most recent call last):
File "/home/ubuntu/neurocaas_remote/ncap_utils/finalcert.py", line 41, in <module>
raise Exception("error getting certificate, not formatted for per-job logging. Message: {}".format(e))
Exception: error getting certificate, not formatted for per-job logging. Message: Traceback (most recent call last):
File "/home/ubuntu/neurocaas_remote/ncap_utils/finalcert.py", line 30, in <module>
c = load_cert(bucketname,certpath)
File "/home/ubuntu/neurocaas_remote/ncap_utils/updatecert.py", line 44, in load_cert
raise ValueError
ValueError
any suggestion? thank you!!!
If it is, the job fails silently.
When lambda timeouts are set too short, instances can be created and abandoned without sending a command. this is an issue that has to be caught by account wide protections, which is highly suboptimal. Fix these timeouts.
Add a method to assign a running instance to a NeuroCaaSAMI object. Useful for longer development cycles where the instance stays up longer than the IPython console.
Developer guide should be integrated with the current neurocaas_contrib readme. Dependent upon finishing (#31).
Hi,
This is Jerome from the Allen Institute.
I appreciate this is coming a little unexpected and you are working hard to release NeuroCaas.
I am very interested in this effort so I was going through your PDF installation to understand how it all will work to deploy an algorithm (I have a very good candidate for this). Most of it went smoothly except when I reached
"4 Initializing a blueprint"
This script bash iac_utils/configure.sh, expect that this environment :
"source activate sam" was already set up. Unless I missed it I could not find it in the instruction.
You might be aware of this already but in case you are not I thought this could be helpful.
I thought that replacing
"source activate sam"
with
"source activate neurocaas"
would work given the previous steps.
Happy to help more if I can.
Jerome
We need additional CLI tools to:
Following the step 5.5 in the developer guide, trying to run devami.submit_job_log()
.
in demo_submit.json
in pmd_web_stack, it specifies the s3 bucket
{
"dataname": "debuggers/inputs/demoMovie.npy",
"configname":"debuggers/configs/config.yaml",
"instance_type": "m5.16xlarge",
"timestamp": "05_05_20_1_23"
}
where is debuggers/inputs/
? or how do I know where to put my data and the config.yaml
file?
thank you!
Make the module script_doc_utils a true python package so you can use it across your different projects. Add in functionality to automatically write files to the wiki of a github repo as well as the local script_docs directory.
Revise the figure 1 pyramid to reflect examples of each infrastructure layer that discriminate better between different analyses. For example, have layer 2 include the language version (I.e. python 3.8) instead of job monitor or resource usage. Each layer should be something you can change, in order to create a potentially breaking change.
For example:
Virtualized OS -> OS Version
Job Monitor -> Language Version
Resource Usage -> simultaneous processes
One of the main bottlenecks to developer workflow at the moment is bash scripting in the remote ec2 instance- I will add a set of scripts to automate this process from a template (given a desired set of inputs and outputs, automatically write the script to transfer data and otherwise set up the local environment.)
Have a blog page tied to this repository that tracks development process and provides a user friendly getting started manual.
In the future, it would be great to add additional features:
Create a module in neurocaas_contrib to read in a blueprint that you own. From this blueprint, retrieve statistics on:
Collect these statistics, and aggregate them across all analyses to add usage quantifications to the paper.
To allow developers to develop more functionality locally, I will create a docker image that has the latest version of the neurocaas_contrib package installed, and test functionality of this package with existing workflow.
Occasionally, NeuroCAAS will complain with a KeyError that there is no RootDeviceName when creating instances from existing AMIs. This is most likely due to the AMI still being registered when a new instance creation request is submitted- investigate this further.
See the following certificate from Shuonan running bardensr:
REQUEST START TIME: 2020-07-21 00:11:20.91 (GMT) [+0:00:00.00]
ANALYSIS VERSION ID: 7444a38627c660bd6e93d5ab0a45a73c563eddb4 [+0:00:00.12]
JOB ID: 07_20_test_4 [+0:00:00.17]
[Job Manager] Detected new job: starting up. [+0:00:00.25]
[Internal (init)] Initializing job manager. [+0:00:00.29]
[Internal (init)] Analysis request with dataset(s): debuggers/inputs/data_2.hdf5, config file debuggers/configs/config_2.yaml [+0:00:00.36]
[Job Manager] STEP 1/4 (Initialization): DONE [+0:00:00.41]
[Internal (get_costmonitoring)] Incurred cost so far: $0.6018133333333333. Remaining budget: $299.3981866666667 [+0:00:01.19]
[Job Manager] STEP 2/4 (Validation): DONE [+0:00:01.27]
[Internal (parse_config)] parameter duration not given, proceeding with standard compute launch. [+0:00:01.43]
[Internal (parse_config)] parameter dataset_size is not given, proceeding with standard storage. [+0:00:01.49]
[Job Manager] STEP 3/4 (Environment Setup): INTERNAL ERROR. Traceback (most recent call last):
File "/var/task/submit_start.py", line 717, in process_upload_dev
submission.compute_volumesize()
File "/var/task/submit_start.py", line 386, in compute_volumesize
default_size = utilsparamec2.get_volumesize(os.environ["AMI"])
File "/var/task/utilsparam/ec2.py", line 266, in get_volumesize
root = response["Images"][0]["RootDeviceName"]
KeyError: 'RootDeviceName'
[Job Manager] Shutting down job. [+0:00:01.78]
This is the main issue that collects together all of the individual todo points we would like to accomplish. This issue and the Developer Package + Paper Revision Milestone go together, and this issue should be treated as a long form (markdown enabled) description of that milestone.
I have grouped together Paper Revisions and the Dev Package because the Dev Package is a crucial step to increase the accessibility of the NeuroCAAS Development workflow, and I'd like to have it streamlined before drawing lots of attention to our project again.
I have broken down the work items we will address into several topics:
Developer Package 1: Dockerization: In the time between submitting the paper and now, it's become clear that Docker would make the developer process a lot smoother. In particular, we can go from developers spending most of their time configuring and saving an AWS instance that we host, to having them develop and test a docker image locally that is compatible with NeuroCAAS, and notifying us when it's ready to be deployed. This process has the following workflow:
With dockerized analyses, it's more feasible to address some more reviewer comments like:
Developer Package 2: Analysis Monitoring/Update: We currently have data about usage of each analysis, the number of users/ the number of active jobs, and total per-user usage that we are using to monitor costs and restrict usage as necessary. It would be good to process this information and make it available to developers through some simple interface. Likewise, as users develop their analyses, we want to make it easy for them to update. Most of the time this should be possible by simply updating the docker image their analysis lives in, and submitting a pull request again.
Paper Revisions 1: Developer Perspective: One comment we heard from both reviewers was that we did not focus on the developer's perspective. To this end:
Paper Revisions 2: Usage Metrics: We are now collecting usage metrics for NeuroCAAS via Google Analytics and through the records generated when users run analyses. These could be added to or replace the schematic currently included as Figure 3.
Paper Revisions 3: Grant Materials: Having put together material for grants gives up a lot of extra figures to work with.
Paper Revisions 4: Miscellaneous: We got a few comments on informal style and the balance of content to motivation. It doesn't seem like reviewers liked our 3 part exposition of contribution (which does read more like motivation to be fair). When addressing topics 1 and 2 here let's focus on decreasing motivation and increasing our own quantifications or concrete examples.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.