aws-solutions-library-samples / guidance-for-training-an-aws-deepracer-model-using-amazon-sagemaker Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 715.0 332.29 MB

DeepRacer workshop content

License: MIT No Attribution

Jupyter Notebook 83.28% Python 16.58% Dockerfile 0.04% Shell 0.11%

guidance-for-training-an-aws-deepracer-model-using-amazon-sagemaker's People

Contributors

Stargazers

Watchers

Forkers

jonathankeebler partnercloudsupport nllarson klimtever eanewman fricanojohnb kargnas cristiano-andrade tarekbecker msalviejo kranthigy alanmars andynelson hzainuddin2002 deepak-muley thomasgzh openstack-ninja davur himagb polo149278 chie8842 ashwiniverma prateekkakirwar brettswift onema vanb lucmacpaquet adicostil fatatooine jihys maroon1st saranshlamba pankaj-lakhina jmyung sahilvv teos0009 irfanhsiddiqui bliii akranga dmessing neomaximus dangarfield clarson99 jeffsheets poncin elmic11111 mlcloudsec greendavegreen kyungjinbae berimboloenterprises mansjun kojikita paulmgithub shicks1066 ldmacdonald rfzeg maximosalviejo ghback okeee0 objectbuild mdloo 1luvc0d3 haystackers leafyq o-can rthille 52xiami jingxianlin salmariazi muthyamr chinhnguyen8855 kkochubey1 scottsmall richardshang prk2104 scbronder alexdeethegreat muhyun ronanlee abhatia05 tharindurajarathna gkpraveen1988 lkngin benluteijn leecardona jamiekang senayyakut helpful-bus aleksandrey jongwony chainheal86 pydemia seungheekoh deceneu3 namwoo mikecee totopiahan paulsoh jeffgukang jikkimi

guidance-for-training-an-aws-deepracer-model-using-amazon-sagemaker's Issues

Wifi light neither turns red nor blue

Exhausted troubleshooting the AWS Deepracer, the Wifi light neither turns red nor blue. Followed the instructions meticulously

Access Key for DeepRacer Workshop

I went to the DeepRacer workshop and got a temporary AWS account to play around with DeepRacer while at re:Invent. I'm trying to download the log files to follow along with the notebook example, but I don't seem to have permissions to generate an Access Key to configure the aws cli. What should I do?

I went here: https://console.aws.amazon.com/iam/home?region=us-east-1#/users/aws-deepracer-workshop?section=security_credentials and "Create access key" is disabled.

Need to add Championship Cup Warm-up track

How can I get hold of the .npy file for the Championship Cup Warm-up track?

I'm happy to open a pull request to add it to the tracks folder. I don't see it in there at the moment:

I searched the DeepRacer forum and the internet, and tried checking my S3 logs for the training and evaluation that I did on this new track, but I couldn't find any mention of its .npy file. 🤷‍♂

Lab 1 - Section 3 : outdated content

Link: https://github.com/aws-deepracer/aws-deepracer-workshops/tree/master/Workshops/2019-AWSSummits-AWSDeepRacerService/Lab1#section-3-model-training-and-improving-your-model

This section describes feature and content that are no longer accessible in the console.
The Japanese translation of the lab removed that section.

Can you please address or clarify?

Log downloads NoRegionError.

I am trying to download the logs with cw_utils but I keep getting NoRegionError. Does anyone know how I can I fix this?

stream_name = 'XXXX'
fname = 'logs/deepracer-%s.log' %stream_name
cw_utils.download_log(fname, stream_prefix=stream_name)

---------------------------------------------------------------------------
NoRegionError                             Traceback (most recent call last)
<ipython-input-48-da5f8f73ddad> in <module>
----> 1 cw_utils.download_log(fname, stream_prefix=stream_name)

~/projects/deepracer-models/aws-deepracer-workshops/log-analysis/cw_utils.py in download_log(fname, stream_name, stream_prefix, log_group, start_time, end_time)
     59             end_time=end_time
     60         )
---> 61         for event in logs:
     62             f.write(event['message'].rstrip())
     63             f.write("\n")

~/projects/deepracer-models/aws-deepracer-workshops/log-analysis/cw_utils.py in get_log_events(log_group, stream_name, stream_prefix, start_time, end_time)
     12 
     13 def get_log_events(log_group, stream_name=None, stream_prefix=None, start_time=None, end_time=None):
---> 14     client = boto3.client('logs')
     15     if stream_name is None and stream_prefix is None:
     16         print("both stream name and prefix can't be None")

/anaconda3/lib/python3.7/site-packages/boto3/__init__.py in client(*args, **kwargs)
     89     See :py:meth:`boto3.session.Session.client`.
     90     """
---> 91     return _get_default_session().client(*args, **kwargs)
     92 
     93 

/anaconda3/lib/python3.7/site-packages/boto3/session.py in client(self, service_name, region_name, api_version, use_ssl, verify, endpoint_url, aws_access_key_id, aws_secret_access_key, aws_session_token, config)
    261             aws_access_key_id=aws_access_key_id,
    262             aws_secret_access_key=aws_secret_access_key,
--> 263             aws_session_token=aws_session_token, config=config)
    264 
    265     def resource(self, service_name, region_name=None, api_version=None,

/anaconda3/lib/python3.7/site-packages/botocore/session.py in create_client(self, service_name, region_name, api_version, use_ssl, verify, endpoint_url, aws_access_key_id, aws_secret_access_key, aws_session_token, config)
    837             is_secure=use_ssl, endpoint_url=endpoint_url, verify=verify,
    838             credentials=credentials, scoped_config=self.get_scoped_config(),
--> 839             client_config=config, api_version=api_version)
    840         monitor = self._get_internal_component('monitor')
    841         if monitor is not None:

/anaconda3/lib/python3.7/site-packages/botocore/client.py in create_client(self, service_name, region_name, is_secure, endpoint_url, verify, credentials, scoped_config, api_version, client_config)
     84         client_args = self._get_client_args(
     85             service_model, region_name, is_secure, endpoint_url,
---> 86             verify, credentials, scoped_config, client_config, endpoint_bridge)
     87         service_client = cls(**client_args)
     88         self._register_retries(service_client)

/anaconda3/lib/python3.7/site-packages/botocore/client.py in _get_client_args(self, service_model, region_name, is_secure, endpoint_url, verify, credentials, scoped_config, client_config, endpoint_bridge)
    326         return args_creator.get_client_args(
    327             service_model, region_name, is_secure, endpoint_url,
--> 328             verify, credentials, scoped_config, client_config, endpoint_bridge)
    329 
    330     def _create_methods(self, service_model):

/anaconda3/lib/python3.7/site-packages/botocore/args.py in get_client_args(self, service_model, region_name, is_secure, endpoint_url, verify, credentials, scoped_config, client_config, endpoint_bridge)
     45         final_args = self.compute_client_args(
     46             service_model, client_config, endpoint_bridge, region_name,
---> 47             endpoint_url, is_secure, scoped_config)
     48 
     49         service_name = final_args['service_name']

/anaconda3/lib/python3.7/site-packages/botocore/args.py in compute_client_args(self, service_model, client_config, endpoint_bridge, region_name, endpoint_url, is_secure, scoped_config)
    115 
    116         endpoint_config = endpoint_bridge.resolve(
--> 117             service_name, region_name, endpoint_url, is_secure)
    118 
    119         # Override the user agent if specified in the client config.

/anaconda3/lib/python3.7/site-packages/botocore/client.py in resolve(self, service_name, region_name, endpoint_url, is_secure)
    400         region_name = self._check_default_region(service_name, region_name)
    401         resolved = self.endpoint_resolver.construct_endpoint(
--> 402             service_name, region_name)
    403         if resolved:
    404             return self._create_endpoint(

/anaconda3/lib/python3.7/site-packages/botocore/regions.py in construct_endpoint(self, service_name, region_name)
    120         for partition in self._endpoint_data['partitions']:
    121             result = self._endpoint_for_partition(
--> 122                 partition, service_name, region_name)
    123             if result:
    124                 return result

/anaconda3/lib/python3.7/site-packages/botocore/regions.py in _endpoint_for_partition(self, partition, service_name, region_name)
    133                 region_name = service_data['partitionEndpoint']
    134             else:
--> 135                 raise NoRegionError()
    136         # Attempt to resolve the exact region for this partition.
    137         if region_name in service_data['endpoints']:

NoRegionError: You must specify a region.

Wrong Community Slack link in the Readme

Hello,
The link in the readme should point at http://join.deepracing.io as the Slack is only available through an invitation link.
Thank you,
Tomasz Ptak
AWS ML Community

KeyError: 'outputLocation' when trying to get all the info

At the following codes within the DeepRacer log analysis, the job_desc from the json outcome of robomaker.escribe_simulation_job did not have the key "outputLocation" (KeyError: 'outputLocation'). Therefore, we could not locate the s3_bucket and s3_prefix for the simtrace log and retrieve the corresponding csv file. Below please find the codes as well as the screenshot of the error.

Get all the infro

job_desc = robomaker.describe_simulation_job(job=robomaker_job_arn)

is_training = job_desc['simulationApplications'][0]['launchConfig']['launchFile'] == "distributed_training.launch"
s3_bucket = job_desc['outputLocation']['s3Bucket']
s3_prefix = job_desc['outputLocation']['s3Prefix']
job_type = "training" if is_training else "evaluation"
simtrace_path = "iteration-data/{}/".format(job_type)

Downlaod all the simtrace iteration data

!aws s3 sync s3://{s3_bucket}/{s3_prefix}/{simtrace_path} ./tmp --exclude "" --include "-{job_type}-simtrace.csv"

Fix code scanning alert - Incomplete URL substring sanitization

Tracking issue for:

https://github.com/aws-deepracer/aws-deepracer-workshops/security/code-scanning/54

Broken link

"The latest workshop lab is run as part of AWS DeepRacer events conducted in 2022."

Workshop Lab Link in Readme does not work.

sagemaker role errors

I'm trying to run the Deepracer log analysis tool from https://github.com/aws-samples/aws-deepracer-workshops/blob/master/log-analysis/DeepRacer%20Log%20Analysis.ipynb on my local laptop. However I get below error while trying to run step [5] "Create an IAM role".

try:
    sagemaker_role = sagemaker.get_execution_role()
except:
    sagemaker_role = get_execution_role('sagemaker')

print("Using Sagemaker IAM role arn: \n{}".format(sagemaker_role))

Couldn't call 'get_role' to get Role ARN from role name arn:aws:iam::26********:root to get Role path.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-3bea8175b8c7> in <module>
      1 try:
----> 2     sagemaker_role = sagemaker.get_execution_role()
      3 except:

/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in get_execution_role(sagemaker_session)
   3302     )
-> 3303     raise ValueError(message.format(arn))
   3304 

ValueError: The current AWS identity is not a role: arn:aws:iam::26********:root, therefore it cannot be used as a SageMaker execution role

During handling of the above exception, another exception occurred:

NameError                                 Traceback (most recent call last)
<ipython-input-5-3bea8175b8c7> in <module>
      2     sagemaker_role = sagemaker.get_execution_role()
      3 except:
----> 4     sagemaker_role = get_execution_role('sagemaker')
      5 
      6 print("Using Sagemaker IAM role arn: \n{}".format(sagemaker_role))

NameError: name 'get_execution_role' is not defined

Does anybody know what needs to be done to execute above code without errors?

AWS Summit Raceway track npy file

Can you add the present AWS Summit Raceway track npy file ?

Unable to open notebook `DeepRacer Log Analysis.ipynb`

I am unable to load the notebook above locally. Would someone re-commit a cleaned-up version of it?

Need access to the console to practice

Just wanted to use the console but I don't have access.

Burner Account fails to run any simulations

I'm not sure where else to ask this now that the workshops are closed, but my burner account has yet to successfully run a training simulation (or, it at least doesn't provide any useful data/no video). Evaluations also fail after the "training run" completes, but also never register in the console or are able to be reviewed in any way.

Is it possible anyone has an extra they could send me? This seems to be a problem that only my account is experiencing, unfortunately. I'm aware that scale is still very low and timeouts are common, but my issue seems to prevent any useful functionality.

Thanks for any help.

How to train DeepRacer when account will be expired?

It was told on DeepRacer workshop that access to Robomaker simulation of the track as well ass Sagemaker notebook will be available to download to continue train DeepRacer before official preview will be opened. Where to get all this staff?

Track waypoint data documentation

I've been looking at the Championship Cup 2019 Track.

In this repo, there's a .np file of waypoints.

https://github.com/aws-samples/aws-deepracer-workshops/blob/master/log-analysis/tracks/ChampionshipCup2019_track.npy

The track there is approximately 1.066 wide. I'm assuming the units of that file are meters. (Since the overall width of the map is about 8, and the official width is 34 feet.)

However all the tracks on this documentation page are 24 inches wide, which is 0.6096m.

Even accounting for the ambiguous 3cm grey curb, that would still only get us to 0.7602m.

Where did these numpy files come from?

I haven't seen a physical track myself, but in all the videos they do not look like they're a whole meter wide.

I would love to see a README at /log-analysis/tracks which explains what a .np file is, which column is which, that the units are in meters, and where the data came from.

Model load failed Error

Hi,

I am currently trying to upload my simulated RL model, which was trained on the AWS DeepRacer Console, onto my deepracer. I noticed that there is no longer an upload button on the AWS DeepRacer console.

After looking through the AWS DeepRacer training course, I found that the tar.gz file of the model can be uploaded to the DeepRacer through a thumbdrive. From here, I uploaded the tar.gz file into a newly created models folder in my thumbdrive and connected it to my DeepRacer.

Once I log into the DeepRacer console through the IP Address, I am able to see that my model is an option that I can upload onto the DeepRacer. However, once I try to upload the model onto the DeepRacer, I run into the following issue:

"Model load failed! Please check the ROS logs"

From here, I go into logs and open the ROS Display Logs for the message. According to my ROS logs, it says "File Not Found!"

I do not understand why I am getting this error as I followed the steps and got everything before to work. My RL model simulated completely fine and is evaluated correctly and I created the right hierarchy on my thumbdrive by creating a folder called "models" and then passed the tar.gz file in.

Does anyone know why I am getting this error? How do I resolve this?

Thank you

Change track size

Analyze the reward distribution for your reward function

because Kumo Torakku has negative y values, I shamelessly took
RichardFan's modificationg for plot_track and refactored it to offer an x_shift and y_shift
They may not apply to other tracks. You will need to change it in the future. Simply add parameters:
track_size=(700,1000), y_shift=300

track = la.plot_track(df, l_center_line, l_inner_border, l_outer_border)

plt.title("Reward distribution for all actions ")
im = plt.imshow(track, cmap='hot', interpolation='bilinear', origin="lower")

I found problem when unquote the track_size line.
It pop up an error

  File "<ipython-input-36-3595092372fb>", line 4
    track_size=(700,1000), y_shift=300
              ^
SyntaxError: can't assign to literal

Hopefully someone can help to fix it. Thanks!

Kumo Torakku training track

Hi,

Please could you bundle in the .npy for the new Kumo Torakku training track, and also the leaderboard evaluation track so we can analyse our training?

Thanks!
Lyndon

Sample reward_function method documentation wrong

In the documentation of the reward_function methods that are given as sample, the documentation explains the parameters to the function. Waypoints are explained as follows:

@waypoints (ordered list) :: list of waypoint in order; each waypoint is a set of coordinates
(x,y,yaw) that define a turning point

This is actually not true, because the simulator only provides x and y coordinates for the waypoints, not the yaw.

NameError: name 'plt' is not defined

When running jupyter notebook 'DeepRacer Log Analysis.ipynb' I get below error in a Windows environment (Python 3.8.1).

NameError                                 Traceback (most recent call last)
<ipython-input-2-fd8b8819ef67> in <module>
      1 # Plot the results
----> 2 fig, ax = plt.subplots(figsize=(20,10))
      3 plot_points(ax, waypoints[:-1,0:2])
      4 plot_points(ax, waypoints[:-1,2:4])
      5 plot_points(ax, waypoints[:-1,4:6])

NameError: name 'plt' is not defined

See full screen print at https://www.screencast.com/t/kvOtMEJf

The same works fine on MacOS (Python 3.7.3).
Has anybody experienced the same problem and found a work around?

The simulator might has a bug, my simulator is aways starts from (0,0) coor.

Hi there,
hope you are doing well. I have a horrible week using AWS DeepRacer to train my model. What my issue is that the simulator starts always at the (0,0) coordinate and not on the track I supposed to train. See the images below

and

After about 30 minutes, the car was still seeing only the grasses around. Is there something wrong with the simulator???

If there's a bug, please stop this gimmick, because we have to pay money for using the AWS!!!

Thanks,
Bill

Error : UnknownServiceError

Boto3 : 1.15.143

Code at below, Boto3 raises UNKNOWNSERVICE Error

envroot = os.getcwd()
aws_data_path = set(os.environ.get('AWS_DATA_PATH', '').split(os.pathsep))
aws_data_path.add(os.path.join(envroot, 'models'))
os.environ.update({'AWS_DATA_PATH': os.pathsep.join(aws_data_path)})

region = "us-east-1"
dr_client = boto3.client('deepracer', region_name=region,
endpoint_url="https://deepracer-prod.{}.amazonaws.com".format(region))
models = dr_client.list_models(ModelType="REINFORCEMENT_LEARNING",MaxResults=100)["Models"]
for model in models:
if model["ModelName"]==model_name:
break

This is Error Logs.

UnknownServiceError Traceback (most recent call last)
in
6 region = "us-east-1"
7 dr_client = boto3.client('deepracer', region_name=region,
----> 8 endpoint_url="https://deepracer-prod.{}.amazonaws.com".format(region))
9 models = dr_client.list_models(ModelType="REINFORCEMENT_LEARNING",MaxResults=100)["Models"]
10 for model in models:

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/boto3/init.py in client(*args, **kwargs)
89 See :py:meth:boto3.session.Session.client.
90 """
---> 91 return _get_default_session().client(*args, **kwargs)
92
93

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/boto3/session.py in client(self, service_name, region_name, api_version, use_ssl, verify, endpoint_url, aws_access_key_id, aws_secret_access_key, aws_session_token, config)
261 aws_access_key_id=aws_access_key_id,
262 aws_secret_access_key=aws_secret_access_key,
--> 263 aws_session_token=aws_session_token, config=config)
264
265 def resource(self, service_name, region_name=None, api_version=None,

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/session.py in create_client(self, service_name, region_name, api_version, use_ssl, verify, endpoint_url, aws_access_key_id, aws_secret_access_key, aws_session_token, config)
836 is_secure=use_ssl, endpoint_url=endpoint_url, verify=verify,
837 credentials=credentials, scoped_config=self.get_scoped_config(),
--> 838 client_config=config, api_version=api_version)
839 monitor = self._get_internal_component('monitor')
840 if monitor is not None:

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in create_client(self, service_name, region_name, is_secure, endpoint_url, verify, credentials, scoped_config, api_version, client_config)
78 'choose-service-name', service_name=service_name)
79 service_name = first_non_none_response(responses, default=service_name)
---> 80 service_model = self._load_service_model(service_name, api_version)
81 cls = self._create_client_class(service_name, service_model)
82 endpoint_bridge = ClientEndpointBridge(

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _load_service_model(self, service_name, api_version)
119 def _load_service_model(self, service_name, api_version=None):
120 json_model = self._loader.load_service_model(service_name, 'service-2',
--> 121 api_version=api_version)
122 service_model = ServiceModel(json_model, service_name=service_name)
123 return service_model

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/loaders.py in _wrapper(self, *args, **kwargs)
130 if key in self._cache:
131 return self._cache[key]
--> 132 data = func(self, *args, **kwargs)
133 self._cache[key] = data
134 return data

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/loaders.py in load_service_model(self, service_name, type_name, api_version)
376 raise UnknownServiceError(
377 service_name=service_name,
--> 378 known_service_names=', '.join(sorted(known_services)))
379 if api_version is None:
380 api_version = self.determine_latest_version(

UnknownServiceError: Unknown service: 'deepracer'. Valid service names are: accessanalyzer, acm, acm-pca, alexaforbusiness, amplify, apigateway, apigatewaymanagementapi, apigatewayv2, appconfig, appflow, application-autoscaling, application-insights, appmesh, appstream, appsync, athena, autoscaling, autoscaling-plans, backup, batch, braket, budgets, ce, chime, cloud9, clouddirectory, cloudformation, cloudfront, cloudhsm, cloudhsmv2, cloudsearch, cloudsearchdomain, cloudtrail, cloudwatch, codeartifact, codebuild, codecommit, codedeploy, codeguru-reviewer, codeguruprofiler, codepipeline, codestar, codestar-connections, codestar-notifications, cognito-identity, cognito-idp, cognito-sync, comprehend, comprehendmedical, compute-optimizer, config, connect, connectparticipant, cur, dataexchange, datapipeline, datasync, dax, detective, devicefarm, directconnect, discovery, dlm, dms, docdb, ds, dynamodb, dynamodbstreams, ebs, ec2, ec2-instance-connect, ecr, ecs, efs, eks, elastic-inference, elasticache, elasticbeanstalk, elastictranscoder, elb, elbv2, emr, es, events, firehose, fms, forecast, forecastquery, frauddetector, fsx, gamelift, glacier, globalaccelerator, glue, greengrass, groundstation, guardduty, health, honeycode, iam, identitystore, imagebuilder, importexport, inspector, iot, iot-data, iot-jobs-data, iot1click-devices, iot1click-projects, iotanalytics, iotevents, iotevents-data, iotsecuretunneling, iotsitewise, iotthingsgraph, ivs, kafka, kendra, kinesis, kinesis-video-archived-media, kinesis-video-media, kinesis-video-signaling, kinesisanalytics, kinesisanalyticsv2, kinesisvideo, kms, lakeformation, lambda, lex-models, lex-runtime, license-manager, lightsail, logs, machinelearning, macie, macie2, managedblockchain, marketplace-catalog, marketplace-entitlement, marketplacecommerceanalytics, mediaconnect, mediaconvert, medialive, mediapackage, mediapackage-vod, mediastore, mediastore-data, mediatailor, meteringmarketplace, mgh, migrationhub-config, mobile, mq, mturk, neptune, networkmanager, opsworks, opsworkscm, organizations, outposts, personalize, personalize-events, personalize-runtime, pi, pinpoint, pinpoint-email, pinpoint-sms-voice, polly, pricing, qldb, qldb-session, quicksight, ram, rds, rds-data, redshift, redshift-data, rekognition, resource-groups, resourcegroupstaggingapi, robomaker, route53, route53domains, route53resolver, s3, s3control, s3outposts, sagemaker, sagemaker-a2i-runtime, sagemaker-runtime, savingsplans, schemas, sdb, secretsmanager, securityhub, serverlessrepo, service-quotas, servicecatalog, servicediscovery, ses, sesv2, shield, signer, sms, sms-voice, snowball, sns, sqs, ssm, sso, sso-admin, sso-oidc, stepfunctions, storagegateway, sts, support, swf, synthetics, textract, timestream-query, timestream-write, transcribe, transfer, translate, waf, waf-regional, wafv2, workdocs, worklink, workmail, workmailmessageflow, workspaces, xray