awslabs / realtime-fraud-detection-with-gnn-on-dgl Goto Github PK

An end-to-end blueprint architecture for real-time fraud detection(leveraging graph database Amazon Neptune) using Amazon SageMaker and Deep Graph Library (DGL) to construct a heterogeneous graph from tabular data and train a Graph Neural Network(GNN) model to detect fraudulent transactions in the IEEE-CIS dataset.

Home Page: https://awslabs.github.io/realtime-fraud-detection-with-gnn-on-dgl/

License: Apache License 2.0

JavaScript 4.44% Dockerfile 0.43% Python 20.98% Shell 0.62% HTML 0.47% TypeScript 65.70% Jupyter Notebook 5.45% CSS 1.90%

aws-cdk gnn dgl fraud-detection aws sagemaker neptune graph-database documentdb appsync

realtime-fraud-detection-with-gnn-on-dgl's People

Contributors

Stargazers

Watchers

realtime-fraud-detection-with-gnn-on-dgl's Issues

use separated path for processed data

Using unique processed data for each training pipeline, it will have better isolation.

Use Case

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

improve availability of glue etl job

creating multiple network connections of glue for multiple subnets will benefit below,

use multiple AZs for AZ fault toleration
use multiple AZs for better resource allocation

Use Case

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

(dashboard): auto renew token after it’s expired

We have to refresh the page after the token is expired in current implementation.

Use Case

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

custom domain support for cloudfront

Allow specifying custom domain for cloudfront.

Use Case

Must use custom domain to access cloudfront distribution in China regions.

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

enable vpc flow logs

as title

Use Case

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

glue etl job failed on No module named 'tornado'

Glue Job of training pipeline failed on below error,

No module named 'tornado'

Reproduction Steps

clean existing cdk.out if it exists
deploy solution into account
trigger the training pipeline in step functions

Error Log

found below error log of glue job execution,

2021-05-19 11:32:20,338 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(70)): Error from Python:Traceback (most recent call last):
  File "/tmp/glue-etl.py", line 12, in <module>
    from neptune_python_utils.gremlin_utils import GremlinUtils
  File "/tmp/neptune_python_utils.zip/neptune_python_utils/gremlin_utils.py", line 18, in <module>
    from neptune_python_utils.endpoints import Endpoints
  File "/tmp/neptune_python_utils.zip/neptune_python_utils/endpoints.py", line 18, in <module>
    from tornado.httputil import HTTPHeaders
ModuleNotFoundError: No module named 'tornado'

Environment

CDK CLI Version:
Framework Version:
Node.js Version:
OS :

Other

This is 🐛 Bug Report

improve security of endpoint

make the inference SageMaker endpoint private

Use Case

Proposed Solution

deploy endpoint inside vpc

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

DataIngestLayer size exceed 250MB

Failed to deploy due to DataIngestLayer size exceed 250MB

Reproduction Steps

Current latest code

Error Log

Embedded stack arn:aws:cloudformation:us-east-1:*******:stack/realtime-fraud-detection-with-gnn-on-dgl-trainingNestedStacktrainin
gNestedStackResourc-1JG6I47PDQEEC/1cffc6c0-b140-11eb-ae92-0eac49185f5f was not successfully created: The following resource(s) failed
to create: [ModelRepackageSG0F6A59A7, LoadPropsSGED21E180, LoadPropertiesECSTaskRole2089D972, CustomS3AutoDeleteObjectsCustomResourceP
roviderRole3B1BD092, AwsCliLayerF44AAF94, DataIngestLayer10CACF9D, ETLCompDataCrawlerRoleE08812C6, DataIngestFuncServiceRole170D0DE1,
CustomCDKBucketDeployment8693BB64968944B69AAFB0CC9EB8756CServiceRole89A01265, ETLCompGlueJobBucketEAA2FE1A, TarLayer1AD5AF62, TempFile
system02DFD7EB, FraudDetectionLogGroupE14295CC, ETLCompGlueJobArtifact30b5d8bfAwsCliLayer616651DC, ETLCompFraudDetectionSecConfKey781F
DC27, ModelTrainingPipelineRole6DA0C8EA, ETLCompGlueJobSG4513B7C4, ModelCodefrauddeAwsCliLayerDF316233, DataCatalogCrawlerServiceRoleC
6E16167, ParametersNormalizeFuncServiceRole3BB3084B, CreatemodelSagemakerRole842D47F8,

Environment

CDK CLI Version: 1.102.0
Framework Version:
Node.js Version: v14.4.0
OS :

Other

This is 🐛 Bug Report

Received server error (500) from primary and could not load the entire response body from endpoint

[ERROR] ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from primary and could not load the entire response body. See https://eu-west-3.console.aws.amazon.com/cloudwatch/home?region=eu-west-3#logEventViewer:group=/aws/sagemaker/Endpoints/pytorch-inference-2023-07-10-09-51-02-299 in account 086892845792 for more information. Traceback (most recent call last): File "/var/task/lambda_function.py", line 442, in lambda_handler pred_prob = invoke_endpoint_with_idx(endpointname = ENDPOINT_NAME, target_id = transaction_id, subgraph_dict = subgraph_dict, n_feats = transaction_embed_value_dict) File "/var/task/lambda_function.py", line 314, in invoke_endpoint_with_idx response = runtime.invoke_endpoint(EndpointName=endpointname, File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 508, in _api_call return self._make_api_call(operation_name, kwargs) File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 911, in _make_api_call raise error_class(parsed_response, operation_name) enter image description here please i got this error while running the following code https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl/tree/main/src/sagemaker.

The data path inside sagemaker notebook does not work

The bucket of processed data does not exist (src/sagemaker/FD_SL_Training_BYO_Codes.ipynb)

Reproduction Steps

aws s3 ls s3://fraud-detection-solution/processed_data

Error Log

An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist

Environment

CDK CLI Version: 1.75.0 (build 7708242)
Framework Version: not installed
Node.js Version: not installed
OS :

Other

This is 🐛 Bug Report

polish neptune configuration

minor version auto upgrade
backup retention period

Use Case

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

no module named ‘aiohttp’ in inference lambda

see below error in logs of inference lambda,

[ERROR] Runtime.ImportModuleError: Unable to import module 'inferenceApi': No module named 'aiohttp'
Traceback (most recent call last):

it’s a regression introduced by #120

Reproduction Steps

simulate the data in demo
find the errors in inference lambda

Error Log

Environment

CDK CLI Version:
Framework Version:
Node.js Version:
OS :

Other

This is 🐛 Bug Report

enable access log for all s3 buckets

as title

Use Case

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

sagemaker endpoint fail to deploy or time out server error(0) bug

Invoke Endpoint response time out.

Reproduction Steps

{
"trainingJob": {
"hyperparameters": {
"n-hidden": "2",
"n-epochs": "100",
"lr":"1e-2"
},
"instanceType": "ml.c5.9xlarge",
"timeoutInSeconds": 10800
}
}

Error Log

In Inference Lambda CloudWatch:

Task timed out after 120.10 seconds

In Sagemaker Training CloudWatch:

2021-04-09 04:53:46,902 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: model.mar

2021-04-09 04:53:49,837 [INFO ] main org.pytorch.serve.archive.ModelArchive - eTag 8ff2b3de4bed4fb1bc7fe969652117ff
2021-04-09 04:53:49,847 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model model loaded.
2021-04-09 04:53:49,865 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2021-04-09 04:53:49,930 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2021-04-09 04:53:49,930 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2021-04-09 04:53:49,931 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2021-04-09 04:53:49,957 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.ts.sock.9000
2021-04-09 04:53:49,959 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]55
2021-04-09 04:53:49,959 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started.
2021-04-09 04:53:49,959 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.6.13
2021-04-09 04:53:49,963 [INFO ] W-9000-model_1 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2021-04-09 04:53:49,972 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2021-04-09 04:53:50,017 [INFO ] pool-2-thread-1 TS_METRICS - CPUUtilization.Percent:33.3|#Level:Host|#hostname:model.aws.local,timestamp:1617944030
2021-04-09 04:53:50,017 [INFO ] pool-2-thread-1 TS_METRICS - DiskAvailable.Gigabytes:19.622234344482422|#Level:Host|#hostname:model.aws.local,timestamp:1617944030
2021-04-09 04:53:50,017 [INFO ] pool-2-thread-1 TS_METRICS - DiskUsage.Gigabytes:4.731609344482422|#Level:Host|#hostname:model.aws.local,timestamp:1617944030
2021-04-09 04:53:50,017 [INFO ] pool-2-thread-1 TS_METRICS - DiskUtilization.Percent:19.4|#Level:Host|#hostname:model.aws.local,timestamp:1617944030
2021-04-09 04:53:50,018 [INFO ] pool-2-thread-1 TS_METRICS - MemoryAvailable.Megabytes:30089.12109375|#Level:Host|#hostname:model.aws.local,timestamp:1617944030
2021-04-09 04:53:50,018 [INFO ] pool-2-thread-1 TS_METRICS - MemoryUsed.Megabytes:902.6953125|#Level:Host|#hostname:model.aws.local,timestamp:1617944030
2021-04-09 04:53:50,018 [INFO ] pool-2-thread-1 TS_METRICS - MemoryUtilization.Percent:4.1|#Level:Host|#hostname:model.aws.local,timestamp:1617944030
2021-04-09 04:53:51,250 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable. Valid options are: pytorch, mxnet, tensorflow (all lowercase)
2021-04-09 04:53:51,250 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - ------------------ Loading model -------------------
2021-04-09 04:53:51,250 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process died.
2021-04-09 04:53:51,250 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2021-04-09 04:53:51,250 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 176, in
2021-04-09 04:53:51,250 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     worker.run_server()
2021-04-09 04:53:51,250 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 148, in run_server
2021-04-09 04:53:51,250 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self.handle_connection(cl_socket)
2021-04-09 04:53:51,250 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 112, in handle_connection
2021-04-09 04:53:51,250 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     service, result, code = self.load_model(msg)
2021-04-09 04:53:51,250 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 85, in load_model
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     service = model_loader.load(model_name, model_dir, handler, gpu, batch_size)
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/ts/model_loader.py", line 117, in load
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     model_service.initialize(service.context)
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/model-server/tmp/models/8ff2b3de4bed4fb1bc7fe969652117ff/handler_service.py", line 51, in initialize
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     super().initialize(context)
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/default_handler_service.py", line 66, in initialize
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self._service.validate_and_initialize(model_dir=model_dir)
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 158, in validate_and_initialize
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self._model = self._model_fn(model_dir)
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/opt/ml/model/code/fd_sl_deployment_entry_point.py", line 149, in model_fn
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     rgcn_model.load_state_dict(stat_dict)
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self.class.name, " \t".join(error_msgs)))
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: Error(s) in loading state_dict for HeteroRGCN:
2021-04-09 04:53:51,251 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - #011size mismatch for layers.0.weight.DeviceInfo<>target.weight: copying a param with shape torch.Size([2, 390]) from checkpoint, the shape in current model is torch.Size([16, 390]).
2021-04-09 04:53:51,252 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - #011size mismatch for layers.0.weight.DeviceInfo<>target.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([16]).
2021-04-09 04:53:51,252 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - #011size mismatch for layers.0.weight.DeviceType<>target.weight: copying a param with shape torch.Size([2, 390]) from checkpoint, the shape in current model is torch.Size([16, 390]).
2021-04-09 04:53:51,252 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - #011size mismatch for layers.0.weight.DeviceType<>target.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([16]).
2021-04-09 04:53:51,252 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - #011size mismatch for layers.0.weight.P_emaildomain<>target.weight: copying a param with shape torch.Size([2, 390]) from checkpoint, the shape in current model is torch.Size([16, 390]).
2021-04-09 04:53:51,252 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - #011size mismatch for layers.0.weight.P_emaildomain<>target.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([16]).

Environment

CDK CLI Version:
Framework Version:
Node.js Version:
OS :

Other

Cause of this bug:

Backend worker process died.
Sagemaker Endpoint deployment code and model training code parameter conflict on n-hidden and hidden_size.

This is 🐛 Bug Report

polish lambda configuration

enable tracing for lambda func

Use Case

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

polish docdb configuration

use non-default port
set up backup retention period
publish audit logs of cluster to CloudWatch logs

Use Case

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

Having trouble to access files from S3

Unable to download dataset from S3 with for Sagemaker notebooks.
https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl/blob/main/src/sagemaker/01.FD_SL_Process_IEEE-CIS_Dataset.ipynb

Reproduction Steps

Access directly from browser
https://s3.console.aws.amazon.com/s3/buckets/aws-gcr-solutions-assets?region=us-east-1&prefix=open-dataset%2Fieee-fraud-detection%2F&bucketType=general

Or run the following

Error Log

fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden

Environment

CDK CLI Version: aws-cli/1.19.1 Python/3.9.2 Linux/5.4.72-microsoft-standard-WSL2 botocore/1.20.0
Framework Version: N/A
Node.js Version: N/A
OS : Linux/5.4.72-microsoft-standard-WSL2

Other

N/A

This is 🐛 Bug Report

(dashboard): typo/hardcode/ in recent fraud transaction list

The typo Transtion time and time value is hard code in the popup of recent fraud transaction list.

Reproduction Steps

Error Log

Environment

CDK CLI Version:
Framework Version:
Node.js Version:
OS :

Other

This is 🐛 Bug Report

Reporting a vulnerability

Hello!

I hope you are doing well!

We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.

Can you enable it, so that we can report it?

Thanks in advance!

PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository

Running the Step functions pipeline fails with "Essential Container Exited" error

Running the Step functions pipeline fails with "Essential Container Exited" error.

Reproduction Steps

Use the solution at https://awslabs.github.io/realtime-fraud-detection-with-gnn-on-dgl/en/
Deploy the stack
Run the Step function pipeline; After the model is built, the task "Load the graph data to Graph database" fails with error "Essential Container Exited"

Error Log

{
"resourceType": "ecs",
"resource": "runTask.sync",
"error": "States.TaskFailed",
"cause": "{"Attachments":[{"Details":[{"Name":"subnetId","Value":"subnet-00573893b6b2e85d6"},{"Name":"networkInterfaceId","Value":"eni-029ede5b74b0b2c74"},{"Name":"macAddress","Value":"0a:71:f2:44:c1:17"},{"Name":"privateDnsName","Value":"ip-10-0-236-81.ec2.internal"},{"Name":"privateIPv4Address","Value":"10.0.236.81"}],"Id":"f4fb56b2-63b6-4606-8853-0bef375ce9a2","Status":"DELETED","Type":"eni"}],"Attributes":[{"Name":"ecs.cpu-architecture","Value":"x86_64"}],"AvailabilityZone":"us-east-1b","ClusterArn":"arn:aws:ecs:us-east-1:690058322908:cluster/fraud-detection-on-dgl-trainingNestedStacktrainingNestedStackResourceAA446BCB-WPQE26XMSC6K-FraudDetectionClusterA78016CF-MePvRhDbSX8S","Connectivity":"CONNECTED","ConnectivityAt":1681198609255,"Containers":[{"ContainerArn":"arn:aws:ecs:us-east-1:690058322908:container/fraud-detection-on-dgl-trainingNestedStacktrainingNestedStackResourceAA446BCB-WPQE26XMSC6K-FraudDetectionClusterA78016CF-MePvRhDbSX8S/c797e127ba4f426daad6af5a4e5e6ed0/692a6f48-d798-4084-817e-9e46a6477b2c","Cpu":"0","ExitCode":1,"GpuIds":[],"Image":"366590864501.dkr.ecr.us-east-1.amazonaws.com/realtime-fraud-detection-with-gnn-on-dgl-rel:v2.0.4-9602326372374af5c2cab73a105108e2391a5d4dc8e4cd1a34153ee19d80cf88","ImageDigest":"sha256:323092d76949bcd83b391ac2b7c1ae8f509b695c45daf934afa8b026bd2b0261","LastStatus":"STOPPED","ManagedAgents":[],"Memory":"512","Name":"container","NetworkBindings":[],"NetworkInterfaces":[{"AttachmentId":"f4fb56b2-63b6-4606-8853-0bef375ce9a2","PrivateIpv4Address":"10.0.236.81"}],"RuntimeId":"c797e127ba4f426daad6af5a4e5e6ed0-2377118072","TaskArn":"arn:aws:ecs:us-east-1:690058322908:task/fraud-detection-on-dgl-trainingNestedStacktrainingNestedStackResourceAA446BCB-WPQE26XMSC6K-FraudDetectionClusterA78016CF-MePvRhDbSX8S/c797e127ba4f426daad6af5a4e5e6ed0"}],"Cpu":"256","CreatedAt":1681198605741,"DesiredStatus":"STOPPED","EnableExecuteCommand":false,"EphemeralStorage":{"SizeInGiB":20},"ExecutionStoppedAt":1681198952509,"Group":"family:training-pipeline-load-graph-data","InferenceAccelerators":[],"LastStatus":"STOPPED","LaunchType":"FARGATE","Memory":"1024","Overrides":{"ContainerOverrides":[{"Command":["--data_prefix","s3://fraud-detection-on-dgl-frauddetectiondatabucketd4-2taoab8m8z4f/fraud-detection/neptune/bulk-load","--temp_folder","/mnt/efs","--neptune_endpoint","fraud-detection-on-dgl-transactiongraphclustera4fb-hlrs3g9fn1sa.cluster-c6o7urnzrhpo.us-east-1.neptune.amazonaws.com","--neptune_port","8182","--region","us-east-1","--neptune_iam_role_arn","arn:aws:iam::690058322908:role/fraud-detection-on-dgl-NeptuneBulkLoadRole819075D5-OV4JTGU49H5T"],"Environment":[{"Name":"MODEL_PACKAGE","Value":"https://s3.us-east-1.amazonaws.com/fraud-detection-on-dgl-frauddetectiondatabucketd4-2taoab8m8z4f/fraud-detection/model_output/fraud-detection-model-1681196275158/output/model.tar.gz"},{"Name":"JOB_NAME","Value":"fraud-detection-model-1681196275158"},{"Name":"GRAPH_DATA_PATH","Value":"https://s3.us-east-1.amazonaws.com/fraud-detection-on-dgl-frauddetectiondatabucketd4-2taoab8m8z4f/fraud-detection/processed-data/jr_0a29416eb432e6dda1de0e59d7121d9f8c42b4faf2c192411f5566b62f08da61"}],"EnvironmentFiles":[],"Name":"container","ResourceRequirements":[]}],"InferenceAcceleratorOverrides":[]},"PlatformVersion":"1.4.0","PullStartedAt":1681198616969,"PullStoppedAt":1681198640143,"StartedAt":1681198648874,"StartedBy":"AWS Step Functions","StopCode":"EssentialContainerExited","StoppedAt":1681198975736,"StoppedReason":"Essential container in task exited","StoppingAt":1681198962812,"Tags":[],"TaskArn":"arn:aws:ecs:us-east-1:690058322908:task/fraud-detection-on-dgl-trainingNestedStacktrainingNestedStackResourceAA446BCB-WPQE26XMSC6K-FraudDetectionClusterA78016CF-MePvRhDbSX8S/c797e127ba4f426daad6af5a4e5e6ed0","TaskDefinitionArn":"arn:aws:ecs:us-east-1:690058322908:task-definition/training-pipeline-load-graph-data:1","Version":6}"
}

This is 🐛 Bug Report

(dashboard): dashboard does not work in firefox

Access the web in Firefox, the page goes blank.

Firefox: 78.7.0esr (64-bit)

Reproduction Steps

Error Log

Environment

CDK CLI Version:
Framework Version:
Node.js Version:
OS :

Other

This is 🐛 Bug Report

Submitting ’simulate data’ form responses 404

The server responses 404 after submitting ‘simulate data’ form.

It’s an issue only reproducible in deployment in AWS China regions.

Reproduction Steps

Error Log

Environment

CDK CLI Version:
Framework Version:
Node.js Version:
OS :

Other

This is 🐛 Bug Report

improve data etl

Currently the ETL is using pandas to pre-process the data in Glue job. Those code will have OOM problem to process the large dataset, let’s refactor the code to using Glue extensions and transforms to process the data in distributed system.

Use Case

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

failed to run glue ETL job due to exception ‘Failed to AssociateKmsKey for logGroup using KmsKeyId’

as title

Reproduction Steps

Run the training job state machine or directly running the ETL glue job ‘ETLComPreprocessingJob*’.

Error Log

Environment

CDK CLI Version:
Framework Version:
Node.js Version:
OS :

Other

This is 🐛 Bug Report

(dashboard): i18n support

Support multiple language in dashboard, use the corresponding language based on the locale of browser.

Add Chinese language for dashboard.

Use Case

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

optimize DGL to latest upstream version v0.6.x

Use latest DGL,

DGL 0.6.x
pytorch 1.16+

Use Case

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

enable log encryption of glue

as title

Use Case

s3 encrpytion
cloudwatch logs encrpytion
job bookmark encrpytion

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

failed to deploy training stack due to error ‘clouwatchs logs resource policy size was exceeded’

Deploying from scratch with error message,

clouwatchs logs resource policy size was exceeded

see https://docs.aws.amazon.com/step-functions/latest/dg/bp-cwl.html for more detail

Reproduction Steps

Error Log

Environment

CDK CLI Version:
Framework Version:
Node.js Version:
OS :

Other

This is 🐛 Bug Report

Embedded stack arn:aws:cloudformation:us-east-1:578053008623:stack/fraud-detection-on-dgl-dashboardNestedStackdashboardNestedStackResourceB80305B3-1QLN9Y8UK7DTK/bc188160-343f-11ed-96bf-12d3ddd822cb was not successfully created: The following resource(s) failed to create: [FraudDetectionDashboardAPID13F00C7].

Getting this error when running the stack in automatic deployment.

Reproduction Steps

Just ran the stack with the automatic process.

Error Log

Embedded stack arn:aws:cloudformation:us-east-1:578053008623:stack/fraud-detection-on-dgl-dashboardNestedStackdashboardNestedStackResourceB80305B3-1QLN9Y8UK7DTK/bc188160-343f-11ed-96bf-12d3ddd822cb was not successfully created: The following resource(s) failed to create: [FraudDetectionDashboardAPID13F00C7].-->

This is 🐛 Bug Report

How leverage the repo to do customization?

❓ General Issue

There are many steps in the StepFunction, I don't know what each step is doing there? Can you give some explanation on the workflow of each step?

❓ How to do customization?

If I want to perform a customization with the code, can you give me some guidance on how to get started? If you can provide a customization demo, that would be nicer.

polish httpapi configuration

enable access log

Use Case

Proposed Solution

Other

👋 I may be able to implement this feature request

This is a 🚀 Feature Request

awslabs / realtime-fraud-detection-with-gnn-on-dgl Goto Github PK

realtime-fraud-detection-with-gnn-on-dgl's People

Contributors

Stargazers

Watchers

Forkers

realtime-fraud-detection-with-gnn-on-dgl's Issues

Use Case

Proposed Solution

Other

Use Case

Proposed Solution

Other

Use Case

Proposed Solution

Other

Use Case

Proposed Solution

Other

Use Case

Proposed Solution

Other

Reproduction Steps

Error Log

Environment

Other

Use Case

Proposed Solution

Other

Reproduction Steps

Error Log

Environment

Other

Reproduction Steps

Error Log

Environment

Other

Use Case

Proposed Solution

Other

Reproduction Steps

Error Log

Environment

Other

Use Case

Proposed Solution

Other

Reproduction Steps

Error Log

2021-04-09 04:53:46,902 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: model.mar

Environment

Other

Use Case

Proposed Solution

Other

Use Case

Proposed Solution

Other

Reproduction Steps

Error Log

Environment

Other

Reproduction Steps

Error Log

Environment

Other

Reproduction Steps

Error Log

Reproduction Steps

Error Log

Environment

Other

Reproduction Steps

Error Log

Environment

Other

Use Case

Proposed Solution

Other

Reproduction Steps