The amazon-neptune-samples's discuss from aws-samples

Noticed two Parameters Keys

Hey, I'm working on enabling streams in my Neptune template and checked here for an example of how to implement it. I noticed in the json that there are two parameters keys (lines 110 and 115), which seems like a mistake.

Reference:

amazon-neptune-samples/gremlin/neptune-streams/cloudformation-templates/services-neptune-stack.json

Lines 100 to 125 in a33f481

    
           "NeptuneDBClusterParameterGroup": { 
        
             "Type": "AWS::Neptune::DBClusterParameterGroup", 
        
             "Properties": { 
        
               "Family": "neptune1", 
        
               "Description": { 
        
                 "Fn::Sub": "${ApplicationID} DB cluster parameter group" 
        
               }, 
        
               "Name": { 
        
                 "Fn::Sub": "${ApplicationID}-cluster-parameter-group" 
        
               }, 
        
               "Parameters": { 
        
                 "neptune_enable_audit_log": { 
        
                   "Ref": "NeptuneEnableAuditLog" 
        
                 } 
        
               }, 
        
               "Parameters": { 
        
                 "neptune_lab_mode": "Streams=enabled" 
        
               }, 
        
               "Tags": [ 
        
                 { 
        
                   "Key": "Name", 
        
                   "Value": "Neptune DB cluster parameter group" 
        
                 } 
        
               ] 
        
             } 
        
           },

Add CDK example for setting up a Neptune cluster with a SageMaker Notebook workbench associated to it

General Information
A CDK example of how to set up Neptune with the Notebook workbench would be useful for anyone working with Amazon Neptune.

Proposed Solution
The CDK example would set up the following components to achieve this solution

VPC with subnet and security group
Neptune Cluster with cluster and db parameters
Notebook with an IAM role, custom lifecycle and network association

Acknowledge

I may be able to implement this feature request

Sparql Examples

Hi an eta on sparql examples ?

It is not clear how to make collaborative filtering example for many users

The collaborative filtering example is good, but for a real app, we would have to make millions of requests, spending a lof of time going from app to Neptune. How can we change this example for many users in a single query?

g.V().has('GamerAlias','skywalker123')

How can we make a query that get the topN for many users, not only for skywalker123?

Not able to do bulk load in aws neptune from aws s3

I am not able to do neptune bulk load from aws s3 bucket. Loading failed comes when I load data from s3 bucket to neptune.
Command:
awscurl -X POST --service neptune-db -H 'Content-Type: application/json' --region us-east-2 \

https://:8182/loader -d'
{
"source" : "s3:///Unsaved/2022/12/13/4a873928-9910-47b0-85ca-de593ace4f4a.csv",
"format" : "csv",
"iamRoleArn" : "arn:aws:iam::959061167427:role/NeptuneLoadFroms3",
"region" : "us-east-2",
"failOnError" : "FALSE"
}'
Output:
{
"status" : "200 OK",
"payload" : {
"loadId" : "37aeb194-677a-4cdf-a577-dae8684a6681"
}
}
Command: awscurl --service neptune-db 'https://:8182/loader/37aeb194-677a-4cdf-a577-dae8684a6681' --region us-east-2
output:
{
"status" : "200 OK",
"payload" : {
"feedCount" : [
{
"LOAD_COMPLETED" : 1
},
{
"LOAD_FAILED" : 1
}
],
"overallStatus" : {
"fullUri" : "s3://**/Unsaved/2022/12/13/4a873928-9910-47b0-85ca-de593ace4f4a.csv",
"runNumber" : 1,
"retryNumber" : 12,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 3,
"startTime" : 1672193300,
"totalRecords" : 1,
"totalDuplicates" : 0,
"parsingErrors" : 1,
"datatypeMismatchErrors" : 0,
"insertErrors" : 0
}
}
}

Can anyone help me with this ?

nodejs8.10 is no longer a supported runtime

Should be changed to nodejs10.x

Not able to get "status" : "200 OK" for Bulk Load API Call

I am not able to do Neptune bulk load from AWS S3 using curl command Bulk Load API Call ?
Command:
curl -X POST
-H 'Content-Type: application/json'
https://*.cluster-c4brigvg3m9m.us-east-2.neptune.amazonaws.com:8182/loader -d'
{
"source" : "s3:///Unsaved/2022/12/13/4a873928-9910-47b0-85ca-de593ace4f4a.csv",
"format" : "csv",
"iamRoleArn" : "arn:aws:iam::959061167427:role/NeptuneLoadFroms3",
"region" : "us-east-2",
"failOnError" : "FALSE",
}'

This is the error I am getting:
{"code":"AccessDeniedException","requestId":"f6243cd3-2a4f-48a2-9d91-13803c199ef1","detailedMessage":"Missing Authentication Token"}

Can you please help me why I am getting this error and How can I resolve it ?

Gremlin 3.5.4 gives error from Java code but not from gremlin console (Connecting to Neptune)

Hi team, I have followed the following link https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-java.html to create a sample application. This works fine. But when I add one traversal to add a vertex it fails saying the {"requestId":"xyz","code":"ConstraintViolationException","detailedMessage":"Vertex with id already exists: "} even though the id is not ingested. The traversal that I have added is -

g.inject(1).union(__.addV('label').property(T.id, 'uniqueId1').property('prop1','val1')).valueMap(true)

What is surprising the same query runs from the gremlin console and does not complain about vertex already ingested.

Is there something in the query that must be changed for it to run from the code ?

visualize-graph line 58

I believe that this line should be this:
var PROXY_API_URL = API_GATEWAY_ENDPOINT;

Otherwise when trying to do Step 6, it cannot find anything to replace.

Upgrade chatbot react-force component to handle the known issues with debug library

The react-force component used for the graph visualization has a dependency on the "debug" library which is known to have a low severity issue with the detail being here: https://github.com/aws-samples/amazon-neptune-samples/network/alert/gremlin/chatbot-full-stack-application/code/web-ui/neptune-chatbot/yarn.lock/debug/closed

This issue needs an upgrade/change of the component to remove the issue.

Default Neptune instance in Cfn template no longer supported

In the following CFn JSON template, the default instance size of "db.r4.xlarge" is no longer supported, and is causing the stack to fail. Updating this to "db.r5.xlarge" will allow this stack to be created successfully.

https://github.com/aws-samples/amazon-neptune-samples/blob/master/gremlin/glue-neptune/cloudformation-templates/databases-neptune-stack.json

Not able to load data using neptune bulk load

Submit api call get initiate successfully with status 200 ok.
But when able to check Monitor the Loading Process using this command then I am able to get this error.
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://*****:8182/loader/33aaf4f9-54b5-4f7b-8b39-b00cfb379397

Save results of query to variable for additional processing?

Is there a way to take the results from running a SPARQL query in a Neptune workbench notebook and save that to a variable which can be further processed?

Sample data not accessible?

Hi Brad, I'm trying to load the sample data in your Readme, but I'm getting an "AllAccessDisabled" error when trying to load the data from your S3 bucket:
https://s3.us-east-1.amazonaws.com/recommendation/vertex.txt

Has the data moved or are you able to check in the files to this github repo so we can use them?

Thanks

Gremlin Java Client Demo no longer works

Connections through nlb with iam-auth and enable-ssl no longer works

timeouts during handshake.

PySpark Code?

Hello Team,

Is there a way i can use PySpark to extract from Neptune database and write into S3?I know we can write into S3 but my problem is with PySpark. Assistance on this is muchappreciated.

Thanks

[amazon-neptune-and-aws-cdk-for-amundsen] In Customization

Hi Team,

We have already working AWS account where we don't have option to create new vpn and public subnet. How can we customize this git as per our environment .

Customization like :

1)- Use existing VPC and respective perquisites.
2)- How to do all config in public subnet.
3)- ECS cluster creation and using ECR for docker image etc.

Remove instructions for public read access to S3 buckets

We should remove public s3 read access instructions for S3 bucket.

--create Amazon S3 bucket with public read access
aws s3api create-bucket --bucket --acl public-read --region --create-bucket-configuration LocationConstraint=

--
This should be done using Cloudfront distribution with policy to restrict access to the CloudFrontOriginAccessIdentity

sdk reference in cloudFormation template not available anymore

Hello,

I was trying to plug a notebook to an existing neptune instance, following the Analyze Amazon Neptune Graphs using Amazon SageMaker Jupyter Notebooks tutorial. (the part "What if I want to reuse an existing Neptune cluster with SageMaker?")

The cloudFormation template is referencing an eclipse rdf4j sdk version that is not available anymore, resulting in the stack failing when trying to download it.

amazon-neptune-samples/neptune-sagemaker/cloudformation-templates/neptune-sagemaker/neptune-sagemaker-nested-stack.json

Line 254 in 356cfc8

    
           "  wget https://ftp.osuosl.org/pub/eclipse/rdf4j/eclipse-rdf4j-2.3.2-sdk.zip\n",

I used version 2.4.6 instead and everything went fine, but you may want to consider 2.5.x or 3.x versions.

Thanks for the great tutorial.

Cannot run the event loop while another loop is running

I had to build neptune_python_utils on python3.6.

Running this on SageMaker Jupyter.

neptune_endpoint = 'neptunecluster.cluster-cghdntee9kjh.us-east-1.neptune.amazonaws.com' neptune_port = 8182 neptune.clear(neptune_endpoint=neptune_endpoint, neptune_port=neptune_port)

clearing data...
clearing property graph data [edge_batch_size=200, edge_count=Unknown]...

RuntimeError Traceback (most recent call last)
in ()
----> 1 neptune.clear(neptune_endpoint=neptune_endpoint, neptune_port=neptune_port)

~/SageMaker/util/neptune.py in clear(self, neptune_endpoint, neptune_port, batch_size, edge_batch_size, vertex_batch_size)
60 def clear(self, neptune_endpoint=None, neptune_port=None, batch_size=200, edge_batch_size=None, vertex_batch_size=None):
61 print('clearing data...')
---> 62 self.clearGremlin(neptune_endpoint, neptune_port, batch_size, edge_batch_size, vertex_batch_size)
63 self.clearSparql(neptune_endpoint, neptune_port)
64 print('done')

~/SageMaker/util/neptune.py in clearGremlin(self, neptune_endpoint, neptune_port, batch_size, edge_batch_size, vertex_batch_size)
77 else:
78 print('clearing property graph data [edge_batch_size={}, edge_count={}]...'.format(edge_batch_size, edge_count))
---> 79 g.E().limit(edge_batch_size).drop().toList()
80 edge_count = g.E().count().next()
81 has_edges = (edge_count > 0)

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/process/traversal.py in toList(self)
56
57 def toList(self):
---> 58 return list(iter(self))
59
60 def toSet(self):

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/process/traversal.py in next(self)
46 def next(self):
47 if self.traversers is None:
---> 48 self.traversal_strategies.apply_strategies(self)
49 if self.last_traverser is None:
50 self.last_traverser = next(self.traversers)

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/process/traversal.py in apply_strategies(self, traversal)
571 def apply_strategies(self, traversal):
572 for traversal_strategy in self.traversal_strategies:
--> 573 traversal_strategy.apply(traversal)
574
575 def apply_async_strategies(self, traversal):

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/remote_connection.py in apply(self, traversal)
147 def apply(self, traversal):
148 if traversal.traversers is None:
--> 149 remote_traversal = self.remote_connection.submit(traversal.bytecode)
150 traversal.remote_results = remote_traversal
151 traversal.side_effects = remote_traversal.side_effects

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/driver_remote_connection.py in submit(self, bytecode)
53
54 def submit(self, bytecode):
---> 55 result_set = self._client.submit(bytecode)
56 results = result_set.all().result()
57 side_effects = RemoteTraversalSideEffects(result_set.request_id, self._client,

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/client.py in submit(self, message, bindings)
109
110 def submit(self, message, bindings=None):
--> 111 return self.submitAsync(message, bindings=bindings).result()
112
113 def submitAsync(self, message, bindings=None):

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/client.py in submitAsync(self, message, bindings)
125 message.args.update({'bindings': bindings})
126 conn = self._pool.get(True)
--> 127 return conn.write(message)

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/connection.py in write(self, request_message)
53 def write(self, request_message):
54 if not self._inited:
---> 55 self.connect()
56 request_id = str(uuid.uuid4())
57 result_set = resultset.ResultSet(queue.Queue(), request_id)

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/connection.py in connect(self)
43 self._transport.close()
44 self._transport = self._transport_factory()
---> 45 self._transport.connect(self._url, self._headers)
46 self._protocol.connection_made(self._transport)
47 self._inited = True

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/tornado/transport.py in connect(self, url, headers)
34 url = httpclient.HTTPRequest(url, headers=headers)
35 self._ws = self._loop.run_sync(
---> 36 lambda: websocket.websocket_connect(url))
37
38 def write(self, message):

~/anaconda3/envs/python3/lib/python3.6/site-packages/tornado/ioloop.py in run_sync(self, func, timeout)
569 self.stop()
570 timeout_handle = self.add_timeout(self.time() + timeout, timeout_callback)
--> 571 self.start()
572 if timeout is not None:
573 self.remove_timeout(timeout_handle)

~/anaconda3/envs/python3/lib/python3.6/site-packages/tornado/platform/asyncio.py in start(self)
130 self._setup_logging()
131 asyncio.set_event_loop(self.asyncio_loop)
--> 132 self.asyncio_loop.run_forever()
133 finally:
134 asyncio.set_event_loop(old_loop)

~/anaconda3/envs/python3/lib/python3.6/asyncio/base_events.py in run_forever(self)
410 if events._get_running_loop() is not None:
411 raise RuntimeError(
--> 412 'Cannot run the event loop while another loop is running')
413 self._set_coroutine_wrapper(self._debug)
414 self._thread_id = threading.get_ident()

RuntimeError: Cannot run the event loop while another loop is running

Gremlin connection issue - using ws:// causes socket hang up

The Gremlin connection string uses the ws:// prefix which caused a socket hang up error for me using Lambda + Node.JS 8.10 + Gremlin 3.2.9

amazon-neptune-samples/gremlin/visjs-neptune/indexLambda.js

Line 35 in 7210867

    
           dc = new DriverRemoteConnection('ws://'+process.env.NEPTUNE_CLUSTER_ENDPOINT+':'+process.env.NEPTUNE_PORT+'/gremlin');

It works if I use the wss:// prefix instead.

Blog Parser stack in Gremlin Chatbot Full Stack application fails

BlogParserResource lambda function fails with:
Received response status [FAILED] from custom resource. Message returned: 'Comment' object has no attribute 'findChildren' (RequestId: 131e0b8d-1fa7-49df-b82a-ec82ca4ea8a5)

Is this dataset still available and in which zones?

I'm testing the integration of our application with Amazon Neptune and I would like to load this dump onto my cluster.

My netpune is on eu-west-1.
NOTE: I'm not an AWS expert.

I've got an Ec2 instance from where I'm trying to load the dump to my cluster.
I gave this command:

curl -X POST \
    -H 'Content-Type: application/json' \
    http://mycluster.end.point:8182/loader -d '
    {
      "source" : "s3://neptune-data-ml/recommendation/",
      "accessKey" : "my arn",
      "secretKey" : "my secret key",
      "format" : "csv",
      "region" : "us-east-1", 
      "failOnError" : "FALSE"
    }'

It fails.

Is the dump available on eu-west-1 too?
Is the dump still available

Thank you in advance.

AWS Neptune - Submit Query with empty string crashes

Script with " " (empty space with space) works fine but "" empty string crashes.

gremlin> g.addV("Test").property("title", "Test node 1").property("a", "")
{"requestId":"111xxxx-xxx-xxx-xxx-xxx","code":"MalformedQueryException","detailedMessage":"Query parsing failed at line 1, character position at 62, error message : no viable alternative at input 'g.addV(\"Test\").property(\"title\",\"Test node 1\").property(\"a\",\"\"'"}
Type ':help' or ':h' for help.
Display stack trace? [yN]


gremlin> g.addV("Test").property("title", "Test node 1").property("a", " ")
==>v[98b22f0f-6be0-fb11-38cc-066bf7e17051]

This works fine with NEO4J Gremlin, so I doubt this is a Gremlin issue. Is this a Neptune bug or feature?

	"NeptuneDBClusterParameterGroup": {
	"Type": "AWS::Neptune::DBClusterParameterGroup",
	"Properties": {
	"Family": "neptune1",
	"Description": {
	"Fn::Sub": "${ApplicationID} DB cluster parameter group"
	},
	"Name": {
	"Fn::Sub": "${ApplicationID}-cluster-parameter-group"
	},
	"Parameters": {
	"neptune_enable_audit_log": {
	"Ref": "NeptuneEnableAuditLog"
	}
	},
	"Parameters": {
	"neptune_lab_mode": "Streams=enabled"
	},
	"Tags": [
	{
	"Key": "Name",
	"Value": "Neptune DB cluster parameter group"
	}
	]
	}
	},

aws-samples / amazon-neptune-samples Goto Github PK

amazon-neptune-samples's Issues

clearing data... clearing property graph data [edge_batch_size=200, edge_count=Unknown]...

Recommend Projects

Recommend Topics

Recommend Org

clearing data...
clearing property graph data [edge_batch_size=200, edge_count=Unknown]...