aws-samples / amazon-neptune-samples Goto Github PK
View Code? Open in Web Editor NEWSamples and documentation for using the Amazon Neptune graph database service
License: MIT No Attribution
Samples and documentation for using the Amazon Neptune graph database service
License: MIT No Attribution
Hey, I'm working on enabling streams in my Neptune template and checked here for an example of how to implement it. I noticed in the json that there are two parameters keys (lines 110 and 115), which seems like a mistake.
Reference:
General Information
A CDK example of how to set up Neptune with the Notebook workbench would be useful for anyone working with Amazon Neptune.
Proposed Solution
The CDK example would set up the following components to achieve this solution
Acknowledge
Hi an eta on sparql examples ?
The collaborative filtering example is good, but for a real app, we would have to make millions of requests, spending a lof of time going from app to Neptune. How can we change this example for many users in a single query?
g.V().has('GamerAlias','skywalker123')
How can we make a query that get the topN for many users, not only for skywalker123?
I am not able to do neptune bulk load from aws s3 bucket. Loading failed comes when I load data from s3 bucket to neptune.
Command:
awscurl -X POST --service neptune-db -H 'Content-Type: application/json' --region us-east-2 \
https://:8182/loader -d'
{
"source" : "s3:///Unsaved/2022/12/13/4a873928-9910-47b0-85ca-de593ace4f4a.csv",
"format" : "csv",
"iamRoleArn" : "arn:aws:iam::959061167427:role/NeptuneLoadFroms3",
"region" : "us-east-2",
"failOnError" : "FALSE"
}'
Output:
{
"status" : "200 OK",
"payload" : {
"loadId" : "37aeb194-677a-4cdf-a577-dae8684a6681"
}
}
Command: awscurl --service neptune-db 'https://:8182/loader/37aeb194-677a-4cdf-a577-dae8684a6681' --region us-east-2
output:
{
"status" : "200 OK",
"payload" : {
"feedCount" : [
{
"LOAD_COMPLETED" : 1
},
{
"LOAD_FAILED" : 1
}
],
"overallStatus" : {
"fullUri" : "s3://**/Unsaved/2022/12/13/4a873928-9910-47b0-85ca-de593ace4f4a.csv",
"runNumber" : 1,
"retryNumber" : 12,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 3,
"startTime" : 1672193300,
"totalRecords" : 1,
"totalDuplicates" : 0,
"parsingErrors" : 1,
"datatypeMismatchErrors" : 0,
"insertErrors" : 0
}
}
}
Can anyone help me with this ?
Should be changed to nodejs10.x
I am not able to do Neptune bulk load from AWS S3 using curl command Bulk Load API Call ?
Command:
curl -X POST
-H 'Content-Type: application/json'
https://*.cluster-c4brigvg3m9m.us-east-2.neptune.amazonaws.com:8182/loader -d'
{
"source" : "s3:///Unsaved/2022/12/13/4a873928-9910-47b0-85ca-de593ace4f4a.csv",
"format" : "csv",
"iamRoleArn" : "arn:aws:iam::959061167427:role/NeptuneLoadFroms3",
"region" : "us-east-2",
"failOnError" : "FALSE",
}'
This is the error I am getting:
{"code":"AccessDeniedException","requestId":"f6243cd3-2a4f-48a2-9d91-13803c199ef1","detailedMessage":"Missing Authentication Token"}
Can you please help me why I am getting this error and How can I resolve it ?
Hi team, I have followed the following link https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-java.html to create a sample application. This works fine. But when I add one traversal to add a vertex it fails saying the {"requestId":"xyz","code":"ConstraintViolationException","detailedMessage":"Vertex with id already exists: "}
even though the id is not ingested. The traversal that I have added is -
g.inject(1).union(__.addV('label').property(T.id, 'uniqueId1').property('prop1','val1')).valueMap(true)
What is surprising the same query runs from the gremlin console and does not complain about vertex already ingested.
Is there something in the query that must be changed for it to run from the code ?
I believe that this line should be this:
var PROXY_API_URL = API_GATEWAY_ENDPOINT;
Otherwise when trying to do Step 6, it cannot find anything to replace.
The react-force component used for the graph visualization has a dependency on the "debug" library which is known to have a low severity issue with the detail being here: https://github.com/aws-samples/amazon-neptune-samples/network/alert/gremlin/chatbot-full-stack-application/code/web-ui/neptune-chatbot/yarn.lock/debug/closed
This issue needs an upgrade/change of the component to remove the issue.
In the following CFn JSON template, the default instance size of "db.r4.xlarge" is no longer supported, and is causing the stack to fail. Updating this to "db.r5.xlarge" will allow this stack to be created successfully.
Submit api call get initiate successfully with status 200 ok.
But when able to check Monitor the Loading Process using this command then I am able to get this error.
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://*****:8182/loader/33aaf4f9-54b5-4f7b-8b39-b00cfb379397
Is there a way to take the results from running a SPARQL query in a Neptune workbench notebook and save that to a variable which can be further processed?
Hi Brad, I'm trying to load the sample data in your Readme, but I'm getting an "AllAccessDisabled" error when trying to load the data from your S3 bucket:
https://s3.us-east-1.amazonaws.com/recommendation/vertex.txt
Has the data moved or are you able to check in the files to this github repo so we can use them?
Thanks
Connections through nlb with iam-auth and enable-ssl no longer works
timeouts during handshake.
Hello Team,
Is there a way i can use PySpark to extract from Neptune database and write into S3?I know we can write into S3 but my problem is with PySpark. Assistance on this is muchappreciated.
Thanks
Hi Team,
We have already working AWS account where we don't have option to create new vpn and public subnet. How can we customize this git as per our environment .
Customization like :
1)- Use existing VPC and respective perquisites.
2)- How to do all config in public subnet.
3)- ECS cluster creation and using ECR for docker image etc.
We should remove public s3 read access instructions for S3 bucket.
--create Amazon S3 bucket with public read access
aws s3api create-bucket --bucket --acl public-read --region --create-bucket-configuration LocationConstraint=
--
This should be done using Cloudfront distribution with policy to restrict access to the CloudFrontOriginAccessIdentity
Hello,
I was trying to plug a notebook to an existing neptune instance, following the Analyze Amazon Neptune Graphs using Amazon SageMaker Jupyter Notebooks tutorial. (the part "What if I want to reuse an existing Neptune cluster with SageMaker?")
The cloudFormation template is referencing an eclipse rdf4j sdk version that is not available anymore, resulting in the stack failing when trying to download it.
I used version 2.4.6 instead and everything went fine, but you may want to consider 2.5.x or 3.x versions.
Thanks for the great tutorial.
I had to build neptune_python_utils on python3.6.
Running this on SageMaker Jupyter.
neptune_endpoint = 'neptunecluster.cluster-cghdntee9kjh.us-east-1.neptune.amazonaws.com' neptune_port = 8182 neptune.clear(neptune_endpoint=neptune_endpoint, neptune_port=neptune_port)
RuntimeError Traceback (most recent call last)
in ()
----> 1 neptune.clear(neptune_endpoint=neptune_endpoint, neptune_port=neptune_port)
~/SageMaker/util/neptune.py in clear(self, neptune_endpoint, neptune_port, batch_size, edge_batch_size, vertex_batch_size)
60 def clear(self, neptune_endpoint=None, neptune_port=None, batch_size=200, edge_batch_size=None, vertex_batch_size=None):
61 print('clearing data...')
---> 62 self.clearGremlin(neptune_endpoint, neptune_port, batch_size, edge_batch_size, vertex_batch_size)
63 self.clearSparql(neptune_endpoint, neptune_port)
64 print('done')
~/SageMaker/util/neptune.py in clearGremlin(self, neptune_endpoint, neptune_port, batch_size, edge_batch_size, vertex_batch_size)
77 else:
78 print('clearing property graph data [edge_batch_size={}, edge_count={}]...'.format(edge_batch_size, edge_count))
---> 79 g.E().limit(edge_batch_size).drop().toList()
80 edge_count = g.E().count().next()
81 has_edges = (edge_count > 0)
~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/process/traversal.py in toList(self)
56
57 def toList(self):
---> 58 return list(iter(self))
59
60 def toSet(self):
~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/process/traversal.py in next(self)
46 def next(self):
47 if self.traversers is None:
---> 48 self.traversal_strategies.apply_strategies(self)
49 if self.last_traverser is None:
50 self.last_traverser = next(self.traversers)
~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/process/traversal.py in apply_strategies(self, traversal)
571 def apply_strategies(self, traversal):
572 for traversal_strategy in self.traversal_strategies:
--> 573 traversal_strategy.apply(traversal)
574
575 def apply_async_strategies(self, traversal):
~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/remote_connection.py in apply(self, traversal)
147 def apply(self, traversal):
148 if traversal.traversers is None:
--> 149 remote_traversal = self.remote_connection.submit(traversal.bytecode)
150 traversal.remote_results = remote_traversal
151 traversal.side_effects = remote_traversal.side_effects
~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/driver_remote_connection.py in submit(self, bytecode)
53
54 def submit(self, bytecode):
---> 55 result_set = self._client.submit(bytecode)
56 results = result_set.all().result()
57 side_effects = RemoteTraversalSideEffects(result_set.request_id, self._client,
~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/client.py in submit(self, message, bindings)
109
110 def submit(self, message, bindings=None):
--> 111 return self.submitAsync(message, bindings=bindings).result()
112
113 def submitAsync(self, message, bindings=None):
~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/client.py in submitAsync(self, message, bindings)
125 message.args.update({'bindings': bindings})
126 conn = self._pool.get(True)
--> 127 return conn.write(message)
~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/connection.py in write(self, request_message)
53 def write(self, request_message):
54 if not self._inited:
---> 55 self.connect()
56 request_id = str(uuid.uuid4())
57 result_set = resultset.ResultSet(queue.Queue(), request_id)
~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/connection.py in connect(self)
43 self._transport.close()
44 self._transport = self._transport_factory()
---> 45 self._transport.connect(self._url, self._headers)
46 self._protocol.connection_made(self._transport)
47 self._inited = True
~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/tornado/transport.py in connect(self, url, headers)
34 url = httpclient.HTTPRequest(url, headers=headers)
35 self._ws = self._loop.run_sync(
---> 36 lambda: websocket.websocket_connect(url))
37
38 def write(self, message):
~/anaconda3/envs/python3/lib/python3.6/site-packages/tornado/ioloop.py in run_sync(self, func, timeout)
569 self.stop()
570 timeout_handle = self.add_timeout(self.time() + timeout, timeout_callback)
--> 571 self.start()
572 if timeout is not None:
573 self.remove_timeout(timeout_handle)
~/anaconda3/envs/python3/lib/python3.6/site-packages/tornado/platform/asyncio.py in start(self)
130 self._setup_logging()
131 asyncio.set_event_loop(self.asyncio_loop)
--> 132 self.asyncio_loop.run_forever()
133 finally:
134 asyncio.set_event_loop(old_loop)
~/anaconda3/envs/python3/lib/python3.6/asyncio/base_events.py in run_forever(self)
410 if events._get_running_loop() is not None:
411 raise RuntimeError(
--> 412 'Cannot run the event loop while another loop is running')
413 self._set_coroutine_wrapper(self._debug)
414 self._thread_id = threading.get_ident()
RuntimeError: Cannot run the event loop while another loop is running
The Gremlin connection string uses the ws://
prefix which caused a socket hang up
error for me using Lambda + Node.JS 8.10 + Gremlin 3.2.9
It works if I use the wss://
prefix instead.
BlogParserResource lambda function fails with:
Received response status [FAILED] from custom resource. Message returned: 'Comment' object has no attribute 'findChildren' (RequestId: 131e0b8d-1fa7-49df-b82a-ec82ca4ea8a5)
I'm testing the integration of our application with Amazon Neptune and I would like to load this dump onto my cluster.
My netpune is on eu-west-1.
NOTE: I'm not an AWS expert.
I've got an Ec2 instance from where I'm trying to load the dump to my cluster.
I gave this command:
curl -X POST \
-H 'Content-Type: application/json' \
http://mycluster.end.point:8182/loader -d '
{
"source" : "s3://neptune-data-ml/recommendation/",
"accessKey" : "my arn",
"secretKey" : "my secret key",
"format" : "csv",
"region" : "us-east-1",
"failOnError" : "FALSE"
}'
It fails.
Is the dump available on eu-west-1 too?
Is the dump still available
Thank you in advance.
Script with " "
(empty space with space) works fine but ""
empty string crashes.
gremlin> g.addV("Test").property("title", "Test node 1").property("a", "")
{"requestId":"111xxxx-xxx-xxx-xxx-xxx","code":"MalformedQueryException","detailedMessage":"Query parsing failed at line 1, character position at 62, error message : no viable alternative at input 'g.addV(\"Test\").property(\"title\",\"Test node 1\").property(\"a\",\"\"'"}
Type ':help' or ':h' for help.
Display stack trace? [yN]
gremlin> g.addV("Test").property("title", "Test node 1").property("a", " ")
==>v[98b22f0f-6be0-fb11-38cc-066bf7e17051]
This works fine with NEO4J Gremlin, so I doubt this is a Gremlin issue. Is this a Neptune bug or feature?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.