Code Monkey home page Code Monkey logo

Comments (4)

woolfel avatar woolfel commented on June 24, 2024

A bit more feedback,
https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-gsg-upload-data.html
On this page you say "Create a file called bulk_movies.json. Copy and paste the following content into it, and add a trailing newline: "
It would be good to mention the error you'll get if the end of the file doesn't have a line break.

It would also be good to have these examples in a github repo somewhere. This is a common practice for many open source projects. It also makes it easier for people to fork and submit bug fixes to the example when features change.

from amazon-opensearch-service-developer-guide.

aetter avatar aetter commented on June 24, 2024

Hi @woolfel, thanks again for the feedback. To run through things point by point:

  1. When to sign requests

Sure, easy addition. In this case, it's just when you're using IAM users or roles in your domain access policy. I'll add a sentence to the top of Signing HTTP Requests.

  1. GitHub repo with sample code

I'll give this issue some thought. With the import statements at the top of the code sample and the link to the signing interceptor, it should be pretty easy for people to figure out which libraries I'm using, but I get that having a pom.xml ready to go is nice. With Python, Ruby, and Node, it's just "run a couple commands and you're ready to go," so I think the repo is less helpful here.

  1. curl commands from Cloud9

Can you clarify the situation and/or how Cloud9 helped in this instance? Were you testing a VPC domain that had no access policy attached and using Cloud9 to tunnel in, a la Testing VPC Domains?

  1. Kibana

Can you clarify the request here? The top of that section includes the Kibana URL format, so I assume I'm missing something.

  1. Best practices

This is a tough one. Popular AWS-based use cases are listed here, and if you're doing your own indexing, the only real options are POST index (if ID doesn't matter), PUT index (if ID matters), and bulk (better performance, more work to queue things up on the client side). Side note: do you mean SQS rather than SNS? I'm not aware of anyone using either one for indexing purposes, but I'd have to do some digging.

  1. Java example

Sure, I don't know about "sexy," but Python feels like an appropriate choice for a short script that relays a request from API Gateway to Amazon ES and back. Level of effort for adding a Java-based example shouldn't be too bad, so I'll look into it. I'll reply back here when I have something, hopefully within a week or two.

  1. Bulk file

I think the error you get from Elasticsearch is pretty self-explanatory here ("The bulk request must be terminated by a newline [\n]"), and I have a bit about adding a newline in the documentation. If I add a code sample directory to this repository (number 2 in this list), I'll add sample data, as well.

from amazon-opensearch-service-developer-guide.

woolfel avatar woolfel commented on June 24, 2024

Curl commands from cloud9 environment clarification. I have a VPC setup and I have things locked down. The ES domain is in the same VPC. Given the VPC is locked down, I can't open a terminal and use curl to debug requests to my ES domain. If someone has this particular setup, which would be most enterprise developers, an easy way to debug the issue is to start up a Cloud9 environment in that same VPC. Then issue curl commands to figure out what is going on and what errors are returned. It's much faster than trying to debug a lambda function by looking at cloudwatch logs. Does that explain the use case?

When a person creates a new ES domain with the console, it automatically creates the kibana console/url. Humans being human make mistakes. If some accidentally copies the kibana URL and pastes it into their lambda configuration, it will probably result in errors. A succinct explanation of how the differences between the endpoint and kibana URL would help someone trying to debug from console. Something like "if you're having odd connection/host errors, double check your url and make sure you aren't using the kibana URL by accident. You can tell it's kibana by looking to see if the url has this format"

In the case of bulk file, if a person is using Java + Lambda, more likely than not the error will be swallowed and the developer won't see it. Rather than suffer the pain that is "debugging lambda functions", I used my cloud9 environment. Basically I created a test batch file and used curl to submit the request. In my case, the ES domain is protected and my API gateway doesn't expose it. My lambda function queues up the documents and issues the batch. It was missing the end line break. What makes it tricky to debug is that when you view the logs in cloudwatch, it's tough to see if there's really a final line break. This is due to how the console formats the logs. In contrast, manually testing it from cloud9 with curl, it was immediately clear the file was missing end line break.

On a more general note. When I write docs for apache and other open source projects, I always ask myself this question "will someone new to this know what I mean and am I assuming they know because everything is obvious to an experienced user?" It's one reason why so many open source docs are bad. A lot of my colleagues forget that other people might not have the same level of experience and what's obvious to committers usually isn't to the users.

from amazon-opensearch-service-developer-guide.

aetter avatar aetter commented on June 24, 2024

Hi @woolfel, I've added a ZIP file to this repository and am going to close this issue out.

  1. Added sentence.
  2. Added sample ZIP here.
  3. I think that the SSH tunnel method here is the better solution, but I also noted at the bottom that you can use Cloud9 for the same purpose.
  4. I'm happy with the level of detail here.
  5. Chalking this up as a longer term goal. It's completely valid, and I'll keep it in mind as I add content in the future.
  6. I investigated adding a Java-based Lambda function, and even if I'd been able to figure it out within an acceptable span of time, I wouldn't recommend it. The lack of a Cloud9 editor in the console means that you're stuck in this loop of editing locally, repackaging, uploading, etc. Unless I'm missing something, the Node, Ruby, and Python experiences are just nicer in Lambda.
  7. I'm happy with the level of detail here, as well.

Thanks a bunch for all the feedback!

from amazon-opensearch-service-developer-guide.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.