Code Monkey home page Code Monkey logo

Comments (20)

Lsquared13 avatar Lsquared13 commented on June 14, 2024 1

Merged in #29

from terraform-aws-quorum-cluster.

ExcChua avatar ExcChua commented on June 14, 2024

Just suggesting something here. Instead of fixed IP, the nodes could be using dynamic DNS, so that nodes register their DNS name with a dynamic DNS server, so even if they have different IPs, they could still be contacted by their DNS name. Would that help?

from terraform-aws-quorum-cluster.

Lsquared13 avatar Lsquared13 commented on June 14, 2024

Thanks for the input! Our original issue with that is that geth can't handle enode addresses involving DNS names. However, we talked about it today and @john-osullivan thinks it will be reasonable to extend geth to handle DNS bootnodes.

Dynamic DNS server might be something to think about, it sounds like it could be cheaper. John and I were just thinking of throwing an ELB in front of each bootnode. I'm generally in favor of pushing as much load onto AWS as possible, even if it is throwing money at the problem. With that in mind, if Dynamic DNS necessarily means maintaining our own DNS server I might prefer just going with the ELB solution.

from terraform-aws-quorum-cluster.

john-osullivan avatar john-osullivan commented on June 14, 2024

For the sake of documentation, here's a thorough write-up of what I've learned about the problem and how to tackle it based on my conversations with Louis & Juan yesterday. I'm still cleaning up on the plaintext passwords thing, but I think I've got a handle on this.

There seems to be one key unresolved question here: do running nodes need an updated list of bootnodes? Juan thought they could discover all new peers from each other, so the list only needs to be valid at start time, but Louis thought they always need a list of active bootnodes. The latter requirement makes the problem a good bit harder, as we need to signal active nodes that they need to update the bootnodes in their supervisor config (see bottom). Assuming the list only needs to be accurate at the start, here are a couple of approaches for ensuring that the list is updated each time a new bootnode is added:

  • Rebuilding an enode address in the init-quorum recovery case using the new IP, then saving it to Vault/GitHub for other nodes later on
  • Maintain a fixed DNS address for each bootnode, ensure that they end up behind the right one, and have each node's geth resolve those addresses.

The former approach would mostly involve modifying that recovery case of the init-quorum script, although we would need the script to get write auth for the right Vault/GitHub endpoints. The latter approach means the addresses stored in Vault/GitHub should never have to change -- this option seems more promising.

The required geth modifications look pretty straightforward. Those enode URLs (enode://[hexUsername]@[IP address]:[TCP port]?discport=[UDP port]) strictly use IP addresses, but like Louis said, it should just be a couple lines in the CLI parser to resolve DNS addresses down to IPs.

That reduces the problem to making sure that we recover the node to the same DNS address each time. There are a few ways to do that, sounds like an ELB would be one easy way. I still haven't done much research on that side of it, not sure what the proper solution is. I heard something about Auto-Scaling Groups, that sounds like fun.

Appendix re: Live Updates
Louis and I explored how we could tell live nodes that they need to update -- one interesting solution is creating a FRESH_BOOTNODES ethereum event, then have these nodes run a Python script which listens for it (like we have for block metrics). Receiving this event would trigger a failover process that fetches new bootnode addresses and pauses geth (like here) while we rewrite the supervisor config. If we did want to do that, we'd want to consider building that event straight into the governance -- we don't want random people saying there are new bootnodes, we could ensure that only one of our validator nodes is allowed to emit the event.

from terraform-aws-quorum-cluster.

ExcChua avatar ExcChua commented on June 14, 2024

How many bootnodes are there?

  • Is there a mechanism to
    • tell other bootnodes a particular bootnode is going down?
    • tell all bootnodes a particular quorum node is going down?

Asking this because my software upgrade mechanism plans to bring down nodes in other to upgrade software and if a particular node is down, bootnodes shouldn't tell other nodes the downed node is available.

from terraform-aws-quorum-cluster.

john-osullivan avatar john-osullivan commented on June 14, 2024

I believe there are 3 bootnodes for each of the 14 supported regions, so 42.

As far as I know, we don't have an easy way to detect when a node is going down. The failure event might be sudden, so we might not be able to call some graceful exit procedure.

Louis' advice to me was focus on when nodes are being turned on, and then detecting whether they're new nodes or replacement ones. We've already got a running process to hook into on the bootup, so it saves us some headaches of determining whether a network participant is really down or not.

EDIT: @Lsquared13 & @eximchain137 (Juan?) can comment more on this, but I believe bootnodes just advertise the list of peers which they're currently connected to. If you kill a node and it stops being connected anywhere, that might automatically solve the advertising problem.

from terraform-aws-quorum-cluster.

Lsquared13 avatar Lsquared13 commented on June 14, 2024

@EximChua - You don't need to worry about bootnodes pointing to dead instances, they may have a best-effort shutdown, but in general, nodes can die without telling the bootnode. The technical basis for this was a public network, so it had to handle such things gracefully.

from terraform-aws-quorum-cluster.

ExcChua avatar ExcChua commented on June 14, 2024

@Lsquared13 Thanks for the clarification, Louis!

from terraform-aws-quorum-cluster.

john-osullivan avatar john-osullivan commented on June 14, 2024

Update based on Yesterday's Conversation

The solution is starting to get clearer here. We need to replace each of our 42 bootnode instance with a Load Balancer + AutoScaling Group. The ASG will let us say, "Make sure there's always an instance here", without having to worry about actually replacing it ourselves. The LB will give us a static IP address which is attached to the ASG, so all of the failover work happens automagically. One happy consequence of this strategy is that we don't need to break the enode protocol, as we'll have a static IP dedicated to the bootnode (or really to the current instance acting as a bootnode).

There are 3 types of load balancer (application, network, & classic), we want to use the faster, lower-level network balancer which gives us the static IP. Terraform's documentation for the network load balancer and the autoscaling group are pretty good, still getting acquainted with how we specify everything.

One wrinkle is that the IP is now known by the the LB, rather than by the instance. The LB needs to somehow tell the instance what address it's sitting behind. One good thing is that the bootnode doesn't need to know that IP value until it wants to start advertising its enode address, so we might be able to turn on the bootnode and have it fetch that value before actually initializing geth.

One way to potentially make this design a little cheaper is to use one LB per region which has at least three availability zones, then tie each AZ to a bootnode. Technically, the network LB gives you one static IP per each AZ. If we wanted to save on LB's, we could do some fancy work that only initializes a second one if there aren't three AZs in the given region. That said, it does introduce some complication to the process, having to track which ASG is tied to which AZ and preserving all of those hookups within one LB -- we should quantify the cost of this LB+ASG per each bootnode strategy and see how much we'd really save by reducing our LB count.

  • This commit covers how Louis converted non-bootnode instances to use ASGs.
  • This line shows how to tell the terraform module which ASG it needs to connect to.

All that aside, I'm still cleaning up my big update PR -- just wanted to document our conversation from yesterday in someplace better than a Sublime note

from terraform-aws-quorum-cluster.

john-osullivan avatar john-osullivan commented on June 14, 2024

That PR got merged in on Tuesday, resolving this issue is now my top priority. Working on it in the lb-asg-bootnodes branch of my fork.

from terraform-aws-quorum-cluster.

ExcChua avatar ExcChua commented on June 14, 2024

John, if the IP is that of the LB and not the instance, my upgrader would have trouble upgrading the software. Would you be able to provide the IP, or DNS name for the instance?

from terraform-aws-quorum-cluster.

john-osullivan avatar john-osullivan commented on June 14, 2024

from terraform-aws-quorum-cluster.

ExcChua avatar ExcChua commented on June 14, 2024

The upgrader uses either IP address or DNS to locate & update the software on the machines.

It uses SSH to connect to the target machines, and then *nix Shell Copy (scp) to transfer files.

If there's a LB behind 2 or more nodes, then the upgrader can only update the node the LB is redirecting to. The other nodes wouldn't be updated until the LB redirects to them.

from terraform-aws-quorum-cluster.

Lsquared13 avatar Lsquared13 commented on June 14, 2024

I'm not actually convinced there's a problem here. Is there a reason we can't use the LB DNS/IP within the network for making connections and the direct DNS privately for doing updates?

from terraform-aws-quorum-cluster.

Lsquared13 avatar Lsquared13 commented on June 14, 2024

Also @john-osullivan can we just have terraform + user-data fill the Load Balancer IP or DNS into a data file? I don't think that would force a circular dependency...

from terraform-aws-quorum-cluster.

john-osullivan avatar john-osullivan commented on June 14, 2024

@Lsquared13 Yup, I'm getting the load balancer's DNS by writing it to a data file. The DNS is available as an attribute of aws_lb, but the IP isn't -- that's why I'm writing an additional script to resolve the DNS to an IP in init-bootnode.sh.

Also, @EximChua , we might be alright here because we're designing the system such that each load balancer gets its own node. Every bootnode will get one load balancer which points at one autoscaling group, and the autoscaling group has size one. We aren't actually trying to balance load across many machines, just want to ensure that we have a static IP which will always be pointing to some machine.

If you get the guarantee that each LB DNS only points to one machine, does that solve your problem? Note that the specific machine might change over time as dead instances are replaced, so we definitely need that verification code which checks whether a machine has gotten an upgrade.

from terraform-aws-quorum-cluster.

Lsquared13 avatar Lsquared13 commented on June 14, 2024

I'm totally okay with guaranteeing that we have at most one machine per load balancer.

@john-osullivan I'm wondering if maybe there's an AWS CLI call you can make that will get the IPs for a load balancer. We can grant permission to call it to the IAM role the instances use.

We can also try a workaround like this issue suggests.

And regarding the update mechanism, I expect this to reduce to the general problem of making sure replaced instances run the right versions. We definitely do need that, but I also think that will cover us

from terraform-aws-quorum-cluster.

ExcChua avatar ExcChua commented on June 14, 2024

@john-osullivan If there is a guarantee that each LB DNS points to one machine (and there are no other machines that needs to be updated), then there's no problem.

from terraform-aws-quorum-cluster.

john-osullivan avatar john-osullivan commented on June 14, 2024

Based on some further research that happened yesterday, I'm now swapping out the LB in this solution for an elastic IP address.

It turns out that none of the load balancer options support UDP, which is required for communication between nodes. That's a hard blocker, and this six year old issue has a direct response from an Amazon rep saying that ELB does not support UDP. It seems like there's probably a technical reason under the hood, rather than just time constraints, as people have left comments requesting it as recently as October 2017 to no avail.

Did some research, and the happy outcome is that using elastic IPs ends up being a cleaner solution. We don't have to spin up as many resources, and the security group rules don't need to be duplicated as LB listeners. Here's the rundown:

  1. Terraform creates a number of elastic IPs equal to the number of bootnode ASGs required in every region
  2. Each bootnode ASG gets its own user_data script which includes the public IP and its allocation ID.
  3. When the ASG spins up a node and runs init-bootnode.sh, the following line (taken from this StackOverflow question) will connect the new node to its EIP. The --allow-reassociation option ensures that when a new node gets spun up later and runs the same command, it is allowed to claim the EIP.

aws ec2 associate-address --instance-id $INSTANCE_ID --allocation-id $EIP_ID --allow-reassociation

One constraint on this solution is that by default, AWS only lets you reserve 5 elastic IPs per region. This doesn't get in our way, as we only want 3 bootnodes in each region, but if somebody configures a network with >5 bootnodes in a region, they'll run into issues and have to directly request more from Amazon.

To remedy this issue, I'm making the elastic IP functionality toggled by a boolean variable which defaults to false. If endusers don't use EIPs, then they'll need to figure out their own strategies for updating bootnode addresses, but that's acceptable. I'll make sure to describe the behavior in some documentation somewhere.

This was covered on the morning call, just want to document the strategy for future reference.

from terraform-aws-quorum-cluster.

john-osullivan avatar john-osullivan commented on June 14, 2024

This issue is being wrapped up now over in #29

from terraform-aws-quorum-cluster.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.