Code Monkey home page Code Monkey logo

cloudping.co's Introduction

CloudPing

Records inter-region latency over a TCP connection between all AWS regions.

About this Project

Over time, as I've worked on global AWS deployments, I have often been faced with the question of which inter-region transactions will be faced with the most latency. I have been able to find a lot of static examples of previous testing completed, or anecdotal thoughts based on a region's location. I haven't been able to find any kind of dynamic, consistently updated, latency monitoring. The goal here is to provide a single source of truth for inter-region AWS region latency.

Architecture

architecture

Parts of Application

Front End

The front end of CloudPing is running in a Fargate container as a Python Flask web server. The web server pulls its data from DynamoDB and uses that data to populate the data in the table.

Region-to-Region Pings

Each active AWS region has a Lambda function that runs every 6 hours. This function does a ping of the public DynamoDB endpoint (dynamodb.<region>.amazonaws.com) and stores the RTT for the ping in to a DynamoDB table.

Averages and Percentile Calculations

Every 6 hours, after the completion of the region-to-region pings, the data is taken from the raw results DynamoDB table. The data is then used to calculate daily, weekly, monthly, and annual averages and percentiles between all of the active regions. This data is then stored in a summary DynamoDB table which is used to provide data to the front-end.

DynamoDB Tables

  • PingTest - this table stores the raw data for round trip ping times to and from each region. The data in this table goes back to CloudPing's launch in 2017.
  • cloudping_regions - this table lists the AWS regions which are enabled in CloudPing and drives the regions that are shown on the front end.
  • cloudping_stored_avgs - this table contains the summarized averages and percentiles and is the table used to populate the front-end data.

Deployment Instructions

The Lambda functions are deployed with AWS Chalice. The front-end web site is deployed as a Docker image, stored in ECR, and served by a Fargate service which exists behind an ALB.

TODO

  • API access - both to raw data stored in DyanmoDB, as well as to specific queries used primarily by the web front-end.
  • Graph showing latency over time for user selected parameters (between regions, specific timeframes, etc.)
  • GovCloud and China regions (if anyone is able to help make this happen, please reach out!)

Additional Notes

This project is in no way associated with Amazon or AWS. If you wish to report any issues with the project, please use the "Issues" feature within GitHub.

cloudping.co's People

Contributors

mda590 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cloudping.co's Issues

grouping by continent

I would love to see a grouping mechanism of some sort. For example, I would really like to be able to hit a "US" button and only see the latency in the us-based regions. It would make it a lot easier to digest.

Awesome tool, btw. Thanks!

CERT_DATE_INVALID for www.cloudping.co

Certificate for *.cloudping.co expired on 13 Oct 2022

NET::ERR_CERT_DATE_INVALID
Subject: *.cloudping.co

Issuer: Amazon

Expires on: 13 Oct 2022

Current date: 27 Oct 2022

Transit Gateway Peering

I would suggest #13 is updated to make use of Transit Gateway peering in a full mesh configuration, and use AWS backbone/internal routing to provide inter-region latency. This is most likely what large-scale deployments in AWS would make use of and is the more modern way to measure inter-region latency (as opposed to using public endpoints and the public internet).

How to contribute and use

Cool application but there is no clear documentation on contribution guidelines and how to run front end application. interested in using this app and probable contribution.

Can we get some distribution data ?

In My experience Averages are small part of the problem. Latency distribution or at least 99% percentile would be very helpful. Also information about packet lost is great as it really impacts distributed systems

Creating a lookup table

Hey there,

Some background...
A while ago I created a lookup table manually using the data from the cloudping website.

The lookup table is used to determine which region is nearest to a particular region.

The table looks like this:

 ap-southeast-2:
    ap-southeast-2: '0'
    us-west-2: '1'
    us-west-1: '2'
    ap-southeast-1: '3'
    us-east-1: '4'
    us-east-2: '5'
    eu-west-1: '6'
    eu-west-2: '7'
    eu-central-1: '8'
ap-southeast-1:
    ap-southeast-1: '0'
    us-west-2: '1'
    eu-central-1: '2'
    eu-west-2: '3'
    us-west-1: '4'
    eu-west-1: '5'
    ap-southeast-2: '6'
    us-east-1: '7'
    us-east-2: '8'
eu-central-1:
    eu-central-1: '0'
    eu-west-2: '1'
    eu-west-1: '2'
    us-east-1: '3'
    us-east-2: '4'
    us-west-1: '5'
    us-west-2: '6'
    ap-southeast-2: '7'
    ap-southeast-1: '8'
    # and so on... and so forth...

and is used, in python for example, like this:

priority = table[source_region][destination_region]

I started out with only a few regions, but now I need to do many more. This is pretty tedious to do by hand.

I was about to build something that would scrape the website and calculate this data instead, however, I thought I'd bring this forward as an issue to see if:

  1. There is an easier way to do this that is closer to the data, and doesn't require html scraping
  2. This would be useful for anyone else

Thoughts?

Dark Mode

Would a maintainer mind adapting their GitHub README.md to support dark mode?

If a browser like Firefox is in dark mode, the default DOM background switches to a dark color (e.g., black), and because the architectural images have a transparent background with black text and figures (e.g., arrows), the images are not easy to read and view. I attached an image illustrating what I mean.

cloudping_dark-mode_github

How to run this myself in AWS?

I tried to clone the repo and built in my AWS.
I divided for three part: fronted, ping_from_region, and scheduled_functions

I had stocked in the ping_from_region.
I build a venv(../venv) which with Python 3.10.13. I tried the simple one[1] to build a "hello world" web.
It worked. But when I use "chailce deploy" in the ./ping_from_region. It showed ”Resources deployed:“ and nothing after that.

How should I do to fix this problem?

Ref:
[1]https://github.com/aws/chalice

Inter-AZs

Hi Matt, thanks for the table.
I was wondering if it'd be possible to publish one for inter-AZs as in many OLTP active-passive solutions organizations would use AZs vs regions.
Thanks!

O Canada

Wouldn't it make more sense to move Canada near the US regions? At present it is between Asia and Europe, which is a bit odd, unless you go across the Northwest passage...?

Public endpoints or VPC peering?

Are the latency tests performed using DynamoDB public endpoints? So the packets are going across public internet? Or do you have VPC peering set up between regions?

Can you test buffebloat?

hi,

bufferbloat or variance of latency:
ECN capability;

thanks, here in Brazil, with only SP some friends from Recife pay a minimum latency of 60ms to reach AWS in SP, i dont know if in this case, following the "submarina cable map" elps in determining a closer datacenter. But is another sugestion the use of tese maps to estimate latency.

Show City Name

Hi

It would be great to add the name of city to the page. Thanks.

Different time from my ping statistics

The statistics shown in the table on the page of https://www.cloudping.co/ is not the same with the statistics from icmp ping command by myself. And I found the statistics shown in the page seemly always bigger than the statistics got by myself.

Did anyone else find the same issue?

API access

Would the API access to the raw data in database be available anytime soon?

Show max and 95%

I don't believe AVGs, as they hide problems and - often they don't even represent a vast majority of values. I'm not saying we should ignore them, but I would really appreciate to see max and 95th percentile.

Website not working

Hi!

The cloudping website seems to be down- I'm getting:

This site can’t be reached

www.cloudping.co’s server IP address could not be found.

I've tried this on multiple browsers and also confirmed that others are experiencing the same issue. Are there plans to fix this?

Thanks!

Add GovCloud to Cloudping.co

Matt, sometime in the future can you add an entry for the GovCloud? It would be very useful for some of us. thanks.

Directionality not indicated in chart

ping results for us-east-1 -> ca-central-1 seem to be different than ca-central-1 -> us-east-1 in the chart, however the chart gives no indication of directionality.

It might be better to average the directions so that the boxes are equal however the user looks up values (more typical on charts these types), although that's more work than setting labels.

List Certain Regions

It would be great to be able to select a subset of regions to display for easier-ability glancing.

For example, if I'm running an app in the US and South America, it's a lot of extra clutter to view the other 9 regions in Asia and Europe.

Slower than I measured

Hi there,
This is a great project, thank you!
I have been measuring latency between some of the regions using EC2 instances, somehow, you number is higher than mine (see below), any idea why?

us-east-2 <--> ca-central-1, mine: 24 ms, yours 42 ms.
us-east-2 <--> eu-central-1, mine: 96 ms, yours 115 ms.
similarly for us-east-1, mine is 20 ms smaller than yours.
I get similar numbers for the past month.

[feature request] test my latency

The cross-referencing is very helpful when estimating cross-datacentre and infrastructure latencies, or latencies for customers in different geographies - but it would be very helpful to also be able to check my latency too!

(Like http://www.cloudping.info/, both under one roof?)

us-east-1 => us-east-1 latency

Hi,

I noted that the latency for us-east-1 to itself is more than 40 ms whereas the latency of other regions to itself is less than 10 ms .

Do you know why we have such a difference ?

Regards,

Etienne

reorder regions to align with proximity? inter/intra az info?

i'm a fan, i've looked at github and it not clear how to deploy this distro, would be great if we could experiment ourselves. :^)

some ui comments.

would be great if the geo regions were a little better aligned. maybe put ca-central-1 with the us-east/west regions?

africa and middle east in between eu and ap?

south america somewhere between us eu and ap. or variable.

there are clear blocks of low latency, it would be nice if those were highlighted somehow.

would be super cool if there a way to incorporate AZ's into view.

Ping: Latency vs Round-Trip

Ping (ICMP) usually is measured on a host, by the time each request packet takes to get to the destination and replied by the destination back to the host (round-trip). Inside the host, the ping application calculates therefore the time the packet took to go and come back.

Latency is usually defined by the time it takes for a packet to be received by a host when coming from a server (latency is therefore usually aprox. half the time a ping takes).
Latency is usually important for realtime traffic such as video and online gaming, as it is important that the packets are received quickly, and not that relevant if packets sent have much delay or not.
Nasa says: "Data latency is the total time elapsed between when data are acquired by a sensor and when these data are made available to the public.", this might include the presentation layer (show the public), thus, in its wholistic form, it can be the sum of the times: acquisition + receive + present data. We don't see any "send request" in these times. Some might argue that the the send request may be part of this equation, because the "public" will send a request for data, for it to be received, but if this is a video stream, you only request once, and then you have a steady stream of receives, so we might ignore those times (even though they might occur). Still If we are actually, strictly interested in understanding the latency in this equation, how much time the "public waits for this sensor data" we don't include any send-requests times.

Thus, generally speaking, in its standard definition, we can't accurately measure latency with pings, since we get the round-trip time in ms, request + reply , the latency is specifically only the reply time. When dividing this ping time by 2 we might be trying to obtain the reply time, but its not accurate.

It is common in many portals, and graphic representations to inter-mix these definitions, as being the same, but they are not, the difference its subtle.

But It is very important to understand that applications that really measure latency (voip, audio, video, database, streaming, p2p gaming), will in fact show up aprox. half the ping round-trip times.

Therefore the header on the page should change to Round-Trip and not latency times, just to set the record strait and not confuse teams that rely on these values, and need to technically understand them, as if they are half or double the time, it might matter to them.

: E=mc2 - Note for the curious minds :
These times don't stretch, and are strictly bound limits to the current human technical knowledge, and science evolution, and relate to the time speed of light needs to get from A to B. Since light travels down in approximately a straight line we can use the formula (D = T * V) where D is the distance the fiber travels, T is the time it takes to travel, and V is the velocity of light. The speed of light inside fiber is close to 2*10^8 m/s. As a rule of thumb, aprox. round trip is around 1millisecond per 100 km.
Therefore it is linked to how many kilometers light has to travel. They cannot be reduced, in light of the current human knowledge, we do use fiber optics as the best way to communicate data around the world.
So, the only way to reduce round-trip times from A to B, is simply to get closer to the destination, or simply do what many have done, use CDNs to replicate data across regions, and have it available closer and closer to the end-users.

hope i helped to clarify this topic, 'cause it came to my attention.

.keep up the good work Mat.
cheers.

China (Beijing) region not present

Although the China (Beijing) region is quite different from the others in a number of ways, it would be a very useful addition to the list.

Colorblind friendly colors

The "< 100ms" and the "100-180ms" colors are too close together for colorblind people to see. I humbly request that an accommodation is made to (1) make the colors different enough or (2) add another UX enhancement such as an icon to distinguish them better.

Highlight currently selected row and column

Would be nice to add highlighting for currently selected row and column to make it easier to find row/column pair.

There are numerous examples on how it can be done, e.g. here

PS: is this project still alive?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.