Code Monkey home page Code Monkey logo

Comments (8)

segiddins avatar segiddins commented on August 17, 2024 1

See also https://packaging.python.org/en/latest/guides/analyzing-pypi-package-downloads/ for how pypi handles this

from rubygems.org.

simi avatar simi commented on August 17, 2024 1

@simi brought the point of make it an isolated service versus run it on the actual infrastructure, and I'd love if we could

This was raised by @colby-swandale actually. We need to ensure Timescale service health is not going to affect health of the rest of the service. I thought we do something especial for OpenSearch, but seems we're not. 🤔 @colby-swandale would you mind to decide if it is ok to start with built-in API with some reasonable timeouts or rather start with isolated service?

from rubygems.org.

colby-swandale avatar colby-swandale commented on August 17, 2024

👋🏻 heyo, i'm Colby, i'm maintaining the infrastructure for rubygems.org and wanted to jump in to help get this done. I wanted ask some questions to better understand what changes introducing TimescaleDB will have.

I appreciate Timescale putting their hand up to help us here, it's super appreciated by everyone here. My big takeaway of this proposal is introducing a runtime dependency to rubygems.org, which we have already, ie: Fastly, but look to limit if possible. What benefit is it to run a Timescale Cloud instance vs would our use case be something simple enough for the Timescale Postgres extension could handle relatively easily? I also heard of a potential Timescale DB instance inside AWS being in active development, is this far away?

Our download logs only go as far back as 2015 when we moved to Fastly, so you'll probably need to add a step to backfill gem versions created before this date. Which you can probably backfill up to 365.days.ago to reduce the amount of logs needing to be parsed/inserted.

from rubygems.org.

simi avatar simi commented on August 17, 2024

Our download logs only go as far back as 2015 when we moved to Fastly, so you'll probably need to add a step to backfill gem versions created before this date. Which you can probably backfill up to 365.days.ago to reduce the amount of logs needing to be parsed/inserted.

@colby-swandale What data could be used to backfill pre-Fastly gems? In case there is none, we can just mark those versions as incomplete statistics-wise.

from rubygems.org.

jonatas avatar jonatas commented on August 17, 2024

Hello Colby! Thanks for reaching out!

What benefit is it to run a Timescale Cloud instance vs would our use case be something simple enough for the Timescale Postgres extension could handle relatively easily?

A cloud allows to use elastic computing and storage, high availability, replicas, etc. This would also be a great marketing for our product but the open source version just works.

I also heard of a potential Timescale DB instance inside AWS being in active development, is this far away?

I don't have details enough to share any estimates but will try to check with the team.

My big takeaway of this proposal is introducing a runtime dependency to rubygems.org, which we have already, ie: Fastly, but look to limit if possible.

I totally agree and I was thinking even how these statistics could be a separated service, like rubygems-analytics because the only thing we need to get is the same files from the s3 and maybe transport some rubygems metadata like rubygem_id and version_id, but the rest would be totally isolated.

So, I'm also happy to move it as an independent process to isolate the entire scenario too. If you agree I can first bring the POC that just runs totally independently.

from rubygems.org.

simi avatar simi commented on August 17, 2024

I totally agree and I was thinking even how these statistics could be a separated service, like rubygems-analytics because the only thing we need to get is the same files from the s3 and maybe transport some rubygems metadata like rubygem_id and version_id, but the rest would be totally isolated.

@colby-swandale on the other side new isolated app will add maintenance burden. 🤔 @jonatas do you have any idea/estimate what kind of response time we can get for most complex queries planned?

from rubygems.org.

jonatas avatar jonatas commented on August 17, 2024

I don't think we'll have anything over a second. Everything will be pre-processed, so I imagine the avg query will be under 300ms.

from rubygems.org.

jonatas avatar jonatas commented on August 17, 2024

Hi folks, I just created this POC with the basic code to allow us to collect hourly statistics from the raw data.

We can run all logs available and just pre-load the data into some instance, but I still don't have access to run it.

@simi brought the point of make it an isolated service versus run it on the actual infrastructure, and I'd love if we could

I see a lot of positive impact on building a isolated server which just track downloads. I don't think this type of feature needs to be part of the server and having the extra database layer would add a new layer of complexity over ActiveRecord as it uses a different connections.

On an isolated server we'd need to mimic LogTickets or just have access to s3 api to list and consume all the files:

  • We'll need a listener to subscribe to messages from new logs generated to process.
  • Create an endpoint for statistics that can be consumed by the official website.
  • Drop the old counters from the rubygems and replace the source with service calls.

I'm very open to follow in both ways. I can really integrate into the point that @segiddins went before. I just explored as a POC and looking for more feedback before we proceed to the production implementation. I think as an isolated server we have more chance to develop other types of analysis and even detect patterns.

from rubygems.org.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.