data-network-lab / indicatore_zona_gialla Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 190.85 MB

Data Network x ALTEMS (Alta Scuola di Economia e Management Sistemi Sanitari)

License: MIT License

R 100.00%

covid-19 etl-pipeline datawrapper

indicatore_zona_gialla's Issues

[BUG] Different updating pace within data sources

Expected Behavior

Expecting from tabella_semplice.csv (but also for each of the other \graph-data\*) 22 rows corresponding to each region (21 regions + statuto speciale + marginal Italia).

Current Behavior

Regions count is less than expected: either 20 or 18. The reason why it happens for tabella semplice as well as for all the other outputs in \graph-data\* is that url_vaccini updates regional data with a 1 day lag time. Moreover a few regions with 2 days lag time, such as: PA Bolzano and Valle D'Aosta. This causes a 2 days lag time to get all data coming from all the regions. Therefore since the algorithm takes the last 22 rows it can happen during the day that some regions are missing and a few of them are recyled from the day before.

Possible Solution

A couple of solutions:

Set a daily release hour for data (e.g. daily at 8 pm) after the last region updating. That causes data being open sourced only when it is complete, but in a certain sense it limits usability since prior 8pm data is useless.
recycle regional vaccini data from the last available date, since vaccination is not expected to be drastically changed from 1 day to the following.

Steps to Reproduce

library(reprex)
#> Warning: package 'reprex' was built under R version 4.0.5
library(readr)
#> Warning: package 'readr' was built under R version 4.0.5
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.0.5
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

output = read_csv("https://raw.githubusercontent.com/Data-Network-Lab/indicatore_zona_gialla/main/data/indicatore_stress.csv")
#> Rows: 5645 Columns: 31
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr   (1): denominazione_regione
#> dbl  (29): totale_casi, terapia_intensiva, ricoverati_con_sintomi, totale_ca...
#> date  (1): data
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.

output %>% tail(22) %>%  count(denominazione_regione) 
#> # A tibble: 20 x 2
#>    denominazione_regione     n
#>    <chr>                 <int>
#>  1 Abruzzo                   1
#>  2 Basilicata                1
#>  3 Calabria                  1
#>  4 Campania                  1
#>  5 Emilia-Romagna            1
#>  6 Friuli Venezia Giulia     1
#>  7 Italia                    2
#>  8 Lazio                     1
#>  9 Liguria                   1
#> 10 Lombardia                 1
#> 11 Marche                    1
#> 12 Molise                    1
#> 13 P.A. Trento               1
#> 14 Piemonte                  1
#> 15 Puglia                    1
#> 16 Sardegna                  1
#> 17 Sicilia                   1
#> 18 Toscana                   1
#> 19 Umbria                    1
#> 20 Veneto                    2

output %>% filter(data== today()-2) %>%  count(denominazione_regione)
#> # A tibble: 22 x 2
#>    denominazione_regione     n
#>    <chr>                 <int>
#>  1 Abruzzo                   1
#>  2 Basilicata                1
#>  3 Calabria                  1
#>  4 Campania                  1
#>  5 Emilia-Romagna            1
#>  6 Friuli Venezia Giulia     1
#>  7 Italia                    1
#>  8 Lazio                     1
#>  9 Liguria                   1
#> 10 Lombardia                 1
#> # ... with 12 more rows

output %>% filter(data== today()-3) %>%  count(denominazione_regione)
#> # A tibble: 22 x 2
#>    denominazione_regione     n
#>    <chr>                 <int>
#>  1 Abruzzo                   1
#>  2 Basilicata                1
#>  3 Calabria                  1
#>  4 Campania                  1
#>  5 Emilia-Romagna            1
#>  6 Friuli Venezia Giulia     1
#>  7 Italia                    1
#>  8 Lazio                     1
#>  9 Liguria                   1
#> 10 Lombardia                 1
#> # ... with 12 more rows


anti_join(output %>% filter(data== today()-2) %>%  count(denominazione_regione),
          output %>% tail(22) %>%  count(denominazione_regione), 
          by = "denominazione_regione")
#> # A tibble: 2 x 2
#>   denominazione_regione     n
#>   <chr>                 <int>
#> 1 P.A. Bolzano              1
#> 2 Valle d'Aosta             1

^{Created on 2021-09-13 by the reprex package (v2.0.1)}

[enhancement ] add renv

Add renv to project

Contributors: @NiccoloSalvini

Problem statement

Reproducibility of dependencies across different machines and different environments of R projects (open source contributed).

Description of proposed solution

As part of the R dependency management initiative, the {renv} package was created to provide project-specific R dependency management. This package should be a robust, stable replacement for Packrat, with fewer surprises and better default behaviors, according to the project's creator.

Existing workflows should continue to work as they did before. renv helps manage library paths (and other project-specific state) to help isolate your project's R dependencies.

Detailed description of design and implementation of proposed solution

The general workflow when working with renv is:

Call renv::init() to initialize a new project-local environment with a private R library,

Work in the project as normal, installing and removing new R packages as they are needed in the project,

Call renv::snapshot() to save the state of the project library to the lockfile (called renv.lock),

Continue working on your project, installing and updating R packages as needed.

Call renv::snapshot() again to save the state of your project library if your attempts to update R packages were successful, or call renv::restore() to revert to the previous state as encoded in the lockfile if your attempts to update packages introduced some new problems.

[enhancement ] introduce httr2

Introduce `httr2`

httr2 instead of httr

Motivation

You can now create and modify a request without performing it. This means that there’s now a single function to perform the request and fetch the result: req_perform(). (If you want to handle the response as it streams in, use req_stream() instead). req_perform() replaces httr::GET(), httr::POST(), httr::DELETE(), and more.
HTTP errors are automatically converted into R errors. Use `req_error()' to override the defaults (which turn all 4xx and 5xx responses into errors) or to add additional details to the error message.
You can automatically retry if the request fails or encounters a transient HTTP error (e.g. a 429 rate limit request). req_retry() defines the maximum number of retries, which errors are transient, and how long to wait between tries.
OAuth support has been totally overhauled to directly support many more flows and to make it much easier to both customise the built-in flows and to create your own.
You can manage secrets (often needed for testing) with secret_encrypt() and friends. You can obfuscate mildly confidential data with obfuscate(), preventing it from being scraped from published code.
You can automatically cache all cacheable results with req_cache(). Relatively few API responses are cacheable, but when they are it typically makes a big difference.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.