datazoompuc / datazoom.amazonia Goto Github PK
View Code? Open in Web Editor NEWSimplify access to data from the Brazilian Amazon
License: Other
Simplify access to data from the Brazilian Amazon
License: Other
Currently the raw_data
option makes no difference, and there are no municipality codes, only a column with municipality names.
library(datazoom.amazonia);
load_iema()
#> File downloaded:
#> • 'AMAZONIA_nao_atendidos.xlsx' <id: 10JMRtzu3k95vl8cQmHkVMQ9nJovvIeNl>
#> Saved locally as:
#> • 'C:\Users\igorr\AppData\Local\Temp\RtmpY1ELBW\filec4820212c3.xlsx'
#> # A tibble: 335 × 3
#> municipio populacao_nao_atendida uf
#> <chr> <dbl> <chr>
#> 1 sena madureira 26894 AC
#> 2 xapuri 13420 AC
#> 3 porto acre 7398 AC
#> 4 mancio lima 6003 AC
#> 5 brasileia 5644 AC
#> 6 cruzeiro do sul 4948 AC
#> 7 rio branco 4933 AC
#> 8 bujari 4047 AC
#> 9 feijo 2314 AC
#> 10 rodrigues alves 2052 AC
#> # … with 325 more rows
Created on 2022-07-07 by the reprex package (v2.0.1)
As an improvement, we should try to match the cities to their municipality codes when raw_data = FALSE
(which might be tricky due to spelling inconsistencies). When raw_data = TRUE
, we should also return the data before stripping away accents and capital letters from the city names.
Hi
In the 'sigmine_active' database I used the command below to add a characteristic. But an error appears that I've never seen - do you know?
It seems to point an error in the dataframe
library(tidyverse)
library(datazoom.amazonia)
minera <- load_sigmine(dataset = 'sigmine_active', raw_data = TRUE, language = "pt")
# In the "CONCESSÃO DE LAVRA" phase, which are the 20 companies that have the most records
filter(minera, fase=="CONCESSÃO DE LAVRA") %>%
group_by(nome) %>%
summarize(total=n()) %>%
arrange(desc(total)) %>%
head(20)
Error:
st_as_s2(): dropping Z and/or M coordinate
st_as_s2(): dropping Z and/or M coordinate
Error in s2_geography_from_wkb(x, oriented = oriented, check = check) :
Evaluation error: Found 1 feature with invalid spherical geometry.
[1] Loop 0 is not valid: Edge 39 has duplicate vertex with edge 51.
The problems occured when running the example on README. The package was downloaded from GitHub via DevTools.
IMAZON
data <- load_imazon(dataset = "imazon_shp", raw_data = TRUE,
geo_level = "municipality", language = "pt")
SEEG
data <- load_seeg(dataset = "seeg_industry",
raw_data = FALSE,
geo_level = "state")
data <- load_seeg(dataset = "seeg",
raw_data = TRUE,
geo_level = "municipality"
IEMA
data <- load_iema(dataset = "iema", raw_data = FALSE,
geo_level = "municipality", language = "pt")
data <- load_iema(dataset = "iema", raw_data = FALSE,
geo_level = "municipality", language = "eng")
"Please follow the steps from googledrive
package to download the data. This may take a while"
The aim was to download the data, which was not possible due to this requirement of the googledrive package.
One suggestion brought by Fernanda would be to insert instructions in the message deployed, in order to help the user match the requirements necessary. It is also important to write that these authorizations are not harmful and won´t alter the user´s personal drive, so people feel more comfortable to give the permissions necessary and are able to download and use the data.
Hello! I was trying to download the package from Github with the specified path on the READ.me file. But everytime I ask it to do it, a message appears saying that 2 packages have more recent versions (purrr and curls), after I ask to either update or not, the installation of the desired package starts, but it never works (I've tried several time already). While also showing this message:
When trying to use function load_cipo
with argument search = ""
(the default) I get an error.
This happens with all datasets in the function, downloading the package from Github or CRAN
brazilian_actors <- load_cipo(dataset = "brazilian_actors")
# also happens with "international_cooperation" and "forest_governance"
Error in `dplyr::filter()`:
ℹ In argument: `stringr::str_detect(aux, param$search)`.
Caused by error in `stringr::str_detect()`:
! `pattern` can't be the empty string (`""`).
Remove the filter in the case of search = ""
As I tried to run the available code, I encountered the following error message:
Error in dplyr::bind_cols()
:
! Can't recycle ..1
(size 3865) to match ..2
(size 773).
Run rlang::last_error()
to see where the error occurred.
rlang::last_error()
<error/vctrs_error_incompatible_size>
Error indplyr::bind_cols()
:
! Can't recycle..1
(size 3865) to match..2
(size 773).
Backtrace:
rlang::last_trace()
to see the full context.rlang::last_trace()
<error/vctrs_error_incompatible_size>
Error indplyr::bind_cols()
:
! Can't recycle..1
(size 3865) to match..2
(size 773).
Backtrace:
▆
└─vctrs:::stop_vctrs(...)
└─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = vctrs_error_call(call))
I tried running the code with multiple time periods and the issue only happened when trying to cover three or more years.
Data Zoom now has a citation template.
Please include it in the package as a message. It should show up when the user calls the package. If you think of any other situations where it would be it interesting for the message to pop-up, feel free to include the template there too.
Written format:
Data Zoom (2023). Data Zoom: Simplifying Access To Brazilian Microdata.
https://www.econ.puc-rio.br/datazoom/english/index.html
BibTex format:
https://drive.google.com/file/d/1Nyuw9LANR78not9O3Ssh-OEP-b5y7RN1/view?usp=share_link
Please copy and paste this link on your browser's search bar. I couldn't upload the .tex file to GitHub because it doesn't support this format.
Hi
Looking at the DETER data, I get more rows in output with raw_data = FALSE
than with raw_data = TRUE
:
raw_data = TRUE
, in load_deter(dataset = 'deter_amz', raw_data = TRUE)
raw_data = FALSE
, in load_deter(dataset = 'deter_amz', raw_data = FALSE)
Why is this? I looked at the documentation, but could find no information? I assume this is for polygons that go through multiple municipalities? But then, if an alert is split into two, the information is lost about which alerts polygons belong to the same alert?
Thanks!
Since the original data source was changed, our data cleaning must be updated.
Showing data only for the Legal Amazon is only available at the "municipality" level (geo_level = "municipality"). However, when changing to such geo_level, the database becomes too big to run. R warned that could happen due to the choice of municipality level.
Also, while running the code with geo_level = "municipality", the following warning was shown:
Das 6 bases sugeridas pelo CPI para serem utilizadas nas visualizações de energia, 4 já estão no datazoom.amazonia.
load_aneel carrega os datasets "energy_development_budget" e "energy_generation".
load_epe carrega os datasets "energy_consumption_per_class" e "national_energy_balance".
As duas bases que ainda não estão incluídas são
Segundo a lógica sendo usada nas funções, a primeira base seria inserida como um novo dataset em load_epe e a segunda base seria inserida como um novo dataset em load_aneel.
Mais informações sobre essas bases podem ser encontradas no nosso drive, seguindo os caminhos:
Data Zoom > Sites > Site - Data Zoom Amazônia > Posts > Energia - Data Zoom e CPI - AMZ exporta energia
Data Zoom > Sites > Site - Data Zoom Amazônia > Posts > Energia - Data Zoom e CPI - GeraçãoDistribuída
O pacote datazoom.amazonia
extrai os dados do MAPBIOMAS da coleção número 5.
Ver as linhas 405-424 do código R\download.R.
No entanto, o MAPBIOMAS divulgou dados mais recentes na coleção número 6. Ver em https://mapbiomas.org/estatisticas.
É preciso fazer a atualização.
Hello,
Occurred the same error while downloading both of the following data:
data <- load_iema(dataset = "iema", raw_data = FALSE,
geo_level = "municipality", language = "pt")
data <- load_iema(dataset = "iema", raw_data = FALSE,
geo_level = "municipality", language = "eng")
First, I was requested to install the "googledrive" package and then I had to authorise some permissions. When the authentication was complete, this error appeared:
Error in gargle::response_process()
:
! Client error: (403) Forbidden
Insufficient Permission: Request had insufficient authentication scopes.
Problema no SidraR, corrigir funçao de download para dar erro quando base retornada tiver 0 observacoes a nivel de estado
Hi
I believe the package is assuming windows path (i..e using \ separators)? This would be a problem with Linux/Mac users.
Indeed, running load_deter(dataset = 'deter_amz', raw_data = TRUE)
, I get error message (note the mixed separators):
Error: Cannot open "/tmp/RtmpOQISI6\deter_public.shp"; The file doesn't seem to exist.
But the file definitely exists: file.exists("/tmp/RtmpOQISI6/deter_public.shp") returns TRUE.
Looking at the code, I think you might just want to replace paste(dir, "deter_public.shp", sep = "\\")
in external_download with
file.path()
,which makes sure to use / or \ depending on Windows versus Linux/Mac.
Thanks!
Se rodarmos o seguinte código.
load_ibama(
download_data = FALSE,
load_from_where = "./Desktop/data.xls",
time_aggregation = year,
space_aggregation = municipality
)
O dado resultante data.xls aparace o municipio com cod_municipio == 1100130
aparece duas vezes para o ano 2019. Isso provavelmente acontece porque em um caso o nome do muncipio está municipio == Machadinho d'Oeste
e na outra observação o nome está municipio == Machadinho D'Oeste
.
É preciso revisar o arquivo R\ibama.R para corrigir esse problema.
In the right panel, Github indicates that the license for this package is "Unknown, MIT licenses found". I think we should define clearly the license we are using in a explicit way.
I tried to fix that by excluding one of the license files but it didn't pass through the package check (5217412)
Hello,
Occurred an error while downloading the following data:
data <- load_ibama(dataset = "collected_fines", raw_data = FALSE,
states = "BA", language = "pt")
Error in dplyr::mutate()
:
! Problem while computing municipio = dplyr::case_when(...)
.
Caused by error in value[[i]]
:
! índice fora de limites
When I want to use some function (I'll use load_aneel
as a running example), I find it extremely satisfactory to just type
df <- load_aneel()
and instantly get a neat result. Because we have default options that we assume most users already want (e.g. raw_data = FALSE
and language = "eng"
), this is in fact possible for many functions. But when functions have many equally important datasets, the user must choose one. So my code above is met with
Error in load_aneel() : argument "dataset" is missing, with no default
To use the function, one is forced to copy/paste or -- god forbid -- manually type some long string of characters, such as "energy_development_budget"
or "energy_generation"
.
In the README section for ANEEL, we already have this ready
Options:
"energy_development_budget"
: government spending towards"energy_generation"
: energy generation by entity/corporationSo what if I typed
df <- load_aneel()
and got something like this in the console
No dataset selected. Type a number to pick one
1 - "energy_development_budget": government spending towards energy sources
2 - "energy_generation": energy generation by entity/corporation
Not sure how to implement it in a concise way that works for all functions, but it would be pretty cool
Olá, o pacote do IEMA está dando erro.
After this command for the dataset "areas_embargadas", the generated dataframe does not have the column names:
ibama <- load_ibama(dataset = "areas_embargadas", raw_data = TRUE,
language = "pt", legal_amazon_only = FALSE)
When runing the line example:
data = load_mapbiomas(dataset = "mapbiomas_cover",
raw_data = FALSE,
language = "eng",
cover_level = 0)
The message below kept on for hours
Waiting for authentication in browser...
Press Esc/Ctrl + C to abort
Hello,
Occurred an error while downloading the first example of the SEEG data:
data <- load_seeg(dataset = "seeg",
raw_data = TRUE,
geo_level = "municipality")
First, I was requested to install the "googledrive" package and then I had to authorise some permissions. However, I didn't authorised to edit, create and delete my files in Google Drive, because I thought it was a little bit invasive. When the authentication was complete, this error appeared:
Error in gargle::response_process()
:
! Client error: (403) Forbidden
Insufficient Permission: Request had insufficient authentication scopes.
Quando um usuário pretende gerar informações sobre PIB dos municipios através de vários anos, o comando pode dar erro pois há um limite na quantidade de valores solicitados no SIDRA: 50000
Ou seja, se o usuário rodar o seguinte comando:
data <- load_gdp(c(2000:2021))
O resultado vai gerar em um erro pois o numero de municpios-ano é maior do que 50000 (94690 nesse caso pra ser exato).
Por isso, vale a pena informar ao usuário nos exemplos e nas vignettes desse limite.
The datasets used are listed in the issue name.
PEVS
data <- load_pevs(dataset = 'pevs_forest_crops',
raw_data = TRUE,
geo_level = "municipality",
time_period = 2012:2013,
language = "eng")
COMEX
load_br_trade(dataset = "comex_import_prod",
raw_data = FALSE,
time_period = 1997:2021)
TerraClimate
max_temp <- load_climate(dataset = "max_temperature", time_period = 2000:2020
BACI
clean_baci <- load_baci(dataset = "HS92", raw_data = FALSE, time_period = 2016,
language = "pt")
When running these examples, the user expects to receive a dataframe containing treated data or a list containing raw data. When coming across this kind of problem, there is no output, as the download isn´t finished.
All the suggestions made consist on restraining the range of time or filtering for a less especific geographic level. They do not solve the problem, but are consistent and clever ways of overcoming the inconvenience. For the BACI example though, this is not achievable, as the data is already restrained to a year and there is no geographic level option.
PEVS (Laura)
geo_level = "municipality" makes database too big to run. I found that setting geo_level to "region" made a good example
COMEX (Victor)
Year span reduced to 2 years in the example.
load_br_trade(dataset = "comex_import_prod", raw_data = FALSE, time_period = 2020:2021)
TerraClimate (Arthur)
max_temp <- load_climate(dataset = "max_temperature", time_period = 2000:2002)
The example is:
max_temp <- load_climate(dataset = "max_temperature", time_period = 2000:2020)
And the result would be a vector of 785.2MB, or approximately 488 Million Observations, therefore, it is not a good example, since no RAM Memory I tested this could handle an database this big.
The example is way too heavy to introduce the function, although it worked when reducing the interval to lesser years such as 2000:2002 or 2000:2001
Please follow the steps from googledrive
package to download the data. This may take a while.
Waiting for authentication in browser...
Press Esc/Ctrl + C to abort
Downloading data from IMAZON requires installing the "googledrive" package. By itself, this is not really a problem, however, when following the instructions given by the package, a lot of invasive liscenses are required to be given, such as the possibility of your google drive files being moved, altered and excluded.
O argumento time_period do exemplo está errado. Era para ser 2010:2012 e está 2010:2010
Hi
Thanks for this great package! I am getting an error when downloading, not sure whether it is R- or package-specific...
Workaround I found, from here, was to use: options(download.file.method="curl", download.file.extra="-k -L")
Problem (using R 4.2)
library(datazoom.amazonia)
data <- load_ibama(dataset = "areas_embargadas", raw_data = FALSE,
language = "eng", legal_amazon_only = FALSE)
#> Warning in utils::download.file(url = path, destfile = temp, mode =
#> "wb"): URL 'https://servicos.ibama.gov.br/ctf/publico/areasembargadas/
#> downloadListaAreasEmbargadas.php': status was 'SSL peer certificate or SSH
#> remote key was not OK'
#> Error in utils::download.file(url = path, destfile = temp, mode = "wb"): cannot open URL 'https://servicos.ibama.gov.br/ctf/publico/areasembargadas/downloadListaAreasEmbargadas.php'
Created on 2022-06-28 by the reprex package (v2.0.1)
Estava olhando os dados da SEEG para fazer uma visualização e notei uma coisa esquisita:
Rodei o código abaixo e aparecem 24 estados + NA
seeg_en <- load_seeg("seeg_industry", raw_data = FALSE, geo_level = "state")
seeg_en$state %>% unique()
Rodando esse outro código, em português, só aparecem 7 estados + NA
seeg_pt <- load_seeg("seeg_industry", raw_data = FALSE, geo_level = "state", language = "pt")
seeg_pt$state %>% unique()
Parece que o erro está no case_when
da linha 228 no arquivo "seeg.R"
Criar uma nova página wiki para o datazoom de modo a contar a história do repositório. Essa história teria mais complexidade do que o NEWS.md, por exemplo, que já conta um pouco do que foi feito no pacote datazoom.amazonia.
Package was downloades from github via DevTools
data <- load_datasus(dataset = "datasus_sim_do",
time_period = c(2020),
states = "RJ",
raw_data = FALSE)
I was expecting to download mortality data exclusively from Rio, however, it loaded from cities outside the state, such as Manaus and Belém.
Dataset containing information regarding the Teaching Establishments returns and empty dataframe with eight columns, no matter the state selected.
I tried downloading data from 1990 about datasus mortality, but it would not load.
The first year to load was 1996. The years between 1990 and 1996 had the same warning message as 1990.
base <- datazoom.amazonia::load_datasus(dataset = "datasus_sim_do", states = "RJ", time_period = 1990)
The following error message was returned:
Error in `dplyr::mutate()`:
! Problem while computing `idade_anos = dplyr::case_when(...)`.
Caused by error in `substr()`:
! object 'idade' not found
Run `rlang::last_error()` to see where the error occurred.
I was expecting to download datasus mortality data from 1990.
If the problem is that there is no data available for years before 1996, I think an error message such as "Year not available" should be returned.
The site for the original source was discontinued
data <- load_prodes("prodes", raw_data = TRUE, time_period = 2020)
The IPS amazônia website is currently offline, thus the IPS function (datazoom.amazonia::load_ips) is not working right now.
Also, because of this, the 2021 IPS data could not be added to the package.
IPS has changed the xlsx table containig the research data (now IPS' 2014, 18 & 21 are in the same table), therefore the function is not working currently. The path used for the download must be changed.
When using the example
data <- load_prodes(dataset = "prodes",
raw_data = TRUE,
time_period = 2008:2010,
language = 'en')
The variables' names seem to be in Portuguese. The same happens when the language parameter is not used.
When runing the code:
load_mapbiomas(dataset = "mapbiomas_mining",
raw_data = FALSE,
geo_level = "indigenous_land",
language = "eng")
The following error message was returned:
Error in stringr::str_sub(path, -4) : object 'path' not found
The download can take some time (~10-30mins)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
52 2159M 52 1126M 0 0 772k 0 0:47:43 0:24:52 0:22:51 619k
2159M observations is too much for a regular computer to handle.
Size is way too big to work as a good example for people using the package.
Notei que algumas funções do pacote, como load_iema
e load_baci
, têm como default para o argumento language
a opção "pt", enquanto a grande maioria das outras tem como default "eng", seria interessante padronizar isso para todas as funções.
Checar também que as duas opções são sempre "pt" e "eng" (e não "en", por exemplo)
Examples such as
load_br_trade(dataset = "comex_import_prod",
raw_data = FALSE,
time_period = 1997:2021)
are too big to run because of the aggregate information contained in the year span.
I tried to download Ibama's embargo data and got this error:
ibama <- load_ibama(dataset = "areas_embargadas", raw_data = TRUE,
legal_amazon_only = TRUE)
Error:
! Problem with filter()
input ..1
.
i Input ..1
is codigo_ibge_municipio_embargo %in% ...
.
x Input ..1
must be of size 77006 or 1, not size 0.
Run rlang::last_error()
to see where the error occurred.
And please, I'd like to interview someone from the project. I sent an email to [email protected] but I didn't have an answer
Good afternoon.
Whie testing the examples exposed on the READ.me of the Package of TerraClimate, in the second example, stated below:
amz_precipitation <- datazoom.amazonia::load_climate(dataset = "precipitation",
time_period = 2010,
legal_amazon_only = TRUE)
I received the following message, which then impossibilitated the creation of the object "amz_precipitation", instead retrieving the message:
Error in dplyr::filter(., AMZ_LEGAL == 1) :
object 'legal_amazon' not found
geo_level = "municipality" makes database too big to run. I found that setting geo_level to "region" made a good example.
When trying to use load_mapbiomas' dataset mapbiomas_transition, the data was not downloaded and, instead, it made my computer slow for about a minute and, after that, it returned an error message.
dat <- load_mapbiomas(dataset = "mapbiomas_transition", geo_level = "municipality")
Please follow the steps from googledrive
package to download the data. This may take a while.
In case of authentication errors, run vignette("GOOGLEDRIVE").
The googledrive package is requesting access to your Google account.
Select a pre-authorised account or enter '0' to obtain a new token.
Press Esc/Ctrl + C to cancel.
Selection: 1
Auto-refreshing stale OAuth token.
File downloaded:
• 1-ESTATISTICAS_MapBiomas_COL6.0_UF-MUNICIPIOS_v12_SITE.xlsx.zip <id: 1RT7J2jS6LKyISM49ctfRO31ynJZXX_TY>
Saved locally as:
• C:\Users\vhste\AppData\Local\Temp\RtmpmQ2G3g\file1fc045f95746.zip
Error: std::bad_alloc
I was expecting to get what was once obtained by the following line of code, before the load_mapbiomas updated function..
dat <- load_mapbiomas_transition(space_aggregation = "municipality", transition_interval = 5)
When running this line of code:
load_mapbiomas(dataset = "mapbiomas_transition",
raw_data = FALSE,
language = "pt")
the following error message was returned
trying URL 'https://storage.googleapis.com/mapbiomas-public/COLECAO/5/DOWNLOADS/ESTATISTICAS/Dados_Transicao_MapBiomas_5.0_UF-MUN_SITE_v2.xlsx'
Content type 'application/octet-stream' length 353792661 bytes (337.4 MB)
downloaded 246.1 MB
Error in utils::download.file(url = path, destfile = temp, mode = "wb") :
download from 'https://storage.googleapis.com/mapbiomas-public/COLECAO/5/DOWNLOADS/ESTATISTICAS/Dados_Transicao_MapBiomas_5.0_UF-MUN_SITE_v2.xlsx' failed
In addition: Warning messages:
1: In utils::download.file(url = path, destfile = temp, mode = "wb") :
downloaded length 258079567 != reported length 353792661
2: In utils::download.file(url = path, destfile = temp, mode = "wb") :
URL 'https://storage.googleapis.com/mapbiomas-public/COLECAO/5/DOWNLOADS/ESTATISTICAS/Dados_Transicao_MapBiomas_5.0_UF-MUN_SITE_v2.xlsx': Timeout of 60 seconds was reached
Please I have R - R version 4.1.2 (2021-11-01)
I tried installing datazoom.amazonia:
install.packages("datazoom.amazonia")
Warning in install.packages :
package ‘datazoom.amazonia’ is not available for this version of R
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
Do I need to update my R?
The dataset was pibmunic and the package had recently been downloaded from GitHub.
data <- load_pibmunic(raw_data = FALSE,
geo_level = "state",
time_period = 2019)
README file on GitHub informs data is only available until 2018. However, it would be useful to have more recent data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.