Comments (4)
Hello, I hope it is OK for another user to reply instead of the package creator?
In the vignette there is an example which uses has_substring() to filter. I also think it is better to filter on a dimension column and not a measure column. I think this works:
cbs_get_data(id = "84583NED", WijkenEnBuurten = has_substring("GM"))
from cbsodatar.
CODE in python:
from logging import exception
import pandas
import cbsodata
from openpyxl import Workbook
# Filter met cbsodata api
def cbs_filter(ned, soort_regio=None, gemeente=None):
# NED = bijvoorbeeld 84583NED, table wordt gedownload en omgezet naar pandas tabel (https://pandas.pydata.org/docs/reference/frame.html)
try:
data = pandas.DataFrame(cbsodata.get_data(ned))
except exception as E:
print(E)
return 'Hmmm, er is iets fout gegaan bekijk je logs'
#Indien er een soort_regio is doorgegeven dan wordt er gefilterd op soort_regio
if soort_regio:
data = data[data.SoortRegio_2.str.contains(soort_regio[0].upper() + soort_regio[1::].lower())]
#Indien gemeente is doorgegeven dan wordt er gefilterd op gemeente
if gemeente:
data = data[data.Gemeentenaam_1.str.contains(gemeente[0].upper() + gemeente[1::].lower())]
#Verder kunnen er nog meerdere filters worden toegevoegd aan de hand van de beschikbare kolommen
return data
# Filter zonder cbsodata api
def cbs_filter_dt(data, soort_regio=None, gemeente=None):
#Indien er een soort_regio is doorgegeven dan wordt er gefilterd op soort_regio
if soort_regio:
data = data[data.SoortRegio_2.str.contains(soort_regio[0].upper() + soort_regio[1::].lower())]
#Indien gemeente is doorgegeven dan wordt er gefilterd op gemeente
if gemeente:
data = data[data.Gemeentenaam_1.str.contains(gemeente[0].upper() + gemeente[1::].lower())]
#Verder kunnen er nog meerdere filters worden toegevoegd aan de hand van de beschikbare kolommen
return data
Uitvoering:
>>> from cbs_data import *
>>> data = cbs_filter('84583NED', 'wijk', 'amsterdam')
>>> data
ID WijkenEnBuurten ... MateVanStedelijkheid_115 Omgevingsadressendichtheid_116
921 921 Burgwallen-Oude Zijde ... 1.0 7622.0
927 927 Burgwallen-Nieuwe Zijde ... 1.0 9222.0
936 936 Grachtengordel-West ... 1.0 10638.0
941 941 Grachtengordel-Zuid ... 1.0 9212.0
949 949 Nieuwmarkt/Lastage ... 1.0 6995.0
... ... ... ... ... ...
1464 1464 Bijlmer Oost (E,G,K) ... 1.0 3329.0
1479 1479 Nellestein ... 2.0 2021.0
1483 1483 Holendrecht/Reigersbos ... 1.0 2539.0
1491 1491 Gein ... 2.0 2081.0
1496 1496 Driemond ... 5.0 453.0
[99 rows x 118 columns]
>>> data.to_excel("amsterdam_wijk.xlsx")
from cbsodatar.
@datadwerg thx for answering the question: seems indeed the intention of the question.
Another option of course is to do post-filtering: download the data and then filter on this column (but that will take a lot more memory)
data_all <- cbs_get_data(id = "84583NED")
data_gm <- subset(data_all, grepl("GM", WijkenEnBuurten))
# or using tidyverse
library(tidyverse)
data_gm <- data_all %>% filter(str_detect(WijkenEnBuurten, "GM"))
from cbsodatar.
Thanks for the responses. The solution by @datadwerg answers my question and works.
Kerncijfers wijken en buurten 2020 contains ~17.000 rows and the gemeentes only take 355 of those. So although post-filtering works, it is much slower/requires more resources.
from cbsodatar.
Related Issues (20)
- More informative error message when request too long
- problems loading 70747ned HOT 19
- cbs_get_toc for a particular city HOT 3
- error bij `cbs_add_columns()` HOT 3
- please document output values HOT 3
- cbs_get_meta() gives error: fixed but not yet on CRAN HOT 2
- filter data with geq, neq and not in HOT 3
- week periods are not converted with `cbs_add_date_column` HOT 4
- cbsodataR in Remoteacces environment HOT 7
- setwd() in package limits cronjobs HOT 4
- `cbs_get_data` with `base_url="http://dataderden.cbs.nl` fails HOT 2
- Solution for no connection with Windows 7/8 and IE11
- cbs_get_data() no response error
- error cbs_download_data HOT 1
- HTTP versus HTTPS
- method to keep track of table updates? HOT 1
- `cbs_download_data`: catalog error
- `cbs_get_data` argument `typed=FALSE` not working correctly
- `cbs_get_meta`: default selection which includes a substring of is not parsed correctly
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cbsodatar.