Code Monkey home page Code Monkey logo

Comments (4)

datadwerg avatar datadwerg commented on May 19, 2024

Hello, I hope it is OK for another user to reply instead of the package creator?

In the vignette there is an example which uses has_substring() to filter. I also think it is better to filter on a dimension column and not a measure column. I think this works:

cbs_get_data(id = "84583NED", WijkenEnBuurten = has_substring("GM"))

from cbsodatar.

ayoub-abdessadak avatar ayoub-abdessadak commented on May 19, 2024

CODE in python:

from logging import exception
import pandas 
import cbsodata
from openpyxl import Workbook


# Filter met cbsodata api
def cbs_filter(ned, soort_regio=None, gemeente=None):
    
    # NED = bijvoorbeeld 84583NED, table wordt gedownload en omgezet naar pandas tabel (https://pandas.pydata.org/docs/reference/frame.html)
    try:
        data = pandas.DataFrame(cbsodata.get_data(ned))
    except exception as E:
        print(E)
        return 'Hmmm, er is iets fout gegaan bekijk je logs'

    #Indien er een soort_regio is doorgegeven dan wordt er gefilterd op soort_regio
    if soort_regio:
       data = data[data.SoortRegio_2.str.contains(soort_regio[0].upper() + soort_regio[1::].lower())]

    #Indien gemeente is doorgegeven dan wordt er gefilterd op gemeente
    if gemeente:
        data = data[data.Gemeentenaam_1.str.contains(gemeente[0].upper() + gemeente[1::].lower())]
    

    #Verder kunnen er nog meerdere filters worden toegevoegd aan de hand van de beschikbare kolommen

    return data


# Filter zonder cbsodata api
def cbs_filter_dt(data, soort_regio=None, gemeente=None):

    #Indien er een soort_regio is doorgegeven dan wordt er gefilterd op soort_regio
    if soort_regio:
       data = data[data.SoortRegio_2.str.contains(soort_regio[0].upper() + soort_regio[1::].lower())]

    #Indien gemeente is doorgegeven dan wordt er gefilterd op gemeente
    if gemeente:
        data = data[data.Gemeentenaam_1.str.contains(gemeente[0].upper() + gemeente[1::].lower())]

    #Verder kunnen er nog meerdere filters worden toegevoegd aan de hand van de beschikbare kolommen

    return data

Uitvoering:

>>> from cbs_data import *

>>> data = cbs_filter('84583NED', 'wijk', 'amsterdam')

>>> data
        ID          WijkenEnBuurten  ... MateVanStedelijkheid_115 Omgevingsadressendichtheid_116
921    921    Burgwallen-Oude Zijde  ...                      1.0                         7622.0
927    927  Burgwallen-Nieuwe Zijde  ...                      1.0                         9222.0
936    936      Grachtengordel-West  ...                      1.0                        10638.0
941    941      Grachtengordel-Zuid  ...                      1.0                         9212.0
949    949       Nieuwmarkt/Lastage  ...                      1.0                         6995.0
...    ...                      ...  ...                      ...                            ...
1464  1464     Bijlmer Oost (E,G,K)  ...                      1.0                         3329.0
1479  1479               Nellestein  ...                      2.0                         2021.0
1483  1483   Holendrecht/Reigersbos  ...                      1.0                         2539.0
1491  1491                     Gein  ...                      2.0                         2081.0
1496  1496                 Driemond  ...                      5.0                          453.0


[99 rows x 118 columns]

>>> data.to_excel("amsterdam_wijk.xlsx")

amsterdam_wijk.xlsx

from cbsodatar.

edwindj avatar edwindj commented on May 19, 2024

@datadwerg thx for answering the question: seems indeed the intention of the question.
Another option of course is to do post-filtering: download the data and then filter on this column (but that will take a lot more memory)

data_all <- cbs_get_data(id = "84583NED")
data_gm <- subset(data_all, grepl("GM", WijkenEnBuurten))
# or using tidyverse
library(tidyverse)
data_gm <- data_all %>% filter(str_detect(WijkenEnBuurten, "GM"))

from cbsodatar.

victorhartman avatar victorhartman commented on May 19, 2024

Thanks for the responses. The solution by @datadwerg answers my question and works.

Kerncijfers wijken en buurten 2020 contains ~17.000 rows and the gemeentes only take 355 of those. So although post-filtering works, it is much slower/requires more resources.

from cbsodatar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.