Code Monkey home page Code Monkey logo

Comments (7)

djay avatar djay commented on May 27, 2024 1

I;ve only really used it on my one usecase for now.
Screen Shot 2021-10-02 at 8 32 51 am

For this the iterate and flatten work together to reduce the code to.

        dates = reversed(pd.date_range("2021-02-01", today() - relativedelta(hours=7)).to_pydatetime())
        for get_wb, idx_value in workbook_iterate(url, param_date=dates, D2_Province="province"):
            date, province = idx_value
            if province is None:
                continue
            province = get_province(province)
            if skip_valid(df, (date, province), allow_na):
                continue
            if (wb := get_wb()) is None:
                continue
            row = workbook_flatten(
                wb,
                date,
                D2_Vac_Stack={
                    "DAY(txn_date)-value": "Date",
                    "vaccine_plan_group-alias": {
                        "1": "1 Cum",
                        "2": "2 Cum",
                        "3": "3 Cum",
                    },
                    "SUM(vaccine_total_acm)-value": "Vac Given",
                },
                D2_Walkin="Cases Walkin",
                D2_Proact="Cases Proactive",
                D2_Prison="Cases Area Prison",
                D2_NonThai="Cases Imported",
                D2_New="Cases",
                D2_NewTL={
                    "AGG(stat_count)-alias": "Cases",
                    "DAY(txn_date)-value": "Date"
                },
                D2_Lab2={
                    "AGG(% ติดเฉลี่ย)-value": "Positive Rate Dash",
                    "DAY(txn_date)-value": "Date"
                },
                D2_Lab={
                    "AGG(% ติดเฉลี่ย)-alias": "Positive Rate Dash",
                    "ATTR(txn_date)-alias": "Date",
                },
                D2_Death="Deaths",
                D2_DeathTL={
                    "AGG(num_death)-value": "Deaths",
                    "DAY(txn_date)-value": "Date"
                },
            )

results in

Date,Province,Cases,Cases Area Prison,Cases Imported,Cases Proactive,Cases Walkin,Deaths,Hospitalized Severe,Positive Rate Dash,Tests,Vac Given 1 Cum,Vac Given 2 Cum,Vac Given 3 Cum
...
2021-09-30,Trang,85.0,0.0,0.0,0.0,85.0,0.0,0.0,,,674308.0,433842.0,26718.0
2021-09-30,Trat,76.0,2.0,0.0,0.0,74.0,0.0,91.0,,,268796.0,193600.0,8403.0
2021-09-30,Ubon Ratchathani,178.0,0.0,0.0,0.0,178.0,1.0,0.0,,,512693.0,332288.0,24226.0
2021-09-30,Udon Thani,123.0,0.0,0.0,0.0,123.0,2.0,0.0,,,496948.0,286579.0,16244.0
2021-09-30,Uthai Thani,25.0,0.0,0.0,1.0,24.0,0.0,1.0,,,124028.0,74468.0,5276.0
2021-09-30,Uttaradit,10.0,0.0,0.0,0.0,10.0,0.0,0.0,,,178581.0,111562.0,5184.0
2021-09-30,Yala,30.0,0.0,0.0,2.0,28.0,2.0,0.0,,,374607.0,227236.0,9466.0
2021-09-30,Yasothon,563.0,2.0,0.0,0.0,561.0,1.0,0.0,,,254452.0,150356.0,5982.0

Maybe you could point me to a more permanent and simpler workbook I could scrape and I can use that as an example instead?
I think one thing I might have to do is generalise flatten to work for non timeseries data as it currently assumes this which makes it less useful. Know of a good example which is not indexed by something other than date?

from tableau-scraping.

bertrandmartel avatar bertrandmartel commented on May 27, 2024

@djay Thank you, that's great work!

I'm very interested in workbook_iterate function since there are many usecases when we need to iterate the parameters/filters (server side rendering / get all region/county/province data etc...). This addition would greatly reduce boilerplate

workbook_flatten seems quite advanced, maybe too advanced for most people that will use this library but I may be wrong.

Do you think you can provide a PR with a sample usage for one or both of these feature ?

from tableau-scraping.

bertrandmartel avatar bertrandmartel commented on May 27, 2024

@djay For something not indexed by date, maybe:

from tableau-scraping.

djay avatar djay commented on May 27, 2024

yeah maybe the top5leagues one with a row per player.

Doesn't show off the merging of an embedded graph inside the workbook but chances are the usecase for that one is only going to be timeseries.

from tableau-scraping.

djay avatar djay commented on May 27, 2024

@bertrandmartel maybe workbook_flatten would be more useful if it worked more automatically to try to return a single dataframe from one workbook. Then you can rename columns yourself after to clean it up.
So in the example above it would be

df = wb.flatten(datatime.now())
df = df.rename({"D2_Vac_Stack: vaccine_plan_group-alias: 1": "Vac Given 1",...

However for that to work it needs to know what the index is for every table inside a workbook and assume they are the same index. And also that single value tables will have index value passed in. Im not yet sure where in the information is of what the index is for an internal plot.

from tableau-scraping.

djay avatar djay commented on May 27, 2024

@bertrandmartel actually in that example it would never work for pivoting an internal table/plot. It wouldn't know which column to pivot on. Maybe the way to do that more simply would be have an exclude param on flatten and the user to do that pivot manually and combine themselves.

In addition the code I had dealt with combining internal plots and single values that represented the same data (but potentially different dates). So that would have to be done manually.

So the example would be

df = wb.flatten(datatime.now(), exclude=["D2_Vac_Stack"])
df = df.rename(columns=dict(D2_New="Cases", D2_NewTL="Cases2", D2_Death="Deaths",...))
df = df.combine_first(df["Cases2"].to_frame("Cases")).drop(columns="Cases2")
...
vac = wb.getWorksheet("D2_Vac_Stack").pivot_table(....
df = df.combine_first(vac)

I'm not sure if the end result saves more work or not...

from tableau-scraping.

 avatar commented on May 27, 2024

@bertrandmartel inclusion of workbook_iterate would be very helpful especially if my understanding is correct and it would accept a parameter column and then iterate the values. Though I supose doing this is fairly easy, it would save a lot of time.

I am presently unable to use Python (due system restrictions) but it does seem like your library would be very helpful for getting scraped data into Microsoft Power BI as it supports running of Python scripts where a dataframe is the result. For the time being, many of your posts Stackoverflow have helped me to get a solution working in M (Power Query). Though it is much less advanced it does the job for now.

Thanks for you work.

from tableau-scraping.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.