I wrote two higher level functions that could be useful to others if included in your

I;ve only really used it on my one usecase for now. <a target="_blank" rel="noopen

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

enhancement: workbook_iterate and workbook_flatten about tableau-scraping HOT 7 OPEN

bertrandmartel commented on May 27, 2024 2

enhancement: workbook_iterate and workbook_flatten

from tableau-scraping.

Comments (7)

djay commented on May 27, 2024 1

I;ve only really used it on my one usecase for now.

For this the iterate and flatten work together to reduce the code to.

        dates = reversed(pd.date_range("2021-02-01", today() - relativedelta(hours=7)).to_pydatetime())
        for get_wb, idx_value in workbook_iterate(url, param_date=dates, D2_Province="province"):
            date, province = idx_value
            if province is None:
                continue
            province = get_province(province)
            if skip_valid(df, (date, province), allow_na):
                continue
            if (wb := get_wb()) is None:
                continue
            row = workbook_flatten(
                wb,
                date,
                D2_Vac_Stack={
                    "DAY(txn_date)-value": "Date",
                    "vaccine_plan_group-alias": {
                        "1": "1 Cum",
                        "2": "2 Cum",
                        "3": "3 Cum",
                    },
                    "SUM(vaccine_total_acm)-value": "Vac Given",
                },
                D2_Walkin="Cases Walkin",
                D2_Proact="Cases Proactive",
                D2_Prison="Cases Area Prison",
                D2_NonThai="Cases Imported",
                D2_New="Cases",
                D2_NewTL={
                    "AGG(stat_count)-alias": "Cases",
                    "DAY(txn_date)-value": "Date"
                },
                D2_Lab2={
                    "AGG(% ติดเฉลี่ย)-value": "Positive Rate Dash",
                    "DAY(txn_date)-value": "Date"
                },
                D2_Lab={
                    "AGG(% ติดเฉลี่ย)-alias": "Positive Rate Dash",
                    "ATTR(txn_date)-alias": "Date",
                },
                D2_Death="Deaths",
                D2_DeathTL={
                    "AGG(num_death)-value": "Deaths",
                    "DAY(txn_date)-value": "Date"
                },
            )

results in

Date,Province,Cases,Cases Area Prison,Cases Imported,Cases Proactive,Cases Walkin,Deaths,Hospitalized Severe,Positive Rate Dash,Tests,Vac Given 1 Cum,Vac Given 2 Cum,Vac Given 3 Cum
...
2021-09-30,Trang,85.0,0.0,0.0,0.0,85.0,0.0,0.0,,,674308.0,433842.0,26718.0
2021-09-30,Trat,76.0,2.0,0.0,0.0,74.0,0.0,91.0,,,268796.0,193600.0,8403.0
2021-09-30,Ubon Ratchathani,178.0,0.0,0.0,0.0,178.0,1.0,0.0,,,512693.0,332288.0,24226.0
2021-09-30,Udon Thani,123.0,0.0,0.0,0.0,123.0,2.0,0.0,,,496948.0,286579.0,16244.0
2021-09-30,Uthai Thani,25.0,0.0,0.0,1.0,24.0,0.0,1.0,,,124028.0,74468.0,5276.0
2021-09-30,Uttaradit,10.0,0.0,0.0,0.0,10.0,0.0,0.0,,,178581.0,111562.0,5184.0
2021-09-30,Yala,30.0,0.0,0.0,2.0,28.0,2.0,0.0,,,374607.0,227236.0,9466.0
2021-09-30,Yasothon,563.0,2.0,0.0,0.0,561.0,1.0,0.0,,,254452.0,150356.0,5982.0

Maybe you could point me to a more permanent and simpler workbook I could scrape and I can use that as an example instead?
I think one thing I might have to do is generalise flatten to work for non timeseries data as it currently assumes this which makes it less useful. Know of a good example which is not indexed by something other than date?

from tableau-scraping.

bertrandmartel commented on May 27, 2024

@djay Thank you, that's great work!

I'm very interested in workbook_iterate function since there are many usecases when we need to iterate the parameters/filters (server side rendering / get all region/county/province data etc...). This addition would greatly reduce boilerplate

workbook_flatten seems quite advanced, maybe too advanced for most people that will use this library but I may be wrong.

Do you think you can provide a PR with a sample usage for one or both of these feature ?

from tableau-scraping.

bertrandmartel commented on May 27, 2024

@djay For something not indexed by date, maybe:

from tableau-scraping.

djay commented on May 27, 2024

yeah maybe the top5leagues one with a row per player.

Doesn't show off the merging of an embedded graph inside the workbook but chances are the usecase for that one is only going to be timeseries.

from tableau-scraping.

djay commented on May 27, 2024

@bertrandmartel maybe workbook_flatten would be more useful if it worked more automatically to try to return a single dataframe from one workbook. Then you can rename columns yourself after to clean it up.
So in the example above it would be

df = wb.flatten(datatime.now())
df = df.rename({"D2_Vac_Stack: vaccine_plan_group-alias: 1": "Vac Given 1",...

However for that to work it needs to know what the index is for every table inside a workbook and assume they are the same index. And also that single value tables will have index value passed in. Im not yet sure where in the information is of what the index is for an internal plot.

from tableau-scraping.

djay commented on May 27, 2024

@bertrandmartel actually in that example it would never work for pivoting an internal table/plot. It wouldn't know which column to pivot on. Maybe the way to do that more simply would be have an exclude param on flatten and the user to do that pivot manually and combine themselves.

In addition the code I had dealt with combining internal plots and single values that represented the same data (but potentially different dates). So that would have to be done manually.

So the example would be

df = wb.flatten(datatime.now(), exclude=["D2_Vac_Stack"])
df = df.rename(columns=dict(D2_New="Cases", D2_NewTL="Cases2", D2_Death="Deaths",...))
df = df.combine_first(df["Cases2"].to_frame("Cases")).drop(columns="Cases2")
...
vac = wb.getWorksheet("D2_Vac_Stack").pivot_table(....
df = df.combine_first(vac)

I'm not sure if the end result saves more work or not...

from tableau-scraping.

commented on May 27, 2024

@bertrandmartel inclusion of workbook_iterate would be very helpful especially if my understanding is correct and it would accept a parameter column and then iterate the values. Though I supose doing this is fairly easy, it would save a lot of time.

I am presently unable to use Python (due system restrictions) but it does seem like your library would be very helpful for getting scraped data into Microsoft Power BI as it supports running of Python scripts where a dataframe is the result. For the time being, many of your posts Stackoverflow have helped me to get a solution working in M (Power Query). Though it is much less advanced it does the job for now.

Thanks for you work.

from tableau-scraping.

enhancement: workbook_iterate and workbook_flatten about tableau-scraping HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent