Comments (7)
I;ve only really used it on my one usecase for now.
For this the iterate and flatten work together to reduce the code to.
dates = reversed(pd.date_range("2021-02-01", today() - relativedelta(hours=7)).to_pydatetime())
for get_wb, idx_value in workbook_iterate(url, param_date=dates, D2_Province="province"):
date, province = idx_value
if province is None:
continue
province = get_province(province)
if skip_valid(df, (date, province), allow_na):
continue
if (wb := get_wb()) is None:
continue
row = workbook_flatten(
wb,
date,
D2_Vac_Stack={
"DAY(txn_date)-value": "Date",
"vaccine_plan_group-alias": {
"1": "1 Cum",
"2": "2 Cum",
"3": "3 Cum",
},
"SUM(vaccine_total_acm)-value": "Vac Given",
},
D2_Walkin="Cases Walkin",
D2_Proact="Cases Proactive",
D2_Prison="Cases Area Prison",
D2_NonThai="Cases Imported",
D2_New="Cases",
D2_NewTL={
"AGG(stat_count)-alias": "Cases",
"DAY(txn_date)-value": "Date"
},
D2_Lab2={
"AGG(% ติดเฉลี่ย)-value": "Positive Rate Dash",
"DAY(txn_date)-value": "Date"
},
D2_Lab={
"AGG(% ติดเฉลี่ย)-alias": "Positive Rate Dash",
"ATTR(txn_date)-alias": "Date",
},
D2_Death="Deaths",
D2_DeathTL={
"AGG(num_death)-value": "Deaths",
"DAY(txn_date)-value": "Date"
},
)
results in
Date,Province,Cases,Cases Area Prison,Cases Imported,Cases Proactive,Cases Walkin,Deaths,Hospitalized Severe,Positive Rate Dash,Tests,Vac Given 1 Cum,Vac Given 2 Cum,Vac Given 3 Cum
...
2021-09-30,Trang,85.0,0.0,0.0,0.0,85.0,0.0,0.0,,,674308.0,433842.0,26718.0
2021-09-30,Trat,76.0,2.0,0.0,0.0,74.0,0.0,91.0,,,268796.0,193600.0,8403.0
2021-09-30,Ubon Ratchathani,178.0,0.0,0.0,0.0,178.0,1.0,0.0,,,512693.0,332288.0,24226.0
2021-09-30,Udon Thani,123.0,0.0,0.0,0.0,123.0,2.0,0.0,,,496948.0,286579.0,16244.0
2021-09-30,Uthai Thani,25.0,0.0,0.0,1.0,24.0,0.0,1.0,,,124028.0,74468.0,5276.0
2021-09-30,Uttaradit,10.0,0.0,0.0,0.0,10.0,0.0,0.0,,,178581.0,111562.0,5184.0
2021-09-30,Yala,30.0,0.0,0.0,2.0,28.0,2.0,0.0,,,374607.0,227236.0,9466.0
2021-09-30,Yasothon,563.0,2.0,0.0,0.0,561.0,1.0,0.0,,,254452.0,150356.0,5982.0
Maybe you could point me to a more permanent and simpler workbook I could scrape and I can use that as an example instead?
I think one thing I might have to do is generalise flatten to work for non timeseries data as it currently assumes this which makes it less useful. Know of a good example which is not indexed by something other than date?
from tableau-scraping.
@djay Thank you, that's great work!
I'm very interested in workbook_iterate
function since there are many usecases when we need to iterate the parameters/filters (server side rendering / get all region/county/province data etc...). This addition would greatly reduce boilerplate
workbook_flatten
seems quite advanced, maybe too advanced for most people that will use this library but I may be wrong.
Do you think you can provide a PR with a sample usage for one or both of these feature ?
from tableau-scraping.
@djay For something not indexed by date, maybe:
- https://public.tableau.com/views/PlayerStats-Top5Leagues20192020/OnePlayerSummary
- https://public.tableau.com/views/NewspapersByCountyCalifornia/Newspaperbycounty
- https://public.tableau.com/views/COVIDVaccineDashboard/RECIPIENTVIEW
from tableau-scraping.
yeah maybe the top5leagues one with a row per player.
Doesn't show off the merging of an embedded graph inside the workbook but chances are the usecase for that one is only going to be timeseries.
from tableau-scraping.
@bertrandmartel maybe workbook_flatten would be more useful if it worked more automatically to try to return a single dataframe from one workbook. Then you can rename columns yourself after to clean it up.
So in the example above it would be
df = wb.flatten(datatime.now())
df = df.rename({"D2_Vac_Stack: vaccine_plan_group-alias: 1": "Vac Given 1",...
However for that to work it needs to know what the index is for every table inside a workbook and assume they are the same index. And also that single value tables will have index value passed in. Im not yet sure where in the information is of what the index is for an internal plot.
from tableau-scraping.
@bertrandmartel actually in that example it would never work for pivoting an internal table/plot. It wouldn't know which column to pivot on. Maybe the way to do that more simply would be have an exclude param on flatten and the user to do that pivot manually and combine themselves.
In addition the code I had dealt with combining internal plots and single values that represented the same data (but potentially different dates). So that would have to be done manually.
So the example would be
df = wb.flatten(datatime.now(), exclude=["D2_Vac_Stack"])
df = df.rename(columns=dict(D2_New="Cases", D2_NewTL="Cases2", D2_Death="Deaths",...))
df = df.combine_first(df["Cases2"].to_frame("Cases")).drop(columns="Cases2")
...
vac = wb.getWorksheet("D2_Vac_Stack").pivot_table(....
df = df.combine_first(vac)
I'm not sure if the end result saves more work or not...
from tableau-scraping.
@bertrandmartel inclusion of workbook_iterate
would be very helpful especially if my understanding is correct and it would accept a parameter column and then iterate the values. Though I supose doing this is fairly easy, it would save a lot of time.
I am presently unable to use Python (due system restrictions) but it does seem like your library would be very helpful for getting scraped data into Microsoft Power BI as it supports running of Python scripts where a dataframe is the result. For the time being, many of your posts Stackoverflow have helped me to get a solution working in M (Power Query). Though it is much less advanced it does the job for now.
Thanks for you work.
from tableau-scraping.
Related Issues (20)
- Issue selecting Parameter in Story Point HOT 1
- getTupleIds fail to run when `presModel` is None
- Can't set a Parameter if not listed in getParameters() HOT 2
- Can't set a filter to a value thats not in the defined list for that filter HOT 2
- add missing method for command `select-region-no-return-server` HOT 1
- Add getFilterItems() method
- Scraping tableau data based on data filtered with dropdown boxes in non existent worksheet columns
- Adding range filters
- soup.find fails to find Tableau data HOT 3
- Scraping from a private-access dashboard
- scraping workbook, NOT worksheet with selectables HOT 6
- Attempting to fetch data from sheet but can't get worksheets using getWorksheets(), most requests are returning empty arrays.
- Zones don't always update when there are story points
- getFilters() only returns the first 200 values
- How to scrape pagination data
- Radio Button filtering does not work
- Warning: mixed data types
- Add support for the 'Choose a format to download' > 'Data' option (alongside the crosstab option)
- Issue w/ filtering and parameters/selectable items HOT 1
- Setting a filter produces `WARNING - no data dictionary present in response` and causes `getCsvData` to raise `TypeError`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tableau-scraping.