Comments (11)
Thank you for your swift reply!
That has actually worked and it seems that I have managed to partially resolve this issue. At this stage, it boils down to a specific filter values. I am running python 3.9.7 on Win 10.
from tableauscraper import TableauScraper as TS
import tableauscraper
url = "https://public.tableau.com/views/SknadertilForskningsrdet-oppsummering/Enkeltsknader"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
ws = ts.getWorksheet("Prosjektoversikt")
filters = ws.getFilters()
print(filters)
filters = [20200525, 20210210]
wb = ws.setFilter('MDY(Søknadsfrist dato)', 20200525)
And I am getting an error: 2021-10-03 13:44:03,321 - tableauScraper - WARNING - no data dictionary present in response
With the second filter value 20210210
, I am not getting any data either but strangely enough, I am not getting the same error message. This behaviour is the same no matter if I am passing the filter in int form or string form using force_setFilter func.
Do you have any idea why these two particular filters wont work?
Thank you!
from tableau-scraping.
@janPesl I fail to understand how the tooltip values is retrieved
It doesn't seem to be an image but actual data since I can copy/paste the data. In the tooltip above 24114 is present somewhere in the API result for 25 may. It may be a lead. The table tooltips have similar data
from tableau-scraping.
My solution is not elegant, but it gets the job done without having to scrape the tooltip.
There were two tables that were rendered at the server side: the applications from 25 May 2020 and 10 Feb 2021. This solution has worked for both.
from tableauscraper import TableauScraper as TS
import pandas as pd
import tableauscraper
import json
import time
import requests
#filter function from issue 26
def force_setFilter(wb, ws_name, columnName, values):
"setFilter but ignore the listed filter options. also gets around wrong ordinal value which makes index value incorrect"
scraper = wb._scraper
tableauscraper.api.delayExecution(scraper)
ws = next(ws for ws in wb.worksheets if ws.name == ws_name)
filter = next(
{
"globalFieldName": t["globalFieldName"],
}
for t in ws.getFilters()
if t["column"] == columnName
)
payload = (
("dashboard", scraper.dashboard),
("globalFieldName", (None, filter["globalFieldName"])),
("qualifiedFieldCaption", (None, columnName)),
("membershipTarget", (None, "filter")),
("exclude", (None, "false")),
("filterValues", (None, json.dumps(values))),
("filterUpdateType", (None, "filter-replace"))
)
try:
r = scraper.session.post(
f'{scraper.host}{scraper.tableauData["vizql_root"]}/sessions/{scraper.tableauData["sessionid"]}/commands/tabdoc/dashboard-categorical-filter',
files=payload,
verify=scraper.verify
)
scraper.lastActionTime = time.time()
if r.status_code >= 400:
raise requests.exceptions.RequestException(r.content)
resp = r.json()
errors = [
res['commandReturn']['commandValidationPresModel']['errorMessage']
for res in resp['vqlCmdResponse']['cmdResultList']
if not res['commandReturn'].get('commandValidationPresModel', {}).get('valid', True)
]
if errors:
wb._scraper.logger.error(str(", ".join(errors)))
raise tableauscraper.api.APIResponseException(", ".join(errors))
wb.updateFullData(resp)
return tableauscraper.dashboard.getWorksheetsCmdResponse(scraper, resp)
except ValueError as e:
scraper.logger.error(str(e))
return tableauscraper.TableauWorkbook(
scraper=scraper, originalData={}, originalInfo={}, data=[]
)
except tableauscraper.api.APIResponseException as e:
wb._scraper.logger.error(str(e))
return tableauscraper.TableauWorkbook(
scraper=scraper, originalData={}, originalInfo={}, data=[]
)
# since the most data entries are concentrated in a single sector, I start by scraping all of the other sectors
pdList = []
url = "https://public.tableau.com/views/SknadertilForskningsrdet-oppsummering/Enkeltsknader"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
ws = ts.getWorksheet("Prosjektoversikt")
wb = ws.setFilter('Sektor', 'Annet') # I choose this sector as an init
wb = ws.setFilter('MDY(Søknadsfrist dato)', 20210210) # then I select the relevant date
#set the filter to all the sectors of interest
filtered = force_setFilter(workbook, "Prosjektoversikt", 'Sektor', ['Annet', 'Helseforetak', 'Instituttsektor', 'Næringsliv', 'Ukjent'])
df = filtered.getWorksheet("Prosjektoversikt").data
pdList.append(df)
#next I focus only on the remaining items in the UoH-sektor
#I create a list of lists with names of applicants, the goal is to keep the chunks under 800 items
filters = [['AHO', 'BI', 'Centre for the Science of Learning and Technology', 'CK', 'DMMH', 'FHS', 'Forskningsavdelingen PHS', 'HIB', 'HIHM', 'HIL', 'HIM', 'HINN', 'HIØ', 'HVL', 'HVO', 'MF', 'NHH', 'NIH'],
['Njord - Senter for studier av jordens fysikk', 'NMBU', 'NORD', 'NTNU', 'OSLOMET', 'PHS', 'SAMISKHS', 'Senter for bioinformatikk'],
['UIA', 'UIB', 'UIN', 'UIS', 'UIT', 'UNIS', 'USN', 'VID']
]
for filter in filters:
url = "https://public.tableau.com/views/SknadertilForskningsrdet-oppsummering/Enkeltsknader"
print(filter)
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
ws = ts.getWorksheet("Prosjektoversikt")
wb = ws.setFilter('Prosjektansvarlig', 'UIO') #I use UIO as an init here
wb = ws.setFilter('MDY(Søknadsfrist dato)', 20210210)
filtered = force_setFilter(workbook, "Prosjektoversikt", 'Prosjektansvarlig', filter)
df = pd.DataFrame(filtered.getWorksheet("Prosjektoversikt").data)
pdList.append(df)
time.sleep(5) #Not sure if this is necessary but I have had more reliable results with the delay
#Lastly, even the init value must be included
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
ws = ts.getWorksheet("Prosjektoversikt")
wb = ws.setFilter('Prosjektansvarlig', 'UIO')
wb = ws.setFilter('MDY(Søknadsfrist dato)', 20210210)
df = wb.getWorksheet("Prosjektoversikt").data
pdList.append(df)
result = pd.concat(pdList, ignore_index=True)
from tableau-scraping.
@janPesl I'm glad you've found out how to get all the data !
If you're using the latest release, there is no need of the force_setfilter
function. You just need to get the worksheet from the previous workbook before running setFilter
because the filter list is changing when you perform other filters.
Also, you can configure the delay in milliseconds between calls using ts = TS(delayMs=1000) # 1 second delay between calls
from tableauscraper import TableauScraper as TS
import pandas as pd
# since the most data entries are concentrated in a single sector, I start by scraping all of the other sectors
pdList = []
url = "https://public.tableau.com/views/SknadertilForskningsrdet-oppsummering/Enkeltsknader"
ts = TS()
ts.loads(url)
rootWs = ts.getWorksheet("Prosjektoversikt")
wb = rootWs.setFilter('Sektor', 'Annet') # I choose this sector as an init
# then I select the relevant date
wb = rootWs.setFilter('MDY(Søknadsfrist dato)', 20210210)
# get the worksheet again since the filter list has changed
ws = wb.getWorksheet("Prosjektoversikt")
print(ws.getFilters())
# set the filter to all the sectors of interest
filtered = ws.setFilter('Sektor', [
'Annet', 'Helseforetak', 'Instituttsektor', 'Næringsliv', 'Ukjent'])
df = filtered.getWorksheet("Prosjektoversikt").data
pdList.append(df)
# next I focus only on the remaining items in the UoH-sektor
# I create a list of lists with names of applicants, the goal is to keep the chunks under 800 items
filters = [['AHO', 'BI', 'Centre for the Science of Learning and Technology', 'CK', 'DMMH', 'FHS', 'Forskningsavdelingen PHS', 'HIB', 'HIHM', 'HIL', 'HIM', 'HINN', 'HIØ', 'HVL', 'HVO', 'MF', 'NHH', 'NIH'],
['Njord - Senter for studier av jordens fysikk', 'NMBU', 'NORD',
'NTNU', 'OSLOMET', 'PHS', 'SAMISKHS', 'Senter for bioinformatikk'],
['UIA', 'UIB', 'UIN', 'UIS', 'UIT', 'UNIS', 'USN', 'VID']
]
for filter in filters:
url = "https://public.tableau.com/views/SknadertilForskningsrdet-oppsummering/Enkeltsknader"
print(filter)
ts = TS(delayMs=1000) # 1 second delay between calls
ts.loads(url)
ws = ts.getWorksheet("Prosjektoversikt")
# I use UIO as an init here
wb = ws.setFilter('Prosjektansvarlig', 'UIO')
wb = ws.setFilter('MDY(Søknadsfrist dato)', 20210210)
ws = wb.getWorksheet("Prosjektoversikt")
filtered = ws.setFilter('Prosjektansvarlig', filter)
df = pd.DataFrame(filtered.getWorksheet("Prosjektoversikt").data)
pdList.append(df)
# Lastly, even the init value must be included
ts = TS()
ts.loads(url)
ws = ts.getWorksheet("Prosjektoversikt")
wb = ws.setFilter('Prosjektansvarlig', 'UIO')
wb = ws.setFilter('MDY(Søknadsfrist dato)', 20210210)
df = wb.getWorksheet("Prosjektoversikt").data
pdList.append(df)
result = pd.concat(pdList, ignore_index=True)
print(result)
repl.it: https://replit.com/@bertrandmartel/TableauSoknaderServerSideRendering
from tableau-scraping.
@janPesl Hello!
That's interesting, what is your python version ? OS ? It seems to fail at getting the index of integer 20200212
in a list
Can you try this in python interpreter :
>>> test=[20200212, 20200331]
>>> test.index(20200212)
Does it work ?
There are cases for filters where the value is passed in json in the API so I must keep the original format (int) if possible. But I could add another property that holds string conversion and getting the index from it
from tableau-scraping.
@janPesl I see it now, the data is rendered on server side: https://github.com/bertrandmartel/tableau-scraping#server-side-rendering
Server side rendering means that only images are shown on the screen, not the actual data which means we can't get the data unless we find a specific filter that works with client rendering only (for each value since dates use filter-delta
in your dashboard). In your case, some filters are client side rendered and others are server side rendered which makes it difficult to use that filter (the date) to download all data.
Getting the data for server side rendering is always a challenge, checkout similar issues https://github.com/bertrandmartel/tableau-scraping/issues?q=is%3Aissue+label%3Aserver-side-rendering+
from tableau-scraping.
Good catch!
It seems that once the amount of items is higher than some x, the worksheet us rendered on server side.
Is there a way to apply multiple filters at the same time, not step-wise?
If I start with wb = ws.setFilter('MDY(Søknadsfrist dato)', 20200525)
, it will be rendered at the server side and I wont get any data for further filtration. If I start with wb = ws.setFilter('Utlysning', 'ABALONYX AS')
I will get an error message because ''ABALONYX AS' is not in list'
of the default 'MDY(Søknadsfrist dato)' filter.
from tableau-scraping.
@janPesl yes it's possible using filterDelta=True
but I've tried this and it still returns server side data when some dates are included
wb = ws.setFilter('MDY(Søknadsfrist dato)', [20200525, 20210210], filterDelta=True)
from tableau-scraping.
@janPesl yes it's possible using
filterDelta=True
but I've tried this and it still returns server side data when some dates are includedwb = ws.setFilter('MDY(Søknadsfrist dato)', [20200525, 20210210], filterDelta=True)
I can retrieve chunks of the data this way:
wb = ws.setFilter('Sektor', 'Annet')
wb = ws.setFilter('MDY(Søknadsfrist dato)', 20200525)
df = wb.getWorksheet("Prosjektoversikt").data
The obvious limitation is that I cannot select any other values for the Sektor filter than ['Annet', 'Helseforetak', 'Instituttsektor', 'Næringsliv', 'Øvrige', 'UoH-sektor']
. As a result, I wont get the data for Ukjent, Utlandet
.
from tableau-scraping.
@janPesl I didn't find a way to get those data. The only thing I could achieve was to implement the render tooltip server API to get the tooltip html output for the sheet Antall prosjekter og søknader
:
from tableauscraper import TableauScraper as TS
import pandas as pd
url = "https://public.tableau.com/views/SknadertilForskningsrdet-oppsummering/Enkeltsknader"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
rootWs = ts.getWorksheet("Prosjektoversikt")
dates = [
t
for t in rootWs.getFilters()
if t["column"] == "MDY(Søknadsfrist dato)"
][0]["values"]
print(dates)
for date in dates:
print(f"get date {date}")
wb = rootWs.setFilter('MDY(Søknadsfrist dato)', date)
ws = wb.getWorksheet("Antall prosjekter og søknader")
tooltipHtml = ws.renderTooltip(x=143, y=3)
df = pd.read_html(tooltipHtml)[0]
print(df)
tooltipHtml = ws.renderTooltip(x=342, y=11)
df = pd.read_html(tooltipHtml)[0]
print(df)
repl.it: https://replit.com/@bertrandmartel/TableauForskningsr
It's possible to do the same with the tooltip in the table, but the tooltip doesn't mention the value from the column named Søkt beløp, mill kroner
, only values from the other columns
from tableau-scraping.
@janPesl I see it now, the data is rendered on server side: https://github.com/bertrandmartel/tableau-scraping#server-side-rendering
Server side rendering means that only images are shown on the screen, not the actual data which means we can't get the data unless we find a specific filter that works with client rendering only (for each value since dates use
filter-delta
in your dashboard). In your case, some filters are client side rendered and others are server side rendered which makes it difficult to use that filter (the date) to download all data.Getting the data for server side rendering is always a challenge, checkout similar issues https://github.com/bertrandmartel/tableau-scraping/issues?q=is%3Aissue+label%3Aserver-side-rendering+
Data image urls are listed in the response. I try to scrape all the images but failed. The image request url has a geometry field and it’s too difficult to understand its logic.
By the way, is there any api or methods to scrape all the images or calculate its geometry ?
from tableau-scraping.
Related Issues (20)
- Issue selecting Parameter in Story Point HOT 1
- getTupleIds fail to run when `presModel` is None
- Can't set a Parameter if not listed in getParameters() HOT 2
- Can't set a filter to a value thats not in the defined list for that filter HOT 2
- add missing method for command `select-region-no-return-server` HOT 1
- Add getFilterItems() method
- Scraping tableau data based on data filtered with dropdown boxes in non existent worksheet columns
- Adding range filters
- soup.find fails to find Tableau data HOT 3
- Scraping from a private-access dashboard
- scraping workbook, NOT worksheet with selectables HOT 6
- Attempting to fetch data from sheet but can't get worksheets using getWorksheets(), most requests are returning empty arrays.
- Zones don't always update when there are story points
- getFilters() only returns the first 200 values
- How to scrape pagination data
- Radio Button filtering does not work
- Warning: mixed data types
- Add support for the 'Choose a format to download' > 'Data' option (alongside the crosstab option)
- Issue w/ filtering and parameters/selectable items HOT 1
- Setting a filter produces `WARNING - no data dictionary present in response` and causes `getCsvData` to raise `TypeError`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tableau-scraping.