1

I am trying to extract the "Company & Tests" values from this Web page: https://public.tableau.com/views/v_7_14_2020/COVID-19TestingCommons

The preferred output would be a array with company and number of tests for each company.

There is another thread (How can I scrape tooltips value from a Tableau graph embedded in a webpage) with a similar question..

I tried to work with that and it didn't work in my case

Thank You.

import requests 
from bs4 import BeautifulSoup
import json
import time

data_host = "https://public.tableau.com"

r = requests.get(
    f"{data_host}/views/v_7_14_2020/COVID-19TestingCommons",
    params= {
        ":showVizHome":"no",
    }
)
soup = BeautifulSoup(r.text, "html.parser")

tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)

dataUrl = f'{data_host}{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'


r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],
})

dataReg = re.search('\d+;({.*})\d+;({.*})', r.text, re.MULTILINE)
info = json.loads(dataReg.group(1))
data = json.loads(dataReg.group(2))

print(data["secondaryInfo"]["presModelMap"]["dataDictionary"]["presModelHolder"] ["genDataDictionaryPresModel"]["dataSegments"]["0"]["dataColumns"])
Richárd Baldauf
  • 860
  • 2
  • 9
  • 22

1 Answers1

0

In this case, your tableau url is using server side rendering. It means, by default there is no data sent to the browser and the server is rendering images with the data (tables, maps etc...), selection event are triggered using JS sending mouse coordinates to the server.

But in your case there is a way to get some data client-side-rendered using the filters. When you select a filter, for example the "specimen" filter, the data is rendered on the client (actual data are sent to the browser).

I've made a tableau scraper library to extract the data from Tableau worksheets. You can perform the following code which will load the tableau data (empty worksheets), get the filter in the worksheet Diagnostic Target named Specimen Collected and iterate each value of this filter and get the worksheet data for each one of these:

from tableauscraper import TableauScraper as TS
import pandas as pd

url = "https://public.tableau.com/views/v_7_14_2020/COVID-19TestingCommons"

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

ws = workbook.getWorksheet("Diagnostic Target")

specimens = [
    t["values"]
    for t in ws.getFilters()
    if t["column"] == "Specimen Collected"
][0]

pdList = []
for specimen in specimens:
    print(f"specimen: {specimen}")
    specResultWb = ws.setFilter("Specimen Collected", specimen)
    df = specResultWb.getWorksheet("Company and Tests").data
    pdList.append(df)

result = pd.concat(pdList, ignore_index=True)
print(result)

repl.it: https://replit.com/@bertrandmartel/TableauCovid19TestingCommonsASU

Bertrand Martel
  • 32,363
  • 15
  • 95
  • 118