1

I am trying to extract the "mobility index" values for each state and county from this webpage: https://www.cuebiq.com/visitation-insights-mobility-index/

The preferred output would be a panel data of place (state/county) by date for all available places and dates.

There is another thread (How can I scrape tooltips value from a Tableau graph embedded in a webpage) with a similar question. I tried to follow the solution there but it doesn't seem to work for my case.

Thanks a lot in advance.

(A way that I have tried is to download PDF files generated from Tableau, which would contain all counties' value on a specific date. However, I still need to find a way to make request for each date in the data. Anyway, let me know if you have a better idea than this route).

HaJa
  • 13
  • 2

1 Answers1

0

This tableau data url doesn't return any data. In fact, it only render images of the values (canvas probably) and I'm guessing it detects click based on coordinate. Probably, it's made this way to cache the value and render quickly.

But when you click on a state, it actually returns data but it seems it doesn't always returns the result for the state (but works the individual county).

The solution I've found is to use the tooltip to get the data for the state. When you click the state, it generates a request like this :

POST https://public.tableau.com/{path}/{session_id}/commands/tabsrv/render-tooltip-server

with the following form param :

worksheet: US Map - State - CMI
dashboard: CMI
tupleIds: [18]
vizRegionRect: {"r":"viz","x":496,"y":148,"w":0,"h":0,"fieldVector":null}
allowHoverActions: false
allowPromptText: true
allowWork: false
useInlineImages: true

where tupleIds: [18] refers to the index of the state in a list of states in reverse alphabetical order like this :

stateNames = ["Wyoming","Wisconsin","West Virginia","Washington","Virginia","Vermont","Utah","Texas","Tennessee","South Dakota","South Carolina","Rhode Island","Pennsylvania","Oregon","Oklahoma","Ohio","North Dakota","North Carolina","New York","New Mexico","New Jersey","New Hampshire","Nevada","Nebraska","Montana","Missouri","Mississippi","Minnesota","Michigan","Massachusetts","Maryland","Maine","Louisiana","Kentucky","Kansas","Iowa","Indiana","Illinois","Idaho","Georgia","Florida","District of Columbia","Delaware","Connecticut","Colorado","California","Arkansas","Arizona","Alabama"]

It gives a json with the html of the tooltip which has the CMI and YoY values you want to extract :

{
    "vqlCmdResponse": {
        "cmdResultList": [{
            "commandName": "tabsrv:render-tooltip-server",
            "commandReturn": {
                "tooltipText": "{\"htmlTooltip\": \"<HTML HERE WITH THE VALUES>\"}]},\"overlayAnchors\":[]}"
            }
        }]
    }
}

The only caveat is that you'll hava to make one request per state :

import requests
from bs4 import BeautifulSoup
import json
import time

data_host = "https://public.tableau.com"

r = requests.get(
    f"{data_host}/views/CMI-2_0/CMI",
    params= {
        ":showVizHome":"no",
    }
)
soup = BeautifulSoup(r.text, "html.parser")

tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)

dataUrl = f'{data_host}{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'

r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],
})
data = []

stateNames = ["Wyoming","Wisconsin","West Virginia","Washington","Virginia","Vermont","Utah","Texas","Tennessee","South Dakota","South Carolina","Rhode Island","Pennsylvania","Oregon","Oklahoma","Ohio","North Dakota","North Carolina","New York","New Mexico","New Jersey","New Hampshire","Nevada","Nebraska","Montana","Missouri","Mississippi","Minnesota","Michigan","Massachusetts","Maryland","Maine","Louisiana","Kentucky","Kansas","Iowa","Indiana","Illinois","Idaho","Georgia","Florida","District of Columbia","Delaware","Connecticut","Colorado","California","Arkansas","Arizona","Alabama"]

for stateIndex, state in enumerate(stateNames):
    time.sleep(0.5) #for throttling
    r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabsrv/render-tooltip-server',
        data = {
        "worksheet": "US Map - State - CMI",
        "dashboard": "CMI",
        "tupleIds": f"[{stateIndex+1}]",
        "vizRegionRect": json.dumps({"r":"viz","x":496,"y":148,"w":0,"h":0,"fieldVector":None}),
        "allowHoverActions": "false",
        "allowPromptText": "true",
        "allowWork": "false",
        "useInlineImages": "true"
    })
    tooltip = json.loads(r.json()["vqlCmdResponse"]["cmdResultList"][0]["commandReturn"]["tooltipText"])["htmlTooltip"]
    soup = BeautifulSoup(tooltip, "html.parser")
    rows = [ 
        t.find("tr").find_all("td")
        for t in soup.find_all("table")
    ]
    entry = { "state": state }
    for row in rows:
        if (row[0].text == "Mobility Index:"):
            entry["CMI"] = "".join([t.text.strip() for t in row[1:]])
        if row[0].text == "YoY (%):":
            entry["YoY"] = "".join([t.text.strip() for t in row[1:]])
    print(entry)
    data.append(entry)

print(data)

Try this on repl.it

To get the county information it's the same as this post using the select endpoint which gives you the data with the same format as the post you've linked in your question

The following will extract data for all county and state :

import requests
from bs4 import BeautifulSoup
import json
import time

data_host = "https://public.tableau.com"
worksheet = "US Map - State - CMI"
dashboard = "CMI"

r = requests.get(
    f"{data_host}/views/CMI-2_0/CMI",
    params= {
        ":showVizHome":"no",
    }
)
soup = BeautifulSoup(r.text, "html.parser")

tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)

dataUrl = f'{data_host}{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'

r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],
})
data = []

stateNames = ["Wyoming","Wisconsin","West Virginia","Washington","Virginia","Vermont","Utah","Texas","Tennessee","South Dakota","South Carolina","Rhode Island","Pennsylvania","Oregon","Oklahoma","Ohio","North Dakota","North Carolina","New York","New Mexico","New Jersey","New Hampshire","Nevada","Nebraska","Montana","Missouri","Mississippi","Minnesota","Michigan","Massachusetts","Maryland","Maine","Louisiana","Kentucky","Kansas","Iowa","Indiana","Illinois","Idaho","Georgia","Florida","District of Columbia","Delaware","Connecticut","Colorado","California","Arkansas","Arizona","Alabama"]

for stateIndex, state in enumerate(stateNames):
    time.sleep(0.5) #for throttling
    r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabsrv/render-tooltip-server',
        data = {
        "worksheet": worksheet,
        "dashboard": dashboard,
        "tupleIds": f"[{stateIndex+1}]",
        "vizRegionRect": json.dumps({"r":"viz","x":496,"y":148,"w":0,"h":0,"fieldVector":None}),
        "allowHoverActions": "false",
        "allowPromptText": "true",
        "allowWork": "false",
        "useInlineImages": "true"
    })
    tooltip = json.loads(r.json()["vqlCmdResponse"]["cmdResultList"][0]["commandReturn"]["tooltipText"])["htmlTooltip"]
    soup = BeautifulSoup(tooltip, "html.parser")
    rows = [ 
        t.find("tr").find_all("td")
        for t in soup.find_all("table")
    ]
    entry = { "state": state }
    for row in rows:
        if (row[0].text == "Mobility Index:"):
            entry["CMI"] = "".join([t.text.strip() for t in row[1:]])
        if row[0].text == "YoY (%):":
            entry["YoY"] = "".join([t.text.strip() for t in row[1:]])

    r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabdoc/select',
        data = {
        "worksheet": worksheet,
        "dashboard": dashboard,
        "selection": json.dumps({
            "objectIds":[stateIndex+1],
            "selectionType":"tuples"
        }),
        "selectOptions": "select-options-simple"
    })
    entry["county_data"] = r.json()["vqlCmdResponse"]["layoutStatus"]["applicationPresModel"]["dataDictionary"]["dataSegments"]
    print(entry)
    data.append(entry)


print(data)
Bertrand Martel
  • 32,363
  • 15
  • 95
  • 118