1

Interesting problem. I'm scraping a betting site with selenium, then processing with bs4. Problem is, the way the site loads its odds information is different to how it loads the team names. For example:

London v Tokyo            2/1   4/1
Amsterdam v Helsinki      5/1   3/1

New York v California     7/1   10/1

When I pull this and iterate over it, it comes out like so:

Names = [London, Tokyo, Amsterdam, Helsinki]
Odds = [2/1, 5/1, 4/1, 3/1, 7/1, 10/1]

The odds are loading top to bottom, left to right, in varying length chunks. Which means when I try to splice the names and odds together, they won't match up.

My question is, how can I get around this? I want to eventually have the information come out so the team name is followed by its odds:

Games = [London, 2/1, Tokyo, 4/1, Amsterdam, 5/1, Helsinki, 3/1, New York, 7/1, California, 10/1]

** UPDATE ** The site is: https://www.bet365.com/#/AC/B151/C1/D50/E2/F163/ If you get a landing page then just click through. Then "Esports" on the left panel, then "All Matches" from the midpage.

Code:

from selenium import webdriver
from bs4 import BeautifulSoup

url = "https://www.bet365.com/#/AC/B151/C1/D50/E2/F163/"
driver = webdriver.Chrome()
driver.get(url)

# Then i'm navigating to the "All Matches" page

soup = BeautifulSoup(driver.page_source, 'html.parser')
teams = driver.find_elements_by_class_name("sl-CouponParticipantWithBookCloses_Name")
odds_raw = driver.find_elements_by_class_name("gl-ParticipantOddsOnly_Odds")

odds = []
teams_text = []
new_teams = []
new_odds = []

for name in teams:
teams_text.append(name.text)

Teams come in like blocks so for example: "London v Tokyo". So to get the team names separated I iterate and split them

for name in teams_text:
first, second = name.split(" v ")
new_teams.append(first)
new_teams.append(second)

Then I turn the odds that are received fractionally, and turn them into decimal:

for odd in odds_raw:
odds.append(odd.text)
for odd in odds:
first, second = odd.split("/")
new_odd = (int(first) / int(second)) + 1
new_odds.append(round(new_odd, 2))

So now I have a list of all team names, and a list of decimal odd values. This is where my problem is. The way bet365 produces it's odds for the matches are in vertical blocks of varying lengths for each game division.

So if the odds look like this:

Division 1
London v Tokyo        1   2
Amsterdam v Helsinki  3   4
Division 2
New York v California 5   6
Division 3
Sydney v Brisbane     7   8
Bali v Singapore      9   10
Berlin v Paris        11  12

Then when I pull them, the odds will come out like:

[1, 3, 2, 4, 5, 6, 7, 9, 11, 8, 10, 12]

Where the divisions are varying lengths, I'm having a hard time figuring out how to approach it.

BASmith
  • 29
  • 6

3 Answers3

0

You can use regexes to capture the elements.

import re
s = '''London v Tokyo 2/1 4/1 Amsterdam v Helsinki 5/1 3/1 New York v California 7/1 10/1'''
re.findall(r'(\w+)\s+v\s+(\w+)\s+(\d+/\d+)\s+(\d+/\d+)', s)

[('London', 'Tokyo', '2/1', '4/1'),
 ('Amsterdam', 'Helsinki', '5/1', '3/1'),
 ('York', 'California', '7/1', '10/1')]
BallpointBen
  • 5,916
  • 1
  • 27
  • 47
0

You could achieve your desired output using a for loop like this:

Names = ["London", "Tokyo", "Amsterdam", "Helsinki","New York","California"]
Odds = [2/1, 5/1, 4/1, 3/1, 7/1, 10/1]
start_nmb = 1 

for nmb, odd in enumerate(Odds):
    Names.insert(start_nmb, odd)
    start_nmb += 2

output:

['London', 2.0, 'Tokyo', 5.0, 'Amsterdam', 4.0, 'Helsinki', 3.0, 'New York', 7.0, 'California', 10.0]

Hope this helps!

Nazim Kerimbekov
  • 3,965
  • 6
  • 23
  • 48
  • Tried this and it unfortunately didn't seem to work! I updated the question with a lot more information, if you care to read it. Thanks for the suggestion though – BASmith May 05 '19 at 09:26
0

Here is a long winded way to try. Odd rows (as determined by loop) for odds go into team 1 (the left hand side of the team 1 v team2. Even rows go into team2. Lists of lists are flattened. Lists are then combined as shown in answer here by @user942640 to alternate members.

Note: This relies on equal length list to return accurate results.

import itertools
from bs4 import BeautifulSoup as bs
#your existing code to get to page and wait for presence of all elements
soup = bs(driver.page_source, 'lxml')
teams = [item.text.split(' v ') for item in soup.select('.sl-CouponParticipantWithBookCloses_NameContainer')]

i = 0
team1 = []
team2 = []

for item in soup.select('.sl-MarketCouponValuesExplicit2'):
    if i % 2 == 0:
        team1.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
    else:
        team2.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
    i+=1

team1 =  [item for sublist in team1 for item in sublist]
team2 =  [item for sublist in team2 for item in sublist]
teams = [item for sublist in teams for item in sublist]
team_odds =  [x for x in itertools.chain.from_iterable(itertools.zip_longest(team1,team2)) if x]
final = [x for x in itertools.chain.from_iterable(itertools.zip_longest(teams, team_odds)) if x]
print(final)

So, something like (noting that odds keep updating):

from selenium import webdriver
import itertools
from bs4 import BeautifulSoup as bs
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get('https://www.bet365.com/#/HO/')
driver.get('https://www.bet365.com/#/AC/B151/C1/D50/E2/F163/')
WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".sl-MarketCouponValuesExplicit2")))
soup = bs(driver.page_source, 'lxml')
teams = [item.text.split(' v ') for item in soup.select('.sl-CouponParticipantWithBookCloses_NameContainer')]

i = 0
team1 = []
team2 = []

for item in soup.select('.sl-MarketCouponValuesExplicit2'):
    if i % 2 == 0:
        team1.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
    else:
        team2.append([i.text for i in item.select('div:not(.gl-MarketColumnHeader )')])
    i+=1

team1 =  [item for sublist in team1 for item in sublist]
team2 =  [item for sublist in team2 for item in sublist]
teams = [item for sublist in teams for item in sublist]

team_odds =  [x for x in itertools.chain.from_iterable(itertools.zip_longest(team1,team2)) if x]
final = [x for x in itertools.chain.from_iterable(itertools.zip_longest(teams, team_odds)) if x]
print(final)
QHarr
  • 72,711
  • 10
  • 44
  • 81
  • Wow. This was really really great. Some nice elegant solutions to shorten up my code in here too which I will take note on. Fantastic! Learnt a lot from this, thanks so much :) – BASmith May 05 '19 at 11:19