How do I get the dynamically generated content on a ticker webpage from Yahoo finance?

I am currently in the process of doing some webscraping of Yahoo finance using Selenium. In the process of doing so, I discovered a discrepancy between the HTML received from the GET method of the driver and the HTML as seen when doing a right click + inspect of the source HTML when viewing the same web page in the browser.

From what i could glean from reading online sources, this is all down to the fact that Javascript + CSS is used to dynamically generate content on the page. Again, reading up on related posts, the solution basically boils down to “wait until the content you are looking for has been loaded”,then get the HTML. However, despite following guides, tutorials and threads on Stackoverflow, none of the approaches seem to be able to get the HTML of a Yahoo finance ticker webpage, as seen from a browser.

For sake of narrowing down the scope of the question, let’s suppose that the HTML i want to be available is the buy/sell/hold rating of a stock and the stock/ticker I am interested in is Affirm Holdings inc (AFRM):

The value I want to extract

And the associated HTML is here:

The HTML i am interested in retrieving

At the present, this is the code that has gotten the closest to achieving what i want:


'''Import necessary selenium components '''

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.common.action_chains import ActionChains

from selenium.webdriver.chrome.options import Options

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

'''Make the driver run headless i.e do not actually open it in a browser window, running

it entirely in the background'''

chrome_options = Options()

chrome_options.add_argument("--headless")

driver = webdriver.Chrome(options=chrome_options)

'''Wait to ensure all the web elements are loaded'''

driver.implicitly_wait(10)

driver.maximize_window()

'''Get the desired webpage '''

driver.get("https://finance.yahoo.com/quote/AFRM?p=AFRM&.tsrc=fin-srch")

'''Handle the cookie consent popup that appears.

Find the button you want to click. This is done by finding an element with the name

reject'''

button = driver.find_element(By.NAME,'reject')

'''Click the button - by first scrolling down and then clicking '''

ActionChains(driver).move_to_element(button).click().perform()

'''Wait until the HTML element with id=mrt-node-Col2-10-QuoteModule is visible'''

wait = WebDriverWait(driver, 10)

wait.until(EC.presence_of_all_elements_located((By.ID, "mrt-node-Col2-10-QuoteModule")))

'''Get the underlying HTML '''

html = driver.page_source

Why I am convinced this ought to work and confused why it doesn’t

When inspecting the HTML source code as seen in the browser, you can see that the html parts that define the analyst rating graphs has the following component:


id=mrt-node-Col2-10-QuoteModule

Which can be seen here, when inspecting the HTML

The HTML in question

And my code does the folowing

wait = WebDriverWait(driver, 10)

wait.until(EC.presence_of_all_elements_located((By.ID, "mrt-node-Col2-10-QuoteModule")))

Which if i have not misunderstood things, waits for 10 seconds or the until the HTML component with the id “mrt-node-Col2-10-QuoteModule” is visible

However, when i inspect the underlying HTML, it still appears that i get the HTML without said components I.E i am still getting the HTML without anything JavaScript rendered. What is it that i am doing wrong here?

  • did you try to use python module yfinance ? It doesn’t need to use Selenium

    – 




  • some servers may send different HTML for different users and for different devices (desktop, tablet, phone). They can also use some random values to stop scripts/bots/hackers/spamers.

    – 

  • maybe test code without "--headless" to see what you really get in browser.

    – 

Leave a Comment