Impossible To Scrape Some Websites Using Python?

Question 1

I am trying to scrape a particular website https://birdeye.so/find-gems?chain=solana, but unable to load the data within the table. I am only able to get the table’s headers, such as Token, Trending, etc.

Are some pages just impossible to scrape? If so, why exactly?

Below is my code. I’ve attempted to scrape this page using Selenium, but am unable to load all of the contents. What am I doing wrong?

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
from bs4 import BeautifulSoup
import requests

driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://birdeye.so/find-gems?chain=solana/")
html = driver.page_source
soup = BeautifulSoup(html)
print(soup)

Question 2

the codes below should work


from selenium import webdriver
from bs4 import BeautifulSoup as soup
import pandas as pd
from selenium import webdriver
from selenium.webdriver.edge.service import Service
from selenium.webdriver import EdgeOptions
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By

url = r"https://birdeye.so/find-gems?chain=solana/"
service=Service(executable_path = r'C:\Users\10696\Desktop\access\zhihu\msedgedriver\msedgedriver.exe')
edge_options = EdgeOptions()
edge_options.add_experimental_option('excludeSwitches', ['enable-automation'])
edge_options.add_experimental_option('useAutomationExtension', False)
edge_options.add_argument('lang=zh-CN,zh,zh-TW,en-US,en')
edge_options.add_argument("disable-blink-features=AutomationControlled")#

driver = webdriver.Edge(options=edge_options, service = service)
driver.get(url)
WebDriverWait(driver, timeout=10).until(lambda d: d.find_element(By.CLASS_NAME, "ant-table-cell"))
pag = driver.find_element(By.CLASS_NAME, "ant-table-tbody")
pag = driver.execute_script("return arguments[0].innerHTML;", pag)

table = soup(pag, "html.parser")

Leave a Comment Cancel reply