RSelenium not moving to third page or crashes with errors No active session with ID or unknown server-side error

I am trying to get all links titled “Read More” from this page using RSelenium and rvest

The code I’m using is the following

igop_get_links <- function(url = "https://igop.uab.cat/category/publicacions/"){
  site <- rvest::read_html(url)
  taula <- rvest::html_elements(site, ".paginated_content")
  text <- rvest::html_text(rvest::html_elements(taula, "a"))
  links <- rvest::html_attr(rvest::html_elements(taula, "a"), "href")
  df <- data.frame(text = text,
                   url = links)
  df <- df[df$text== "Read More",]
  return(df)
}

igop_get_pages <- function(url = "https://igop.uab.cat/category/publicacions"){
  links <- igop_get_links(url)
  # get max number of pages
  site <- rvest::read_html(url)
  max <- rvest::html_text(rvest::html_elements(site, ".pagination"))
  max <- strsplit(max, "\n\t\t\t\t")
  max <- sapply(max, function(x) gsub("\n|\t|\\.{3}", "", x), USE.NAMES = FALSE)
  max <- max(as.numeric(max[max != ""]))
  remDr <- RSelenium::rsDriver(
    remoteServerAddr = "localhost",
    port = 4445L,
    browser = "firefox",chromever = NULL,
    iedrver = NULL,
    phantomver = NULL
  )
  remDr <- remDr[["client"]]
  remDr$navigate(url)
  for(i in 1:(max-1)){
    webElem <- remDr$findElement(using = 'css selector',"a.next")
    webElem$clickElement()
    remDr$setTimeout(type = "page load", milliseconds = 10000)
    linkspage <- igop_get_links(remDr$getCurrentUrl()[[1]])
    links <- rbind(links, linkspage)
    # linkspage <- s |>
    #   rvest::session_follow_link(css = "a.next") |>
    #   igop_get_links()
    # links <- rbind(links, linkspage)
  }
  remDr$close()
  return(links)

}

However, when I try to run t3 <- igop_get_pages() either one of these three things happens without me changing any of the code.
It crashes and returns the following error

Selenium message:No active session with ID 87c316d8-ded8-41e7-94d7-4a119e4006c1

Error:   Summary: NoSuchDriver
     Detail: A session is either terminated or not started
     Further Details: run errorDetails method

It crashes with the following message

Could not open firefox browser.
Client error message:
     Summary: UnknownError
     Detail: An unknown server-side error occurred while processing the command.
     Further Details: run errorDetails method
Check server log for further details.
Error in checkError(res) : 
  Undefined error in httr call. httr output: length(url) == 1 is not TRUE

Or it doesn’t throw any error but it is incapable of navigating further than the second page, i.e. reads the first page, clicks “next” button, reads the second page and then goes back to the first page and repeats the process. This should not be happening,the “previous” button has a different css selector (a.prev predictably). I have tried using rvest::session_follow_link but it does not work, since the URL as such does not change (it’s always https://igop.uab.cat/category/publicacions/# instead of https://igop.uab.cat/category/publicacions/2-3-whatever).

I am using firefox 118.0.2 on Windows.

Leave a Comment