When performing web scraping or test automation using Selenium, a common requirement is to extract the href
value of an anchor (<a>
) element with a dynamic or patterned id
like id="p1234"
.
In this guide, youβll learn how to extract href
values from anchor elements by targeting IDs that match a pattern, such as p123
, p45678
, etc., using Python and Java with Selenium WebDriver.
π Problem Statement
You are trying to extract the href
value from anchor tags like:
<a href="/profile/user-profile.html" id="p12345">User Profile</a>
<a href="/profile/another-profile.html" id="p67890">Another</a>
Where the id
follows the format p
followed by digits.
You might try find_element_by_id()
in Selenium, but the issue is that the ID is dynamic, and you donβt know the exact number.
π Solution in Python Selenium
You can solve this using XPath with a partial match using regular expressions or starts-with
.
β Step-by-Step Python Code
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("file:///path/to/your/file.html") # or your target URL
# Use XPath to find all <a> elements with ID starting with 'p' followed by digits
elements = driver.find_elements(By.XPATH, "//a[starts-with(@id, 'p') and @href]")
for element in elements:
element_id = element.get_attribute("id")
href = element.get_attribute("href")
print(f"ID: {element_id}, HREF: {href}")
driver.quit()
π Output:
ID: p12345, HREF: https://example.com/profile/user-profile.html
ID: p67890, HREF: https://example.com/profile/another-profile.html
β
Bonus Tip: You can use re
(regex) in Python to further filter the results if needed.
β Java Solution Using Selenium WebDriver
Java also allows XPath filtering with Selenium using the same logic.
β Java Code
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import java.util.List;
public class ExtractHrefById {
public static void main(String[] args) {
WebDriver driver = new ChromeDriver();
driver.get("file:///path/to/your/file.html"); // or your target URL
// Find elements with id starting with 'p' and get href
List<WebElement> elements = driver.findElements(By.xpath("//a[starts-with(@id, 'p') and @href]"));
for (WebElement el : elements) {
String id = el.getAttribute("id");
String href = el.getAttribute("href");
System.out.println("ID: " + id + ", HREF: " + href);
}
driver.quit();
}
}
π XPath Breakdown
//a[starts-with(@id, 'p') and @href]
//a
β Selects all anchor tags.starts-with(@id, 'p')
β Filters whereid
starts with"p"
.@href
β Ensures the element has anhref
attribute.
If your id
pattern is more complex (like p
followed strictly by digits), XPath 2.0 would allow full regex. But in Selenium, use starts-with
+ filtering in code.