html - Having trouble being specific enough to scrape what I want from a tag with Beautiful Soup and Python -
here's sample of html want scrape.
<a id="catalogentry_img3677183" href="http://www.academy.com/shop/pdp/under-armour%e2%84%a2-mens-tide-chaser-short-sleeve-shirt#repchildcatid=4099002" title="under armour men's tide chaser short sleeve shirt" onclick="javascript:dltrackproductgridclicks("109457178","under armour men's tide chaser short sleeve shirt","3677183");">
and retrieve link inside quotations href attribute. here's code wrote.
a_ids = page_soup.findall("a") in range(len(a_ids)): output = a_ids[a]["href"] print(output)
however, results code includes bunch of messy stuff other tags below.
<a href="http://www.academy.com/shop/pdp/bcg-mens-turbo-mesh-short-sleeve-t- shirt#repchildcatid=4190420" id="catalogentry_img4181006" onclick="javascript:dltrackproductgridclicks("109409336","bcg men's turbo mesh short sleeve t-shirt","4181006");" title="bcg men's turbo mesh short sleeve t-shirt"> <img alt="bcg men's turbo mesh short sleeve t-shirt" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming- soon.jpg';" src="//assets.academy.com/mgen/12/10740412.jpg?is=500,500"/> <div class="product-info-attributes"> <!-- begin ayrpricedisplay.jspf --> <div class="z-pricing" id="offerprice_4181006"> $9.99 </div>
i want link in href tag. how can target specific link want? reference, url i'm trying scrape here: http://www.academy.com/shop/browse/apparel/mens-apparel/mens-shirts--t-shirts
the len function not needed since find_all returns list.
just do
a_ids = soup.find_all("a") in a_ids: output = a["href"] print(output)
or shorter:
hrefs = [a['href'] in soup.find_all('a')] in hrefs: print(a)
Comments
Post a Comment