html - Having trouble being specific enough to scrape what I want from a tag with Beautiful Soup and Python -


here's sample of html want scrape.

<a id="catalogentry_img3677183" href="http://www.academy.com/shop/pdp/under-armour%e2%84%a2-mens-tide-chaser-short-sleeve-shirt#repchildcatid=4099002" title="under armour men's tide chaser short sleeve shirt" onclick="javascript:dltrackproductgridclicks(&quot;109457178&quot;,&quot;under armour men's tide chaser short sleeve shirt&quot;,&quot;3677183&quot;);"> 

and retrieve link inside quotations href attribute. here's code wrote.

    a_ids = page_soup.findall("a")      in range(len(a_ids)):         output = a_ids[a]["href"]         print(output) 

however, results code includes bunch of messy stuff other tags below.

<a href="http://www.academy.com/shop/pdp/bcg-mens-turbo-mesh-short-sleeve-t- shirt#repchildcatid=4190420" id="catalogentry_img4181006"  onclick="javascript:dltrackproductgridclicks(&quot;109409336&quot;,&quot;bcg  men's turbo mesh short sleeve t-shirt&quot;,&quot;4181006&quot;);"  title="bcg men's turbo mesh short sleeve t-shirt"> <img alt="bcg men's turbo mesh short sleeve t-shirt" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming- soon.jpg';" src="//assets.academy.com/mgen/12/10740412.jpg?is=500,500"/> <div class="product-info-attributes"> <!-- begin ayrpricedisplay.jspf --> <div class="z-pricing" id="offerprice_4181006">         $9.99                </div> 

i want link in href tag. how can target specific link want? reference, url i'm trying scrape here: http://www.academy.com/shop/browse/apparel/mens-apparel/mens-shirts--t-shirts

the len function not needed since find_all returns list.

just do

a_ids = soup.find_all("a")  in a_ids:     output = a["href"]     print(output) 

or shorter:

hrefs = [a['href'] in soup.find_all('a')] in hrefs:     print(a) 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -