How to scrape multiple pages with an unchanging URL - Python & BeautifulSoup -
i'm trying scrape website: https://www.99acres.com
so far i've used beautifulsoup execute code , extract data website; however, code right gets me first page. wondering if there's way access other pages, because when click on next page url not change, cannot iterate on different url each time.
below code far:
import io import csv import requests bs4 import beautifulsoup response = requests.get('https://www.99acres.com/search/property/buy/residential-all/hyderabad?search_type=qs&search_location=cp1&lstacn=cp_r&lstacnid=1&src=cluster&preference=s&selected_tab=1&city=269&res_com=r&property_type=r&isvoicesearch=n&keyword_suggest=hyderabad%3b&bedroom_num=3&fullselectedsuggestions=hyderabad&strentitymap=w3sidhlwzsi6imnpdhkifsx7ijeiolsiahlkzxjhymfkiiwiq0luwv8ynjksifbsruzfukvoq0vfuywgukvtq09nx1iixx1d&texttypedtillsuggestion=hy&refine_results=y&refine_localities=refine%20localities&action=%2fdo%2fquicksearch%2fsearch&suggestion=city_269%2c%20preference_s%2c%20rescom_r&searchform=1&price_min=null&price_max=null') html = response.text soup = beautifulsoup(html, 'html.parser') list=[] dealer = soup.findall('div',{'class': 'srpwrap'}) item in dealer: try: p = item.contents[1].find_all("div",{"class":"_srpttl srpttl fwn wdthfix480 lf"})[0].text except: p='' try: d = item.contents[1].find_all("div",{"class":"lf f13 hm10 mb5"})[0].text except: d='' li=[p,d] list.append(li) open('project.txt','w',encoding="utf-8") file: writer= csv.writer(file) row in list: writer.writerows(row) file.close()
i have never worked beautifulsoup, here general approach how this: should index json formatted response ajax response when loading page. here sample using curl
:
curl 'https://www.99acres.com/do/quicksearch/getresults_ajax' -h 'pragma: no-cache' -h 'origin: https://www.99acres.com' -h 'accept-encoding: gzip, deflate, br' -h 'accept-language: en-us,en;q=0.8,de;q=0.6,da;q=0.4' -h 'user-agent: mozilla/5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, gecko) chrome/60.0.3112.101 safari/537.36' -h 'content-type: application/x-www-form-urlencoded' -h 'accept: */*' -h 'cache-control: no-cache' -h 'authority: www.99acres.com' -h 'cookie: 99_ab=37; new_visitor=1; 99_fp_visitor_offset=87; 99_suggestor=37; 99nri=2; prop_source=ip; src_city=-1; 99_citypage=-1; sl_prop=0; 99_defsrch=n; res_com=res; kwp_last_action_id_type=2784981911907674%2csearch%2c402278484965075610; 99_city=38; spd=%7b%22p%22%3a%7b%22a%22%3a%22r%22%2c%22b%22%3a%22s%22%2c%22c%22%3a%22r%22%2c%22d%22%3a%22269%22%2c%22j%22%3a%223%22%7d%7d; lsp=p; 99zedoparameters=%7b%22city%22%3a%22269%22%2c%22locality%22%3anull%2c%22budgetbucket%22%3anull%2c%22activity%22%3a%22srp%22%2c%22rescom%22%3a%22res%22%2c%22preference%22%3a%22buy%22%2c%22nri%22%3a%22yes%22%7d; google_search_id=402278484965075610; _sess_id=1oflv%2b%2fpandwweeziigqnutfrkarbutjkqqeyu%2fcv5wkmzcnyvpc89tievpnyate28ubwbcd0ptpvcp9k3o20w%3d%3d; newrequirementsbyuser=0' -h 'referer: https://www.99acres.com/3-bhk-property-in-hyderabad-ffid?orig_property_type=r&search_type=qs&search_location=cp1&pageid=qs' --data 'src=paging&static_search=1&nextbutton=next%20%bb&page=2&button_next=2&lstacnid=2784981911907674&encrypted_input=uib8ifftihwguyb8izcjicb8ienqmsb8izqjicb8idmgize1i3wgihwgmzexodqzmzmsmzexodm5ntugfcagfcaynjkgfcm1iyagfcbsicm0mcn8ica%3d&lstacn=search&sortby=&is_ajax=1' --compressed
this way can adjust page
parameter.
Comments
Post a Comment