How to scrape multiple pages with an unchanging URL - Python & BeautifulSoup -


i'm trying scrape website: https://www.99acres.com

so far i've used beautifulsoup execute code , extract data website; however, code right gets me first page. wondering if there's way access other pages, because when click on next page url not change, cannot iterate on different url each time.

below code far:

import io import csv import requests bs4 import beautifulsoup  response = requests.get('https://www.99acres.com/search/property/buy/residential-all/hyderabad?search_type=qs&search_location=cp1&lstacn=cp_r&lstacnid=1&src=cluster&preference=s&selected_tab=1&city=269&res_com=r&property_type=r&isvoicesearch=n&keyword_suggest=hyderabad%3b&bedroom_num=3&fullselectedsuggestions=hyderabad&strentitymap=w3sidhlwzsi6imnpdhkifsx7ijeiolsiahlkzxjhymfkiiwiq0luwv8ynjksifbsruzfukvoq0vfuywgukvtq09nx1iixx1d&texttypedtillsuggestion=hy&refine_results=y&refine_localities=refine%20localities&action=%2fdo%2fquicksearch%2fsearch&suggestion=city_269%2c%20preference_s%2c%20rescom_r&searchform=1&price_min=null&price_max=null') html = response.text soup = beautifulsoup(html, 'html.parser') list=[]  dealer = soup.findall('div',{'class': 'srpwrap'})  item in dealer:     try:         p = item.contents[1].find_all("div",{"class":"_srpttl srpttl fwn wdthfix480 lf"})[0].text     except:         p=''     try:         d = item.contents[1].find_all("div",{"class":"lf f13 hm10 mb5"})[0].text     except:         d=''      li=[p,d]     list.append(li)   open('project.txt','w',encoding="utf-8") file:     writer= csv.writer(file)     row in list:         writer.writerows(row)  file.close() 

i have never worked beautifulsoup, here general approach how this: should index json formatted response ajax response when loading page. here sample using curl:

curl 'https://www.99acres.com/do/quicksearch/getresults_ajax' -h 'pragma: no-cache' -h 'origin: https://www.99acres.com' -h 'accept-encoding: gzip, deflate, br' -h 'accept-language: en-us,en;q=0.8,de;q=0.6,da;q=0.4' -h 'user-agent: mozilla/5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, gecko) chrome/60.0.3112.101 safari/537.36' -h 'content-type: application/x-www-form-urlencoded' -h 'accept: */*' -h 'cache-control: no-cache' -h 'authority: www.99acres.com' -h 'cookie: 99_ab=37; new_visitor=1; 99_fp_visitor_offset=87; 99_suggestor=37; 99nri=2; prop_source=ip; src_city=-1; 99_citypage=-1; sl_prop=0; 99_defsrch=n; res_com=res; kwp_last_action_id_type=2784981911907674%2csearch%2c402278484965075610; 99_city=38; spd=%7b%22p%22%3a%7b%22a%22%3a%22r%22%2c%22b%22%3a%22s%22%2c%22c%22%3a%22r%22%2c%22d%22%3a%22269%22%2c%22j%22%3a%223%22%7d%7d; lsp=p; 99zedoparameters=%7b%22city%22%3a%22269%22%2c%22locality%22%3anull%2c%22budgetbucket%22%3anull%2c%22activity%22%3a%22srp%22%2c%22rescom%22%3a%22res%22%2c%22preference%22%3a%22buy%22%2c%22nri%22%3a%22yes%22%7d; google_search_id=402278484965075610; _sess_id=1oflv%2b%2fpandwweeziigqnutfrkarbutjkqqeyu%2fcv5wkmzcnyvpc89tievpnyate28ubwbcd0ptpvcp9k3o20w%3d%3d; newrequirementsbyuser=0' -h 'referer: https://www.99acres.com/3-bhk-property-in-hyderabad-ffid?orig_property_type=r&search_type=qs&search_location=cp1&pageid=qs' --data 'src=paging&static_search=1&nextbutton=next%20%bb&page=2&button_next=2&lstacnid=2784981911907674&encrypted_input=uib8ifftihwguyb8izcjicb8ienqmsb8izqjicb8idmgize1i3wgihwgmzexodqzmzmsmzexodm5ntugfcagfcaynjkgfcm1iyagfcbsicm0mcn8ica%3d&lstacn=search&sortby=&is_ajax=1' --compressed 

this way can adjust page parameter.


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -