web scraping - Loop through payload Python -
there site connect to, need login 4 times different user names , passwords.
is there anyway can looping through usernames , passwords in payload.
this first time im doing , not sure of how go it. code works fine if post 1 username , password.
im using python 2.7 , beautifulsoup , requests.
here code.
import requests import zipfile, stringio bs4 import beautifulsoup # here add login details submitted login form. payload = [ {'username': 'xxxxxx','password': 'xxxxxx','option': 'login'}, {'username': 'xxxxxx','password': 'xxxxxxx','option': 'login'}, {'username': 'xxxxx','password': 'xxxxx','option': 'login'}, {'username': 'xxxxxx','password': 'xxxxxx','option': 'login'}, ] #possibly need headers later. headers = {'user-agent': 'mozilla/5.0 (macintosh; intel mac os x 10_12_5) applewebkit/537.36 (khtml, gecko) chrome/59.0.3071.115 safari/537.36'} base_url = "https://service.rl360.com/scripts/customer.cgi/sc/servicing/" requests.session() s: p = s.post('https://service.rl360.com/scripts/customer.cgi?option=login', data=payload) # download page scrape. r = s.get('https://service.rl360.com/scripts/customer.cgi/sc/servicing/downloads.php?folder=datadownloads&sortfield=expirydays&sortorder=ascending', stream=true) content = r.text soup = beautifulsoup(content, 'lxml') #now recent download url. download_url = soup.find_all("a", {'class':'tabletd'})[-1]['href'] #now join base url download url. download_docs = s.get(base_url + download_url, stream=true) print "checking content" content_type = download_docs.headers['content-type'] print content_type print "checking filename" content_name = download_docs.headers['content-disposition'] print content_name print "checking download size" content_size = download_docs.headers['content-length'] print content_size #this extract , download specified xml files. z = zipfile.zipfile(stringio.stringio(download_docs.content)) print "---------------------------------" print "downloading........." #now save files specified location. z.extractall('c:\temp') print "download complete"
just use loop. may need adjust download directory if files overwritten.
payloads = [ {'username': 'xxxxxx1','password': 'xxxxxx','option': 'login'}, {'username': 'xxxxxx2','password': 'xxxxxxx','option': 'login'}, {'username': 'xxxxx3','password': 'xxxxx','option': 'login'}, {'username': 'xxxxxx4','password': 'xxxxxx','option': 'login'}, ] .... payload in payloads: requests.session() s: p = s.post('https://service.rl360.com/scripts/customer.cgi?option=login', data=payload) ...
Comments
Post a Comment