Appending Scraped Data to Dataframe - Python, Selenium -


i'm learning webscraping , working on eat24 (yelp's website). i'm able scrape basic data yelp, unable pretty simple: append data dataframe. here code, i've notated should simple follow along.

from selenium import webdriver import time import pandas pd selenium.webdriver.support.ui import webdriverwait selenium.webdriver.common.keys import keys  driver = webdriver.chrome()  #go eat24, type in zip code 10007, choose pickup , click search  driver.get("https://new-york.eat24hours.com/restaurants/index.php") search_area = driver.find_element_by_name("address_auto_complete") search_area.send_keys("10007") pickup_element = driver.find_element_by_xpath("//[@id='search_form']/div/table/tbody/tr/td[2]") pickup_element.click() search_button = driver.find_element_by_xpath("//*[@id='search_form']/div/table/tbody/tr/td[3]/button") search_button.click()   #scroll , down on page load more of 'infinity' list  in range(0,3):     driver.execute_script("window.scrollto(0,  document.body.scrollheight);")     driver.execute_script("window.scrollto(0,0);")     time.sleep(1)  #find menu urls  menu_urls = [page.get_attribute('href') page in  driver.find_elements_by_xpath('//*[@title="view menu"]')]  df = pd.dataframe(columns=['name', 'menuitems'])  #collect menu items/prices/name each url url in menu_urls:     driver.get(url)     menu_items = driver.find_elements_by_class_name("cpa")     menu_items = [x.text x in menu_items]     menu_prices = driver.find_elements_by_class_name('item_price')     menu_prices = [x.text x in menu_prices]     name = driver.find_element_by_id('restaurant_name')     menuitems = dict(zip(menu_items, menu_prices))     df['name'] = name     df['menuitems'] = menuitems  df.to_csv('test.csv', index=false) 

the problem @ end. isn't adding menuitems + name successive rows in dataframe. have tried using .loc , other functions got messy removed attempts. appreciated!!

edit: error "valueerror: length of values not match length of index" when loop attempts add second set of menuitems/restaurant name dataframe

i figured out simple solution, not sure why didn't think of before. added "row" count goes 1 on each iteration, , used .loc place data in "row"th row

row = 0 url in menu_urls:     row +=1     driver.get(url)     menu_items = driver.find_elements_by_class_name("cpa")     menu_items = [x.text x in menu_items]     menu_prices = driver.find_elements_by_class_name('item_price')     menu_prices = [x.text x in menu_prices]     name = driver.find_element_by_id('restaurant_name').text     menuitems = [dict(zip(menu_items, menu_prices))]     df.loc[row, 'name'] = name     df.loc[row, 'menuitems'] = menuitems     print df 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -