javascript - Load a SPA webpage via AJAX -
i'm trying fetch entire webpage using javascript plugging in url. however, website built single page application (spa) uses javascript / backbone.js dynamically load of it's contents after rendering initial response.
so example, when route following address:
https://connect.garmin.com/modern/activity/1915361012
and enter console (after page has loaded):
var $page = $("html") console.log("%c✔: ", "color:green;", $page.find(".inline-edit-target.page-title-overflow").text().trim()); console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());
then i'll dynamically loaded activity title statically loaded page footer:
however, when try load webpage via ajax call either $.get()
or .load()
, delivered initial response (the same content when on view-source):
view-source:https://connect.garmin.com/modern/activity/1915361012
so if use either of the following ajax calls:
// jquery.get() var url = "https://connect.garmin.com/modern/activity/1915361012"; jquery.get(url,function(data) { var $page = $("<div>").html(data) console.log("%c✖: ", "color:red;", $page.find(".page-title").text().trim()); console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim()); }); // jquery.load() var url = "https://connect.garmin.com/modern/activity/1915361012"; var $page = $("<div>") $page.load(url, function(data) { console.log("%c✖: ", "color:red;", $page.find(".page-title").text().trim() ); console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim()); });
i'll still initial footer, won't of other page contents:
i've tried solution here eval()
contents of every script
tag, doesn't appear robust enough load page:
jquery.get(url,function(data) { var $page = $("<div>").html(data) $page.find("script").each(function() { var scriptcontent = $(this).html(); //grab content of tag eval(scriptcontent); //execute content }); console.log("%c✖: ", "color:red;", $page.find(".page-title").text().trim()); console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim()); });
q: options load webpage scrapable on javascript?
you never able replicate arbitrary (spa) page does.
the way see using headless browser such phantomjs or headless chrome, or headless firefox.
i wanted try headless chrome let's see can page:
quick check using internal repl
load page chrome headless (you'll need chrome 59 on mac/linux, chrome 60 on windows), , find page title javascript repl:
% chrome --headless --disable-gpu --repl https://connect.garmin.com/modern/activity/1915361012 [0830/171405.025582:info:headless_shell.cc(303)] type javascript expression evaluate or "quit" exit. >>> $('body').find('.page-title').text().trim() {"result":{"type":"string","value":"daily mile - round 2 - day 27"}}
nb: chrome
command line working on mac did beforehand:
alias chrome="'/applications/google chrome.app/contents/macos/google chrome'"
using programmatically node & puppeteer
puppeteer node library (by google chrome developers) provides high-level api control headless chrome on devtools protocol. can configured use full (non-headless) chrome.
(step 0 : install node & yarn if don't have them)
in new directory:
yarn init yarn add puppeteer
create index.js
this:
const puppeteer = require('puppeteer'); (async() => { const url = 'https://connect.garmin.com/modern/activity/1915361012'; const browser = await puppeteer.launch(); const page = await browser.newpage(); // go url , wait page load await page.goto(url, {waituntil: 'networkidle'}); // wait results show await page.waitforselector('.page-title'); // extract results page const text = await page.evaluate(() => { const title = document.queryselector('.page-title'); return title.innertext.trim(); }); console.log(`found: ${text}`); browser.close(); })();
result:
$ node index.js found: daily mile - round 2 - day 27
Comments
Post a Comment