javascript - Load a SPA webpage via AJAX -


i'm trying fetch entire webpage using javascript plugging in url. however, website built single page application (spa) uses javascript / backbone.js dynamically load of it's contents after rendering initial response.

so example, when route following address:

https://connect.garmin.com/modern/activity/1915361012 

and enter console (after page has loaded):

var $page = $("html") console.log("%c✔: ", "color:green;", $page.find(".inline-edit-target.page-title-overflow").text().trim()); console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim()); 

then i'll dynamically loaded activity title statically loaded page footer:

working screenshot


however, when try load webpage via ajax call either $.get() or .load(), delivered initial response (the same content when on view-source):

view-source:https://connect.garmin.com/modern/activity/1915361012 

so if use either of the following ajax calls:

// jquery.get() var url = "https://connect.garmin.com/modern/activity/1915361012"; jquery.get(url,function(data) {     var $page = $("<div>").html(data)     console.log("%c✖: ", "color:red;",   $page.find(".page-title").text().trim());     console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim()); });  // jquery.load() var url = "https://connect.garmin.com/modern/activity/1915361012"; var $page = $("<div>") $page.load(url, function(data) {     console.log("%c✖: ", "color:red;",   $page.find(".page-title").text().trim()    );     console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim()); }); 

i'll still initial footer, won't of other page contents:

broken - screenshot


i've tried solution here eval() contents of every script tag, doesn't appear robust enough load page:

jquery.get(url,function(data) {     var $page = $("<div>").html(data)     $page.find("script").each(function() {         var scriptcontent = $(this).html(); //grab content of tag         eval(scriptcontent); //execute content     });     console.log("%c✖: ", "color:red;",   $page.find(".page-title").text().trim());     console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim()); }); 

q: options load webpage scrapable on javascript?

you never able replicate arbitrary (spa) page does.

the way see using headless browser such phantomjs or headless chrome, or headless firefox.

i wanted try headless chrome let's see can page:

quick check using internal repl

load page chrome headless (you'll need chrome 59 on mac/linux, chrome 60 on windows), , find page title javascript repl:

% chrome --headless --disable-gpu --repl https://connect.garmin.com/modern/activity/1915361012 [0830/171405.025582:info:headless_shell.cc(303)] type javascript expression evaluate or "quit" exit. >>> $('body').find('.page-title').text().trim()  {"result":{"type":"string","value":"daily mile - round 2 - day 27"}} 

nb: chrome command line working on mac did beforehand:

alias chrome="'/applications/google chrome.app/contents/macos/google chrome'" 

using programmatically node & puppeteer

puppeteer node library (by google chrome developers) provides high-level api control headless chrome on devtools protocol. can configured use full (non-headless) chrome.

(step 0 : install node & yarn if don't have them)

in new directory:

yarn init yarn add puppeteer 

create index.js this:

const puppeteer = require('puppeteer'); (async() => {     const url = 'https://connect.garmin.com/modern/activity/1915361012';     const browser = await puppeteer.launch();     const page = await browser.newpage();     // go url , wait page load     await page.goto(url, {waituntil: 'networkidle'});     // wait results show     await page.waitforselector('.page-title');     // extract results page     const text = await page.evaluate(() => {         const title = document.queryselector('.page-title');         return title.innertext.trim();     });     console.log(`found: ${text}`);     browser.close(); })(); 

result:

$ node index.js  found: daily mile - round 2 - day 27 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -