python - Scrapying a table with <p style="position -


i scrape page have several paragraph tags :

 <p style="position:absolute;top:110px;left:65px"> 

and have 4 elements in each line following :

<p style="position:absolute;top:110px;left:65px"><span style="virtical-align:top;font-size:9px;font-family:gothictext;color:#000000;letter-spacing:0.00000px;stringwidth:86px;">information a_110</span></p>  <p style="position:absolute;top:110px;left:173px"><span style="virtical-align:top;font-size:9px;font-family:gothictext;color:#000000;letter-spacing:1.64571px;stringwidth:115px;">information b_110</span></p> <p style="position:absolute;top:110px;left:403px"><span style="virtical-align:top;font-size:9px;font-family:gothictext;color:#000000;letter-spacing:1.55520px;stringwidth:194px;">information c_110</span></p> <p style="position:absolute;top:110px;left:814px"><span style="virtical-align:top;font-size:9px;font-family:gothictext;color:#000000;letter-spacing:1.59158px;stringwidth:151px;"> information d_110</span></p>  <p style="position:absolute;top:110px;left:1080px"><span style="virtical-align:top;font-size:9px;font-family:gothictext;color:#000000;letter-spacing:0.00000px;stringwidth:36px;"> information e_110</span></p> 

i stack information a_110, b_110, c_110, d_110 table each line.

what far: record pages exact position (left :1080px, etc.) , xparse each position extract information. problem don't detect automatically position (top:110px;left:1080px instance), need enter manually.

the drawback of approach can omit data points (for instance if position becomes top:111px;left:1080px instead of top:110px;left:1080px).


Comments

Popular posts from this blog

python Tkinter Capturing keyboard events save as one single string -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

javascript - Z-index in d3.js -