python - Scrapying a table with <p style="position -
i scrape page have several paragraph tags :
<p style="position:absolute;top:110px;left:65px">
and have 4 elements in each line following :
<p style="position:absolute;top:110px;left:65px"><span style="virtical-align:top;font-size:9px;font-family:gothictext;color:#000000;letter-spacing:0.00000px;stringwidth:86px;">information a_110</span></p> <p style="position:absolute;top:110px;left:173px"><span style="virtical-align:top;font-size:9px;font-family:gothictext;color:#000000;letter-spacing:1.64571px;stringwidth:115px;">information b_110</span></p> <p style="position:absolute;top:110px;left:403px"><span style="virtical-align:top;font-size:9px;font-family:gothictext;color:#000000;letter-spacing:1.55520px;stringwidth:194px;">information c_110</span></p> <p style="position:absolute;top:110px;left:814px"><span style="virtical-align:top;font-size:9px;font-family:gothictext;color:#000000;letter-spacing:1.59158px;stringwidth:151px;"> information d_110</span></p> <p style="position:absolute;top:110px;left:1080px"><span style="virtical-align:top;font-size:9px;font-family:gothictext;color:#000000;letter-spacing:0.00000px;stringwidth:36px;"> information e_110</span></p>
i stack information a_110, b_110, c_110, d_110
table each line.
what far: record pages exact position (left :1080px
, etc.) , xparse each position extract information. problem don't detect automatically position (top:110px;left:1080px
instance), need enter manually.
the drawback of approach can omit data points (for instance if position becomes top:111px;left:1080px
instead of top:110px;left:1080px
).
Comments
Post a Comment