python - Get All Strings Between Two Different Tags -
i trying put chat log of conversations i've had people. want able break out name, time, , text. because conversations i'm pulling aren't in nice , neat csv file, need scrape source code. following code below. there way pull strings between <div class='message'>
, </p>
can put each individual chat message respective sender , time sent? thanks!
<div class="message"><div class="message_header"><span class="user">first lastname</span><span class="meta">tuesday, january 1, 2000 @ 5:00pm est</span></div></div><p>text here</p>
you can using regular expressions. here came with. note regex tested, python code not complete. should able figure out i'm doing. if need more explanation on regex or way implement it, let me know. i'll adjust answer.
import re #put in loop files line = #get line file m = re.match(r"<div class=\"message\">.*<span class=\"user\">(.*)<\/span><span class=\"meta\">(.*)<\/span>.*<p>(.*)<\/p>", line) name = m.group(1) # name time = m.group(2) # time message = m.group(3) # message
Comments
Post a Comment