python - Get All Strings Between Two Different Tags -


i trying put chat log of conversations i've had people. want able break out name, time, , text. because conversations i'm pulling aren't in nice , neat csv file, need scrape source code. following code below. there way pull strings between <div class='message'> , </p> can put each individual chat message respective sender , time sent? thanks!

<div class="message"><div class="message_header"><span class="user">first lastname</span><span class="meta">tuesday, january 1, 2000 @ 5:00pm est</span></div></div><p>text here</p>  

you can using regular expressions. here came with. note regex tested, python code not complete. should able figure out i'm doing. if need more explanation on regex or way implement it, let me know. i'll adjust answer.

import re  #put in loop files        line = #get line file     m = re.match(r"<div class=\"message\">.*<span class=\"user\">(.*)<\/span><span class=\"meta\">(.*)<\/span>.*<p>(.*)<\/p>", line)     name = m.group(1)       # name     time = m.group(2)       # time     message = m.group(3)    # message 

Comments

Popular posts from this blog

python Tkinter Capturing keyboard events save as one single string -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

javascript - Z-index in d3.js -