Parse textfile without fixed structure using python dictionary and Pandas -


i have .txt file without specific separators , parse it, need count character character know starts , ends column. so, constructed python dictionary keys column names , values number of characters takes each column:

headers = {first_col: 3, second_col: 5, third_col: 2, ... nth_col: n_chars} 

having in mind, know 3 first columns of following line in .txt file

abc123-3yn0000000001203abc123*testingline 

first_col: abc second_col: 123-3 third_col: yn

i want know if there pandas function helps me parse .txt taking account particular condition , (if possible) using headers dictionary.

using dictionary dangerous because order not guaranteed. meaning, if picked third_col first, you've thrown of entire scheme. can fix using lists. there, can use pd.read_fwf read fixed formatted text file.

solution

names = ['first_col', 'second_col', 'third_col'] widths = [3, 5, 2]  pd.read_fwf(     'myfile.txt',     widths=widths,     names=names )    first_col second_col third_col 0       abc      123-3        yn 

you can use ordereddict collections library , make sure keep order want passing iterator produces tuples in correct order

from collections import ordereddict  names = ['first_col', 'second_col', 'third_col'] widths = [3, 5, 2]  header = ordereddict(zip(names, widths))  pd.read_fwf(     'myfile.txt',     widths=header.values(),     names=header.keys() )    first_col second_col third_col 0       abc      123-3        yn 

demonstration

from collections import ordereddict  txt = """abc123-3yn0000000001203abc123*testingline"""  names = ['first_col', 'second_col', 'third_col'] widths = [3, 5, 2]  header = ordereddict(zip(names, widths))  pd.read_fwf(     'myfile.txt',     widths=header.values(),     names=header.keys() )    first_col second_col third_col 0       abc      123-3        yn 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -