Parse textfile without fixed structure using python dictionary and Pandas -
i have .txt file without specific separators , parse it, need count character character know starts , ends column. so, constructed python dictionary keys column names , values number of characters takes each column:
headers = {first_col: 3, second_col: 5, third_col: 2, ... nth_col: n_chars}
having in mind, know 3 first columns of following line in .txt file
abc123-3yn0000000001203abc123*testingline
first_col: abc second_col: 123-3 third_col: yn
i want know if there pandas function helps me parse .txt taking account particular condition , (if possible) using headers
dictionary.
using dictionary dangerous because order not guaranteed. meaning, if picked third_col
first, you've thrown of entire scheme. can fix using lists. there, can use pd.read_fwf
read fixed formatted text file.
solution
names = ['first_col', 'second_col', 'third_col'] widths = [3, 5, 2] pd.read_fwf( 'myfile.txt', widths=widths, names=names ) first_col second_col third_col 0 abc 123-3 yn
you can use ordereddict
collections
library , make sure keep order want passing iterator produces tuples in correct order
from collections import ordereddict names = ['first_col', 'second_col', 'third_col'] widths = [3, 5, 2] header = ordereddict(zip(names, widths)) pd.read_fwf( 'myfile.txt', widths=header.values(), names=header.keys() ) first_col second_col third_col 0 abc 123-3 yn
demonstration
from collections import ordereddict txt = """abc123-3yn0000000001203abc123*testingline""" names = ['first_col', 'second_col', 'third_col'] widths = [3, 5, 2] header = ordereddict(zip(names, widths)) pd.read_fwf( 'myfile.txt', widths=header.values(), names=header.keys() ) first_col second_col third_col 0 abc 123-3 yn
Comments
Post a Comment