bash - Using sed to remove period at the end of string (zip code) -
i have file of addresses attempting scrub , using sed
rid of unwanted charachters , formatting. in case, have zip codes followed period:
mr. john doe exclusively stuff, 186 caravelle drive, ponte vedra fl 33487.
(for time being, ignore new lines; focusing on zip , period now)
i want remove period (.) zip first step in cleaning up. tried use sub strings in sed follows (using "|" delimiter - easier me see):
sed 's|\([0-9]{4}\)\.|\1|g' test.txt
unfortunately, doesn't remove period. prints out part of sub string based on post: replace period surrounded characters sed
a point in right direction appreciated.
you specified 4 digits {4}
have 5 , have escape {
, }
, example:
sed 's|\(^[0-9]\{5\}\).*|\1|g' test.txt
notice have space after dot, might want trim following 5 digits safe might want specify must @ start of line ^
.
in case, if type info sed
more complete man sed
, find this:
'-r' '--regexp-extended' use extended regular expressions rather basic regular expressions. extended regexps 'egrep' accepts; can clearer because have less backslashes, gnu extension , hence scripts use them not portable. *note extended regular expressions: extended regexps.
and under appendix extended regular expressions
can read:
the difference between basic , extended regular expressions in behavior of few characters: '?', '+', parentheses, braces ('{}'), , '|'. while basic regular expressions require these escaped if want them behave special characters, when using extended regular expressions must escape them if want them _to match literal character_. '|' special here because '\|' gnu extension - standard basic regular expressions not provide functionality. examples: 'abc?' becomes 'abc\?' when using extended regular expressions. matches literal string 'abc?'. 'c\+' becomes 'c+' when using extended regular expressions. matches 1 or more 'c's. 'a\{3,\}' becomes 'a{3,}' when using extended regular expressions. matches 3 or more 'a's. '\(abc\)\{2,3\}' becomes '(abc){2,3}' when using extended regular expressions. matches either 'abcabc' or 'abcabcabc'. '\(abc*\)\1' becomes '(abc*)\1' when using extended regular expressions. backreferences must still escaped when using extended regular expressions.
Comments
Post a Comment