Using bash grep -Po regex fails if string has an underscore -
i have searched , gasp read man pages , still can't figure out whats , how fix it... admit being regex newb, no shame! (ubuntu 12.04, bash 4.2.25, gnu grep 2.10)
as part of script bunch of other interesting things (which seem work) i'm attempting extract data file names... there expected patterns exist.. example file names have date: date in format "yyyy-mm-dd" handily can grep out whole thing , break down later grepping '\b[0-9]{4}.{1}[0-9]{2}.{1}[0-9]{2}\b'
(in fact can safely target year directly '\b[0-9]{4}\b'
) works fine if input string looks either of these:
something 1989-07-23 something.jpg" or "foo-2013-01-10-bar.csv
but if looks wordsidon'tcareabout_2004-09-14_otherthings.tif
or foofoobarbar_2010-07-16.gif
grep finds no matches.
what gives underscores? why cause regex fail? , there better way go may ignorant of? have ultra-minimal perl , java skills, know way around bash pretty well... or thought did...
i suppose rename files, seems inelegant.
your regexp uses \b
, matches boundary between word , non-word characters. problem _
word character, , digits, there's no boundary between _
, 2
.
you can use
[^0-9][0-9]{4}.{1}[0-9]{2}.{1}[0-9]{2}[^0-9]
instead. if date can @ beginning or end of filename, use:
([^0-9]|^)[0-9]{4}.{1}[0-9]{2}.{1}[0-9]{2}([^0-9]|$)
Comments
Post a Comment