regex - Regular expression to match a line that doesn't contain a word? -
i know it's possible match word , reverse matches using other tools (e.g. grep -v
). however, i'd know if it's possible match lines don't contain specific word (e.g. hede) using regular expression.
input:
hoho hihi haha hede
code:
grep "<regex 'doesn't contain hede'>" input
desired output:
hoho hihi haha
the notion regex doesn't support inverse matching not entirely true. can mimic behavior using negative look-arounds:
^((?!hede).)*$
the regex above match string, or line without line break, not containing (sub)string 'hede'. mentioned, not regex "good" @ (or should do), still, is possible.
and if need match line break chars well, use dot-all modifier (the trailing s
in following pattern):
/^((?!hede).)*$/s
or use inline:
/(?s)^((?!hede).)*$/
(where /.../
regex delimiters, i.e., not part of pattern)
if dot-all modifier not available, can mimic same behavior character class [\s\s]
:
/^((?!hede)[\s\s])*$/
explanation
a string list of n
characters. before, , after each character, there's empty string. list of n
characters have n+1
empty strings. consider string "abhedecd"
:
┌──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┐ s = │e1│ │e2│ b │e3│ h │e4│ e │e5│ d │e6│ e │e7│ c │e8│ d │e9│ └──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┘ index 0 1 2 3 4 5 6 7
where e
's empty strings. regex (?!hede).
looks ahead see if there's no substring "hede"
seen, , if case (so else seen), .
(dot) match character except line break. look-arounds called zero-width-assertions because don't consume characters. assert/validate something.
so, in example, every empty string first validated see if there's no "hede"
ahead, before character consumed .
(dot). regex (?!hede).
once, wrapped in group, , repeated 0 or more times: ((?!hede).)*
. finally, start- , end-of-input anchored make sure entire input consumed: ^((?!hede).)*$
as can see, input "abhedecd"
fail because on e3
, regex (?!hede)
fails (there is "hede"
ahead!).
Comments
Post a Comment