From: Karl Berry Date: Wed, 27 Jul 2005 00:16:01 +0000 (+0000) Subject: regexprops, generated by findutils helper X-Git-Tag: cvs-readonly~3178 X-Git-Url: http://erislabs.net/gitweb/?a=commitdiff_plain;h=f2e592bfe3e001764b5031e38e102dd914c84b5f;p=gnulib.git regexprops, generated by findutils helper --- diff --git a/doc/README b/doc/README index a50ddda19..551f5390f 100644 --- a/doc/README +++ b/doc/README @@ -1,16 +1 @@ -Sketchy start at information on using files in gnulib. - -for mktemp, need these files (including adding to EXTRA_DIST): -$GNULIBSRC/m4/check-decl.m4 m4 -$GNULIBSRC/m4/mkstemp.m4 m4 -$GNULIBSRC/m4/prereq.m4 m4 -$GNULIBSRC/lib/tempname.c lib -$GNULIBSRC/lib/mkstemp.c lib - -then in configure.ac: -AC_DEFUN([texinfo_MACROS], -[ - AC_REQUIRE([gl_PREREQ_TEMPNAME])dnl for mkstemp - AC_REQUIRE([UTILS_FUNC_MKSTEMP]) -])dnl -texinfo_MACROS +regexprops-generic.texi is generated via a utility in findutils. diff --git a/doc/gnulib.texi b/doc/gnulib.texi index 8e434a24f..cc826994d 100644 --- a/doc/gnulib.texi +++ b/doc/gnulib.texi @@ -1,5 +1,5 @@ \input texinfo @c -*-texinfo-*- -@comment $Id: gnulib.texi,v 1.13 2005-07-16 19:41:33 jas Exp $ +@comment $Id: gnulib.texi,v 1.14 2005-07-27 00:16:01 karl Exp $ @comment %**start of header @setfilename gnulib.info @settitle GNU Gnulib @@ -7,7 +7,7 @@ @syncodeindex pg cp @comment %**end of header -@set UPDATED $Date: 2005-07-16 19:41:33 $ +@set UPDATED $Date: 2005-07-27 00:16:01 $ @copying This manual is for GNU Gnulib (updated @value{UPDATED}), @@ -87,6 +87,7 @@ Getting started: * inet_ntoa:: * Out of memory handling:: * Library version handling:: +* Regular expressions:: @end menu @@ -346,6 +347,17 @@ Typical uses look like: @end example +@node Regular expressions +@section Regular expressions + +Gnulib supports many different types of regular expressions; although +the underlying features are the same or identical, the syntax used +varies. The descriptions given here for the different types are +generated automatically. + +@include regexprops-generic.texi + + @node Invoking gnulib-tool @chapter Invoking gnulib-tool diff --git a/doc/regexprops-generic.texi b/doc/regexprops-generic.texi new file mode 100644 index 000000000..cad909b05 --- /dev/null +++ b/doc/regexprops-generic.texi @@ -0,0 +1,703 @@ +@menu +* awk regular expression syntax:: +* egrep regular expression syntax:: +* ed regular expression syntax:: +* emacs regular expression syntax:: +* gnu-awk regular expression syntax:: +* grep regular expression syntax:: +* posix-awk regular expression syntax:: +* posix-basic regular expression syntax:: +* posix-egrep regular expression syntax:: +* posix-extended regular expression syntax:: +* posix-minimal-basic regular expression syntax:: +* sed regular expression syntax:: +@end menu + +@node awk regular expression syntax +@subsection @samp{awk} regular expression syntax + + +The character @samp{.} matches any single character except the null character. + + +@table @samp + +@item + +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. +@item ? +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item \+ +matches a @samp{+} +@item \? +matches a @samp{?}. +@end table + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}. + +GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. + +Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit matches that digit. + +The alternation operator is @samp{|}. + +The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. + +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by +@samp{(} +@item After the alternation operator @samp{|} + +@end enumerate + + + + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + +@node egrep regular expression syntax +@subsection @samp{egrep} regular expression syntax + + +The character @samp{.} matches any single character except newline. + + +@table @samp + +@item + +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. +@item ? +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item \+ +matches a @samp{+} +@item \? +matches a @samp{?}. +@end table + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. Non-matching lists @samp{[^@dots{}]} do not ever match newline. + +GNU extensions are supported: +@enumerate + +@item @samp{\w} matches a character within a word + +@item @samp{\W} matches a character which is not within a word + +@item @samp{\<} matches the beginning of a word + +@item @samp{\>} matches the end of a word + +@item @samp{\b} matches a word boundary + +@item @samp{\B} matches characters which are not a word boundary + +@item @samp{\`} matches the beginning of the whole input + +@item @samp{\'} matches the end of the whole input + +@end enumerate + + +Grouping is performed with parentheses @samp{()}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. + +The alternation operator is @samp{|}. + +The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. + +The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression. + + + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + +@node ed regular expression syntax +@subsection @samp{ed} regular expression syntax + + +The character @samp{.} matches any single character except the null character. + + +@table @samp + +@item \+ +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. +@item \? +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item + and ? +match themselves. +@end table + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + +GNU extensions are supported: +@enumerate + +@item @samp{\w} matches a character within a word + +@item @samp{\W} matches a character which is not within a word + +@item @samp{\<} matches the beginning of a word + +@item @samp{\>} matches the end of a word + +@item @samp{\b} matches a word boundary + +@item @samp{\B} matches characters which are not a word boundary + +@item @samp{\`} matches the beginning of the whole input + +@item @samp{\'} matches the end of the whole input + +@end enumerate + + +Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. + +The alternation operator is @samp{\|}. + +The character @samp{^} only represents the beginning of a string when it appears: +@enumerate + +@item +At the beginning of a regular expression + +@item After an open-group, signified by +@samp{\(} + +@item After the alternation operator @samp{\|} + +@end enumerate + + +The character @samp{$} only represents the end of a string when it appears: +@enumerate + +@item At the end of a regular expression + +@item Before an close-group, signified by +@samp{\)} +@item Before the alternation operator @samp{\|} + +@end enumerate + + +@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by +@samp{\(} +@item After the alternation operator @samp{\|} + +@end enumerate + + +Intervals are specified by @samp{\@{} and @samp{\@}}. Invalid intervals such as @samp{a\@{1z} are not accepted. + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + +@node emacs regular expression syntax +@subsection @samp{emacs} regular expression syntax + + +The character @samp{.} matches any single character except newline. + + +@table @samp + +@item + +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. +@item ? +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item \+ +matches a @samp{+} +@item \? +matches a @samp{?}. +@end table + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}. + +GNU extensions are supported: +@enumerate + +@item @samp{\w} matches a character within a word + +@item @samp{\W} matches a character which is not within a word + +@item @samp{\<} matches the beginning of a word + +@item @samp{\>} matches the end of a word + +@item @samp{\b} matches a word boundary + +@item @samp{\B} matches characters which are not a word boundary + +@item @samp{\`} matches the beginning of the whole input + +@item @samp{\'} matches the end of the whole input + +@end enumerate + + +Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. + +The alternation operator is @samp{\|}. + +The character @samp{^} only represents the beginning of a string when it appears: +@enumerate + +@item +At the beginning of a regular expression + +@item After an open-group, signified by +@samp{\(} + +@item After the alternation operator @samp{\|} + +@end enumerate + + +The character @samp{$} only represents the end of a string when it appears: +@enumerate + +@item At the end of a regular expression + +@item Before an close-group, signified by +@samp{\)} +@item Before the alternation operator @samp{\|} + +@end enumerate + + +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by +@samp{\(} +@item After the alternation operator @samp{\|} + +@end enumerate + + + + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + +@node gnu-awk regular expression syntax +@subsection @samp{gnu-awk} regular expression syntax + + +The character @samp{.} matches any single character. + + +@table @samp + +@item + +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. +@item ? +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item \+ +matches a @samp{+} +@item \? +matches a @samp{?}. +@end table + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + +GNU extensions are supported: +@enumerate + +@item @samp{\w} matches a character within a word + +@item @samp{\W} matches a character which is not within a word + +@item @samp{\<} matches the beginning of a word + +@item @samp{\>} matches the end of a word + +@item @samp{\b} matches a word boundary + +@item @samp{\B} matches characters which are not a word boundary + +@item @samp{\`} matches the beginning of the whole input + +@item @samp{\'} matches the end of the whole input + +@end enumerate + + +Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. + +The alternation operator is @samp{|}. + +The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. + +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by +@samp{(} +@item After the alternation operator @samp{|} + +@end enumerate + + + + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + +@node grep regular expression syntax +@subsection @samp{grep} regular expression syntax + + +The character @samp{.} matches any single character except newline. + + +@table @samp + +@item \+ +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. +@item \? +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item + and ? +match themselves. +@end table + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. Non-matching lists @samp{[^@dots{}]} do not ever match newline. + +GNU extensions are supported: +@enumerate + +@item @samp{\w} matches a character within a word + +@item @samp{\W} matches a character which is not within a word + +@item @samp{\<} matches the beginning of a word + +@item @samp{\>} matches the end of a word + +@item @samp{\b} matches a word boundary + +@item @samp{\B} matches characters which are not a word boundary + +@item @samp{\`} matches the beginning of the whole input + +@item @samp{\'} matches the end of the whole input + +@end enumerate + + +Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. + +The alternation operator is @samp{\|}. + +The character @samp{^} only represents the beginning of a string when it appears: +@enumerate + +@item +At the beginning of a regular expression + +@item After an open-group, signified by +@samp{\(} + +@item After a newline + +@item After the alternation operator @samp{\|} + +@end enumerate + + +The character @samp{$} only represents the end of a string when it appears: +@enumerate + +@item At the end of a regular expression + +@item Before an close-group, signified by +@samp{\)} +@item Before a newline + +@item Before the alternation operator @samp{\|} + +@end enumerate + + +@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by +@samp{\(} +@item After a newline + +@item After the alternation operator @samp{\|} + +@end enumerate + + +Intervals are specified by @samp{\@{} and @samp{\@}}. Invalid intervals such as @samp{a\@{1z} are not accepted. + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + +@node posix-awk regular expression syntax +@subsection @samp{posix-awk} regular expression syntax + + +The character @samp{.} matches any single character except the null character. + + +@table @samp + +@item + +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. +@item ? +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item \+ +matches a @samp{+} +@item \? +matches a @samp{?}. +@end table + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + +GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. + +Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. + +The alternation operator is @samp{|}. + +The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. + +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are illegal: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by +@samp{(} +@item After the alternation operator @samp{|} + +@end enumerate + + +Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals such as @samp{a@{1z} are not accepted. + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + +@node posix-basic regular expression syntax +@subsection @samp{posix-basic} regular expression syntax +This is a synonym for ed. +@node posix-egrep regular expression syntax +@subsection @samp{posix-egrep} regular expression syntax + + +The character @samp{.} matches any single character except newline. + + +@table @samp + +@item + +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. +@item ? +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item \+ +matches a @samp{+} +@item \? +matches a @samp{?}. +@end table + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. Non-matching lists @samp{[^@dots{}]} do not ever match newline. + +GNU extensions are supported: +@enumerate + +@item @samp{\w} matches a character within a word + +@item @samp{\W} matches a character which is not within a word + +@item @samp{\<} matches the beginning of a word + +@item @samp{\>} matches the end of a word + +@item @samp{\b} matches a word boundary + +@item @samp{\B} matches characters which are not a word boundary + +@item @samp{\`} matches the beginning of the whole input + +@item @samp{\'} matches the end of the whole input + +@end enumerate + + +Grouping is performed with parentheses @samp{()}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. + +The alternation operator is @samp{|}. + +The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. + +The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression. + +Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + +@node posix-extended regular expression syntax +@subsection @samp{posix-extended} regular expression syntax + + +The character @samp{.} matches any single character except the null character. + + +@table @samp + +@item + +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. +@item ? +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item \+ +matches a @samp{+} +@item \? +matches a @samp{?}. +@end table + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + +GNU extensions are supported: +@enumerate + +@item @samp{\w} matches a character within a word + +@item @samp{\W} matches a character which is not within a word + +@item @samp{\<} matches the beginning of a word + +@item @samp{\>} matches the end of a word + +@item @samp{\b} matches a word boundary + +@item @samp{\B} matches characters which are not a word boundary + +@item @samp{\`} matches the beginning of the whole input + +@item @samp{\'} matches the end of the whole input + +@end enumerate + + +Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. + +The alternation operator is @samp{|}. + +The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. + +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are illegal: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by +@samp{(} +@item After the alternation operator @samp{|} + +@end enumerate + + +Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals such as @samp{a@{1z} are not accepted. + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + +@node posix-minimal-basic regular expression syntax +@subsection @samp{posix-minimal-basic} regular expression syntax + + +The character @samp{.} matches any single character except the null character. + + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + +GNU extensions are supported: +@enumerate + +@item @samp{\w} matches a character within a word + +@item @samp{\W} matches a character which is not within a word + +@item @samp{\<} matches the beginning of a word + +@item @samp{\>} matches the end of a word + +@item @samp{\b} matches a word boundary + +@item @samp{\B} matches characters which are not a word boundary + +@item @samp{\`} matches the beginning of the whole input + +@item @samp{\'} matches the end of the whole input + +@end enumerate + + +Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. + + + +The character @samp{^} only represents the beginning of a string when it appears: +@enumerate + +@item +At the beginning of a regular expression + +@item After an open-group, signified by +@samp{\(} + +@end enumerate + + +The character @samp{$} only represents the end of a string when it appears: +@enumerate + +@item At the end of a regular expression + +@item Before an close-group, signified by +@samp{\)} +@end enumerate + + + + +Intervals are specified by @samp{\@{} and @samp{\@}}. Invalid intervals such as @samp{a\@{1z} are not accepted. + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + +@node sed regular expression syntax +@subsection @samp{sed} regular expression syntax +This is a synonym for ed. \ No newline at end of file