— parse roff escape
(const char **end
const char **start
, int *sz
This function scans a roff(7)
An escape sequence consists of
- an initial backslash character
- a single ASCII character
called the escape sequence identifier,
- and, with only a few
exceptions, an argument.
Arguments can be given in the following forms; some escape sequence identifiers
only accept some of these forms as specified below. The first three forms are
called the standard forms.
- In brackets:
- The argument starts after the initial ‘[’, ends
before the final ‘]’, and the escape sequence ends with the
- Two-character argument short
- This form can only be used for arguments consisting of
exactly two characters. It has the same effect as
- One-character argument short
- This form can only be used for arguments consisting of
exactly one character. It has the same effect as
- Delimited form:
- The argument starts after the initial delimiter character
C, ends before the next occurrence of the delimiter
character C, and the escape sequence ends with that
second C. Some escape sequences allow arbitrary
characters C as quoting characters, some restrict
the range of characters that can be used as quoting characters.
Upon function entry, end
is expected to point to the
escape sequence identifier. The values passed in as
are ignored and
By design, this function cannot handle those
escape sequences that
require in-place expansion, in particular user-defined strings
, number registers \n
, and numerical expression control
. These are handled by roff_res
private preprocessor function called from roff_parseln
see the file roff.c
The function mandoc_escape
() is used
- recursively by itself, because
some escape sequence arguments can in turn contain other escape
- for error detection internally
by the roff(7) parser part of
the mandoc(3) library, see the
- above all externally by the
mandoc formatting modules, in
particular -Tascii and -Thtml, for
formatting purposes, see the files term.c and
- and rarely externally by
high-level utilities using the mandoc library, for example
makewhatis(8), to purge
escape sequences from text.
Upon function return, the pointer end
is set to the
character after the end of the escape sequence, such that the calling
higher-level parser can easily continue.
For escape sequences taking an argument, the pointer start
is set to the beginning of the argument and sz
is set to
the length of the argument. For escape sequences not taking an argument,
is set to the character after the end of the
sequence and sz
is set to 0. Both
; in that case, the argument and the length are
For sequences taking an argument, the function mandoc_escape
returns one of the following values:
- The escape sequence \f taking an argument
in standard form: \f[, \f(,
\fa. Two-character arguments
starting with the character ‘C’ are reduced to one-character
arguments by skipping the ‘C’. More specific values are
returned for the most commonly used arguments:
- The escape sequence \C taking an argument
delimited with the single quote character and, as a special exception, the
escape sequences not having an identifier, that is,
those where the argument, in standard form, directly follows the initial
backslash: \C', \[,
\(, \a. Note
that the one-character argument short form can only be used for argument
characters that do not clash with escape sequence identifiers.
If the argument matches one of the forms described below under
ESCAPE_UNICODE, that value is returned instead.
ESCAPE_SPECIAL special character escape
sequences can be rendered using the functions
mchars_spec2cp() and mchars_spec2str()
described in the
- Escape sequences of the same format as described above
ESCAPE_SPECIAL, but with an argument of the
X and Y are hexadecimal digits
and Y is not zero: \C'u,
\[u. As a special exception, start
is set to the character after the u, and the
sz return value does not include the
Such Unicode character escape sequences can be rendered using the function
mchars_num2uc() described in the
- The escape sequence \N followed by a
delimited argument. The delimiter character is arbitrary except that
digits cannot be used. If a digit is encountered instead of the opening
delimiter, that digit is considered to be the argument and the end of the
ESCAPE_IGNORE is returned.
Such ASCII character escape sequences can be rendered using the function
mchars_num2char() described in the
- The escape sequence \o followed by an
argument delimited by an arbitrary character.
- The escape sequence \s followed by
an argument in standard form or by an argument delimited by the single
quote character: \s', \s[,
\s(, \sa. As
a special exception, an optional ‘+’ or ‘-’
character is allowed after the ‘s’ for all forms.
- The escape sequences \F,
\g, \k, \M,
\m, \n, \V, and
\Y followed by an argument in standard form.
- The escape sequences \A,
\b, \D, \R,
\X, and \Z followed by an argument
delimited by an arbitrary character.
- The escape sequences \H,
\h, \L, \l,
\S, \v, and \x
followed by an argument delimited by a character that cannot occur in
numerical expressions. However, if any character that can occur in
numerical expressions is found instead of a delimiter, the sequence is
considered to end with that character, and
ESCAPE_ERROR is returned.
- Escape sequences taking an argument but not matching any of
the above patterns. In particular, that happens if the end of the logical
input line is reached before the end of the argument.
For sequences that do not take an argument, the function
() returns one of the following values:
- The escape sequence “\z”.
- The escape sequence “\c”.
- The escape sequences “\d” and
This function is implemented in mandoc.c
This function has been available since mandoc 1.11.2.
The function doesn't cleanly distinguish between sequences that are valid and
supported, valid and ignored, valid and unsupported, syntactically invalid, or
undefined. For sequences that are ignored or unsupported, it doesn't tell
whether that deficiency is likely to cause major formatting problems and/or
loss of document content. The function is already rather complicated and still
parses some sequences incorrectly.