wisi

Homepage: https://stephe-leake.org/ada/wisitoken.html

Author: Stephen Leake

Updated:

Summary

Utilities for implementing an indentation/navigation engine using a generalized LR parser

Commentary

History: see NEWS

Design:

'wisi' was originally short for "wisent indentation engine", but
now is just a name. wisi was developed to support Emacs ada-mode
5.0 indentation, font-lock, and navigation, which are parser based.

The approach to indenting a given token is to parse the buffer,
computing a delta indent for each token in a grammar production in
a post-parse action.

Other post-parse actions cache face and navigation information as
text properties on tokens.

The three reasons to run the parser (indent, face, navigate) occur
at different times (user indent, font-lock, user navigate), so only
the relevant parser actions are run.

Parsing can be noticeably slow in large files, so sometimes we do a
partial or incremental parse. We keep a list of regions where the
post-parse actions have been run and the results are valid, to
determine what needs to be parsed or updated.

Since we have a cache (the text properties), we need to consider
when to invalidate it.  Ideally, we invalidate only when a change
to the buffer would change the result of a parse or post-parse
action that crosses that change, or starts after that change.
Changes in whitespace (indentation and newlines) do not affect an
Ada parse.  Other languages are sensitive to newlines (Bash for
example) or indentation (Python).  Adding comments does not change
a parse, unless code is commented out.

For navigate, we expect fully accurate results, and can tolerate
one initial delay, so we always parse the entire file, and then use
incremental parse for updates.

For font-lock, we only parse the portion of the file requested by
font-lock, using the list of parsed regions, and edit that list
when the buffer is changed.

For indenting, we expect fast results, and can tolerate some
inaccuracy until the editing is done, so we allow partial parse. We
cache the indent for each line in a text property on the newline
char preceding the line. `wisi-indent-region' sets the cache on all
the lines computed (part of the buffer in large files), but
performs the indent only on the lines in the indent
region. Subsequent calls to `wisi-indent-region' apply the cached
indents. Non-whitespace edits to the buffer invalidate the indent
caches in the edited region and after. Since we can do partial
parse, we keep a list of parsed regions.

See `wisi--post-change' for the details of what we check for
invalidating.

Choice of grammar compiler and parser

There are two other parsing engines available in Emacs:

- SMIE

  We don't use this because it is designed to parse small snippets
  of code. For Ada (and some other languages) indentation, we
  always need to parse the entire buffer.

- semantic

  The Ada grammar as given in the Ada language reference manual is
  not LALR(1). So we use a generalized parser. In addition, the
  semantic lexer is more complex, and gives different information
  than we need. Finally, the semantic parser does not support error
  correction, and thus fails in most editing situations.

We use the WisiToken tool wisi-bnf-generate to compile BNF or EBNF
to Ada source, WisiToken provides a generalized LR parser, with
robust error correction and incremental or partial parse. See
ada-mode.info and wisi.info for more information on the developer
tools used for wisi.

Dependencies

Reverse dependencies