Homepage: http://stephe-leake.org/ada/wisitoken.html
Author: Stephen Leake
Updated:
Utilities for implementing an indentation/navigation engine using a generalized LR parser
History: see NEWS-wisi.text Design: 'wisi' was originally short for "wisent indentation engine", but now is just a name. wisi was developed to support Emacs ada-mode 5.0 indentation, font-lock, and navigation, which are parser based. The approach to indenting a given token is to parse the buffer, computing a delta indent for each token in a grammar production in a post-parse action. Other post-parse actions cache face and navigation information as text properties on tokens. The three reasons to run the parser (indent, face, navigate) occur at different times (user indent, font-lock, user navigate), so only the relevant parser actions are run. Parsing can be noticeably slow in large files, so sometimes we do a partial or incremental parse. We keep a list of regions where the post-parse actions have been run and the results are valid, to determine what needs to be parsed or updated. Since we have a cache (the text properties), we need to consider when to invalidate it. Ideally, we invalidate only when a change to the buffer would change the result of a parse or post-parse action that crosses that change, or starts after that change. Changes in whitespace (indentation and newlines) do not affect an Ada parse. Other languages are sensitive to newlines (Bash for example) or indentation (Python). Adding comments does not change a parse, unless code is commented out. For navigate, we expect fully accurate results, and can tolerate one initial delay, so we always parse the entire file, and then use incremental parse for updates. For font-lock, we only parse the portion of the file requested by font-lock, using the list of parsed regions, and edit that list when the buffer is changed. For indenting, we expect fast results, and can tolerate some inaccuracy until the editing is done, so we allow partial parse. We cache the indent for each line in a text property on the newline char preceding the line. `wisi-indent-region' sets the cache on all the lines computed (part of the buffer in large files), but performs the indent only on the lines in the indent region. Subsequent calls to `wisi-indent-region' apply the cached indents. Non-whitespace edits to the buffer invalidate the indent caches in the edited region and after. Since we can do partial parse, we keep a list of parsed regions. See `wisi--post-change' for the details of what we check for invalidating. Choice of grammar compiler and parser There are two other parsing engines available in Emacs: - SMIE We don't use this because it is designed to parse small snippets of code. For Ada (and some other languages) indentation, we always need to parse the entire buffer. - semantic The Ada grammar as given in the Ada language reference manual is not LALR(1). So we use a generalized parser. In addition, the semantic lexer is more complex, and gives different information than we need. Finally, the semantic parser does not support error correction, and thus fails in most editing situations. We use the WisiToken tool wisi-bnf-generate to compile BNF or EBNF to Ada source, WisiToken provides a generalized LR parser, with robust error correction and incremental or partial parse. See ada-mode.info and wisi.info for more information on the developer tools used for wisi.