regexp-opt

Homepage: https://www.gnu.org/software/emacs

Author: Simon Marshall

Summary

Generate efficient regexps to match strings

Commentary

The "opt" in "regexp-opt" stands for "optim\\(al\\|i[sz]e\\)".

This package generates a regexp from a given list of strings (which matches
one of those strings) so that the regexp generated by:

(regexp-opt strings)

is equivalent to, but more efficient than, the regexp generated by:

(mapconcat 'regexp-quote strings "\\|")

For example:

(let ((strings '("cond" "if" "when" "unless" "while"
		    "let" "let*" "progn" "prog1" "prog2"
		    "save-restriction" "save-excursion" "save-window-excursion"
		    "save-current-buffer" "save-match-data"
		    "catch" "throw" "unwind-protect" "condition-case")))
  (concat "(" (regexp-opt strings t) "\\>"))
 => "(\\(c\\(atch\\|ond\\(ition-case\\)?\\)\\|if\\|let\\*?\\|prog[12n]\\|save-\\(current-buffer\\|excursion\\|match-data\\|restriction\\|window-excursion\\)\\|throw\\|un\\(less\\|wind-protect\\)\\|wh\\(en\\|ile\\)\\)\\>"

Searching using the above example `regexp-opt' regexp takes approximately
two-thirds of the time taken using the equivalent `mapconcat' regexp.

Since this package was written to produce efficient regexps, not regexps
efficiently, it is probably not a good idea to in-line too many calls in
your code, unless you use the following trick with `eval-when-compile':

(defvar definition-regexp
  (eval-when-compile
    (concat "^("
            (regexp-opt '("defun" "defsubst" "defmacro" "defalias"
                          "defvar" "defconst") t)
            "\\>")))

The `byte-compile' code will be as if you had defined the variable thus:

(defvar definition-regexp
  "^(\\(def\\(alias\\|const\\|macro\\|subst\\|un\\|var\\)\\)\\>")

Note that if you use this trick for all instances of `regexp-opt' and
`regexp-opt-depth' in your code, regexp-opt.el would only have to be loaded
at compile time.  But note also that using this trick means that should
regexp-opt.el be changed, perhaps to fix a bug or to add a feature to
improve the efficiency of `regexp-opt' regexps, you would have to recompile
your code for such changes to have effect in your code.

Originally written for font-lock.el, from an idea from Stig's hl319.el, with
thanks for ideas also to Michael Ernst, Bob Glickstein, Dan Nicolaescu and
Stefan Monnier.
No doubt `regexp-opt' doesn't always produce optimal regexps, so code, ideas
or any other information to improve things are welcome.

One possible improvement would be to compile '("aa" "ab" "ba" "bb")
into "[ab][ab]" rather than "a[ab]\\|b[ab]".  I'm not sure it's worth
it but if someone knows how to do it without going through too many
contortions, I'm all ears.

Reverse dependencies