Homepage: https://www.gnu.org/software/emacs
Author: Simon Marshall
Generate efficient regexps to match strings
The "opt" in "regexp-opt" stands for "optim\\(al\\|i[sz]e\\)". This package generates a regexp from a given list of strings (which matches one of those strings) so that the regexp generated by: (regexp-opt strings) is equivalent to, but more efficient than, the regexp generated by: (mapconcat 'regexp-quote strings "\\|") For example: (let ((strings '("cond" "if" "when" "unless" "while" "let" "let*" "progn" "prog1" "prog2" "save-restriction" "save-excursion" "save-window-excursion" "save-current-buffer" "save-match-data" "catch" "throw" "unwind-protect" "condition-case"))) (concat "(" (regexp-opt strings t) "\\>")) => "(\\(c\\(atch\\|ond\\(ition-case\\)?\\)\\|if\\|let\\*?\\|prog[12n]\\|save-\\(current-buffer\\|excursion\\|match-data\\|restriction\\|window-excursion\\)\\|throw\\|un\\(less\\|wind-protect\\)\\|wh\\(en\\|ile\\)\\)\\>" Searching using the above example `regexp-opt' regexp takes approximately two-thirds of the time taken using the equivalent `mapconcat' regexp. Since this package was written to produce efficient regexps, not regexps efficiently, it is probably not a good idea to in-line too many calls in your code, unless you use the following trick with `eval-when-compile': (defvar definition-regexp (eval-when-compile (concat "^(" (regexp-opt '("defun" "defsubst" "defmacro" "defalias" "defvar" "defconst") t) "\\>"))) The `byte-compile' code will be as if you had defined the variable thus: (defvar definition-regexp "^(\\(def\\(alias\\|const\\|macro\\|subst\\|un\\|var\\)\\)\\>") Note that if you use this trick for all instances of `regexp-opt' and `regexp-opt-depth' in your code, regexp-opt.el would only have to be loaded at compile time. But note also that using this trick means that should regexp-opt.el be changed, perhaps to fix a bug or to add a feature to improve the efficiency of `regexp-opt' regexps, you would have to recompile your code for such changes to have effect in your code. Originally written for font-lock.el, from an idea from Stig's hl319.el, with thanks for ideas also to Michael Ernst, Bob Glickstein, Dan Nicolaescu and Stefan Monnier. No doubt `regexp-opt' doesn't always produce optimal regexps, so code, ideas or any other information to improve things are welcome. One possible improvement would be to compile '("aa" "ab" "ba" "bb") into "[ab][ab]" rather than "a[ab]\\|b[ab]". I'm not sure it's worth it but if someone knows how to do it without going through too many contortions, I'm all ears.