Documentation

Commentary

Tree-sitter Language Versions

markdown-ts-mode has been tested with the following grammars and version:
- tree-sitter-markdown: v0.4.1
- tree-sitter-markdown-inline: v0.4.1

We try our best to make built-in modes work with latest grammar
versions, so a more recent grammar has a good chance to work.  Send
us a bug report if it doesn't.

Bidirectional Text Considerations

Text with markup may need an extra newline before bidirectional text
for it to show correctly.  This is a limitation in the Emacs display
engine.

Code Block Language Mode Considerations

Fenced code block language modes are derived from the table
`markdown-ts-code-block-modes' and heuristics adding "-ts-mode" and
"-mode" to the language name.  If your language's mode is not
properly recognized, add it to `markdown-ts-code-block-modes', which
see.

Each language's major mode is enabled once per `markdown-ts-mode'
buffer in a temporary buffer to extract its default font-lock and
tree-sitter settings.  In conventional non-treesit code blocks, the
major mode is enabled each time the block is fontified.

NOTE: Major mode hooks are not run and font-lock, treesit, indent,
comment recognition, etc. customizations that you might have in your
hooks are not applied.  As a result, your code blocks might appear
different in `markdown-ts-mode' mode than in native major mode
buffers.  Not running hooks avoids the cost of each mode hook and
avoids potential recursive treesit callback issues.

Code Block Commands

Some common commands will operate within a code block and in its
mode.  These operate in an indirect buffer and some commands may
operate slightly differently to the same commands invoked in a native
mode buffer.  `indent-for-tab-command' may invoke
`completion-at-point' under the covers and may return results from
the surrounding Markdown buffer instead of the code block's context.
You can invoke `completion-at-point' directly by using its key
binding, `C-M-i`.

Pipe Tables

These are a GitHub Flavored Markdown (GFM) extension and are a de
facto standard given their popularity and implementations across
products.  They might as well be folded into the CommonMark standard.

A pipe table is a rectangle of text surrounded to the east and west
by pipe symbols and where pipes separate columns.  It has a header
row immediately followed by header alignment row, both of which must
contain the same number of cells.  Body rows are optional.  A table
ends with a blank line or at the start of a new Markdown block
element.

The header alignment row column must have a minimum of three hyphens
which indicates default alignment (i.e., whatever a renderer
chooses).  To indicate left alignment, prefix the hyphen string with
a colon like this :---.  For center alignment, prefix and suffix with
colons like this :---:.  For right alignment, suffix with a colon
like this ---:.

Each table header or body cell can contain the usual Markdown inline
style indicators.  A cell cannot contain a block element such as a
headings, thematic breaks, block quotes, fenced code blocks.

Table cells do not need to align visually; i.e., the pipe symbols do
not need to line up vertically.  The tree-sitter parser and renderers
detect cell boundaries using pipe symbols, not their relative alignment.

If you want to include what looks like code, you can use backticks to
wrap such text ala `this is code`.  This mode will not fontify such
code.

If you want to include a pipe symbol in a cell, escape it thusly \|.

Technically, body rows do not need to contain the same number of
cells as the header has columns and rows do not need to share the
same number of cells among themselves.  Note: Many renderers get
confused if the table is "ragged" with an uneven number of columns
among rows.  Some renderers will insert empty cells on a row that
contains fewer cells than the header has columns.  Some elide cells
that exceed the number of columns in the header.

Pipe Table Recommendations

- Always use pipe symbols at the start and end of each table line.
- Use a uniform number of columns spanning the table.
- If you detect a parsing error which presents as different
  fontification and which is often caused by an empty first cell on the
  final row, try putting some characters in that cell.
- Renderers often exclude certain empty cells such as an "empty" final
  cell in a table.  Follow the next item to avoid this.
- If you need that cell to appear blank and are converting to HTML,
  try using a non-printing HTML entity, such as a non-breaking space
  " ", which parse as concrete characters yet render as blank.
- HTML comments <!-- --> can also be used as a non-blank character
  string that does not get rendered.  These are considered cell text
  and when placed at the end of a row, that row's number of columns is
  increased and might exceed the number of header columns.
- There are tree-sitter parser quirks.  Commands such as
  `markdown-ts-table-delete-column' and `markdown-ts-table-move-column'
  follow the parser tree and can lead to unexpected results so follow
  these recommendations and table operations should be as expected.

Tree Sitter Bugs

`markdown-ts-mode' relies on the underlying tree-sitter library in
Emacs (chosen at its build time), and language grammars you have
installed.  There are known and reported bugs which negatively affect
certain features.  This mode should benefit as these bugs are fixed
or worked around.

- The Markdown grammar inserts block_continuation nodes as children
  of code_fence_content, which confuses both the inspector and the
  embedded parser.  This affects code blocks inside block quotes.

- HTML block type 4 (`<!' followed by an uppercase letter, e.g.,
  `<!DOCTYPE html>') causes the parser to consume all subsequent
  content.  Lowercase `<!doctype html>' works as a workaround.
  See <https://github.com/tree-sitter-grammars/tree-sitter-markdown/issues/233>.

- Superscript (`^text^') and subscript (`~text~') syntax is not
  supported by the grammar.  No EXTENSION_ build flag exists for
  this.  This is Pandoc / PHP Markdown Extra syntax, not CommonMark
  or GFM.

- Ordered (numbered) lists do not nest by indentation.  Indenting
  a `1.' item under another ordered item does not produce a nested
  list node; the parser either treats it as a flat sibling or
  absorbs it into the parent item's paragraph as a
  block_continuation.  Unordered (`-', `*', `+') lists nest
  correctly.  Demote/promote of ordered list items is therefore
  disabled.

- Renumbering ordered lists (`markdown-ts-renumber-list') may only
  affect items from point downward if the parser splits a single
  list into separate `list' nodes, or may continue numbering across
  two separate lists if the parser merges them into one node.

- Empty lines following an `indented_code_block' may be claimed by
  the parser as continuation lines of that block, rather than being
  treated as blank separators.

- Pipe tables are inconsistently parsed.  Whitespace is correctly
  trimmed at the start of a cell content but trailing whitespace is
  incorrectly included.  Empty cells can contain uneven "ragged" row
  column configurations that can confuse the parser.

  Markdown pipe tables with parsing issues:

  |Column 1|Column 2|Column 3|
  |--------|--------|--------|
  |        |        |        |
  |        |        |        | <-- parse error

  |Column 1|Column 2|Column 3|
  |--------|--------|--------|
  |        |        |        |
  | xxx    |        |        | <-- parsed correctly

  |Column 1|Column 2| <-- 2 columns correct
  |--------|--------| <-- 2 columns correct
           |        | <-- 1 column incorrect
           |        | <-- parse error
  |        |        | <-- not a row after the error, above

  See <https://github.com/tree-sitter-grammars/tree-sitter-markdown/issues/241>
      <https://github.com/tree-sitter-grammars/tree-sitter-markdown/issues/242>

- The grammar's external scanner has a buffer overflow in its
  `serialize' function: when the parser state exceeds the
  serialization buffer provided by tree-sitter, `memcpy' writes past
  the end and can corrupt the stack.  Triggered in practice while
  parsing Markdown in Emacs.
  See <https://github.com/tree-sitter-grammars/tree-sitter-markdown/issues/243>.

Batch Fontification

Some downstream packages fontify multiple unrelated Markdown
fragments by joining them in a single buffer with a separator and
running `markdown-ts-mode' over the whole thing.  Choice of separator
matters: the tree-sitter-markdown grammar does not list the NUL byte
(`\0') as extra/whitespace, so a NUL separator yields an ERROR node
and the parser enters error recovery, which can leak inline faces
(e.g., strikethrough) across fragment boundaries.  Regex-based modes
tolerate NUL silently because they do not parse structure; this is a
behavioral difference, not a regression.

A form feed (`^L', `\f') flanked by blank lines works as a clean
drop-in separator: the parser treats it as a paragraph break and
inline state does not bleed across fragments.

Consumers

Reverse Dependencies