Documentation
¶
Overview ¶
Package normalize sanitizes journal Markdown for static site rendering by fixing fences, headings, and list formatting.
Index ¶
- func CollectTurnNumbers(lines []string) []int
- func FindTurnBoundary(lines []string, mask []bool, startIdx int, turnSeq []int, turnNum int, ...) int
- func IsBoilerplateToolOutput(raw []string) bool
- func NextInSequence(sorted []int, n int) int
- func NormalizeContent(content string, fencesVerified bool) string
- func PreBlockMask(lines []string) []bool
- func ProcessTurns(content, roleKey string, ...) string
- func SplitTrailingFooter(body []string) ([]string, []string)
- func StripPreWrapper(body []string) []string
- func TrimBlankLines(lines []string) []string
- func WrapToolOutputs(content string) string
- func WrapUserTurns(content string) string
- type TurnMatch
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CollectTurnNumbers ¶
CollectTurnNumbers extracts all turn numbers from turn headers in the document, returning them sorted and deduplicated. Headers inside <pre> blocks are skipped: they are embedded content from tool outputs that read other journal files.
func FindTurnBoundary ¶
func FindTurnBoundary( lines []string, mask []bool, startIdx int, turnSeq []int, turnNum int, turnTime string, ) int
FindTurnBoundary scans lines from startIdx to find the boundary of the current turn body: the last occurrence of expectedNext turn number that is not inside a pre block and has a timestamp >= turnTime.
Parameters:
- lines: All lines in the document
- mask: Pre-block mask (true = inside pre)
- startIdx: Index to start scanning from
- turnSeq: Sorted sequence of all turn numbers in the document
- turnNum: Current turn number
- turnTime: Current turn timestamp (for ordering)
Returns:
- int: Index of the boundary line (or len(lines) if EOF)
func IsBoilerplateToolOutput ¶
IsBoilerplateToolOutput returns true if the tool output body contains only empty lines or low-value confirmation messages that add no information to the rendered journal page. Both the Tool Output header and body are dropped.
Detected patterns:
- Empty body (no non-blank lines)
- "No matches found" (grep/glob with zero results)
- Edit confirmations ("The file ... has been updated successfully.")
- Hook denials ("Hook PreToolUse:... denied this tool")
func NextInSequence ¶
NextInSequence returns the smallest number in the sorted slice that is strictly greater than n. Returns -1 if no such number exists.
func NormalizeContent ¶
NormalizeContent sanitizes journal Markdown for static site rendering:
- Strips code fence markers (eliminates nesting conflicts)
- Wraps Tool Output and User sections in <pre><code> with HTML-escaped content
- Sanitizes H1 headings (strips Claude tags, truncates to 75 chars)
- Demotes non-turn-header headings to bold (prevents broken page structure)
- Inserts blank lines before list items when missing (Python-Markdown requires them)
- Strips bold markers from tool-use lines (**Glob: *.md** -> Glob: *.md)
- Escapes glob-like * characters outside code blocks
- Replaces inline code spans containing angle brackets with quoted entities
Heavy formatting (metadata tables, proper fence reconstruction) is left to the ctx-journal-normalize skill which uses AI for context-aware cleanup.
Parameters:
- content: Raw Markdown content of a journal entry
- fencesVerified: Whether the file's fences have been verified via state
Returns:
- string: Sanitized content ready for static site rendering
func PreBlockMask ¶
PreBlockMask returns a boolean slice where mask[i] is true if line i is inside a <pre> block (between <pre>/<pre><code> and </pre>/</code></pre>). This allows turn-header scanning to skip embedded headers from tool outputs that quote other journal files.
func ProcessTurns ¶
func ProcessTurns( content, roleKey string, processFn func(out, body []string, atEOF bool) []string, ) string
ProcessTurns iterates lines, matching turn headers with the given role, and delegates body processing to the provided callback. Non-matching lines are passed through unchanged.
Parameters:
- content: Full document content
- roleKey: YAML DescKey for the role to match (e.g., DescKeyLabelToolOutput)
- processFn: Called with (out, body, atEOF) for each matched turn; returns updated out slice
Returns:
- string: Processed content
func SplitTrailingFooter ¶
SplitTrailingFooter splits a multipart navigation footer from the end of tool output body lines. The footer pattern is: a "---" separator followed (possibly across multiple lines) by a "**Part N of M**" label with navigation links. Returns (body without footer, footer lines). If no footer is found, returns the original body and nil.
func StripPreWrapper ¶
StripPreWrapper removes <details>, <summary>, <pre>, </pre>, </details> wrapper lines from tool output body. When <pre> tags are found (the old export format that HTML-escapes content), entities are unescaped. When only <details>/<summary> are found (CollapseToolOutputs format), inner content is returned as-is since it was never HTML-escaped.
Returns raw content lines ready for wrapping.
func TrimBlankLines ¶
TrimBlankLines removes leading and trailing blank lines from a slice.
Parameters:
- lines: Input lines
Returns:
- []string: Trimmed lines (may be empty)
func WrapToolOutputs ¶
WrapToolOutputs finds Tool Output sections and wraps their body in <pre><code> with HTML-escaped content. This prevents all markdown interpretation: headings, separators, HTML tags, fence markers all become inert entities.
Requires pymdownx.highlight with use_pygments=false in the zensical config (set in TplZensicalTheme) to prevent the highlight extension from hijacking <pre><code> blocks.
Tool outputs already wrapped in <details><pre> by the export pipeline are unwrapped and unescaped before re-escaping uniformly.
Boundary detection: all turn numbers are pre-scanned and sorted. For turn N, the boundary target is the minimum turn number > N across the entire document. This correctly skips embedded turn headers from other journal files (e.g., ### 802. Assistant inside a tool output that read another session's file) because the real next turn (### 42.) is always the smallest number > N.
func WrapUserTurns ¶
WrapUserTurns finds User turn bodies and wraps them in <pre><code> with HTML-escaped content. This is the "defencify" strategy: user input is treated as plain preformatted text, which eliminates an entire class of rendering bugs caused by stray/unclosed fence markers in user messages.
Requires pymdownx.highlight with use_pygments=false in the zensical config (set in TplZensicalTheme). With Pygments enabled, the highlight extension hijacks <pre><code> and transforms block boundaries.
Type 1 HTML block (<pre>) survives blank lines (ends at </pre>, not at a blank line). HTML escaping prevents ALL inner content conflicts: fence markers, headings, HTML tags, etc. all become inert entities.
Trade-off: markdown formatting in user messages (bold, links, lists) is flattened to plain text. This is acceptable: preserving user input verbatim is more valuable than rendering decorative formatting.
Boundary detection reuses the same pre-scan + last-match-wins approach as WrapToolOutputs.
Types ¶
type TurnMatch ¶
TurnMatch holds the result of matching a turn header line.
func MatchTurnHeader ¶
MatchTurnHeader attempts to parse a turn header from a line.
Parameters:
- line: Raw line to match (will be trimmed)
- mask: Whether this line is inside a pre block
Returns:
- *TurnMatch: Parsed turn data, or nil if not a turn header