normalize

package
v0.8.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 24, 2026 License: Apache-2.0 Imports: 12 Imported by: 0

Documentation

Overview

Package normalize sanitizes journal Markdown for static site rendering by fixing fences, headings, and list formatting.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CollectTurnNumbers

func CollectTurnNumbers(lines []string) []int

CollectTurnNumbers extracts all turn numbers from turn headers in the document, returning them sorted and deduplicated. Headers inside <pre> blocks are skipped: they are embedded content from tool outputs that read other journal files.

func FindTurnBoundary

func FindTurnBoundary(
	lines []string, mask []bool, startIdx int,
	turnSeq []int, turnNum int, turnTime string,
) int

FindTurnBoundary scans lines from startIdx to find the boundary of the current turn body: the last occurrence of expectedNext turn number that is not inside a pre block and has a timestamp >= turnTime.

Parameters:

  • lines: All lines in the document
  • mask: Pre-block mask (true = inside pre)
  • startIdx: Index to start scanning from
  • turnSeq: Sorted sequence of all turn numbers in the document
  • turnNum: Current turn number
  • turnTime: Current turn timestamp (for ordering)

Returns:

  • int: Index of the boundary line (or len(lines) if EOF)

func IsBoilerplateToolOutput

func IsBoilerplateToolOutput(raw []string) bool

IsBoilerplateToolOutput returns true if the tool output body contains only empty lines or low-value confirmation messages that add no information to the rendered journal page. Both the Tool Output header and body are dropped.

Detected patterns:

  • Empty body (no non-blank lines)
  • "No matches found" (grep/glob with zero results)
  • Edit confirmations ("The file ... has been updated successfully.")
  • Hook denials ("Hook PreToolUse:... denied this tool")

func NextInSequence

func NextInSequence(sorted []int, n int) int

NextInSequence returns the smallest number in the sorted slice that is strictly greater than n. Returns -1 if no such number exists.

func NormalizeContent

func NormalizeContent(content string, fencesVerified bool) string

NormalizeContent sanitizes journal Markdown for static site rendering:

  • Strips code fence markers (eliminates nesting conflicts)
  • Wraps Tool Output and User sections in <pre><code> with HTML-escaped content
  • Sanitizes H1 headings (strips Claude tags, truncates to 75 chars)
  • Demotes non-turn-header headings to bold (prevents broken page structure)
  • Inserts blank lines before list items when missing (Python-Markdown requires them)
  • Strips bold markers from tool-use lines (**Glob: *.md** -> Glob: *.md)
  • Escapes glob-like * characters outside code blocks
  • Replaces inline code spans containing angle brackets with quoted entities

Heavy formatting (metadata tables, proper fence reconstruction) is left to the ctx-journal-normalize skill which uses AI for context-aware cleanup.

Parameters:

  • content: Raw Markdown content of a journal entry
  • fencesVerified: Whether the file's fences have been verified via state

Returns:

  • string: Sanitized content ready for static site rendering

func PreBlockMask

func PreBlockMask(lines []string) []bool

PreBlockMask returns a boolean slice where mask[i] is true if line i is inside a <pre> block (between <pre>/<pre><code> and </pre>/</code></pre>). This allows turn-header scanning to skip embedded headers from tool outputs that quote other journal files.

func ProcessTurns

func ProcessTurns(
	content, roleKey string,
	processFn func(out, body []string, atEOF bool) []string,
) string

ProcessTurns iterates lines, matching turn headers with the given role, and delegates body processing to the provided callback. Non-matching lines are passed through unchanged.

Parameters:

  • content: Full document content
  • roleKey: YAML DescKey for the role to match (e.g., DescKeyLabelToolOutput)
  • processFn: Called with (out, body, atEOF) for each matched turn; returns updated out slice

Returns:

  • string: Processed content

func SplitTrailingFooter

func SplitTrailingFooter(body []string) ([]string, []string)

SplitTrailingFooter splits a multipart navigation footer from the end of tool output body lines. The footer pattern is: a "---" separator followed (possibly across multiple lines) by a "**Part N of M**" label with navigation links. Returns (body without footer, footer lines). If no footer is found, returns the original body and nil.

func StripPreWrapper

func StripPreWrapper(body []string) []string

StripPreWrapper removes <details>, <summary>, <pre>, </pre>, </details> wrapper lines from tool output body. When <pre> tags are found (the old export format that HTML-escapes content), entities are unescaped. When only <details>/<summary> are found (CollapseToolOutputs format), inner content is returned as-is since it was never HTML-escaped.

Returns raw content lines ready for wrapping.

func TrimBlankLines

func TrimBlankLines(lines []string) []string

TrimBlankLines removes leading and trailing blank lines from a slice.

Parameters:

  • lines: Input lines

Returns:

  • []string: Trimmed lines (may be empty)

func WrapToolOutputs

func WrapToolOutputs(content string) string

WrapToolOutputs finds Tool Output sections and wraps their body in <pre><code> with HTML-escaped content. This prevents all markdown interpretation: headings, separators, HTML tags, fence markers all become inert entities.

Requires pymdownx.highlight with use_pygments=false in the zensical config (set in TplZensicalTheme) to prevent the highlight extension from hijacking <pre><code> blocks.

Tool outputs already wrapped in <details><pre> by the export pipeline are unwrapped and unescaped before re-escaping uniformly.

Boundary detection: all turn numbers are pre-scanned and sorted. For turn N, the boundary target is the minimum turn number > N across the entire document. This correctly skips embedded turn headers from other journal files (e.g., ### 802. Assistant inside a tool output that read another session's file) because the real next turn (### 42.) is always the smallest number > N.

func WrapUserTurns

func WrapUserTurns(content string) string

WrapUserTurns finds User turn bodies and wraps them in <pre><code> with HTML-escaped content. This is the "defencify" strategy: user input is treated as plain preformatted text, which eliminates an entire class of rendering bugs caused by stray/unclosed fence markers in user messages.

Requires pymdownx.highlight with use_pygments=false in the zensical config (set in TplZensicalTheme). With Pygments enabled, the highlight extension hijacks <pre><code> and transforms block boundaries.

Type 1 HTML block (<pre>) survives blank lines (ends at </pre>, not at a blank line). HTML escaping prevents ALL inner content conflicts: fence markers, headings, HTML tags, etc. all become inert entities.

Trade-off: markdown formatting in user messages (bold, links, lists) is flattened to plain text. This is acceptable: preserving user input verbatim is more valuable than rendering decorative formatting.

Boundary detection reuses the same pre-scan + last-match-wins approach as WrapToolOutputs.

Types

type TurnMatch

type TurnMatch struct {
	Num  int
	Role string
	Time string
}

TurnMatch holds the result of matching a turn header line.

func MatchTurnHeader

func MatchTurnHeader(line string, masked bool) *TurnMatch

MatchTurnHeader attempts to parse a turn header from a line.

Parameters:

  • line: Raw line to match (will be trimmed)
  • mask: Whether this line is inside a pre block

Returns:

  • *TurnMatch: Parsed turn data, or nil if not a turn header

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL