Documentation
¶
Overview ¶
Package lex provides support for a *nix (f)lex like tool on .l sources. The syntax is similar to a subset of (f)lex, see also: http://flex.sourceforge.net/manual/Format.html#Format
Changelog ¶
2014-11-18: Add option for marking an accepting state. Required to support POSIX longest match.
Some feature examples:
/* Unindented multiline Go comments in the definitions section */
Any indented text in the definitions section
%{
Any text in the definitions section within %{ and %}
%}
D [0-9]
%s non-exclusive-start-condition s2 s3
%x exclusive-start-condition e2
%yyt getTopState() // not required when only INITIAL start condition exists
%yyb last == '\n' || last = '\0'
%yyc getCurrentChar()
%yyn move() // get next character
%yym mark() // now in accepting state
%%
Indented text before the first rule is presumably treated specially (renderer specific)
{D}+ return(INT)
{D}+\.{D}+
return(FLOAT)
[a-z][a-z0-9]+
/* identifier found */
return(IDENT)
A"[foo]\"bar"Z println(`A[foo]"barZ`)
^bol|eol$
<non-exclusive-start-condition>foo
%{
println("foo found")
%}
<s2,s3>bar
<INITIAL,e2>abc
<*>"always" println("active in all start conditions")
%%
The optional user code section. Possibly the place where a lexem recognition fail will
be handled (renderer specific).
Missing/differing functionality of the .l parser/FSM generator (compared to flex):
- Trailing context (re1/re2).
- No requirement of an action to start on the same line as the pattern.
- Processing of actions enclosed in braces. This package mostly treats any non blank text following a pattern up to the next pattern as an action source code.
- All flex % prefixed options except %s and %x.
- Flex incompatible %yy* options
- No cclasses ([[:digit:]]).
- Anything special after '(?'.
- Matching <<EOF>>. Still \0 is OK in a pattern.
- And probably more.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type L ¶
type L struct {
// Source code lines for rendering from the definitions section
DefCode []string
// Names of declared start conditions with their respective numeric
// identificators
StartConditions map[string]int
// Start conditions numeric identificators with their respective DFA
// start state
StartConditionsStates map[int]*lexer.NfaState
// Beginnig of line start conditions numeric identificators with their
// respective DFA start state
StartConditionsBolStates map[int]*lexer.NfaState
// Rule[0] is a pseudo rule. It's action contains the source code for
// rendering from the rules section before firts rule
Rules []Rule
// The generated FSM
Dfa lexer.Nfa
// Accept states with their respective rule index
Accepts map[*lexer.NfaState]int
// Source code for rendering from the user code section
UserCode string
// Source code for rendering of get_current_start_condition. Set by
// %yyt.
YYT string
// Source code for rendering of get_bol, i.e. if we are at the
// beginning of line right now. Set by %yyb.
YYB string
// Source code for rendering of get_peek_char, i.e. the char the lexer
// will now consider in making of a decision. Set by %yyc.
YYC string
// Source code for rendering of move_to_next_char, i.e. "consume" the
// current peek char and go to the next one. Set by %yyn.
YYN string
// Source code for rendering of mark_accepting, support to accept
// longest matching but reusing the "overflowed" input. Set by %yym.
YYM string
}
L represents selected data structures found in / generated from a .l source. A [command line] tool using this package may then render L to some programming language source code and/or data table(s).
func NewL ¶
NewL parses a .l source fname from src, returns L or an error if any. Currently it is not reentrant and not invokable more than once in an application (which is assumed tolerable for a "lex" tool). The unoptdfa argument allows to disable optimization of the produced DFA. The mode32 parameter is not yet supported and must be false.
type Rule ¶
type Rule struct {
Conds []string // Start conditions of the rule
Pattern string // Original rule's pattern
BOL bool // Pattern starts with beginning of line assertion (^)
EOL bool // Pattern ends wih end of line ($) assertion
RE string // Pattern translated to a regular expression
Action string // Rule's associated action source code
}
Rule represents data for a pattern/action