colfer

package module
v0.0.0-...-24fa777 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 30, 2016 License: CC0-1.0 Imports: 13 Imported by: 0

README

Colfer Build Status

Colfer is a schema-based binary serialization format optimized for speed and size.

The project's compiler colf(1) generates source code from schema definitions to marshal and unmarshall data structures.

This is free and unencumbered software released into the public domain. The format is inspired by Protocol Buffer.

Language Support
  • C, C99 compliant, C++13 compliant, WIP, API might change
  • Go, a.k.a. golang
  • Java, Android compatible
  • JavaScript, a.k.a. ECMAScript, NodeJS compatible
Features
  • Simple and straightforward in use
  • No dependencies other than the core library
  • Both faster and smaller than: Protocol Buffers, FlatBuffers and MessagePack
  • Robust including size protection
  • Maximum of 127 fields per data structure
  • No support for enumerations
  • Framed; suitable for concatenation/streaming
TODO's
  • RMI (WIP GoDoc )
  • List type support for integers and timestamps
  • Please share your experiences

Use

Download a prebuilt compiler or run go get -u github.com/pascaldekloe/colfer/cmd/colf to make one yourself. Without arguments the command prints its manual.

NAME
	colf — compile Colfer schemas

SYNOPSIS
	colf [ options ] language [ file ... ]

DESCRIPTION
	Generates source code for a language. The options are: C, Go,
	Java and JavaScript.
	The file operands specify the input. Directories are scanned for
	files with the colf extension. If file is absent, colf includes
	the working directory.
	A package can have multiple schema files.

OPTIONS
  -b directory
    	Use a specific destination base directory. (default ".")
  -f	Normalizes schemas on the fly.
  -l expression
    	Sets the default upper limit for the number of elements in a
    	list. The expression is applied to the target language under the
    	name ColferListMax. (default "64 * 1024")
  -p prefix
    	Adds a package prefix. Use slash as a separator when nesting.
  -s expression
    	Sets the default upper limit for serial byte sizes. The
    	expression is applied to the target language under the name
    	ColferSizeMax. (default "16 * 1024 * 1024")
  -v	Enables verbose reporting to the standard error.

EXIT STATUS
	The command exits 0 on succes, 1 on compilation failure and 2
	when invoked without arguments.

EXAMPLES
	Compile ./api/*.colf into ./src/ as Java:

		colf -p com/example -b src java api

	Compile ./io.colf with compact limits as C:

		colf -s 2048 -l 96 c io.colf

BUGS
	Report bugs at https://github.com/pascaldekloe/colfer/issues

SEE ALSO
	protoc(1)

It is recommended to commit the generated source code to the respective version control. Maven users may disagree.

Schema

Data structures are defined in .colf files. The format is quite conventional.

// Package demo offers a demonstration.
// These comment lines will end up in the generated code.
package demo

// Coarse is the grounds where the game of golf is played.
type coarse struct {
	ID    uint64
	name  text
	holes []hole
	image binary
	tags  []text
}

type hole struct {
	// Lat is the latitude of the cup.
	lat float64
	// Lon is the longitude of the cup.
	lon float64
	// Par is the difficulty index.
	par uint8
	// Water marks the presence of water.
	water bool
	// Sand marks the presence of sand.
	sand bool
}

See what the generated code looks like.

The following table shows how Colfer data types are applied per language.

Colfer C Go Java JavaScript
bool bool bool boolean Boolean
uint8 uint8_t uint8 byte † Number
uint16 uint16_t uint16 short † Number
uint32 uint32_t uint32 int † Number
uint64 uint64_t uint64 long † Number ‡
int32 int32_t int32 int Number
int64 int64_t int64 long Number ‡
float32 float float32 float Number
float64 double float64 double Number
timestamp 2 × time_t Time †† Instant Date + Number
text char †‡ string String ‡‡ String ‡‡
binary uint8_t †‡ []byte byte[] Uint8Array
list †‡ slice array Array
  • † signed representation of unsigned data, i.e. may overflow to negative.
  • ‡ range limited to (1 - 2⁵³, 2⁵³ - 1)
  • †† timezone not preserved
  • †‡ struct of pointer + size_t
  • ‡‡ characters limited by UTF-16 (U+0000, U+10FFFF)

Lists may contain floating points, text, binaries or data structures.

Compatibility

Name changes do not affect the serialization format. Deprecated fields can be renamed to clearly discourage its use.

The following changes are backward compatible.

  • New fields at the end of Colfer structs
  • Change datatype int32 into int64
  • Change datatype text into binary

Performance

BenchmarkMarshal/colfer-8   	20000000	        65.9 ns/op	      48 B/op	       1 allocs/op
BenchmarkMarshal/protobuf-8 	20000000	        81.4 ns/op	      52 B/op	       1 allocs/op
BenchmarkMarshal/flatbuf-8  	 2000000	       701 ns/op	     472 B/op	      12 allocs/op
BenchmarkUnmarshal/colfer-8 	20000000	        94.3 ns/op	      84 B/op	       2 allocs/op
BenchmarkUnmarshal/protobuf-8         	10000000	       128 ns/op	      84 B/op	       2 allocs/op
BenchmarkUnmarshal/flatbuf-8          	10000000	       152 ns/op	      84 B/op	       2 allocs/op
BenchmarkMarshalReuse/colfer-8        	50000000	        36.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkMarshalReuse/protobuf-8      	30000000	        48.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkMarshalReuse/flatbuf-8       	 5000000	       294 ns/op	       0 B/op	       0 allocs/op
BenchmarkUnmarshalReuse/colfer-8      	20000000	        62.7 ns/op	      20 B/op	       1 allocs/op
BenchmarkUnmarshalReuse/protobuf-8    	20000000	        92.5 ns/op	      20 B/op	       1 allocs/op
BenchmarkUnmarshalReuse/flatbuf-8     	10000000	       119 ns/op	      20 B/op	       1 allocs/op

Format

Data structures consist of zero or more field value definitions followed by a termination byte 0x7f. Only those fields with a value other than the zero value may be serialized. Fields appear in order as stated by the schema.

The zero value for booleans is false, integers: 0, floating points: 0.0, timestamps: 1970-01-01T00:00:00.000000000Z, text & binary: the empty string, nested data structures: null and an empty list for data structure lists.

Data is represented in a big-endian manner. The format relies on varints also known as a variable-length quantity. Bits reserved for future use (RFU) must be set to 0.

Value Definiton

Each definition starts with an 8-bit header. The 7 least significant bits identify the field by its (0-based position) index in the schema. The most significant bit is used as a flag.

Boolean occurrences set the value to true. The flag is RFU.

Unsigned 8-bit integer values just follow the header byte and the flag is RFU. Unsigned 16-bit integer values greather than 255 also follow the header byte. Smaller values are encoded in one byte with the header flag set. Unsigned 32-bit integer values less than 1<<21 are encoded as varints and larger values set the header flag for fixed length encoding. Unsigned 64-bit integer values less than 1<<49 are encoded as varints and larger values set the header flag for fixed length encoding.

Signed 32- and 64-bit integers are encoded as varints. The flag stands for negative. The tenth byte for 64-bit integers is skipped for encoding since its value is fixed to 0x01.

Floating points are encoded conform IEEE 754. The flag is RFU.

Timestamps are encoded as a 32-bit unsigned integer for the number of seconds that have elapsed since 00:00:00 UTC, Thursday, 1 January 1970, not counting leap seconds. When the header flag is set then the number of seconds is encoded as a 64-bit two's complement integer. In both cases the value is followed with 32 bits for the nanosecond fraction. Note that the first two bits are RFU.

The data for text and binaries is prefixed with a varint byte size declaration. Text is encoded as UTF-8. The flag is RFU.

Lists of floating points, text, binaries and data structures are prefixed with a varint element size declaration. The flag is RFU.

Documentation

Overview

Package colfer provides schema definitions.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Format

func Format(file string) (changed bool, err error)

Format normalizes the file's content. The content of file is expected to be syntactically correct.

func GenerateECMA

func GenerateECMA(basedir string, packages []*Package) error

GenerateECMA writes the code into file "Colfer.js".

func GenerateGo

func GenerateGo(basedir string, packages []*Package) error

GenerateGo writes the code into file "Colfer.go".

func GenerateJava

func GenerateJava(basedir string, packages []*Package) error

GenerateJava writes the code into the respective ".java" files.

func IsECMAKeyword

func IsECMAKeyword(s string) bool

IsECMAKeyword returs whether s is a reserved word in ECMAScript code.

func IsJavaKeyword

func IsJavaKeyword(s string) bool

IsJavaKeyword returs whether s is a reserved word in Java code.

Types

type Field

type Field struct {
	// Struct is the parent.
	Struct *Struct
	// Index is the Struct.Fields position.
	Index int
	// Name is the identification token.
	Name string
	// NameNative is the language specific identification token
	NameNative string
	// Docs are the documentation texts.
	Docs []string
	// Type is the datatype.
	Type string
	// TypeNative is the language specific datatype placeholder.
	TypeNative string
	// TypeRef is the Colfer data structure reference.
	TypeRef *Struct
	// TypeList flags whether the datatype is a list.
	TypeList bool
}

Field is a Struct member definition.

func (*Field) DocText

func (f *Field) DocText(indent string) string

DocText returns the documentation lines prefixed with ident.

func (*Field) NameTitle

func (f *Field) NameTitle() string

NameTitle returns the identification token in title case.

func (*Field) String

func (f *Field) String() string

String returns the qualified name.

type Package

type Package struct {
	// Name is the identification token.
	Name string
	// NameNative is the language specific identification token
	NameNative string
	// Docs are the documentation texts.
	Docs []string
	// Structs are the type definitions.
	Structs []*Struct
	// SchemaFiles are the source filenames.
	SchemaFiles []string
	// SizeMax is the uper limit expression.
	SizeMax string
	// ListMax is the uper limit expression.
	ListMax string
}

Package is a named definition bundle.

func ParseFiles

func ParseFiles(files []string) ([]*Package, error)

ParseFiles returns the schema definitions.

func (*Package) DocText

func (p *Package) DocText(ident string) string

DocText returns the documentation lines prefixed with ident.

func (*Package) HasFloat

func (p *Package) HasFloat() bool

HasFloat returns whether p has one or more floating point fields.

func (*Package) HasList

func (p *Package) HasList() bool

HasList returns whether p has one or more list fields.

func (*Package) HasTimestamp

func (p *Package) HasTimestamp() bool

HasTimestamp returns whether p has one or more timestamp fields.

func (*Package) Refs

func (p *Package) Refs() []*Package

Refs returns all direct references sorted by name.

func (*Package) SchemaFileList

func (p *Package) SchemaFileList() string

SchemaFileList returns a listing text.

type Struct

type Struct struct {
	Pkg *Package
	// Name is the identification token.
	Name string
	// Docs are the documentation texts.
	Docs []string
	// Fields are the elements in order of appearance.
	Fields []*Field
	// SchemaFile is the source filename.
	SchemaFile string
}

Struct is a data structure definition.

func (*Struct) DocText

func (s *Struct) DocText(indent string) string

DocText returns the documentation lines prefixed with ident.

func (*Struct) HasBinary

func (s *Struct) HasBinary() bool

HasBinary returns whether s has one or more binary fields.

func (*Struct) HasBinaryList

func (s *Struct) HasBinaryList() bool

HasBinaryList returns whether s has one or more binary list fields.

func (*Struct) HasFloat

func (s *Struct) HasFloat() bool

HasFloat returns whether s has one or more floating point fields.

func (*Struct) HasList

func (s *Struct) HasList() bool

HasList returns whether s has one or more list fields.

func (*Struct) HasText

func (s *Struct) HasText() bool

HasText returns whether s has one or more text fields.

func (*Struct) HasTimestamp

func (s *Struct) HasTimestamp() bool

HasTimestamp returns whether s has one or more timestamp fields.

func (*Struct) NameTitle

func (s *Struct) NameTitle() string

NameTitle returns the identification token in title case.

func (*Struct) String

func (s *Struct) String() string

String returns the qualified name.

Directories

Path Synopsis
cmd
colf command
go
gen
Package gen tests all field mapping options.
Package gen tests all field mapping options.
rpc
Package rpc implements the net/rpc codecs.
Package rpc implements the net/rpc codecs.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL