GUT

README
Login

GUT is a Generator of Useful Types 😉 You write down a description of structured types in a syntax inspired by different languages of the curly braces family and the ML family. GUT takes that description and generates type definitions with useful properties, such as comparison operators and certain reflection capabilities, in an output language of your choice.

Installation

The code implementing GUT is written in C++, because that language most dearly needs such a generator. Other output modules exist, however.

If you unpacked a binary distribution archive of GUT, you should be able to run the program gut (for unixoid systems) or gut.exe (for windows-like systems) directly. You can verify the authenticity of the distribution archives using Minisign:

minisign -Vm gut-... -P RWQqcIdQLuG/Iiw3rcwD9D3Wu8UojK49fBSLibuda44tqkEcZV+xTfYC

Alternatively, you can run GUT using Docker. Images are provided in the repository chust/gut. Where you would run gut, you can instead type docker run --rm chust/gut.

If you want to compile GUT from source, you will need a C++14 compiler, the CMake build tool, and the Conan package manager. The program depends on MMU and GMP

Command Line Use

In the most basic scenario, the gut executable reads type declarations from standard input and writes generated definitions to standard output, but in order to know what kind of output it should generate, you have to supply the --lang option. For example, this will pretty print the input back in the type declaration format:

$ echo 'type Foo = string' | gut --lang=gut
reading root module <stdin>...
writing <stdout>...
type Foo = string

Most of the time you will probably save your type declarations and the generated output in files. In this case, gut can infer the output format from the extension of the output filename. For example, this will read input from foo.gut and turn the declarations into a technical reference document in HTML format, named foo.html:

$ gut foo.gut -o foo.html
reading root module 'foo.gut'...
writing 'foo.html'...

For detailed information about the command line options, invoke gut with the --help option. If you want information about generator-specific options for your chosen output type, add the --lang option as well. For example, this will print information how to use the C++14 output generator:

$ gut --lang=c++ --help
Usage: gut.exe [OPTIONS] SOURCES...
...

Type Declaration Language

The declaration language supported by GUT consists of three kinds of elements: Types (obviously), constants, and modules. The different declarations are described below in detail. Every declaration is conceptually part of a module body:

Decl = TypeDecl | ConstDecl | ModuleDecl | ";".
ModuleBody = Decl {Decl}.

Declarations may optionally be separated by any number of semicolons, but they are not required to be.

Comments following the C++ style can be inserted anywhere between the other tokens of the grammar: The character combinations /* and */ delimit block comments (which may be nested), and two slashes // start a line comment. A line comment starting with three slashes and a space /// is considered documentation and will be associated with the next declaration encountered in the input.

Type Declarations

A type declaration is introduced by the type keyword, followed by an identifier for the type, an equals sign, some optional hints, and the actual definition:

TypeDecl = "type" Ident "=" {TypeHint} ...
TypeHint = "@struct" | "@flags".

A type hint is a keyword starting with an at-sign @. The meaning of the hints is described in the sections about the types that make use of them.

The type definition can be as simple as a keyword or identifier denoting a basic or already defined type, or as complex as a discriminated union with cases that have tuples and records as arguments.

Basic Types

The set of basic types consists of the following:

BasicType
= "void"
| "bool"
| "int8" | "uint8"
| "int16" | "uint16"
| "int32" | "uint32"
| "int64" | "uint64"
| "bigint"
| "float32"
| "float64"
| "string".

They keyword void denotes the empty tuple conveying no contained information. bool is the logical data type consisting of the values true and false. bigint is an arbitrary precision integer. And string is a string of characters.

The int* and uint* keywords denote fixed-width signed and unsigned integers respectively. The float* keywords denote fixed-width binary floating point values.

For example, a simple type alias may be created with a declaration like this:

type Word = int32

Tuple Types

A tuple is a structure with multiple anonymous members, possibly of different types. The members are usually accessed by position, or by fixed names indicating the position.

A tuple is written as a parenthesized list of other types, delimited by commas:

TupleType = "(" [TypeExpr {"," TypeExpr}] ")".

For example, a pair of a string and a number may be declared like this:

type NamedNumber = (string, float64)

The type expression consisting only of an empty pair of parentheses () is equivalent to the basic void type.

Container Types

A container is a structure bundling many elements of the same type(s). GUT supports linear and associative containers, both denoted using a similar syntax:

SeqType = "[" [LengthConst | TypeExpr] "]" TypeExpr

The type expression for a container starts with an opening bracket and ends with the type of contained values. If the opening bracket is immediately followed by a closing bracket, the container is a variable-size linear one. If the brackets contain a non-negative integer number, the container is a fixed-size linear one with the given number of elements. If the brackets contain a type, that is the key type for an associative container.

For example, an array of four integers may be declared like this:

type FourInts = [4]int32

A variable-size array of floating point numbers would be declared like this:

type SomeFloats = []float64

An associative array mapping strings to numbers could look this this:

type NamedNumbers = [string]bigint

And a set of two-integer pairs with no associated values would be written like this:

type UniquePoints = [(int32, int32)]void

Record Types

Like a tuple, a record consists of a fixed set of members, possibly of different types. Unlike a tuple, the members of a record have names that can be used to address them.

A record is written as a set of members enclosed within braces. Each member is a pair of an identifier and a type, separated by a colon and terminated by a semicolon. The semicolons are not optional:

RecordType = "{" {Ident ":" TypeExpr ";"} "}".

For example, a record representing a place marker may look like this:

type Place = { Position : (float32, float32); Description : string; }

By default, the output generators will try to represent record-typed members and elements in other types as references, while tuple-typed members may be inlined. You can request that a records should be structurally inlined in other types by adding the @struct hint. For example, a record representing an integer pair of coordinates could be declared like this:

type Point = @struct { X : int32; Y : int32; }

Union Types

A union is a structure with a number of mutually exclusive alternative representations. In GUT, you declare a name for each alternative, optionally assign it an identifying integer "tag", and optionally describe the members stored with it.

You write a union type as the list of alternatives, each introduced by a pipe symbol, followed by the symbolic identifier, then optionally followed by an equals sign and a numeric identifier, and then optionally followed by the keyword of and an argument type:

UnionCase = "|" Ident ["=" IntConst] ["of" TypeExpr].
UnionType = UnionCase {UnionCase}.

A classic use case of a union type is an option that either holds a value or signals the absence of any useful value (without abusing some potentially legal value for that purpose). Such an option holding a number may be declared like this:

type UIntOption =
| Some of uint32
| None

In fact, this type is so useful that a shorthand exists and an equivalent type can also be written as follows:

type UIntOption = ?uint32

Another common special case for union types is the declaration of an enumerated type with no argument type associated with any of the cases. Typically, output generators can represent such a type particularly efficiently. An example may look like this:

type MyEnum = | A | B = 42 | C

If you do not specify an explicit numeric tag for each of the cases, the tag value increments by one from the previous case.

An enumeration may additionally be annotated with the @flags hint, which has two effects:

  1. Instead of starting from tag zero and incrementing by one for every implicitly tagged case, the start value becomes one and is multiplied by two for implicitly tagged cases.
  2. Output generators will try to represent the type in a way that allows the cases to be combined as some kind of bitfield or set.

This example declaration represents style flags for some text formatting application:

type Styles = @flags | BOLD | ITALIC | UNDERLINE | STRIKE

For union types that do have arguments, note that you can also use tuples or records to associate multiple data members with a case, like this:

type MyAlternatives =
| Nothing
| Exact of bigint
| Inexact of { Value : float64; Tolerance : float64; }
| Error of (int32, string)

Opaque Types

The representation of an opaque type depends entirely on the output generator. It may be a partially defined type whose implementation details are not provided by GUT, or some polymorphic object or pointer type.

The keyword opaque denotes an opaque type in a type declaration:

type Unspecified = opaque

Constant Declarations

A constant declaration is introduced by the const keyword, followed by an identifier for the constant, an optional colon and an explicit type, an equals sign and a value:

ConstDecl = "const" Ident [":" TypeExpr] "=" ConstExpr.
ConstExpr = "true" | "false" | IntConst | FloatConst | StringConst
          | "import" StringConst

A subset of the GUT types can be represented as literals, namely bool, any integer type, any floating point type, and string.

Numbers can be written in binary, octal, decimal, or hexadecimal notation, indicated by the 0b, 0o, 0d, or 0x prefixes respectively. The 0d prefix is optional, as decimal numbers are the default. An optional + or - sign comes before the base prefix. Floating point numbers are written in the usual point notation with an optional exponent suffix; the exponent is introduced by e or p (case insensitive, and only p works for hexadecimal), followed by a signed decimal integer.

For example, a numeric constant may look like this:

const TheAnswer : bigint = 0x2A

Strings are written in the usual double quotes notation with backslash escape sequences:

Escape Translation
\t Horizontal tabulator character
\b Backspace character
\r Carriage return character
\n Newline character
\xNN Unicode codepoint number 0xNN
\uNNNN Unicode codepoint number 0xNNNN
\UNNNNNN Unicode codepoint number 0xNNNNNN
\ + any character The character following the backslash copied verbatim

In addition to literals, the contents of a file can be imported as a container of type [N]uint8, where N is the size of the file. The path after the import keyword is written as a string and interpreted relative to the input containing the import (or relative to the current working directory if the input does not come from a file).

If an explicit type is specified for the constant and the assigned value has a different type, a converion is attempted:

Even when a constant is declared with a variable-size linear container type, the concrete size is always included in the final type.

Module Declarations

Conceptually, all declarations in GUT are part of a module scope, either the implicit scope introduced by an input stream, or an explicitly declared scope.

An explicit module scope is created with the module keyword followed by an identifier, and the module body enclosed in braces. A module may also be imported from a source file, located by a path relative to the input containing the import (or relative to the current working directory if the input does not come from a file):

ModuleDecl = "module" Ident "{" ModuleBody "}"
           | "module" Ident "=" "import" StringConst.

A declaration of two nested modules (not counting the implicit top-level scope) may look like this:

module Outer {
  module Inner {
    type MyInt = int32
  }

  const Stuff : Inner.MyInt = 42
}

Note how the type given to the constant Stuff uses a qualified identifier to refer to the type declared in the Inner scope. Identifiers are looked up relative to the scopes in which they occur, starting from the innermost surrounding scope, unless they start with a ., in which case the lookup starts at the outermost surrounding scope associated with the current input.

Language Mappings

C++

The C++14 language mapping results in rather verbose, but hopefully easy to use code:

Comparison operators are generated for all types.

C#

The C# language mapping is mostly straightforward:

F#

The F# language mapping is very straightforward:

Kotlin

The Kotlin language mapping is mostly straightforward:

Python

The Python language mapping is mostly straightforward: