GUT is a Generator of Useful Types 😉 You write down a description of structured types in a syntax inspired by different languages of the curly braces family and the ML family. GUT takes that description and generates type definitions with useful properties, such as comparison operators and certain reflection capabilities, in an output language of your choice.
Installation
The code implementing GUT is written in C++, because that language most dearly needs such a generator. Other output modules exist, however.
If you unpacked a binary distribution
archive of GUT, you should be able to run the program gut (for unixoid
systems) or gut.exe (for windows-like systems) directly.
You can verify the authenticity of the distribution archives using
Minisign or
GnuPG:
minisign -Vm gut-$VERSION.zip -P RWQqcIdQLuG/Iiw3rcwD9D3Wu8UojK49fBSLibuda44tqkEcZV+xTfYC
gpg --verify gut-$VERSION.zip.asc
Alternatively, you can run GUT using
Docker. Images are provided in the repository
chust/gut. Where you would run
gut, you can instead type docker run --rm chust/gut.
If you want to compile GUT from source, you will need a C++14 compiler, the CMake build tool, and the Conan package manager. The program depends on MMU and GMP
Command Line Use
In the most basic scenario, the gut executable reads type declarations from
standard input and writes generated definitions to standard output, but in
order to know what kind of output it should generate, you have to supply the
--lang option. For example, this will pretty print the input back in the type
declaration format:
$ echo 'type Foo = string' | gut --lang=gut
reading root module <stdin>...
writing <stdout>...
type Foo = string
Most of the time you will probably save your type declarations and the
generated output in files. In this case, gut can infer the output format from
the extension of the output filename. For example, this will read input from
foo.gut and turn the declarations into a technical reference document in HTML
format, named foo.html:
$ gut foo.gut -o foo.html
reading root module 'foo.gut'...
writing 'foo.html'...
For detailed information about the command line options, invoke gut with the
--help option. If you want information about generator-specific options for
your chosen output type, add the --lang option as well. For example, this
will print information how to use the C++14 output generator:
$ gut --lang=c++ --help
Usage: gut.exe [OPTIONS] SOURCES...
...
Type Declaration Language
The declaration language supported by GUT consists of three kinds of elements: Types (obviously), constants, and modules. The different declarations are described below in detail. Every declaration is conceptually part of a module body:
Decl = TypeDecl | ConstDecl | ModuleDecl | ";".
ModuleBody = Decl {Decl}.
Declarations may optionally be separated by any number of semicolons, but they are not required to be.
Comments following the C++ style can be inserted anywhere between the other
tokens of the grammar: The character combinations /* and */ delimit block
comments (which may be nested), and two slashes // start a line comment. A
line comment starting with three slashes and a space /// is considered
documentation and will be associated with the next declaration encountered in
the input.
Type Declarations
A type declaration is introduced by the type keyword, followed by an
identifier for the type, an equals sign, some optional hints, and the actual
definition:
TypeDecl = "type" Ident "=" {TypeHint} ...
TypeHint = "@struct" | "@flags".
A type hint is a keyword starting with an at-sign @. The meaning of the hints
is described in the sections about the types that make use of them.
The type definition can be as simple as a keyword or identifier denoting a basic or already defined type, or as complex as a discriminated union with cases that have tuples and records as arguments.
Basic Types
The set of basic types consists of the following:
BasicType
= "void"
| "bool"
| "int8" | "uint8"
| "int16" | "uint16"
| "int32" | "uint32"
| "int64" | "uint64"
| "bigint"
| "float32"
| "float64"
| "string".
They keyword void denotes the empty tuple conveying no contained information.
bool is the logical data type consisting of the values true and false.
bigint is an arbitrary precision integer. And string is a string of
characters.
The int* and uint* keywords denote fixed-width signed and unsigned integers
respectively. The float* keywords denote fixed-width binary floating point
values.
For example, a simple type alias may be created with a declaration like this:
type Word = int32
Tuple Types
A tuple is a structure with multiple anonymous members, possibly of different types. The members are usually accessed by position, or by fixed names indicating the position.
A tuple is written as a parenthesized list of other types, delimited by commas:
TupleType = "(" [TypeExpr {"," TypeExpr}] ")".
For example, a pair of a string and a number may be declared like this:
type NamedNumber = (string, float64)
The type expression consisting only of an empty pair of parentheses () is
equivalent to the basic void type.
Container Types
A container is a structure bundling many elements of the same type(s). GUT supports linear and associative containers, both denoted using a similar syntax:
SeqType = "[" [LengthConst | TypeExpr] "]" TypeExpr
The type expression for a container starts with an opening bracket and ends with the type of contained values. If the opening bracket is immediately followed by a closing bracket, the container is a variable-size linear one. If the brackets contain a non-negative integer number, the container is a fixed-size linear one with the given number of elements. If the brackets contain a type, that is the key type for an associative container.
For example, an array of four integers may be declared like this:
type FourInts = [4]int32
A variable-size array of floating point numbers would be declared like this:
type SomeFloats = []float64
An associative array mapping strings to numbers could look this this:
type NamedNumbers = [string]bigint
And a set of two-integer pairs with no associated values would be written like this:
type UniquePoints = [(int32, int32)]void
Record Types
Like a tuple, a record consists of a fixed set of members, possibly of different types. Unlike a tuple, the members of a record have names that can be used to address them.
A record is written as a set of members enclosed within braces. Each member is a pair of an identifier and a type, separated by a colon and terminated by a semicolon. The semicolons are not optional:
RecordType = "{" {Ident ":" TypeExpr ";"} "}".
For example, a record representing a place marker may look like this:
type Place = { Position : (float32, float32); Description : string; }
By default, the output generators will try to represent record-typed members
and elements in other types as references, while tuple-typed members may be
inlined. You can request that a records should be structurally inlined in other
types by adding the @struct hint. For example, a record representing an
integer pair of coordinates could be declared like this:
type Point = @struct { X : int32; Y : int32; }
Union Types
A union is a structure with a number of mutually exclusive alternative representations. In GUT, you declare a name for each alternative, optionally assign it an identifying integer "tag", and optionally describe the members stored with it.
You write a union type as the list of alternatives, each introduced by a pipe
symbol, followed by the symbolic identifier, then optionally followed by an
equals sign and a numeric identifier, and then optionally followed by the
keyword of and an argument type:
UnionCase = "|" Ident ["=" IntConst] ["of" TypeExpr].
UnionType = UnionCase {UnionCase}.
A classic use case of a union type is an option that either holds a value or signals the absence of any useful value (without abusing some potentially legal value for that purpose). Such an option holding a number may be declared like this:
type UIntOption =
| Some of uint32
| None
In fact, this type is so useful that a shorthand exists and an equivalent type can also be written as follows:
type UIntOption = ?uint32
Another common special case for union types is the declaration of an enumerated type with no argument type associated with any of the cases. Typically, output generators can represent such a type particularly efficiently. An example may look like this:
type MyEnum = | A | B = 42 | C
If you do not specify an explicit numeric tag for each of the cases, the tag value increments by one from the previous case.
An enumeration may additionally be annotated with the @flags hint, which has
two effects:
- Instead of starting from tag zero and incrementing by one for every implicitly tagged case, the start value becomes one and is multiplied by two for implicitly tagged cases.
- Output generators will try to represent the type in a way that allows the cases to be combined as some kind of bitfield or set.
This example declaration represents style flags for some text formatting application:
type Styles = @flags | BOLD | ITALIC | UNDERLINE | STRIKE
For union types that do have arguments, note that you can also use tuples or records to associate multiple data members with a case, like this:
type MyAlternatives =
| Nothing
| Exact of bigint
| Inexact of { Value : float64; Tolerance : float64; }
| Error of (int32, string)
Opaque Types
The representation of an opaque type depends entirely on the output generator. It may be a partially defined type whose implementation details are not provided by GUT, or some polymorphic object or pointer type.
The keyword opaque denotes an opaque type in a type declaration:
type Unspecified = opaque
Constant Declarations
A constant declaration is introduced by the const keyword, followed by an
identifier for the constant, an optional colon and an explicit type, an equals
sign and a value:
ConstDecl = "const" Ident [":" TypeExpr] "=" ConstExpr.
ConstExpr = "true" | "false" | IntConst | FloatConst | StringConst
| "import" StringConst
A subset of the GUT types can be represented as literals, namely bool, any
integer type, any floating point type, and string.
Numbers can be written in binary, octal, decimal, or hexadecimal notation,
indicated by the 0b, 0o, 0d, or 0x prefixes respectively. The 0d
prefix is optional, as decimal numbers are the default. An optional + or -
sign comes before the base prefix. Floating point numbers are written in the
usual point notation with an optional exponent suffix; the exponent is
introduced by e or p (case insensitive, and only p works for
hexadecimal), followed by a signed decimal integer.
For example, a numeric constant may look like this:
const TheAnswer : bigint = 0x2A
Strings are written in the usual double quotes notation with backslash escape sequences:
| Escape | Translation |
|---|---|
\t |
Horizontal tabulator character |
\b |
Backspace character |
\r |
Carriage return character |
\n |
Newline character |
\xNN |
Unicode codepoint number 0xNN |
\uNNNN |
Unicode codepoint number 0xNNNN |
\UNNNNNN |
Unicode codepoint number 0xNNNNNN |
\ + any character |
The character following the backslash copied verbatim |
In addition to literals, the contents of a file can be imported as a container
of type [N]uint8, where N is the size of the file. The path after the
import keyword is written as a string and interpreted relative to the input
containing the import (or relative to the current working directory if the
input does not come from a file).
If an explicit type is specified for the constant and the assigned value has a different type, a converion is attempted:
- Booleans convert to the integers zero or one.
- Integers convert among each other as long as the concrete value is representable.
- Floating point numbers convert among each other.
- Strings convert to their UTF-8 representation as
[N]uint8and vice versa.
Even when a constant is declared with a variable-size linear container type, the concrete size is always included in the final type.
Module Declarations
Conceptually, all declarations in GUT are part of a module scope, either the implicit scope introduced by an input stream, or an explicitly declared scope.
An explicit module scope is created with the module keyword followed by an
identifier, and the module body enclosed in braces. A module may also be
imported from a source file, located by a path relative to the input containing
the import (or relative to the current working directory if the input does not
come from a file):
ModuleDecl = "module" Ident "{" ModuleBody "}"
| "module" Ident "=" "import" StringConst.
A declaration of two nested modules (not counting the implicit top-level scope) may look like this:
module Outer {
module Inner {
type MyInt = int32
}
const Stuff : Inner.MyInt = 42
}
Note how the type given to the constant Stuff uses a qualified identifier to
refer to the type declared in the Inner scope. Identifiers are looked up
relative to the scopes in which they occur, starting from the innermost
surrounding scope, unless they start with a ., in which case the lookup
starts at the outermost surrounding scope associated with the current input.
Language Mappings
C++
The C++14 language mapping results in rather verbose, but hopefully easy to use code:
- The basic type
boolalso exists in C++. - The fixed-width integer types are represented by the corresponding
std::int*_ttypes. bigintcan be configured to use Botan.BigInt or mpz_class, but by default only a decimal string representation is stored.float32becomesfloat,float64becomesdouble.stringcan be represented asstd::wstring,std::string,const wchar_t *, orconst char *. When using the raw pointer types, some features relying on automatic container memory management are not available.- Non two-element tuples, including
voidif necessary, are represented usingstd::tuple. - Two-element tuples are represented using
std::pair. - Fixed-size linear containers are represented as
std::array. - Variable-size linear containers are represented as
std::vector. - Associative containers are represented as
std::setorstd::map. - Records in a type declaration or as a union case argument are represented as
classorstructtypes. Record type expressions in other places are not supported. Unless a@structhint is present, records are wrapped instd::shared_ptr(or a similar template class) when included in other types. If a@structhint is present, records are structurally inlined in other types and use thestructkeyword rather than theclasskeyword for their declaration. - Enumerations are represented as
enum structtypes. If a@flagshint is present, bitwise operators are generated for the type.to_stringandto_<ident>conversion functions between enumerations and strings are created. - Options are represented as
std::shared_ptrto the value type. - Other union types are represented by a hierarchy of classes with a common
base class that holds the tag value, a subclass for each union case, and a
wrapper type declared using
classorunionin C++, which can hold a value of any of the leaf classes. Unless a@structhint is present, unions are wrapped instd::shared_ptr(or a similar template class) when included in other types. If a@structhint is present, unions are structurally inlined in other types and use theunionkeyword rather than theclasskeyword for their declaration. Note that unions require a fairly large amount of trivial generated code to implement safe construction, copying, and destruction. - Opaque types are represented by forward-declared
structorclasstypes, depending on the presence of a@structhint. - Constants are represented as
constvariables. - Modules are represented as namespaces in C++.
Comparison operators are generated for all types.
C#
The C# language mapping is mostly straightforward:
- The basic types
boolandstringalso exists in C#. - The fixed-width integer types are represented by the corresponding basic
types
sbyte,byte,short,ushort,int,uint,long, orulong. bigintis represented bySystem.Numerics.BigInteger.float32becomesfloat,float64becomesdouble.- Tuples, including
voidif necessary, are represented usingSystem.Tuple, orSystem.ValueTupleif a@structhint is present. - Fixed-size linear containers are represented as arrays.
- Variable-size linear containers are represented as
System.Collections.Generic.List. - Associative containers are represented as
System.Collections.Generic.SortedSetorSystem.Collections.Generic.SortedDictionary. - Records in a type declaration or as a union case argument are represented as
record classorrecord structtypes. Record type expressions in other places are not supported. If a@structhint is present, records are structurally inlined in other types and use thestructkeyword rather than theclasskeyword for their declaration. - Enumerations are represented as
enumtypes. If a@flagshint is present, a[System.Flags]attribute is added to the type declaration, and an unsigned integer type is used as the base type. - Options are represented as nullable types. Nesting of option types is not preserved.
- Other union types are represented by a hierarchy of record classes with a common base class that holds the tag value, and a subclass for each union case.
- Opaque types are represented by the universal
System.Objecttype, if enabled. Otherwise, opaque types have to be defined externally. - Constants are represented as
static readonlyvariables. - Modules are represented as
static classdeclarations in C#.
F#
The F# language mapping is very straightforward:
- The basic types all exists in F# as well, only
voidis calledunitin F#. - Tuples are represented as
structtuples. - Fixed-size linear containers are represented as arrays.
- Variable-size linear containers are represented as
ResizeArray. - Associative containers are represented as
SetorMap. - Records are represented as records in a type declaration, or as anonymous
records if they occur somewhere else. If a
@structhint is present for a record type declaration, the[<Struct>]attribute will be added to the corresponding declaration in F#. - Enumerations are represented as enumerations. If a
@flagshint is present for a union type declaration, the[<System.Flags>]attribute will be added to the corresponding declaration in F#. - Any other union type is represented as a discriminated union type.
- Opaque types are represented by the universal
objtype, if enabled. Otherwise, opaque types have to be defined externally. - Constants are represented as regular let bindings.
- Modules are represented as modules in F#.
Kotlin
The Kotlin language mapping is mostly straightforward:
- An empty type is represented by
Unit. - The basic type
boolis represented byBoolean. - The fixed-width integer types are represented by the corresponding types
Byte,UByte,Short,UShort,Int,UInt,Long, orULong. bigintis represented byjava.math.BigInteger.float32becomesFloat,float64becomesDouble.stringis represented byString.- Two-element tuples are represented using
Pair, other tuples are only supported as the right-hand side of a type declaration. - Fixed-size linear containers are represented as arrays.
- Variable-size linear containers are represented as
List. - Associative containers are represented as
SetorMap. - Records and tuples in a type declaration or as a union case argument are
represented as
data classtypes. Record type expressions in other places are not supported. - Enumerations are represented as
enumtypes. If a@flagshint is present, an unsigned integer type is used for the tag values, and references to the type wrapjava.util.EnumSetaround the declared type. - Options are represented as nullable types. Nesting of option types is not preserved.
- Other union types are represented by a hierarchy of classes with a common base class that holds the tag value, and a subclass for each union case.
- Opaque types are represented by the universal
Anytype, if enabled. Otherwise, opaque types have to be defined externally. - Constants are represented as
valdeclarations. - Modules are represented as
objectdeclarations in Kotlin.
Python
The Python language mapping is mostly straightforward:
- The basic type
boolalso exists in Python. voidis represented asNone.- All integer types are represented as
int. - All floating-point types are represented as
float. - Tuples are represented as
tuple. - All linear containers are represented as
list. - Associative containers are represented as
setordict. - Records are represented as
typing.NamedTuplein a record type declaration, or withdataclasses.dataclassin a union type declaration. Record type expressions in other places are not supported. - Enumerations are represented as subclasses of
enum.Enumorenum.Flag, depending on whether a@flagshint is present. - Options are represented as
typing.Optional. - Any other union type is represented as a hierarchy of classes with a common base class that holds the tag value, and a subclass for each union case.
- Opaque types are represented by subtypes of
typing.Anycreated usingtyping.NewType. - Constants are represented as module variables.
- Modules are represented as name prefixes delimited by underscores.