GUT is a Generator of Useful Types 😉 You write down a description of structured types in a syntax inspired by different languages of the curly braces family and the ML family. GUT takes that description and generates type definitions with useful properties, such as comparison operators and certain reflection capabilities, in an output language of your choice.
Installation
The code implementing GUT is written in C++, because that language most dearly needs such a generator. Other output modules exist, however.
If you unpacked a binary distribution
archive of GUT, you should be able to run the program gut
(for unixoid
systems) or gut.exe
(for windows-like systems) directly.
You can verify the authenticity of the distribution archives using
Minisign or
GnuPG:
minisign -Vm gut-$VERSION.zip -P RWQqcIdQLuG/Iiw3rcwD9D3Wu8UojK49fBSLibuda44tqkEcZV+xTfYC
gpg --verify gut-$VERSION.zip.asc
Alternatively, you can run GUT using
Docker. Images are provided in the repository
chust/gut. Where you would run
gut
, you can instead type docker run --rm chust/gut
.
If you want to compile GUT from source, you will need a C++14 compiler, the CMake build tool, and the Conan package manager. The program depends on MMU and GMP
Command Line Use
In the most basic scenario, the gut
executable reads type declarations from
standard input and writes generated definitions to standard output, but in
order to know what kind of output it should generate, you have to supply the
--lang
option. For example, this will pretty print the input back in the type
declaration format:
$ echo 'type Foo = string' | gut --lang=gut
reading root module <stdin>...
writing <stdout>...
type Foo = string
Most of the time you will probably save your type declarations and the
generated output in files. In this case, gut
can infer the output format from
the extension of the output filename. For example, this will read input from
foo.gut
and turn the declarations into a technical reference document in HTML
format, named foo.html
:
$ gut foo.gut -o foo.html
reading root module 'foo.gut'...
writing 'foo.html'...
For detailed information about the command line options, invoke gut
with the
--help
option. If you want information about generator-specific options for
your chosen output type, add the --lang
option as well. For example, this
will print information how to use the C++14 output generator:
$ gut --lang=c++ --help
Usage: gut.exe [OPTIONS] SOURCES...
...
Type Declaration Language
The declaration language supported by GUT consists of three kinds of elements: Types (obviously), constants, and modules. The different declarations are described below in detail. Every declaration is conceptually part of a module body:
Decl = TypeDecl | ConstDecl | ModuleDecl | ";".
ModuleBody = Decl {Decl}.
Declarations may optionally be separated by any number of semicolons, but they are not required to be.
Comments following the C++ style can be inserted anywhere between the other
tokens of the grammar: The character combinations /*
and */
delimit block
comments (which may be nested), and two slashes //
start a line comment. A
line comment starting with three slashes and a space ///
is considered
documentation and will be associated with the next declaration encountered in
the input.
Type Declarations
A type declaration is introduced by the type
keyword, followed by an
identifier for the type, an equals sign, some optional hints, and the actual
definition:
TypeDecl = "type" Ident "=" {TypeHint} ...
TypeHint = "@struct" | "@flags".
A type hint is a keyword starting with an at-sign @
. The meaning of the hints
is described in the sections about the types that make use of them.
The type definition can be as simple as a keyword or identifier denoting a basic or already defined type, or as complex as a discriminated union with cases that have tuples and records as arguments.
Basic Types
The set of basic types consists of the following:
BasicType
= "void"
| "bool"
| "int8" | "uint8"
| "int16" | "uint16"
| "int32" | "uint32"
| "int64" | "uint64"
| "bigint"
| "float32"
| "float64"
| "string".
They keyword void
denotes the empty tuple conveying no contained information.
bool
is the logical data type consisting of the values true
and false
.
bigint
is an arbitrary precision integer. And string
is a string of
characters.
The int*
and uint*
keywords denote fixed-width signed and unsigned integers
respectively. The float*
keywords denote fixed-width binary floating point
values.
For example, a simple type alias may be created with a declaration like this:
type Word = int32
Tuple Types
A tuple is a structure with multiple anonymous members, possibly of different types. The members are usually accessed by position, or by fixed names indicating the position.
A tuple is written as a parenthesized list of other types, delimited by commas:
TupleType = "(" [TypeExpr {"," TypeExpr}] ")".
For example, a pair of a string and a number may be declared like this:
type NamedNumber = (string, float64)
The type expression consisting only of an empty pair of parentheses ()
is
equivalent to the basic void
type.
Container Types
A container is a structure bundling many elements of the same type(s). GUT supports linear and associative containers, both denoted using a similar syntax:
SeqType = "[" [LengthConst | TypeExpr] "]" TypeExpr
The type expression for a container starts with an opening bracket and ends with the type of contained values. If the opening bracket is immediately followed by a closing bracket, the container is a variable-size linear one. If the brackets contain a non-negative integer number, the container is a fixed-size linear one with the given number of elements. If the brackets contain a type, that is the key type for an associative container.
For example, an array of four integers may be declared like this:
type FourInts = [4]int32
A variable-size array of floating point numbers would be declared like this:
type SomeFloats = []float64
An associative array mapping strings to numbers could look this this:
type NamedNumbers = [string]bigint
And a set of two-integer pairs with no associated values would be written like this:
type UniquePoints = [(int32, int32)]void
Record Types
Like a tuple, a record consists of a fixed set of members, possibly of different types. Unlike a tuple, the members of a record have names that can be used to address them.
A record is written as a set of members enclosed within braces. Each member is a pair of an identifier and a type, separated by a colon and terminated by a semicolon. The semicolons are not optional:
RecordType = "{" {Ident ":" TypeExpr ";"} "}".
For example, a record representing a place marker may look like this:
type Place = { Position : (float32, float32); Description : string; }
By default, the output generators will try to represent record-typed members
and elements in other types as references, while tuple-typed members may be
inlined. You can request that a records should be structurally inlined in other
types by adding the @struct
hint. For example, a record representing an
integer pair of coordinates could be declared like this:
type Point = @struct { X : int32; Y : int32; }
Union Types
A union is a structure with a number of mutually exclusive alternative representations. In GUT, you declare a name for each alternative, optionally assign it an identifying integer "tag", and optionally describe the members stored with it.
You write a union type as the list of alternatives, each introduced by a pipe
symbol, followed by the symbolic identifier, then optionally followed by an
equals sign and a numeric identifier, and then optionally followed by the
keyword of
and an argument type:
UnionCase = "|" Ident ["=" IntConst] ["of" TypeExpr].
UnionType = UnionCase {UnionCase}.
A classic use case of a union type is an option that either holds a value or signals the absence of any useful value (without abusing some potentially legal value for that purpose). Such an option holding a number may be declared like this:
type UIntOption =
| Some of uint32
| None
In fact, this type is so useful that a shorthand exists and an equivalent type can also be written as follows:
type UIntOption = ?uint32
Another common special case for union types is the declaration of an enumerated type with no argument type associated with any of the cases. Typically, output generators can represent such a type particularly efficiently. An example may look like this:
type MyEnum = | A | B = 42 | C
If you do not specify an explicit numeric tag for each of the cases, the tag value increments by one from the previous case.
An enumeration may additionally be annotated with the @flags
hint, which has
two effects:
- Instead of starting from tag zero and incrementing by one for every implicitly tagged case, the start value becomes one and is multiplied by two for implicitly tagged cases.
- Output generators will try to represent the type in a way that allows the cases to be combined as some kind of bitfield or set.
This example declaration represents style flags for some text formatting application:
type Styles = @flags | BOLD | ITALIC | UNDERLINE | STRIKE
For union types that do have arguments, note that you can also use tuples or records to associate multiple data members with a case, like this:
type MyAlternatives =
| Nothing
| Exact of bigint
| Inexact of { Value : float64; Tolerance : float64; }
| Error of (int32, string)
Opaque Types
The representation of an opaque type depends entirely on the output generator. It may be a partially defined type whose implementation details are not provided by GUT, or some polymorphic object or pointer type.
The keyword opaque
denotes an opaque type in a type declaration:
type Unspecified = opaque
Constant Declarations
A constant declaration is introduced by the const
keyword, followed by an
identifier for the constant, an optional colon and an explicit type, an equals
sign and a value:
ConstDecl = "const" Ident [":" TypeExpr] "=" ConstExpr.
ConstExpr = "true" | "false" | IntConst | FloatConst | StringConst
| "import" StringConst
A subset of the GUT types can be represented as literals, namely bool
, any
integer type, any floating point type, and string
.
Numbers can be written in binary, octal, decimal, or hexadecimal notation,
indicated by the 0b
, 0o
, 0d
, or 0x
prefixes respectively. The 0d
prefix is optional, as decimal numbers are the default. An optional +
or -
sign comes before the base prefix. Floating point numbers are written in the
usual point notation with an optional exponent suffix; the exponent is
introduced by e
or p
(case insensitive, and only p
works for
hexadecimal), followed by a signed decimal integer.
For example, a numeric constant may look like this:
const TheAnswer : bigint = 0x2A
Strings are written in the usual double quotes notation with backslash escape sequences:
Escape | Translation |
---|---|
\t |
Horizontal tabulator character |
\b |
Backspace character |
\r |
Carriage return character |
\n |
Newline character |
\xNN |
Unicode codepoint number 0xNN |
\uNNNN |
Unicode codepoint number 0xNNNN |
\UNNNNNN |
Unicode codepoint number 0xNNNNNN |
\ + any character |
The character following the backslash copied verbatim |
In addition to literals, the contents of a file can be imported as a container
of type [N]uint8
, where N
is the size of the file. The path after the
import
keyword is written as a string and interpreted relative to the input
containing the import (or relative to the current working directory if the
input does not come from a file).
If an explicit type is specified for the constant and the assigned value has a different type, a converion is attempted:
- Booleans convert to the integers zero or one.
- Integers convert among each other as long as the concrete value is representable.
- Floating point numbers convert among each other.
- Strings convert to their UTF-8 representation as
[N]uint8
and vice versa.
Even when a constant is declared with a variable-size linear container type, the concrete size is always included in the final type.
Module Declarations
Conceptually, all declarations in GUT are part of a module scope, either the implicit scope introduced by an input stream, or an explicitly declared scope.
An explicit module scope is created with the module
keyword followed by an
identifier, and the module body enclosed in braces. A module may also be
imported from a source file, located by a path relative to the input containing
the import (or relative to the current working directory if the input does not
come from a file):
ModuleDecl = "module" Ident "{" ModuleBody "}"
| "module" Ident "=" "import" StringConst.
A declaration of two nested modules (not counting the implicit top-level scope) may look like this:
module Outer {
module Inner {
type MyInt = int32
}
const Stuff : Inner.MyInt = 42
}
Note how the type given to the constant Stuff
uses a qualified identifier to
refer to the type declared in the Inner
scope. Identifiers are looked up
relative to the scopes in which they occur, starting from the innermost
surrounding scope, unless they start with a .
, in which case the lookup
starts at the outermost surrounding scope associated with the current input.
Language Mappings
C++
The C++14 language mapping results in rather verbose, but hopefully easy to use code:
- The basic type
bool
also exists in C++. - The fixed-width integer types are represented by the corresponding
std::int*_t
types. bigint
can be configured to use Botan.BigInt or mpz_class, but by default only a decimal string representation is stored.float32
becomesfloat
,float64
becomesdouble
.string
can be represented asstd::wstring
,std::string
,const wchar_t *
, orconst char *
. When using the raw pointer types, some features relying on automatic container memory management are not available.- Non two-element tuples, including
void
if necessary, are represented usingstd::tuple
. - Two-element tuples are represented using
std::pair
. - Fixed-size linear containers are represented as
std::array
. - Variable-size linear containers are represented as
std::vector
. - Associative containers are represented as
std::set
orstd::map
. - Records in a type declaration or as a union case argument are represented as
class
orstruct
types. Record type expressions in other places are not supported. Unless a@struct
hint is present, records are wrapped instd::shared_ptr
(or a similar template class) when included in other types. If a@struct
hint is present, records are structurally inlined in other types and use thestruct
keyword rather than theclass
keyword for their declaration. - Enumerations are represented as
enum struct
types. If a@flags
hint is present, bitwise operators are generated for the type.to_string
andto_<ident>
conversion functions between enumerations and strings are created. - Options are represented as
std::shared_ptr
to the value type. - Other union types are represented by a hierarchy of classes with a common
base class that holds the tag value, a subclass for each union case, and a
wrapper type declared using
class
orunion
in C++, which can hold a value of any of the leaf classes. Unless a@struct
hint is present, unions are wrapped instd::shared_ptr
(or a similar template class) when included in other types. If a@struct
hint is present, unions are structurally inlined in other types and use theunion
keyword rather than theclass
keyword for their declaration. Note that unions require a fairly large amount of trivial generated code to implement safe construction, copying, and destruction. - Opaque types are represented by forward-declared
struct
orclass
types, depending on the presence of a@struct
hint. - Constants are represented as
const
variables. - Modules are represented as namespaces in C++.
Comparison operators are generated for all types.
C#
The C# language mapping is mostly straightforward:
- The basic types
bool
andstring
also exists in C#. - The fixed-width integer types are represented by the corresponding basic
types
sbyte
,byte
,short
,ushort
,int
,uint
,long
, orulong
. bigint
is represented bySystem.Numerics.BigInteger
.float32
becomesfloat
,float64
becomesdouble
.- Tuples, including
void
if necessary, are represented usingSystem.Tuple
, orSystem.ValueTuple
if a@struct
hint is present. - Fixed-size linear containers are represented as arrays.
- Variable-size linear containers are represented as
System.Collections.Generic.List
. - Associative containers are represented as
System.Collections.Generic.SortedSet
orSystem.Collections.Generic.SortedDictionary
. - Records in a type declaration or as a union case argument are represented as
record class
orrecord struct
types. Record type expressions in other places are not supported. If a@struct
hint is present, records are structurally inlined in other types and use thestruct
keyword rather than theclass
keyword for their declaration. - Enumerations are represented as
enum
types. If a@flags
hint is present, a[System.Flags]
attribute is added to the type declaration, and an unsigned integer type is used as the base type. - Options are represented as nullable types. Nesting of option types is not preserved.
- Other union types are represented by a hierarchy of record classes with a common base class that holds the tag value, and a subclass for each union case.
- Opaque types are represented by the universal
System.Object
type, if enabled. Otherwise, opaque types have to be defined externally. - Constants are represented as
static readonly
variables. - Modules are represented as
static class
declarations in C#.
F#
The F# language mapping is very straightforward:
- The basic types all exists in F# as well, only
void
is calledunit
in F#. - Tuples are represented as
struct
tuples. - Fixed-size linear containers are represented as arrays.
- Variable-size linear containers are represented as
ResizeArray
. - Associative containers are represented as
Set
orMap
. - Records are represented as records in a type declaration, or as anonymous
records if they occur somewhere else. If a
@struct
hint is present for a record type declaration, the[<Struct>]
attribute will be added to the corresponding declaration in F#. - Enumerations are represented as enumerations. If a
@flags
hint is present for a union type declaration, the[<System.Flags>]
attribute will be added to the corresponding declaration in F#. - Any other union type is represented as a discriminated union type.
- Opaque types are represented by the universal
obj
type, if enabled. Otherwise, opaque types have to be defined externally. - Constants are represented as regular let bindings.
- Modules are represented as modules in F#.
Kotlin
The Kotlin language mapping is mostly straightforward:
- An empty type is represented by
Unit
. - The basic type
bool
is represented byBoolean
. - The fixed-width integer types are represented by the corresponding types
Byte
,UByte
,Short
,UShort
,Int
,UInt
,Long
, orULong
. bigint
is represented byjava.math.BigInteger
.float32
becomesFloat
,float64
becomesDouble
.string
is represented byString
.- Two-element tuples are represented using
Pair
, other tuples are only supported as the right-hand side of a type declaration. - Fixed-size linear containers are represented as arrays.
- Variable-size linear containers are represented as
List
. - Associative containers are represented as
Set
orMap
. - Records and tuples in a type declaration or as a union case argument are
represented as
data class
types. Record type expressions in other places are not supported. - Enumerations are represented as
enum
types. If a@flags
hint is present, an unsigned integer type is used for the tag values, and references to the type wrapjava.util.EnumSet
around the declared type. - Options are represented as nullable types. Nesting of option types is not preserved.
- Other union types are represented by a hierarchy of classes with a common base class that holds the tag value, and a subclass for each union case.
- Opaque types are represented by the universal
Any
type, if enabled. Otherwise, opaque types have to be defined externally. - Constants are represented as
val
declarations. - Modules are represented as
object
declarations in Kotlin.
Python
The Python language mapping is mostly straightforward:
- The basic type
bool
also exists in Python. void
is represented asNone
.- All integer types are represented as
int
. - All floating-point types are represented as
float
. - Tuples are represented as
tuple
. - All linear containers are represented as
list
. - Associative containers are represented as
set
ordict
. - Records are represented as
typing.NamedTuple
in a record type declaration, or withdataclasses.dataclass
in a union type declaration. Record type expressions in other places are not supported. - Enumerations are represented as subclasses of
enum.Enum
orenum.Flag
, depending on whether a@flags
hint is present. - Options are represented as
typing.Optional
. - Any other union type is represented as a hierarchy of classes with a common base class that holds the tag value, and a subclass for each union case.
- Opaque types are represented by subtypes of
typing.Any
created usingtyping.NewType
. - Constants are represented as module variables.
- Modules are represented as name prefixes delimited by underscores.