GUT is a Generator of Useful Types π You write down a description of structured types in a syntax inspired by different languages of the curly braces family and the ML family. GUT takes that description and generates type definitions with useful properties, such as comparison operators and certain reflection capabilities, in an output language of your choice.
The code implementing GUT is written in C++, because that language most dearly needs such a generator. Other output modules exist, however.
If you unpacked a binary
distribution archive of GUT, you should be able to run the
program gut
(for unixoid systems) or gut.exe
(for windows-like systems) directly. You can verify the authenticity of
the distribution archives using Minisign or GnuPG:
minisign -Vm gut-$VERSION.zip -P RWQqcIdQLuG/Iiw3rcwD9D3Wu8UojK49fBSLibuda44tqkEcZV+xTfYC
gpg --verify gut-$VERSION.zip.asc
Alternatively, you can run GUT using Docker. Images are provided in the
repository chust/gut.
Where you would run gut
, you can instead type
docker run --rm chust/gut
.
If you want to compile GUT from source, you will need a C++14 compiler, the CMake build tool, and the Conan package manager. The program depends on MMU and GMP
In the most basic scenario, the gut
executable reads
type declarations from standard input and writes generated definitions
to standard output, but in order to know what kind of output it
should generate, you have to supply the --lang
option. For
example, this will pretty print the input back in the type declaration
format:
$ echo 'type Foo = string' | gut --lang=gut
reading root module <stdin>...
writing <stdout>...
type Foo = string
Most of the time you will probably save your type declarations and
the generated output in files. In this case, gut
can infer
the output format from the extension of the output filename. For
example, this will read input from foo.gut
and turn the
declarations into a technical reference document in HTML format, named
foo.html
:
$ gut foo.gut -o foo.html
reading root module 'foo.gut'...
writing 'foo.html'...
For detailed information about the command line options, invoke
gut
with the --help
option. If you want
information about generator-specific options for your chosen output
type, add the --lang
option as well. For example, this will
print information how to use the C++14 output generator:
$ gut --lang=c++ --help
Usage: gut.exe [OPTIONS] SOURCES...
...
The declaration language supported by GUT consists of three kinds of elements: Types (obviously), constants, and modules. The different declarations are described below in detail. Every declaration is conceptually part of a module body:
Decl = TypeDecl | ConstDecl | ModuleDecl | ";".
ModuleBody = Decl {Decl}.
Declarations may optionally be separated by any number of semicolons, but they are not required to be.
Comments following the C++ style can be inserted anywhere between the
other tokens of the grammar: The character combinations /*
and */
delimit block comments (which may be nested), and
two slashes //
start a line comment. A line
comment starting with three slashes and a space
///
is considered documentation and will be associated with
the next declaration encountered in the input.
A type declaration is introduced by the type
keyword,
followed by an identifier for the type, an equals sign, some optional
hints, and the actual definition:
TypeDecl = "type" Ident "=" {TypeHint} ...
TypeHint = "@struct" | "@flags".
A type hint is a keyword starting with an at-sign @
. The
meaning of the hints is described in the sections about the types that
make use of them.
The type definition can be as simple as a keyword or identifier denoting a basic or already defined type, or as complex as a discriminated union with cases that have tuples and records as arguments.
The set of basic types consists of the following:
BasicType
= "void"
| "bool"
| "int8" | "uint8"
| "int16" | "uint16"
| "int32" | "uint32"
| "int64" | "uint64"
| "bigint"
| "float32"
| "float64"
| "string".
They keyword void
denotes the empty tuple conveying no
contained information. bool
is the logical data type
consisting of the values true
and false
.
bigint
is an arbitrary precision integer. And
string
is a string of characters.
The int*
and uint*
keywords denote
fixed-width signed and unsigned integers respectively. The
float*
keywords denote fixed-width binary floating point
values.
For example, a simple type alias may be created with a declaration like this:
type Word = int32
A tuple is a structure with multiple anonymous members, possibly of different types. The members are usually accessed by position, or by fixed names indicating the position.
A tuple is written as a parenthesized list of other types, delimited by commas:
TupleType = "(" [TypeExpr {"," TypeExpr}] ")".
For example, a pair of a string and a number may be declared like this:
type NamedNumber = (string, float64)
The type expression consisting only of an empty pair of parentheses
()
is equivalent to the basic void
type.
A container is a structure bundling many elements of the same type(s). GUT supports linear and associative containers, both denoted using a similar syntax:
SeqType = "[" [LengthConst | TypeExpr] "]" TypeExpr
The type expression for a container starts with an opening bracket and ends with the type of contained values. If the opening bracket is immediately followed by a closing bracket, the container is a variable-size linear one. If the brackets contain a non-negative integer number, the container is a fixed-size linear one with the given number of elements. If the brackets contain a type, that is the key type for an associative container.
For example, an array of four integers may be declared like this:
type FourInts = [4]int32
A variable-size array of floating point numbers would be declared like this:
type SomeFloats = []float64
An associative array mapping strings to numbers could look this this:
type NamedNumbers = [string]bigint
And a set of two-integer pairs with no associated values would be written like this:
type UniquePoints = [(int32, int32)]void
Like a tuple, a record consists of a fixed set of members, possibly of different types. Unlike a tuple, the members of a record have names that can be used to address them.
A record is written as a set of members enclosed within braces. Each member is a pair of an identifier and a type, separated by a colon and terminated by a semicolon. The semicolons are not optional:
RecordType = "{" {Ident ":" TypeExpr ";"} "}".
For example, a record representing a place marker may look like this:
type Place = { Position : (float32, float32); Description : string; }
By default, the output generators will try to represent record-typed
members and elements in other types as references, while tuple-typed
members may be inlined. You can request that a records should be
structurally inlined in other types by adding the @struct
hint. For example, a record representing an integer pair of coordinates
could be declared like this:
type Point = @struct { X : int32; Y : int32; }
A union is a structure with a number of mutually exclusive alternative representations. In GUT, you declare a name for each alternative, optionally assign it an identifying integer βtagβ, and optionally describe the members stored with it.
You write a union type as the list of alternatives, each introduced
by a pipe symbol, followed by the symbolic identifier, then optionally
followed by an equals sign and a numeric identifier, and then optionally
followed by the keyword of
and an argument type:
UnionCase = "|" Ident ["=" IntConst] ["of" TypeExpr].
UnionType = UnionCase {UnionCase}.
A classic use case of a union type is an option that either holds a value or signals the absence of any useful value (without abusing some potentially legal value for that purpose). Such an option holding a number may be declared like this:
type UIntOption =
| Some of uint32
| None
In fact, this type is so useful that a shorthand exists and an equivalent type can also be written as follows:
type UIntOption = ?uint32
Another common special case for union types is the declaration of an enumerated type with no argument type associated with any of the cases. Typically, output generators can represent such a type particularly efficiently. An example may look like this:
type MyEnum = | A | B = 42 | C
If you do not specify an explicit numeric tag for each of the cases, the tag value increments by one from the previous case.
An enumeration may additionally be annotated with the
@flags
hint, which has two effects:
This example declaration represents style flags for some text formatting application:
type Styles = @flags | BOLD | ITALIC | UNDERLINE | STRIKE
For union types that do have arguments, note that you can also use tuples or records to associate multiple data members with a case, like this:
type MyAlternatives =
| Nothing
| Exact of bigint
| Inexact of { Value : float64; Tolerance : float64; }
| Error of (int32, string)
The representation of an opaque type depends entirely on the output generator. It may be a partially defined type whose implementation details are not provided by GUT, or some polymorphic object or pointer type.
The keyword opaque
denotes an opaque type in a type
declaration:
type Unspecified = opaque
A constant declaration is introduced by the const
keyword, followed by an identifier for the constant, an optional colon
and an explicit type, an equals sign and a value:
ConstDecl = "const" Ident [":" TypeExpr] "=" ConstExpr.
ConstExpr = "true" | "false" | IntConst | FloatConst | StringConst
| "import" StringConst
A subset of the GUT types can be represented as literals,
namely bool
, any integer type, any floating point type, and
string
.
Numbers can be written in binary, octal, decimal, or hexadecimal
notation, indicated by the 0b
, 0o
,
0d
, or 0x
prefixes respectively. The
0d
prefix is optional, as decimal numbers are the default.
An optional +
or -
sign comes before the base
prefix. Floating point numbers are written in the usual point notation
with an optional exponent suffix; the exponent is introduced by
e
or p
(case insensitive, and only
p
works for hexadecimal), followed by a signed decimal
integer.
For example, a numeric constant may look like this:
const TheAnswer : bigint = 0x2A
Strings are written in the usual double quotes notation with backslash escape sequences:
Escape | Translation |
---|---|
\t |
Horizontal tabulator character |
\b |
Backspace character |
\r |
Carriage return character |
\n |
Newline character |
\xNN |
Unicode codepoint number 0xNN |
\uNNNN |
Unicode codepoint number 0xNNNN |
\UNNNNNN |
Unicode codepoint number 0xNNNNNN |
\ + any character |
The character following the backslash copied verbatim |
In addition to literals, the contents of a file can be imported as a
container of type [N]uint8
, where N
is the
size of the file. The path after the import
keyword is
written as a string and interpreted relative to the input containing the
import (or relative to the current working directory if the input does
not come from a file).
If an explicit type is specified for the constant and the assigned value has a different type, a converion is attempted:
[N]uint8
and vice versa.Even when a constant is declared with a variable-size linear container type, the concrete size is always included in the final type.
Conceptually, all declarations in GUT are part of a module scope, either the implicit scope introduced by an input stream, or an explicitly declared scope.
An explicit module scope is created with the module
keyword followed by an identifier, and the module body enclosed in
braces. A module may also be imported from a source file, located by a
path relative to the input containing the import (or relative to the
current working directory if the input does not come from a file):
ModuleDecl = "module" Ident "{" ModuleBody "}"
| "module" Ident "=" "import" StringConst.
A declaration of two nested modules (not counting the implicit top-level scope) may look like this:
module Outer {
module Inner {
type MyInt = int32
}
const Stuff : Inner.MyInt = 42
}
Note how the type given to the constant Stuff
uses a
qualified identifier to refer to the type declared in the
Inner
scope. Identifiers are looked up relative to the
scopes in which they occur, starting from the innermost surrounding
scope, unless they start with a .
, in which case
the lookup starts at the outermost surrounding scope associated with the
current input.
The C++14 language mapping results in rather verbose, but hopefully easy to use code:
bool
also exists in C++.std::int*_t
types.bigint
can be configured to use Botan.BigInt or mpz_class, but by default only a decimal
string representation is stored.float32
becomes float
,
float64
becomes double
.string
can be represented as std::wstring
,
std::string
, const wchar_t *
, or
const char *
. When using the raw pointer types, some
features relying on automatic container memory management are not
available.void
if necessary,
are represented using std::tuple
.std::pair
.std::array
.std::vector
.std::set
or
std::map
.class
or struct
types. Record
type expressions in other places are not supported. Unless a
@struct
hint is present, records are wrapped in
std::shared_ptr
(or a similar template class) when included
in other types. If a @struct
hint is present, records are
structurally inlined in other types and use the struct
keyword rather than the class
keyword for their
declaration.enum struct
types. If a
@flags
hint is present, bitwise operators are generated for
the type. to_string
and to_<ident>
conversion functions between enumerations and strings are created.std::shared_ptr
to the value
type.class
or
union
in C++, which can hold a value of any of the leaf
classes. Unless a @struct
hint is present, unions are
wrapped in std::shared_ptr
(or a similar template class)
when included in other types. If a @struct
hint is present,
unions are structurally inlined in other types and use the
union
keyword rather than the class
keyword
for their declaration. Note that unions require a fairly large amount of
trivial generated code to implement safe construction, copying, and
destruction.struct
or class
types, depending on the presence of a
@struct
hint.const
variables.Comparison operators are generated for all types.
The C# language mapping is mostly straightforward:
bool
and string
also
exists in C#.sbyte
, byte
, short
,
ushort
, int
, uint
,
long
, or ulong
.bigint
is represented by
System.Numerics.BigInteger
.float32
becomes float
,
float64
becomes double
.void
if necessary, are represented
using System.Tuple
, or System.ValueTuple
if a
@struct
hint is present.System.Collections.Generic.List
.System.Collections.Generic.SortedSet
or
System.Collections.Generic.SortedDictionary
.record class
or record struct
types. Record type expressions in other places are not supported. If a
@struct
hint is present, records are structurally inlined
in other types and use the struct
keyword rather than the
class
keyword for their declaration.enum
types. If a
@flags
hint is present, a [System.Flags]
attribute is added to the type declaration, and an unsigned integer type
is used as the base type.System.Object
type, if enabled. Otherwise, opaque types
have to be defined externally.static readonly
variables.static class
declarations in
C#.The F# language mapping is very straightforward:
void
is
called unit
in F#.struct
tuples.ResizeArray
.Set
or
Map
.@struct
hint is present for a record type declaration, the
[<Struct>]
attribute will be added to the
corresponding declaration in F#.@flags
hint is present for a union type declaration, the
[<System.Flags>]
attribute will be added to the
corresponding declaration in F#.obj
type,
if enabled. Otherwise, opaque types have to be defined externally.The Kotlin language mapping is mostly straightforward:
Unit
.bool
is represented by
Boolean
.Byte
, UByte
, Short
,
UShort
, Int
, UInt
,
Long
, or ULong
.bigint
is represented by
java.math.BigInteger
.float32
becomes Float
,
float64
becomes Double
.string
is represented by String
.Pair
, other
tuples are only supported as the right-hand side of a type
declaration.List
.Set
or
Map
.data class
types. Record type
expressions in other places are not supported.enum
types. If a
@flags
hint is present, an unsigned integer type is used
for the tag values, and references to the type wrap
java.util.EnumSet
around the declared type.Any
type,
if enabled. Otherwise, opaque types have to be defined externally.val
declarations.object
declarations in
Kotlin.The Python language mapping is mostly straightforward:
bool
also exists in Python.void
is represented as None
.int
.float
.tuple
.list
.set
or
dict
.typing.NamedTuple
in a
record type declaration, or with dataclasses.dataclass
in a
union type declaration. Record type expressions in other places are not
supported.enum.Enum
or enum.Flag
, depending on whether a @flags
hint is present.typing.Optional
.typing.Any
created using typing.NewType
.