Petal compiler, specification, and tools

typesystem.md 11KB

Petal’s type system

A type system must be unsound (it accepts incorrect programs), incomplete (it rejects correct programs), or undecidable (some programs can’t be analyzed). Petal would like a sound type system and accepts being incomplete.

The type system operates at compile-time insofar as possible. If something could just as easily be a compile-time error as a runtime error, it should be a compile-time error.

Basic types

Use of uninitialized variables

Using uninitialized variables is not allowed.

Special types

Petal has a few special types that are a bit abstract:

  • never
  • unit
  • raw
  • any

never

never is a type that has no values. The main effect of this is that a function declared as returning never can’t return a value. It must instead throw an exception, halt the program, something like that. This is useful for panic-type functions that log an error and then abort the program, or functions that throw an exception in every code path, or possibly for functions that loop infinitely.

Other than that, you can use this like a normal type. You can’t ever initialize a variable with a value of type never, and you can’t use uninitialized variables, so this doesn’t blow up in your face. This property propagates to any type that has a field of type never. The following struct can’t be built, for instance:

struct HasNever
{
    never a
}

never is not a bottom type from type theory. It does not implicitly convert to anything, and nothing implicitly converts to it. It is an empty type, and it’s sometimes called the diverging type.

The never type has size 0.

unit

unit is a type that only has one value named unit().

Every function returns a value. A function that would be declared returning void in Java or C instead returns unit.

The unit type has size 0.

raw

raw refers to raw memory. It’s only allowed in unsafe code. Otherwise, it represents an opaque byte (uint8 that supports no operations aside from equality). raw& is a reference to bytes that could represent anything; raw[] is some memory that might contain anything.

any

any is anything. Any object, any struct, anything. It’s the root of the type hierarchy.

Builtin types

Numeric types

  • int
  • int8
  • uint8
  • int16
  • uint16
  • int32
  • uint32
  • int64
  • uint64
  • int128
  • uint128
  • float32
  • float64

The trailing number is the number of bits in the type. The prefix indicates the general type: signed integer intX, unsigned integer uintX, IEEE floating point floatX.

int is an alias for int64. While most numbers are small, a 32-bit integer is a little small for a number of real-world contexts like file length.

Strings and characters

Petal supports UTF-8 / UTF-16 / UTF-32 code units:

  • char8
  • char16
  • char32

These are low-level tools for low-level code. char8 is 8 bits; char16 is 16 bits; char32 is 32 bits.

On the higher-level side, it has:

  • string
  • char

string is a UTF-8 encoded string. char represents a grapheme cluster. A grapheme cluster might be a single code unit, like a (U+0061); it might be a single character with multiple code units, like (U+2603); or it might be a character with combining marks, like é̄ (U+0065 U+0302 U+0304).

Internally, string is stored as char8[]. To access this underlying array, use the representation property.

Boolean

Petal supports the boolean type, bool. It has two possible values: true and false. It is a purely boolean type, not supporting arithmetic.

Derived types

There are a number of derived types that can be constructed of other types:

  • tuples
  • aliases
  • functions
  • references
  • arrays
  • dicts

Aliases

An alias is just another name for a type.

In fact, an alias can be another name for anything nameable. The compiler will try to report that alias instead of the thing’s true name when you access it through an alias.

As an example:

alias wchar char16.

Tuples

A tuple is an ordered collection of values of varying types:

(int32, uint8, string) tuple = (7451, 12, "hello world").

Tuples are implicitly flattened:

alias Record (int32, string).
alias NamedRecord (string, record).
# Both of these options work:
NamedRecord r1 = ("id1", (102, "value1")).
NamedRecord r2 = ("id2", 51, "value2").

A single-item tuple is the same as its content:

alias JustAString (string).
JustAString r3 = "hello".
string s = r3.

Tuples can be indexed like arrays, but the index must be a compile-time constant:

NamedRecord r4 = ("id4", 12, "value3").
println r4:1.  # prints "12"
println r4:2.  # prints "value3"

Tuples can be implicitly expanded or created when passed to a function:

unit show(NamedRecord r)
{
    printfln "id: {} index: {} value: {}" r:0 r:1 r:2.
}
unit show2(string id, int32 index, string value)
{
    show(id, index, value).
}

Functions

A function type holds a reference to a function. It can also include a reference to a context for that function.

For instance:

int32 asciiVowels(char8[] s)
{
    int32 count = 0.
    foreach c; s:byCodeUnit
        if c in "aeiou"
            count++.
    return count.
}
func: int32 fn(string) = &asciiVowels

Every function returns a value. A function’s arguments are effectively a tuple.

Arrays

An array is an ordered, indexable collection of items of a given type. It offers O(1) indexing, O(n) search, amortized O(log n) append, and O(n) splicing and concatenation. This is also a slice type; two arrays may refer to the same or overlapping data.

Arrays use 0-based indexing: the first item is list[0], the second is list[1], etc.

int32[] list!
foreach v; 1..10
    list ~= v * v.
frontHalf = list[0:to 5].
assert list[4] == 25.

Arrays can also be iterated, with or without an index:

int32[] list = [1, 4, 9, 16].
foreach i, v; list
    printfln "list[{}] = {}" i v.

This prints:

list[0] = 1
list[1] = 4
list[2] = 9
list[3] = 16

Implementation note: arrays are implemented as:

struct Array
{
    unsafe mut T* data.
    mut int64 length.
    TypeInfo type.
}

The type field may be omitted or dynamically added as appropriate.

Multidimensional arrays

A multidimensional array is an array with multiple dimensions. Petal’s multidimensional arrays are row-major; the first index is the row, the second is the column.

A two-dimensional array is sometimes called a rectangular array. It can be used to implement a matrix. It looks like:

int32[,] rect = [
    [11, 12, 13, 14],
    [21, 22, 23, 24],
    [31, 32, 33, 34],
].
# It's a two-dimensional array
assert rect.lengths.length == 2.
# The first length is 3, because the array is 3 high.
assert rect.lengths[0] == 3.
# The second length is 4, because the array is 4 wide.
assert rect.lengths[1] == 4.

int32 sum!
foreach y, x, val; rect
    sum += val.
assert sum == 270.

This works identically for more than two dimensions. The compiler is guaranteed to support up to five dimensions.

References

A reference is an alias to an existing value.

References are a boring example of the name-value distinction more interestingly demonstrated by Dr Charles Dodgson with a song. The song is A-sitting on a Gate, but it’s called Ways and Means. The song’s name is The Aged Aged Man, while the name is called Haddock’s Eyes.

Let’s look at this example similarly:

int32 x = 12.
int32& y = &x.
println y.  # 0x7FFDA7EAA968
println $y. # 12
$y = 15.
println x.  # 15
x = 18.
println $y. # 18

The reference type that refers to an int32 is named int32&. To refer to the value that reference y points to, you use $y. (This is instead of *, which is used in a number of languages, to reduce ambiguity. It’s obvious many readers that println *y should print the value that y points to, but it’s a bit harder for the compiler to figure that out.)

The value is 12. The value is called x. The address of the value is 0x7FFDA7EAA968. And that address is also called y.

Aggregate types

An aggregate type is a type with fields and methods. A field is just a variable that values of that type contain. A method is a function that implicitly takes the aggregate as its first argument.

Structs

A struct is a data type passed by value. It’s a series of fields that can be accessed together, and it acts as a namespace for functions that deal whith those fields.

A struct doesn’t need to define any fields; it’s valid to have a struct with no fields. This isn’t usually useful, though.

struct Password
{
    uint8[] salt.
    uint8[] hashed.
    this()
    {
        salt = randomSalt.
    }
    unit set(string v) = hashed = digest v salt.
    bool verify(string v) = hashed == digest v salt.
}
Password password!
password:set "its-a-secret".
println (toHex password:hash).

A password is little more than its fields bundled together.

Classes

A class is much like a struct with more power. However, it’s a little less efficient.

First, the inefficiency: structs are allocated inline in their context, while each class instance is a separate heap allocation. Structs are accessed like other variables in the same context, but class instances are always accessed with references.

(The compiler may, as an optimization, allocate some class instances inline if it detects it’s safe to do so. But this isn’t reliable.)

Classes can participate in inheritance. They can inherit one other class; if not otherwise specified, they inherit Object. They can inherit any number of interfaces.

class Person
{
    string name.
    this(this.name).
    virtual unit greet()
    {
        printfln "Hello {}!" name.
    }
}
class Employee from Person
{
    string id.
    override unit greet()
    {
        base:greet.
        printfln "Please remember to comply with all corporate policies, {}." id.
    }
    this(this.name, this.id).
}
let employee = Employee "Anne" "TK-421".
employee:greet

Classes can be abstract. An abstract class must be marked abstract. An abstract class can’t be instantiated directly but can be inherited from, and it can contain abstract methods. A non-abstract class that inherits from an abstract class must override all abstract methods from the base class.

An abstract method may have a body but doesn’t require one.

Interfaces

An interface is like an abstract class, but it cannot have any fields and all their functions are implicitly abstract. It can only define function signatures. They engage in inheritance, but an interface can only inherit from another interface.

Concepts

A concept is a bit like an interface. It defines a series of operations that a type must support.

An aggregate may declare that it adheres to a concept.

Operator overloading

User-defined types can overload operators. Please do not abuse this feature.

  • indexing: index function for retrieving a value, setIndex for changing a value, range for slicing, end for $.
  • math: add, subtract, multiply, divide, modulus, exponent
  • bitwise: bitAnd, bitOr, bitXor
  • bitwise shifts: rightShift, leftShift, rightRotate, leftRotate
  • iteration: range with no arguments to return a range.

Mutability

By default, local variables are mutable, and everything else is immutable.

Algebraic data types

Petal supports type algebra.

# s is either an int or a string
let int | string s = "";
# t is an int and a string: a tuple.
let int & string t = 0 & "";