Browse Source

Working on docs and grammar for the new syntax

dhasenan 2 years ago
parent
commit
311abe242a

+ 2
- 0
.gitignore View File

@@ -0,0 +1,2 @@
1
+/.metadata/
2
+*.swp

+ 14
- 0
compiler/.gitignore View File

@@ -0,0 +1,14 @@
1
+.dub
2
+docs.json
3
+__dummy.html
4
+docs/
5
+petalc.so
6
+petalc.dylib
7
+petalc.dll
8
+petalc.a
9
+petalc.lib
10
+petalc-test-*
11
+*.exe
12
+*.o
13
+*.obj
14
+*.lst

+ 8
- 0
compiler/README.md View File

@@ -0,0 +1,8 @@
1
+# petalc
2
+This is a hand-rolled compiler for Petal.
3
+
4
+## Why hand-rolled?
5
+The three options I looked into were Antlr4, Pegged, and flex/bison.
6
+
7
+I couldn't get an IDE to work with a simple gradle project using Antlr4. Pegged uses tons of memory.
8
+Flex/bison target C, and that's a nonstarter -- C is an ugly language to use.

+ 15
- 0
compiler/source/petal/parse/lexer.d View File

@@ -99,11 +99,21 @@ private:
99 99
         {
100 100
             return identOrKeyword;
101 101
         }
102
+        if (remaining[0].isDigit)
103
+        {
104
+            return number;
105
+        }
102 106
         if (remaining[0] == '\'') return charLiteral;
103 107
         if (remaining[0] == '"' || remaining[0] == '`') return stringLiteral;
104 108
         return maybeOperator;
105 109
     }
106 110
 
111
+    Token number()
112
+    {
113
+        // We always try to parse a hex number.
114
+        // Then, we validate none of the digits went beyond the allowed range.
115
+    }
116
+
107 117
     Token maybeOperator()
108 118
     {
109 119
         return Token(TokenType.invalid);
@@ -311,6 +321,11 @@ private immutable(Operator[]) buildOperators()
311 321
     return tmp.idup;
312 322
 }
313 323
 
324
+bool isDigit(char c)
325
+{
326
+    return '0' <= c && c <= '9';
327
+}
328
+
314 329
 string buildTokenTypeEnum()
315 330
 {
316 331
     // The tokenizer skips whitespace automatically.

+ 274
- 0
literate_docs/grammar.g4 View File

@@ -0,0 +1,274 @@
1
+/*
2
+# Grammar
3
+
4
+## TODOs
5
+* attribute blocks
6
+
7
+## Basic concepts
8
+### Scope
9
+Petal uses lexical scoping. Functions, types, and flow control statements introduce new scopes.
10
+
11
+## Source files
12
+*/
13
+source_file: namespace_declaration using_or_decl* EOF;
14
+namespace_declaration: Namespace Identifiers EOS;
15
+using_or_decl: using | declaration;
16
+using
17
+    : Using Identifiers (As Identifier)? EOS
18
+    | Using Identifiers Arrow Identifier+ EOS
19
+    ;
20
+
21
+/*
22
+A source file starts with a namespace declaration and continues with using directives and
23
+declarations.
24
+
25
+A using directive comes in three forms:
26
+* `using flower:pistil.` imports all symbols found in namespace `flower:pistil` into the current
27
+  scope.
28
+* `using flower:pistil as p.` creates a symbol `p` in the current scope. This symbol contains all
29
+  symbols found in `flower:pistil`.
30
+* `using flower:pistil -> style stigma.` imports the symbols `style` and `stigma` into the current
31
+  scope.
32
+
33
+## Attributes
34
+An attribute is a piece of metadata attached to a declaration.
35
+
36
+There are a few core attributes, and those are referred to simply by their name:
37
+
38
+* `abstract`: on classes only. This indicates that the class cannot be instantiated and may have
39
+  bodiless functions.
40
+* `static`: on declarations inside a type. Normally, declarations inside a type belong to instances
41
+  of that type. This indicates that the declaration belongs to the type as a whole.
42
+* `virtual`: on non-static functions inside a class. Indicates that the function can be overridden.
43
+* `override`: on non-static functions inside a class. Indicates that the function overrides a
44
+  matching function in the base type.
45
+* `public`: alias for `@[core:attr:visible]`
46
+* `private`: resets visibility
47
+* `protected`: alias for `@[core:attr:visible this:typeof, includeChildren=true] private`
48
+
49
+User-defined attributes are referred to as `@[attr args]`. For instance, to indicate that a field
50
+should be serialized to JSON with name `tulips`, you might write:
51
+
52
+```petal
53
+class Farm
54
+{
55
+    @[json name="tulips"]
56
+    let int tulipCount!
57
+}
58
+```
59
+
60
+An attribute can be any compile-time constant. By convention, attribute types use snake_case names.
61
+
62
+### Attribute blocks
63
+
64
+
65
+### Combining attributes
66
+
67
+### Visibility attributes
68
+
69
+The core attributes `public`, `private`, and `protected` are shortcuts to Petal's visibility system.
70
+
71
+The true attribute for visibility is defined as:
72
+
73
+@[attr_combination overwrite=true, allowSameDecl=true]
74
+struct visible
75
+{
76
+    this Symbol to..., bool includeChildren = false.
77
+}
78
+
79
+
80
+*/
81
+attribute_block: attribute+ Obrace declaration* Cbrace;
82
+attribute: uda | core_attr;
83
+uda: Attr Osquare call_expr Csquare;
84
+core_attr: Abstract | Override | Static | Virtual | Public | Private | Protected;
85
+
86
+/*
87
+## Declarations
88
+*/
89
+declaration
90
+    : attribute_block
91
+    | attribute+ declaration
92
+    | alias_decl
93
+    | variable_decl
94
+    | func_decl
95
+    | type_decl
96
+    ;
97
+
98
+/*
99
+### Aliases
100
+An alias creates a new name for a type.
101
+*/
102
+alias_decl: Using Identifier Assign type EOS;
103
+
104
+/*
105
+### Variables
106
+*/
107
+variable_decl
108
+    : Let type Identifier (Assign expression)? EOS
109
+    | Let type Identifier Bang
110
+    | Let Identifier Assign expression EOS
111
+    ;
112
+/*
113
+A variable declaration introduces a new variable into the current scope.
114
+
115
+Examples:
116
+
117
+```petal
118
+# Type inferred as boolean
119
+let retry = false.
120
+# Type explicitly given as string, initialized to the default value.
121
+let string name!
122
+# Variable is uint8 and uninitialized.
123
+let uint8 permissions.
124
+# Variable is explicitly typed and initialized.
125
+let float32 f = 12.
126
+```
127
+
128
+A variable cannot be used before it is initialized. It may be initialized by using "!" instead of
129
+"." to terminate the declaration (known as "let-bang") or by assigning to it, either within the
130
+declaration itself or with a separate statement.
131
+
132
+A let-bang initializes the variable in the default manner. This manner is specific to the type:
133
+* If the type is a numeric type, it is initialized to `0`.
134
+* If the type is boolean, it is initialized to `false`.
135
+* If the type is an array type, it is initialized to the empty array.
136
+* If the type is `string`, it is initialized to the empty string.
137
+* If the type is an enum of a numeric type and one of its values is equal to `0`, it is initialized
138
+  to that value.
139
+* If the type has a default constructor, that constructor is called. A class or struct that defines
140
+  no constructors explicitly has a default constructor.
141
+* If the type has an explicitly defined constructor that can be called with zero arguments, that
142
+  constructor is called.
143
+* Otherwise, it is a compile-time error.
144
+
145
+Notably, interface variables cannot be initialized with a let-bang.
146
+
147
+### Functions
148
+*/
149
+func_decl: func_proto | func_definition;
150
+func_proto: Identifier arg_list (Arrow type)? EOS;
151
+func_definition: Identifier arg_list (Arrow type)? block;
152
+arg_list
153
+    : // empty
154
+    | arg (Comma arg)*
155
+    ;
156
+arg: attribute* type Identifier (Assign expression)?;
157
+statement_or_decl: statement | declaration;
158
+
159
+/*
160
+In Petal, every function accepts a tuple of values and yields one value.
161
+
162
+A function prototype (`FuncProto`) is the type information and name of a function.
163
+
164
+### Types
165
+*/
166
+type_decl
167
+    : struct_decl
168
+    | interface_decl
169
+    | class_decl
170
+    | enum_decl
171
+    ;
172
+class_decl: Class Identifier inheritance? Obrace declaration* Cbrace;
173
+enum_decl: Enum Identifier (Base type)? Obrace enum_member* Cbrace;
174
+enum_member: Identifier (Assign expression)? Comma;
175
+interface_decl
176
+    : Interface Identifier inheritance? Obrace (attribute* (func_proto|type_decl))* Cbrace;
177
+inheritance: Base type (Comma type)*;
178
+struct_decl: Struct Identifier Obrace declaration* Cbrace;
179
+/*
180
+
181
+A type can be a class, enum, interface, or struct.
182
+
183
+A class is a reference type. It may inherit from one base class. If no base class is specified, its
184
+base class is `Core.Object`. It may inherit from multiple interfaces.
185
+
186
+An enum is a collection of constants of the same type. The default base type for an enum is
187
+`uint16`; an enum may have no more than 65535 members. However, you may override the base type with
188
+any compile-time-constructible type.
189
+
190
+An interface is a reference type. It cannot be instantiated and can only contain function
191
+prototypes, no implementations and no fields. Interfaces are a way to opt into multiple hierarchies.
192
+
193
+Structs are value types. They do not participate in virtual inheritance and cannot have virtual
194
+methods.
195
+
196
+### Statements
197
+*/
198
+statement
199
+    : assign
200
+    | block
201
+    | breakstmt
202
+    | continuestmt
203
+    | dowhile
204
+    | expression EOS
205
+    | foreach
206
+    | ifstmt
207
+    | label
208
+    | match
209
+    | throwstmt
210
+    | trycatch
211
+    | whilestmt
212
+    ;
213
+
214
+assign: expression Assign expression EOS;
215
+block: Obrace statement_or_decl* Cbrace;
216
+breakstmt: Break Identifier? EOS;
217
+continuestmt: Continue Identifier? EOS;
218
+dowhile: Do block While expression (EOS | (Else block));
219
+foreach: Foreach type? Identifier Arrow expression block (Else block)?;
220
+ifstmt: If expression block (Else block)?;
221
+label: Label Identifier;
222
+match: Match expression Obrace match_case* Cbrace;
223
+match_case: match_criteria (Comma match_criteria)* block;
224
+match_criteria
225
+    : Case expression
226
+    | Case Let type Identifier
227
+    | Else
228
+    ;
229
+throwstmt: Throw expression EOS;
230
+trycatch: Try block catchblock* finallyblock*;
231
+catchblock: Catch type Identifier block;
232
+finallyblock: Finally block;
233
+whilestmt: While expression block (Else block)?;
234
+/*
235
+
236
+Most statements deal with flow control. Flow control deals with truthiness; see TODO LINKME.
237
+
238
+An `if` statement evaluates its expression. If the expression is truthy, it evaluates the first
239
+BlockStatement; otherwise, it evaluates the second.
240
+
241
+```petal
242
+if true { print "good" } else { print "bad" }
243
+```
244
+
245
+A `while` statement evaluates its expression. If the expression is truthy, it evaluates its first
246
+block and repeats this process. Otherwise, it evaluates its second block and does not repeat.
247
+
248
+```petal
249
+let i = 0.
250
+while i < 10
251
+{
252
+    i++
253
+}
254
+```
255
+
256
+
257
+### Expressions
258
+*/
259
+expression
260
+    : call_expr
261
+    | atomic_expr
262
+    ;
263
+atomic_expr
264
+    : NumberLiteral
265
+    | String
266
+    | Oparen expression Oparen
267
+    | Identifiers
268
+    ;
269
+call_expr: atomic_expr parameters;
270
+parameters: param+ | Oparen Cparen;
271
+param: expression | Identifier Assign expression;
272
+
273
+
274
+type: Identifiers;

+ 2
- 0
literate_docs/head.g4 View File

@@ -0,0 +1,2 @@
1
+grammar petal;
2
+

+ 213
- 0
literate_docs/lexical.g4 View File

@@ -0,0 +1,213 @@
1
+/*
2
+# Petal specification: lexical
3
+A Petal program is a series of source files. A source file is a UTF-8 encoded text document.
4
+Typically, a source file will be a file in a file system, but that is not required; a conforming
5
+implementation might retrieve a file as an entry in a zip document or load it over a network or use
6
+some other means to obtain it.
7
+
8
+The lexical grammar is concerned with the atoms with which one builds a valid source file.
9
+
10
+The version of the Unicode standard used in this grammar is 11.0.
11
+
12
+## End of file
13
+The token `eof` represents the end of a source file.
14
+
15
+## Whitespace and comments
16
+Whitespace and comments serve to separate tokens. (Some tokens have natural separation from others;
17
+in other cases, whitespace or a comment is required.)
18
+
19
+A comment starts with the `#` character and continues to the end of the current line.
20
+
21
+The following characters are whitespace: `U+0009` (tab), `U+0010` (newline), `U+000D` (carriage
22
+return), and `U+0020` (space).
23
+
24
+A documentation comment is a specially formatted series of comments on successive lines. Each line
25
+must start with '##'.
26
+
27
+```petal
28
+# This is a normal comment.
29
+## This starts a documentation comment.
30
+    ## This continues the documentation comment.
31
+## This is its third line.
32
+```
33
+
34
+The content of a documentation comment is not defined, but Doxygen syntax is recommended.
35
+
36
+Whitespace and comments are generally omitted from grammar rules.
37
+*/
38
+
39
+//channels { DOC_COMMENT }
40
+
41
+Doccomment: '##' .*? '\n' -> channel(HIDDEN);
42
+Linecomment: '#' [^#] .*? '\n' -> channel(HIDDEN);
43
+
44
+/*
45
+
46
+## Identifiers
47
+Identifiers are user-defined names of the form:
48
+*/
49
+Identifiers: Identifier (':' Identifier)*;
50
+Identifier: IdentifierStart IdentifierChar*;
51
+fragment IdentifierStart: Letter | Mark | '_';
52
+fragment IdentifierChar: IdentifierStart | Digit;
53
+fragment Letter: UnicodeClassL;
54
+fragment Mark: UnicodeClassM;
55
+fragment Digit: UnicodeClassN;
56
+
57
+/*
58
+The identifier `petal` is reserved and may only be defined in the language runtime. Keywords are
59
+reserved.
60
+
61
+## Keywords
62
+The following keywords are reserved:
63
+*/
64
+
65
+Abstract: 'abstract';
66
+As: 'as';
67
+Base: 'base';
68
+Break: 'break';
69
+Case: 'case';
70
+Cast: 'cast';
71
+Catch: 'catch';
72
+Class: 'class';
73
+Const: 'const';
74
+Continue: 'continue';
75
+Do: 'do';
76
+Else: 'else';
77
+Enum: 'enum';
78
+Extern: 'extern';
79
+False: 'false';
80
+Finally: 'finally';
81
+For: 'for';
82
+Foreach: 'foreach';
83
+Func: 'func';
84
+Goto: 'goto';
85
+If: 'if';
86
+Interface: 'interface';
87
+Is: 'is';
88
+Label: 'label';
89
+Let: 'let';
90
+Match: 'match';
91
+Mut: 'mut';
92
+Namespace: 'namespace';
93
+Null: 'null';
94
+Out: 'out';
95
+Override: 'override';
96
+Private: 'private';
97
+Protected: 'protected';
98
+Public: 'public';
99
+Ref: 'ref';
100
+Return: 'return';
101
+Static: 'static';
102
+Struct: 'struct';
103
+Switch: 'switch';
104
+This: 'this';
105
+Throw: 'throw';
106
+True: 'true';
107
+Try: 'try';
108
+Typeof: 'typeof';
109
+Using: 'using';
110
+Version: 'version';
111
+Virtual: 'virtual';
112
+While: 'while';
113
+
114
+/*
115
+## Punctuators, operators, etc
116
+The punctuation-style tokens that we accept:
117
+*/
118
+Osquare: '[';
119
+Csquare: ']';
120
+Oparen: '(';
121
+Cparen: ')';
122
+Obrace: '{';
123
+Cbrace: '}';
124
+Comma: ',';
125
+Bang: '!';
126
+Mod: '%';
127
+ModAssign: '%=';
128
+BitAnd: '&';
129
+BitAndAssign: '&=';
130
+LogicAnd: '&&';
131
+Times: '*';
132
+TimesAssign: '*=';
133
+Plus: '+';
134
+PlusAssign: '+=';
135
+Minus: '-';
136
+MinusAssign: '-=';
137
+Divide: '/';
138
+DivideAssign: '/=';
139
+Assign: '=';
140
+Eq: '==';
141
+Attr: '@';
142
+Xor: '^';
143
+XorAssign: '^=';
144
+BitOr: '|';
145
+BitOrAssign: '|=';
146
+LogicOr: '||';
147
+LogicOrAssign: '||=';
148
+BitNot: '~';
149
+BitNotAssign: '~=';
150
+Lambda: '=>';
151
+Arrow: '->';
152
+Child: ':';
153
+Dollar: '$';
154
+EOS: '.';
155
+
156
+/*
157
+
158
+## Literals
159
+### Numeric literals
160
+*/
161
+
162
+NumberLiteral: NumberLiteralDigits NumberTypeSuffix?;
163
+fragment NumberTypeSuffix: 'U' | 'L' | 'UL' | 'f';
164
+fragment NumberLiteralDigits: BinLiteral | OctLiteral | DecLiteral | HexLiteral;
165
+fragment BinLiteral: '0b' BinaryDigit (BinaryDigit|'_')* ('.' BinaryDigit (BinaryDigit|'_')*)?;
166
+fragment BinaryDigit: '0' | '1';
167
+fragment OctLiteral: '0o' OctDigit (OctDigit|'_')* ('.' OctDigit (OctDigit|'_')*);
168
+fragment OctDigit: [0-7];
169
+fragment DecLiteral: DecDigit (DecDigit|'_')* ('.' DecDigit (DecDigit|'_')*);
170
+fragment DecDigit: [0-9];
171
+fragment HexLiteral: '0x' HexDigit (HexDigit|'_')* ('.' HexDigit (HexDigit|'_')*);
172
+fragment HexDigit: [0-9a-fA-F];
173
+
174
+/*
175
+
176
+Petal supports literals in base 2, 8, 10, and 16. Integer examples:
177
+
178
+```Petal
179
+let binary = 0b1111_0000.
180
+let octal = 0o744.
181
+let decimal = 90210.
182
+let hex = 0xF00D.
183
+```
184
+
185
+Underscores are meaningless and ignored during parsing.
186
+
187
+### String literals
188
+*/
189
+String: TripleString | SingleString;
190
+TripleString: '"""' .* '"""';
191
+SingleString: '"' (NormalChar | EscapeChar)* '"';
192
+fragment NormalChar: ~('\\' | '"');
193
+fragment EscapeChar
194
+    : '\\u' HexDigit HexDigit HexDigit HexDigit
195
+    | '\\0'
196
+    | '\\\\'
197
+    | '\\"'
198
+    | '\\r'
199
+    | '\\t'
200
+    | '\\n'
201
+    | '\\$'
202
+    ;
203
+
204
+/*
205
+
206
+String literals work much like many other languages. Similar to C#, `singleString` supports string
207
+interpolation, which inserts code into a string.
208
+
209
+*/
210
+
211
+fragment UnicodeClassL: [a-zA-Z];
212
+fragment UnicodeClassM: [_];
213
+fragment UnicodeClassN: [0-9];

+ 110
- 0
openquestions.md View File

@@ -0,0 +1,110 @@
1
+# Open questions
2
+* How can we manage a shared system that allows thread-local GC, or is that even a goal?
3
+* How do we want to manage foreign function interfaces?
4
+
5
+## Should we include builtin rational numbers?
6
+Rational numbers are really nice for some things. If you're using rational numbers for time, for
7
+instance, you can represent times from 1/2^63 seconds to 2^63 seconds exactly. You can represent a
8
+third of a second. Or a fifth.
9
+
10
+Unfortunately, you have two dimensions of overflow, not just one.
11
+
12
+# Answered questions
13
+## Do we need a character type?
14
+No. A character type is misleading. It makes you think that 'é' should be an 8-bit value. Or if the
15
+only character type is 32-bit, it makes you think that 'é' should be one character, just like 'é'.
16
+It makes you think you should be able to get the character at position *n* in constant time, and
17
+that's just not possible.
18
+
19
+`string` instead has explicit functions for each, but tends to work on a code unit basis:
20
+
21
+* `string.bytes` is a `uint8[]` slice of the same memory.
22
+* `string[start -> end]` is a shorthand for `string(string.bytes[start -> end])`.
23
+* `string.byCluster` iterates through character clusters with type `string`. This is the default
24
+  iteration method.
25
+* `string.byCodepoint` iterates through codepoints with type `uint32`.
26
+* `string.byCodeUnit` iterates through code units with type `uint8`.
27
+
28
+## Will we want multiple builtin string types?
29
+No. The runtime will pretend that the entire world is on UTF-8. The standard library will include an
30
+encoding section. This will support other encodings.
31
+
32
+## Will we have a separation between runtime and standard library, like D?
33
+No. We want to share code (such as Unicode data) between the standard library and the runtime.
34
+
35
+## Do we want unions?
36
+No. Unions are an advanced, unsafe feature. They let you do some useful
37
+things, but on the whole, we want to leave unions for a compiler-provided `any` type.
38
+
39
+## Do we need null?
40
+No. We will use a library-provided type. Something like:
41
+
42
+```petal
43
+struct Maybe(T)
44
+{
45
+  null { value = Unit(). }
46
+  set T v { value = v. }
47
+  get -> T { assert value !is Unit. return value as T. }
48
+  private: let any value = Unit().
49
+}
50
+```
51
+
52
+I suspect that, with a little practice, the need to use `Maybe` will be minimal in many
53
+applications.
54
+
55
+## Do we want builtin typedefs?
56
+We will have aliases. What about a typedef? A typedef is an exact duplicate of a type. It's a type
57
+that differs from its input type only in name. They're typically mutually convertible, but only with
58
+an explicit cast.
59
+
60
+These are only rarely useful. D's std.typecons.Typedef handles this by making the typedef'd type
61
+entirely opaque; a `Typedef!int` can't be incremented, a `Typedef!MyType` has none of the members of
62
+`MyType` exposed; etc. This sucks.
63
+
64
+A more thorough treatment would be to treat it as if you'd copy/pasted the source code and swapped
65
+the names around.
66
+
67
+```petal
68
+class Foo base Stream
69
+{
70
+    int i.
71
+    this this:i.
72
+    next -> Foo { return Foo(i + 1). }
73
+    same Foo f -> bool { return this is f. }
74
+}
75
+typedef Bar = Foo.
76
+# expands to:
77
+class Bar base Stream
78
+{
79
+    int i.
80
+    this this:i.
81
+    next -> Bar { return Bar(i + 1). }
82
+    same Bar f -> bool { return this is f. }
83
+}
84
+```
85
+
86
+With sufficient metaprogramming, we could make this happen:
87
+
88
+```petal
89
+# Alternate expansion
90
+struct Bar
91
+{
92
+    private let Foo _self.
93
+    private this this:self.
94
+    this int i { _self = Foo(i). }
95
+    next -> Bar { return Bar(_self:next). }
96
+    same Bar f -> bool { return _self:same(f:_self). }
97
+}
98
+```
99
+
100
+I'm not that sold on typedefs, so we're going with "no" for now.
101
+
102
+## Do we need interfaces? Multiple interfaces per type?
103
+Yes.
104
+
105
+Let's say we want some objects to opt into advanced formatting. Like I have a `Currency` type that
106
+should be formattable with similar options to floating point numbers. It needs to inherit from
107
+`Formattable`. But that shouldn't affect its base class options. Or `Serializable` likewise.
108
+
109
+And it's not hard to imagine something that's both formattable and serializable.
110
+