Browse Source

Some random thoughts

dhasenan 2 years ago
commit
61b2a04a40
7 changed files with 636 additions and 0 deletions
  1. 7
    0
      any.md
  2. 84
    0
      designing.md
  3. 41
    0
      ecosystem.md
  4. 2
    0
      organization.md
  5. 105
    0
      petal.md
  6. 4
    0
      ponderings.md
  7. 393
    0
      typesystem.md

+ 7
- 0
any.md View File

@@ -0,0 +1,7 @@
1
+How do we implement the `any` type?
2
+
3
+I can trivially write `any` as a sort of Variant, a tuple of a runtime type info
4
+object and a `raw*`. But that doesn't mesh very well with `int[] ->
5
+const(any)[]` casts. We would have to make `any[]` into its own special type.
6
+
7
+What about `any[][]`?

+ 84
- 0
designing.md View File

@@ -0,0 +1,84 @@
1
+How do you design a programming language?
2
+
3
+## Problem domains
4
+
5
+## Principles and theoretical backing
6
+
7
+## Programming paradigms
8
+
9
+## Inspiration languages
10
+
11
+## Look and feel
12
+* Curly brace: Java, C, Javascript
13
+    * Blocks are bookended by `{}`
14
+    * Semicolons as statement enders / delimiters common
15
+    * Also tends to be high on parentheses
16
+* Begin/end: Ruby and Lua
17
+    * Blocks start implicitly or with `begin`, end with `end`
18
+    * Statement delimiters rare
19
+* Indentation: Python, CoffeeScript
20
+    * Needs consistent indentation
21
+    * Sometimes starts blocks with `:`
22
+* Single expression: Elm, Haskell
23
+    * Usually function-oriented
24
+    * Preamble of helper declarations followed by one expression as the return
25
+      value
26
+    * Pattern matching as major flow control mechanism
27
+    * Mainly only functional languages
28
+* Parentheses: Lisp
29
+    * Everything uses parentheses.
30
+
31
+### Delimiters
32
+Delimiters make things easier on you.
33
+
34
+Let's say you're trying to make a low-delimiter language. You don't want parentheses for function
35
+calls. You don't want statement delimiters. You don't want commas separating function parameters.
36
+
37
+This means you write code like:
38
+
39
+    type1 var1
40
+    func1 var1
41
+    type2 var2
42
+
43
+The names make it clear what's happening. But without those names, you just have:
44
+
45
+    a b
46
+    c b
47
+    d e
48
+
49
+How do you parse that? Is it supposed to be one function call with five arguments? Three
50
+declarations? A mix of calls and declarations?
51
+
52
+If you add a statement delimiter, you might encounter:
53
+
54
+    a b;
55
+    c b
56
+    d e;
57
+
58
+So the first is either a function call or a declaration, and the second is a function call with
59
+three arguments.
60
+
61
+Or let's say you're using Dart-style lists. What does this mean?
62
+
63
+    print myList [12]
64
+
65
+Is it printing `myList` and then `[12]`? Or is it printing element 12 of `myList`?
66
+
67
+Adding a required delimiter disambiguates:
68
+
69
+    # prints two lists
70
+    print myList; [12]
71
+    # just prints one element of myList
72
+    print myList [12]
73
+
74
+Otherwise, you have to be very careful to make your language unambiguous. For instance, you might
75
+use a different syntax for indexing:
76
+
77
+    var myList = [1, 2, 3]
78
+    # Unambiguous indexing
79
+    print myList@1
80
+    # Weird spacing, but two lists
81
+    print myList[1]
82
+
83
+## Personal value features
84
+What bothers you about existing languages? Fix that.

+ 41
- 0
ecosystem.md View File

@@ -0,0 +1,41 @@
1
+# Petal ecosystem
2
+## Standard library
3
+This packages together the runtime and the core libraries. I want to be able to do the bulk of what
4
+I currently tend to do in D. That means:
5
+* collection types
6
+* collection-oriented things (map/reduce etc) (and string equivalents)
7
+* database interface
8
+* dates and times
9
+* files
10
+* filesystem
11
+* http client
12
+* json
13
+* logging
14
+* process handling
15
+* random numbers
16
+* regular expressions (just bind libpcre?)
17
+* sockets
18
+* string encoding
19
+* string formatting
20
+* threading
21
+* uuids
22
+* xml
23
+
24
+## Package management
25
+Use maven as the backend.
26
+
27
+You depend on specific versions of packages. You can override your dependencies' dependencies.
28
+
29
+So I have libawesome-2.0.5 that depends on libx11-7.8.14. You have libbetter-5.0.1 that depends on
30
+libawesome-2.0.5 and libx11-7.9.12. I have an app that depends on libbetter-5.0.1. It gets
31
+libx11-7.9.12.
32
+
33
+## Build system
34
+I want to integrate with something standard and generic. Maybe bazel?
35
+
36
+## Editor support
37
+Start with a vim syntax file. Go for a language server as soon as reasonably practical. I think LSP
38
+supports syntax highlighting.
39
+
40
+## Binding generator
41
+We want a binding generator for native libraries. At the very least, support GIR.

+ 2
- 0
organization.md View File

@@ -0,0 +1,2 @@
1
+# Source code organization in Petal
2
+

+ 105
- 0
petal.md View File

@@ -0,0 +1,105 @@
1
+# Petal
2
+
3
+## Problem domains
4
+I want a language that's good for application development. GUI and webserver
5
+style. I want it to be good for writing efficient code easily. I want it to have
6
+a garbage collector, and I have absolutely no interest in disabling the garbage
7
+collector.
8
+
9
+On the other hand, we do have to interface with native code, and part of that is
10
+making manual allocations work.
11
+
12
+## Principles and theoretical backing
13
+I don't think Petal will be a particularly theoretically bound language. It's
14
+not an ideas language.
15
+
16
+## Programming paradigms
17
+Procedural, structured, OOP, DbI, moderate functional, contracts. More
18
+functional than D, far less than a typical ML.
19
+
20
+## Inspiration languages
21
+Obviously using D as an inspiration for DbI. C# is one of the main inspirations
22
+for OOP / structured / procedural code. I want to use Smalltalk-style message-passing, kind of, but
23
+it's probably more natural to use more traditional function calls.
24
+
25
+I want to see how far I can push purity without sacrificing performance.
26
+
27
+## Look and feel
28
+I'm going to start out with D-style syntax. It's worked out a number of kinks. Later I'll look at
29
+syntax again and see if I can make something I like more. Mainly I want syntax that I know basically
30
+works.
31
+
32
+I *want* something like:
33
+
34
+```
35
+# comment
36
+let int a!           # auto-initializes a
37
+writeln a.
38
+let b = a:to float.  # cast
39
+let Object c!        # calls default ctor
40
+
41
+class MyClass
42
+{
43
+    @Annotation
44
+    unit foo
45
+    {
46
+        writeln "this is the foo method".
47
+    }
48
+
49
+    int sum(int a, int b)
50
+    {
51
+        return a + b.
52
+    }
53
+}
54
+```
55
+
56
+## Personal value features
57
+* Local variables mutable by default
58
+* Function params const by default
59
+* Modulus should obey: `(a % b) == ((a % b) + b) % b`
60
+    * The sign of the result matches the sign of the divisor
61
+    * Like in Python
62
+* Attributes
63
+    * All attributes are *values*.
64
+
65
+## Metaprogramming
66
+I want easy metaprogramming. This means:
67
+* Analyzing types with code should be straightforward
68
+    * Like: `typeof(foo).meta.methods[0].name` -- object model, not weird pseudofunctions
69
+    * Should be able to use normal code to look at it
70
+    * Some convenience method for checking for a function with a given signature:
71
+      `type.hasFunction(int foo(string, string))`
72
+* Defining types with code should be straightforward
73
+    * String mixins are the most flexible way...
74
+    * This seems to be crying out for a template engine, but let's watch for patterns.
75
+* Inserting code should be straightforward
76
+* Most everything you do at runtime should be available at compiletime
77
+
78
+
79
+## Foreign function interface
80
+How easily do I want to be able to interface with C and C++? Are there other languages I want to
81
+interface with?
82
+
83
+Well, I'm just one person. So I want the interface to be pretty simple, at least for C. I think
84
+there's a lot less C++ code I want to interact with. I do want GObject to be straightforward to work
85
+with, but probably no special support for it.
86
+
87
+## Compiler
88
+Parser options:
89
+* pegged
90
+* antlr
91
+* flex / bison
92
+
93
+Pegged is slow and memory-hungry. Antlr has a terrible API for adding fields to your AST nodes, so
94
+I'd need to create a semantic tree from the parse tree.
95
+
96
+Backend options:
97
+* llvm
98
+* c
99
+* some pre-existing virtual machine / bytecode
100
+
101
+We'll probably need to use a bytecode thingy for running code at compile-time, sandboxed.
102
+
103
+Testing:
104
+* use fuzzer
105
+* hand-rolled test cases

+ 4
- 0
ponderings.md View File

@@ -0,0 +1,4 @@
1
+# Uniformity
2
+I want things to be relatively uniform. This means:
3
+* APIs for things should be as uniform as possible. Ranges are a good example of this.
4
+*

+ 393
- 0
typesystem.md View File

@@ -0,0 +1,393 @@
1
+# Petal's type system
2
+
3
+A type system must be unsound (it accepts incorrect programs), incomplete (it rejects correct
4
+programs), or undecidable (some programs can't be analyzed). Petal would like a sound type system
5
+and accepts being incomplete.
6
+
7
+The type system operates at compile-time insofar as possible. If something could just as easily be a
8
+compile-time error as a runtime error, it should be a compile-time error.
9
+
10
+## Use of uninitialized variables
11
+Using uninitialized variables is not allowed.
12
+
13
+## Special types
14
+Petal has a few special types that are a bit abstract:
15
+* `never`
16
+* `unit`
17
+* `raw`
18
+* `any`
19
+
20
+### never
21
+`never` is a type that has no values. The main effect of this is that a function declared as
22
+returning `never` can't return a value. It must instead throw an exception, halt the program,
23
+something like that. This is useful for `panic`-type functions that log an error and then abort the
24
+program, or functions that throw an exception in every code path, or possibly for functions that
25
+loop infinitely.
26
+
27
+Other than that, you can use this like a normal type. You can't ever initialize a variable with a
28
+value of type `never`, and you can't use uninitialized variables, so this doesn't blow up in your
29
+face. This property propagates to any type that has a field of type `never`. The following struct
30
+can't be built, for instance:
31
+
32
+```
33
+struct HasNever
34
+{
35
+    never a
36
+}
37
+```
38
+
39
+`never` is not a bottom type from type theory. It does not implicitly convert to anything, and
40
+nothing implicitly converts to it. It is an empty type, and it's sometimes called the diverging
41
+type.
42
+
43
+The `never` type has size 0.
44
+
45
+### unit
46
+`unit` is a type that only has one value named `unit()`.
47
+
48
+Every function returns a value. A function that would be declared returning `void` in Java or C
49
+instead returns `unit`.
50
+
51
+The `unit` type has size 0.
52
+
53
+### raw
54
+`raw` refers to raw memory. It's only allowed in unsafe code. Otherwise, it represents an opaque
55
+byte (`uint8` that supports no operations aside from equality). `raw&` is a reference to bytes that
56
+could represent anything; `raw[]` is some memory that might contain anything.
57
+
58
+### any
59
+`any` is anything. Any object, any struct, anything. It's the root of the type hierarchy.
60
+
61
+## Builtin types
62
+### Numeric types
63
+* `int`
64
+* `int8`
65
+* `uint8`
66
+* `int16`
67
+* `uint16`
68
+* `int32`
69
+* `uint32`
70
+* `int64`
71
+* `uint64`
72
+* `int128`
73
+* `uint128`
74
+* `float32`
75
+* `float64`
76
+
77
+The trailing number is the number of bits in the type. The prefix indicates the general type: signed
78
+integer `intX`, unsigned integer `uintX`, IEEE floating point `floatX`.
79
+
80
+`int` is an alias for `int64`. While most numbers are small, a 32-bit integer is a little small for
81
+a number of real-world contexts like file length.
82
+
83
+### Strings and characters
84
+Petal supports UTF-8 / UTF-16 / UTF-32 code units:
85
+* `char8`
86
+* `char16`
87
+* `char32`
88
+
89
+These are low-level tools for low-level code. `char8` is 8 bits; `char16` is 16 bits; `char32` is 32
90
+bits.
91
+
92
+On the higher-level side, it has:
93
+* `string`
94
+* `char`
95
+
96
+`string` is a UTF-8 encoded string. `char` represents a **grapheme cluster**. A grapheme cluster
97
+might be a single code unit, like `a` (U+0061); it might be a single character with multiple code
98
+units, like `☃` (U+2603); or it might be a character with combining marks, like `é̄` (U+0065 U+0302
99
+U+0304).
100
+
101
+Internally, `string` is stored as `char8[]`. To access this underlying array, use the
102
+`representation` property.
103
+
104
+### Boolean
105
+Petal supports the boolean type, `bool`. It has two possible values: `true` and `false`. It is a
106
+purely boolean type, not supporting arithmetic.
107
+
108
+## Derived types
109
+There are a number of derived types that can be constructed of other types:
110
+
111
+* tuples
112
+* aliases
113
+* functions
114
+* references
115
+* arrays
116
+* dicts
117
+
118
+### Aliases
119
+An alias is just another name for a type.
120
+
121
+In fact, an alias can be another name for anything nameable. The compiler will try to report that
122
+alias instead of the thing's true name when you access it through an alias.
123
+
124
+As an example:
125
+
126
+```
127
+alias wchar char16.
128
+```
129
+
130
+### Tuples
131
+A tuple is an ordered collection of values of varying types:
132
+
133
+```
134
+(int32, uint8, string) tuple = (7451, 12, "hello world").
135
+```
136
+
137
+Tuples are implicitly flattened:
138
+
139
+```
140
+alias Record (int32, string).
141
+alias NamedRecord (string, record).
142
+# Both of these options work:
143
+NamedRecord r1 = ("id1", (102, "value1")).
144
+NamedRecord r2 = ("id2", 51, "value2").
145
+```
146
+
147
+A single-item tuple is the same as its content:
148
+
149
+```
150
+alias JustAString (string).
151
+JustAString r3 = "hello".
152
+string s = r3.
153
+```
154
+
155
+Tuples can be indexed like arrays, but the index must be a compile-time constant:
156
+
157
+```
158
+NamedRecord r4 = ("id4", 12, "value3").
159
+println r4:1.  # prints "12"
160
+println r4:2.  # prints "value3"
161
+```
162
+
163
+Tuples can be implicitly expanded or created when passed to a function:
164
+
165
+```
166
+unit show(NamedRecord r)
167
+{
168
+    printfln "id: {} index: {} value: {}" r:0 r:1 r:2.
169
+}
170
+unit show2(string id, int32 index, string value)
171
+{
172
+    show(id, index, value).
173
+}
174
+```
175
+
176
+### Functions
177
+A function type holds a reference to a function. It can also include a reference to a context for
178
+that function.
179
+
180
+For instance:
181
+
182
+```
183
+int32 asciiVowels(char8[] s)
184
+{
185
+    int32 count = 0.
186
+    foreach c; s:byCodeUnit
187
+        if c in "aeiou"
188
+            count++.
189
+    return count.
190
+}
191
+func: int32 fn(string) = &asciiVowels
192
+```
193
+
194
+Every function returns a value. A function's arguments are effectively a tuple.
195
+
196
+### Arrays
197
+An array is an ordered, indexable collection of items of a given type. It offers O(1) indexing, O(n)
198
+search, amortized O(log n) append, and O(n) splicing and concatenation. This is also a slice type;
199
+two arrays may refer to the same or overlapping data.
200
+
201
+Arrays use 0-based indexing: the first item is `list[0]`, the second is `list[1]`, etc.
202
+
203
+```
204
+int32[] list!
205
+foreach v; 1..10
206
+    list ~= v * v.
207
+frontHalf = list[0:to 5].
208
+assert list[4] == 25.
209
+```
210
+
211
+Arrays can also be iterated, with or without an index:
212
+
213
+```
214
+int32[] list = [1, 4, 9, 16].
215
+foreach i, v; list
216
+    printfln "list[{}] = {}" i v.
217
+```
218
+
219
+This prints:
220
+
221
+```
222
+list[0] = 1
223
+list[1] = 4
224
+list[2] = 9
225
+list[3] = 16
226
+```
227
+
228
+Implementation note: arrays are implemented as:
229
+
230
+```
231
+struct Array
232
+{
233
+    unsafe mut T* data.
234
+    mut int64 length.
235
+    TypeInfo type.
236
+}
237
+```
238
+
239
+The `type` field may be omitted or dynamically added as appropriate.
240
+
241
+### Multidimensional arrays
242
+A multidimensional array is an array with multiple dimensions. Petal's multidimensional arrays are
243
+row-major; the first index is the row, the second is the column.
244
+
245
+A two-dimensional array is sometimes called a rectangular array. It can be used to implement a
246
+matrix. It looks like:
247
+
248
+```
249
+int32[,] rect = [
250
+    [11, 12, 13, 14],
251
+    [21, 22, 23, 24],
252
+    [31, 32, 33, 34],
253
+].
254
+# It's a two-dimensional array
255
+assert rect.lengths.length == 2.
256
+# The first length is 3, because the array is 3 high.
257
+assert rect.lengths[0] == 3.
258
+# The second length is 4, because the array is 4 wide.
259
+assert rect.lengths[1] == 4.
260
+
261
+int32 sum!
262
+foreach y, x, val; rect
263
+    sum += val.
264
+assert sum == 270.
265
+```
266
+
267
+This works identically for more than two dimensions. The compiler is guaranteed to support up to
268
+five dimensions.
269
+
270
+
271
+### References
272
+A reference is an alias to an existing value.
273
+
274
+References are a boring example of the name-value distinction more interestingly demonstrated by Dr
275
+Charles Dodgson with a song. The song **is** _A-sitting on a Gate_, but it's called _Ways and
276
+Means_. The song's name is _The Aged Aged Man_, while the name is called _Haddock's Eyes_.
277
+
278
+Let's look at this example similarly:
279
+
280
+```
281
+int32 x = 12.
282
+int32& y = &x.
283
+println y.  # 0x7FFDA7EAA968
284
+println $y. # 12
285
+$y = 15.
286
+println x.  # 15
287
+x = 18.
288
+println $y. # 18
289
+```
290
+
291
+The reference type that refers to an `int32` is named `int32&`. To refer to the value that reference
292
+`y` points to, you use `$y`. (This is instead of `*`, which is used in a number of languages, to
293
+reduce ambiguity. It's obvious many readers that `println *y` should print the value that `y` points
294
+to, but it's a bit harder for the compiler to figure that out.)
295
+
296
+The value is 12. The value is called `x`. The address of the value is `0x7FFDA7EAA968`. And that
297
+address is also called `y`.
298
+
299
+
300
+## Aggregate types
301
+An aggregate type is a type with fields and methods. A field is just a variable that values of that
302
+type contain. A method is a function that implicitly takes the aggregate as its first argument.
303
+
304
+### Structs
305
+A struct is a data type passed by value. It's a series of fields that can be accessed together, and
306
+it acts as a namespace for functions that deal whith those fields.
307
+
308
+A struct doesn't need to define any fields; it's valid to have a struct with no fields. This isn't
309
+usually useful, though.
310
+
311
+```
312
+struct Password
313
+{
314
+    uint8[] salt.
315
+    uint8[] hashed.
316
+    this()
317
+    {
318
+        salt = randomSalt.
319
+    }
320
+    unit set(string v) = hashed = digest v salt.
321
+    bool verify(string v) = hashed == digest v salt.
322
+}
323
+Password password!
324
+password:set "its-a-secret".
325
+println (toHex password:hash).
326
+```
327
+
328
+A password is little more than its fields bundled together.
329
+
330
+### Classes
331
+A class is much like a struct with more power. However, it's a little less efficient.
332
+
333
+First, the inefficiency: structs are allocated inline in their context, while each class instance is
334
+a separate heap allocation. Structs are accessed like other variables in the same context, but class
335
+instances are always accessed with references.
336
+
337
+(The compiler may, as an optimization, allocate some class instances inline if it detects it's safe
338
+to do so. But this isn't reliable.)
339
+
340
+Classes can participate in inheritance. They can inherit one other class; if not otherwise
341
+specified, they inherit `Object`. They can inherit any number of interfaces.
342
+
343
+```
344
+class Person
345
+{
346
+    string name.
347
+    this(this.name).
348
+    virtual unit greet()
349
+    {
350
+        printfln "Hello {}!" name.
351
+    }
352
+}
353
+class Employee from Person
354
+{
355
+    string id.
356
+    override unit greet()
357
+    {
358
+        base:greet.
359
+        printfln "Please remember to comply with all corporate policies, {}." id.
360
+    }
361
+    this(this.name, this.id).
362
+}
363
+let employee = Employee "Anne" "TK-421".
364
+employee:greet
365
+```
366
+
367
+Classes can be abstract. An abstract class must be marked `abstract`. An abstract class can't be
368
+instantiated directly but can be inherited from, and it can contain abstract methods. A non-abstract
369
+class that inherits from an abstract class must override all abstract methods from the base class.
370
+
371
+An abstract method may have a body but doesn't require one.
372
+
373
+### Interfaces
374
+An interface is like an abstract class, but it cannot have any fields and all their functions are
375
+implicitly abstract. It can only define function signatures. They engage in inheritance, but an
376
+interface can only inherit from another interface.
377
+
378
+### Concepts
379
+A concept is a bit like an interface. It defines a series of operations that a type must support.
380
+
381
+An aggregate may declare that it adheres to a concept.
382
+
383
+### Operator overloading
384
+User-defined types can overload operators. Please do not abuse this feature.
385
+* indexing: `index` function for retrieving a value, `setIndex` for changing a value, `range` for
386
+  slicing.
387
+* math: `add`, `subtract`, `multiply`, `divide`, `modulus`, `exponent`
388
+* bitwise: `bitAnd`, `bitOr`, `bitXor`
389
+* bitwise shifts: `rightShift`, `leftShift`, `rightRotate`, `leftRotate`
390
+* iteration: `range` with no arguments to return a range.
391
+
392
+## Mutability
393
+By default, local variables are mutable, and everything else is immutable.