You may choose to create a custom proposal if ALL of the following are true:
If all of the above are true, then congratulations, you've elected the path of pain. A template is here to get you started.
Your language needs to have the following:
int x = 7; String s = x; // int cannot convert to StringStatic typing is likely the most controversial required feature, as it means you cannot implement a typical dynamically-typed language (e.g., Python, Ruby, PHP). I'm intentionally requiring typechecking because it's useful to know how the compiler does this when debugging type errors in any language, and to give a sense of why certain limitations exist around types. That said, it is ok if some typechecking components are delayed until runtime; array bounds-checking is arguably one such delayed feature.
1 + 2
is an expression that evaluates to 3
, as is foo("bar") + "baz"
.
Not everything in your language needs to be an expression (e.g., int x = 7
is not an expression in Java, but rather a statement), and it's ok if most things in your language are not expressions (e.g., Prolog's is
built-in treats its second argument as an expression, and it's the only construct in the language that works with expressions).
if
and while
are both common control structures.
If you want your language to be Turing-complete, you must permit either loops or recursion, if not both.
If you don't want your language to be Turing-complete, you must have a very good argument why, based on how you're planning the language to be used (e.g., “it's one less thing to implement” is not a valid reason, but ”it's important that programs in this language always terminate” can be legitimate).
Essentially all languages have control structures; even Prolog has conditional execution, even though it lacks the usual if
and while
.
5
as the first parameter of the function), which is limiting.
def addThis(x: Int): Int => Int = (y: Int) => x + yThe
addThis
function takes a parameter x
, and returns a function.
The function returned itself takes a parameter y
, and itself returns the value of x + y
.
Internally, when addThis
is called, it creates a closure which holds a function that does x + y
, along with the specific value of x
.
It's possible to represent something similar with objects and methods, for example, in Java:
public class AddThisHelper { private final int x; public AddThisHelper(final int x) { this.x = x; } public int doCall(final int y) { return x + y; } // roughly equivalent to Scala's addThis public static AddThisHelper addThis(final int x) { return new AddThisHelper(x); } }However, this is clearly a lot bulkier. While higher-order functions are best known from their use in functional programming languages (e.g., Lisp, Scheme, Racket, Haskell, Scala), most modern languages support them (e.g., JavaScript, C++11, Java 8, Go).
class HasId a where getId :: a -> Int instance HasId String where getId input_string = length input_stringThe above code defines the
HasId
typeclass, which operates over some generic type a
.
This typeclass is associated with the getId
function, which will take a value of type a
and return an Int
representing its id.
We then say that the String
type implements the HasId
typeclass, and define getId
for String
to return the length of the input string.
In Java parlance, the above code is very similar to the following:
public interface HasId { public int getId(); } public class String implements HasId { public int getId() { return length(); } }However, the above Java code doesn't work, since
String
is already defined for you.
As such, you cannot tack-on a method to it like we have.
Typeclasses, in contrast, allow us to effectively add methods to arbitrary types whenever we want.
Haskell is generally the most well-known for having typeclasses, but Rust (via trait
and impl
) and Swift (via protocol
and extension
) also support typeclasses.
A non-exhaustive list of example features is provided here.
So far the discussion has focused on the input to your compiler, namely, programs written in a language of your design. In this section, we discuss the output of your compiler, namely, a (hopefully!) equivalent program written in a (possibly) different language. Details follow.
First of all, you have complete control over your target language. You can compile all the way to assembly if you'd like. You can compile to a low-level representation above assembly, such as LLVM bitcode (which itself compiles to assembly), or Java Virtual Machine (JVM) bytecode (which runs on the JVM, like Java). You can even compile to another high-level language, like Java, C, or JavaScript (often called transpilation).
The fact that I do not require you to compile to assembly may seem strange, given that much content out there is about compiling to assembly. The primary reason why I am doing this is because I want this class to focus on modern compilers. Compiling directly to assembly is nowadays somewhat rare. If assembly is ultimately desired, it's common practice to use LLVM as a target instead, which itself can compile to assembly. Clang (a C compiler), Rust, and Swift all compile to LLVM bitcode. LLVM is generally a simpler target than assembly, and it will automatically do certain optimizations for you, and will almost certainly generate more efficient assembly than you would directly. LLVM can be customized to do language-specific optimizations, and LLVM can compile to a wide variety of assembly languages. As such, from a modern compiler standpoint, LLVM is almost certainly better than a custom approach. The only downside to selecting LLVM, from an educational standpoint, is that you won't have as good an understanding of how a specific target machine works (but you can still select assembly if that's important to you).
Similarly, the fact that I allow you to compile to a high-level language may seem strange. This is permitted because there has been an influx of languages and compilers that do exactly this. For example, Mercury compiles to C, TypeScript, CoffeeScript, and Dart all compile to JavaScript, and a variety of languages can be compiled to JavaScript thanks to special compilers (e.g., Emscripten for C, Scala.js for Scala, and ClojureScript for Clojure). (JavaScript is a popular target, given its ability to run natively in web browsers.) As such, compiling to a high-level language is not very strange at all. Moreover, this can simplify a number of basic things (many things become trivial), allowing you to experiment with more complex language features.
While you can choose any target language you want, your choice will have a dramatic impact on development, and will largely determine how difficult it will be to implement different parts of your language. For example, if your target language is JavaScript, implementing features that JavaScript already has (e.g., expressions, higher-order functions) will be a relatively trivial, likely one-to-one mapping. Conversely, if your target language is assembly, implementing even basic expressions is non-trivial.