Custom Proposals

You may choose to create a custom proposal if ALL of the following are true:

You have prior experience working with compilers or language design.
You understand that this will be an incredible amount of labor.
You understand that you will receive no bonus for this extra labor. People who select one of the other proposals and slap their name on it will receive the same amount of points as you will.
You understand that you will still be expected to follow the same deadlines as the rest of the class. That is, the proposal will need to be developed quickly.
You understand that the vast majority of groups who attempted this before you were unable to deliver a complete compiler. In prior versions of the class, custom proposals were the default, so more slack was given then. However, since custom proposals are no longer the default, slack will be unlikely to be given.
You understand that each one of these proposals took the professor, on average, about 2 hours to create, including time to do some quick prototyping to see how complex certain features would be to implement in conjunction with each other. Given the professor's prior experience in this area, this will likely take you much, much longer, and it's more likely that you'll need to revise the proposal later.
You're craving a CS challenge which would frighten your peers.

If all of the above are true, then congratulations, you've elected the path of pain. A template is here to get you started.

Required Components

Your language needs to have the following:

Formally-defined Syntax: Your language's syntax must be defined in Backus-Naur Form (BNF).
Statically-Typed: Your language must be statically typed, and your compiler must reject ill-typed programs. For example, Java is a statically-typed programming language, and the following Java program is ill-typed:
```
          int x = 7;
          String s = x; // int cannot convert to String
        
```
Static typing is likely the most controversial required feature, as it means you cannot implement a typical dynamically-typed language (e.g., Python, Ruby, PHP). I'm intentionally requiring typechecking because it's useful to know how the compiler does this when debugging type errors in any language, and to give a sense of why certain limitations exist around types. That said, it is ok if some typechecking components are delayed until runtime; array bounds-checking is arguably one such delayed feature.
Expressions: Your language must contain subprograms that evaluate to values. For example, 1 + 2 is an expression that evaluates to 3, as is foo("bar") + "baz". Not everything in your language needs to be an expression (e.g., int x = 7 is not an expression in Java, but rather a statement), and it's ok if most things in your language are not expressions (e.g., Prolog's is built-in treats its second argument as an expression, and it's the only construct in the language that works with expressions).
Subprograms: Your language must allow the definition of entire subprograms which can be executed at desired times. For example, subroutines, functions, and methods are all different kinds of subprograms. It's only required to implement one kind of subprogram.
Control structures: Your language must allow for code to be conditionally executed. For example, if and while are both common control structures. If you want your language to be Turing-complete, you must permit either loops or recursion, if not both. If you don't want your language to be Turing-complete, you must have a very good argument why, based on how you're planning the language to be used (e.g., “it's one less thing to implement” is not a valid reason, but ”it's important that programs in this language always terminate” can be legitimate). Essentially all languages have control structures; even Prolog has conditional execution, even though it lacks the usual if and while.
Abstraction of Computations: It's not a hard requirement, but it's recommended that your language allow for entire computations to be abstracted over, in a manner that allows whole computations to be treated as parameters. This may sound strange at first, but most languages allow this in some way. A listing of common approaches follows:
- Function Pointers: the memory address of a function can be passed around. This is great for low-level languages like C, and maps well to what the underlying hardware does. However, function pointers do not allow you to automatically encapsulate any sort of state with the function (e.g., pass 5 as the first parameter of the function), which is limiting.
- Objects + Methods: programmer-defined state is associated with programmer-defined functions, and the functions have direct access to the state. Individual associations are represented with objects, which can be passed around normally as values. Most languages supporting this feature specifically use a class-based approach for defining objects, though alternatives exist (e.g., JavaScript uses a prototype-based approach).
- Higher-order Functions: effectively encapsulate a particular function (like a function pointer) along with state the function needs when called. The encapsulated unit is called a closure, and closures can be passed normally as values. A key feature of higher-order functions is that they can be easily defined on the fly. For example, consider the following Scala code:
```
              def addThis(x: Int): Int => Int =
                (y: Int) => x + y
            
```
  The addThis function takes a parameter x, and returns a function. The function returned itself takes a parameter y, and itself returns the value of x + y. Internally, when addThis is called, it creates a closure which holds a function that does x + y, along with the specific value of x. It's possible to represent something similar with objects and methods, for example, in Java:
```
              public class AddThisHelper {
                private final int x;
                public AddThisHelper(final int x) {
                  this.x = x;
                }
                public int doCall(final int y) {
                  return x + y;
                }
                // roughly equivalent to Scala's addThis
                public static AddThisHelper addThis(final int x) {
                  return new AddThisHelper(x);
                }
              }
            
```
  However, this is clearly a lot bulkier. While higher-order functions are best known from their use in functional programming languages (e.g., Lisp, Scheme, Racket, Haskell, Scala), most modern languages support them (e.g., JavaScript, C++11, Java 8, Go).
- Typeclasses, not to be confused with object-oriented classes, associate a series of functions with particular types. This is done by first defining a typeclass (not object-oriented class!) which defines operations over some generic type, and then later defining an implementation of these operations for a specific type. For example, consider the following Haskell code:
```
              class HasId a where
                getId :: a -> Int
              instance HasId String where
                getId input_string = length input_string
            
```
  The above code defines the HasId typeclass, which operates over some generic type a. This typeclass is associated with the getId function, which will take a value of type a and return an Int representing its id. We then say that the String type implements the HasId typeclass, and define getId for String to return the length of the input string. In Java parlance, the above code is very similar to the following:
```
              public interface HasId {
                public int getId();
              }
              public class String implements HasId {
                public int getId() {
                  return length();
                }
              }
            
```
  However, the above Java code doesn't work, since String is already defined for you. As such, you cannot tack-on a method to it like we have. Typeclasses, in contrast, allow us to effectively add methods to arbitrary types whenever we want. Haskell is generally the most well-known for having typeclasses, but Rust (via trait and impl) and Swift (via protocol and extension) also support typeclasses.
Possibly Something Extra: depending on your target language and previously-chosen features (more in the next section), you may need to implement additional features. These features can either be user-facing (e.g., an additional kind of construct that programmers can be used), or internal (e.g., a special kind of optimization you perform). There are a ton of possible features out there; if you're looking for ideas, just ask!

A non-exhaustive list of example features is provided here.

Target Language

So far the discussion has focused on the input to your compiler, namely, programs written in a language of your design. In this section, we discuss the output of your compiler, namely, a (hopefully!) equivalent program written in a (possibly) different language. Details follow.

First of all, you have complete control over your target language. You can compile all the way to assembly if you'd like. You can compile to a low-level representation above assembly, such as LLVM bitcode (which itself compiles to assembly), or Java Virtual Machine (JVM) bytecode (which runs on the JVM, like Java). You can even compile to another high-level language, like Java, C, or JavaScript (often called transpilation).

The fact that I do not require you to compile to assembly may seem strange, given that much content out there is about compiling to assembly. The primary reason why I am doing this is because I want this class to focus on modern compilers. Compiling directly to assembly is nowadays somewhat rare. If assembly is ultimately desired, it's common practice to use LLVM as a target instead, which itself can compile to assembly. Clang (a C compiler), Rust, and Swift all compile to LLVM bitcode. LLVM is generally a simpler target than assembly, and it will automatically do certain optimizations for you, and will almost certainly generate more efficient assembly than you would directly. LLVM can be customized to do language-specific optimizations, and LLVM can compile to a wide variety of assembly languages. As such, from a modern compiler standpoint, LLVM is almost certainly better than a custom approach. The only downside to selecting LLVM, from an educational standpoint, is that you won't have as good an understanding of how a specific target machine works (but you can still select assembly if that's important to you).

Similarly, the fact that I allow you to compile to a high-level language may seem strange. This is permitted because there has been an influx of languages and compilers that do exactly this. For example, Mercury compiles to C, TypeScript, CoffeeScript, and Dart all compile to JavaScript, and a variety of languages can be compiled to JavaScript thanks to special compilers (e.g., Emscripten for C, Scala.js for Scala, and ClojureScript for Clojure). (JavaScript is a popular target, given its ability to run natively in web browsers.) As such, compiling to a high-level language is not very strange at all. Moreover, this can simplify a number of basic things (many things become trivial), allowing you to experiment with more complex language features.

While you can choose any target language you want, your choice will have a dramatic impact on development, and will largely determine how difficult it will be to implement different parts of your language. For example, if your target language is JavaScript, implementing features that JavaScript already has (e.g., expressions, higher-order functions) will be a relatively trivial, likely one-to-one mapping. Conversely, if your target language is assembly, implementing even basic expressions is non-trivial.