Project Features
A survey of possible features is below, separated into likely major and minor features.
It's difficult to say what counts as a major or minor feature, as this depends heavily on:
-
The other features chosen in your language.
For example, type inference generally becomes much more complex if subtyping is also in your language.
Similarly, the choice of some features implies the choice of others; for example, inheritance is nearly useless without method overriding and subtyping.
-
The target language of your language.
If your target language already supports a given feature, then it will likely be easier to implement, at least for your code generator.
For example, inheritance, virtual dispatch, classes, and objects are pretty complex, but if your target language already supports them, then one-to-one translations are possible.
This might be true even if you have a low-level machine target; JVM bytecode already supports these features, so most of the work is done for you.
-
Specifically how you compile your code.
For example, let's say you have higher-order functions in your language, and you're translating to JavaScript.
JavaScript already supports higher-order functions, so a simple translation is possible.
However, you might instead choose to translate to JavaScript which does not use higher-order functions, making the translation more complex.
-
How close your language is to one of the example languages.
If all your chosen features come from example languages we use, especially if they are all from the same language, this is more likely to make a feature be minor.
For example, theoretically one could propose a language that has the same syntax and features as one of the example languages.
You wouldn't be able to count such features as contributions, because it'd be possible to literally copy/paste the entire example as your own.
Generally, the more you distance yourself from an example language, the less you'll need to have in your language.
When you submit your language design proposal, I'll evaluate the features you selected, and identify which I think are the major features, and which I think are the minor features.
If you select too many, I'll suggest certain features to remove.
If you select too few, I'll suggest ways to make a feature more complex than described, or will suggest features to add.
The initial proposal is a best guess, and we'll refine from there.
You can submit the initial proposal as many times as necessary to eventually get full credit on it.
I have some features described below, organized by whether or not it's likely a major or a minor feature.
This is not an exhaustive list by far; if you want to do something that's not listed here, just chat with me (or put it on the proposal), and I'll provide feedback on it.
Likely Minor Features
-
Built-in datatypes, and operations on those data types (e.g.,
int
, double
, bool
, etc.).
These will be trivial (but likely practically necessary) for most target languages.
In most cases, these are so trivial that they won't count towards the number of features you have.
-
Mutable state; that is, variables can be updated to later hold different values.
Common in many languages, but strongly discouraged or even completely absent in functional and logical languages.
In most cases, this is so trivial that they won't count towards the number of features you have.
-
Loops (e.g.,
for
, while
, etc.).
Like mutable state, common in many languages, except for functional and logical languages (which instead use recursion for iteration).
To be useful, loops require mutable state to be present.
Loop-related features like break
and continue
may or may not be present.
In most cases, this is so trivial that they won't count towards the number of features you have, unless you're targeting a low-level language.
-
Conditionals (e.g.,
if
).
Seen in most langauages, though often merely implied in logical languages (e.g., Prolog).
The most common way to perform conditional execution.
In most cases, this is so trivial that they won't count towards the number of features you have, unless you're targeting a low-level language.
-
Tuples; that is, a data structure that holds other pieces of data.
Most modern languages have direct support for these.
Depending on your target language, these could range between trivial and major.
-
Structures (e.g.,
struct
in C).
These are like tuples, except each piece of data contained within has a user-defined name associated with it.
Depending on your target language, these could range between trivial and major.
-
Pointers.
C, C++, and Rust are some of the only languages with true pointers; most languages use references instead.
The distinction is that a pointer is a memory address, and we can manipulate pointers in many of the same ways in which we can manipulate integers (e.g., incrementing a pointer increments the address to which it points).
In contrast, references only abstractly reference something in memory; for example, the same reference may concretely refer to different memory locations at different times thanks to things like garbage collection.
References disallow direct arithmetic operations, since they aren't fixed to be integers.
-
First-order functions (e.g., C-like functions).
Common to many languages.
Some code is in memory which can take parameters and return values.
For a low-level target, this is potentially a major feature.
-
Access modifiers, such as
public
, private
, and protected
.
These are checked at compile-time, approximately at the same time as typechecking (it can be part of typechecking, a separate pass, or some combination thereof).
-
Subtyping, which allows us to have a general type and a more specific variant of that general type.
Both the general type and the specific variant are valid types.
Commonly used for handling class-based inheritance at the type level, though it's possible to separate the concepts of subclassing and subtyping (OCaml does this).
-
Function/method overloading.
Instead of function call and definition being based on a particular name, we base it on a particular signature, where the signature includes the name, number of parameters, parameter types, and potentially more (or less).
-
Algebraic data types.
Allow for the user to define a type, along with different variants of that type.
For example, we can define integers and real numbers to both be numbers.
Unlike with class-based object-oriented code, these variants are indistinguishable at the type level; with the prior example, the type would be
number
, and the variants would be integer
and real
.
(These variants are called constructors, which are unrelated to object-oriented constructors; the naming is unfortunate.)
To be useful, algebraic data types need to be combined with pattern matching (to see what specific kind of variant we have).
Generally, algebraic data types also need pattern matching in order to be useful.
Likely Major Features
-
Complex syntax, not based on S-expressions.
-
Function pointers.
Building on first-order functions and pointers, since functions are themselves just code in memory, we can refer to these functions via pointers of their own.
This is likely a minor feature if your target language already has higher-order functions.
-
Vector-based operations and optimizations.
Certain modern processor architectures support vector operations, which effectively allow for multiple elements of an array to be manipulated with single instructions.
Optimizing compilers can use these instructions to automatically optimize code that manipulates arrays with loops, in order to process multiple array elements in batches, instead of the usual one at a time.
-
Automatic function inlining.
This replaces a function call with the body of the function.
Care is needed to make sure this is safe to do (e.g., this won't work with recursive functions), and to reasonably ensure that performance will be improved (this optimization can backfire when functions get large).
-
Class-based inheritance.
Used by Java, C++, Python, and others.
Methods are put into classes, and it's known at compile time which methods are associated with which classes.
The actual class used by an object can vary at runtime.
-
Prototype-based inheritance.
Objects defer method calls they don't understand to a prototype, somewhat analogous to a parent class.
Unlike parent classes, prototypes can be changed at runtime, and often individual methods, too.
More flexible than class-based inheritance from a programmer standpoint, but it's harder to optimize and misuse can lead to confusing code.
-
Method overriding.
Usually phrased as a specific feature in class-based inheritance models, the ability to redefine a method in a subclass.
Access to a superclass' method may be available by a
super
-like mechanism.
Misuse can lead to code that's difficult to understand.
-
Operator overloading.
Related to function overloading, but we can do this with (potentially built-in) operations like
+
or -
.
Java intentionally doesn't support it (citing that it's easy to misuse), C++ has special syntax for it, and Python treats operators consistently with regular methods (it supports operator overloading without doing any extra work).
-
Generic programming, i.e., generics, type variables, etc.
Commonly-supported feature with many uses, though not universally supported (e.g., C and Go).
Different languages can implement generics in different ways; for example, C++ uses templates, which are powerful enough to automatically generate code at compile time.
Some languages additionally support bounded generics (e.g., Java), which allow for putting additional constraints on type variables.
If you're interested in implementing generics, traditional unbounded generics are one major feature, and bounded generics would count as a second major feature.
-
Resource aquisition is initialization (RAII).
Programming idiom that ties the creation of an object with the acquisition of some resource (e.g., memory, file, lock, etc.), and the destruction of the same object with releasing the resource.
Used in C++11 and Rust to semi-automatically deallocate memory when it is no longer needed, without needing special runtime support (and expense).
-
Higher-order functions.
Somewhat like first-order functions, except these can be defined at any point, and behave like a normal expression.
Unlike first-order functions, higher-order functions are allowed to close over local variables, internally saving the function in a data structure called a closure.
Closures maintain both the function code and the saved variables.
Incredibly powerful feature.
-
Pattern matching.
Effectively a super-powered
switch
, allows for cases to effectively be nested.
With full exhaustivity checking, this is likely a major feature, but without exhaustivity checking, it's likely a minor feature.
-
Type inference.
Frees programmers from having to write what the input/output/variable types are, leading to more concise code.
The types are automatically determined at compile time, so this is still statically-typed code.
Some languages (e.g., Haskell) and infer most types, whereas others (e.g., Scala) have more limited type inference.
Swift has type inference, and even C++11 has a limited form of type inference (with the
auto
type).
-
Lazy evaluation.
Expressions are not evaluated when they appear, but rather when their results are needed.
When evaluation occurs, the result is saved so that it can be returned directly as opposed to performing the computation again.
Haskell uniformly uses lazy evaluation, and Scala enables it selectively by marking variables with the
lazy
reserved word.
-
Type classes (not the same as class-based inheritance).
We first define a type class, which defines a series of function signatures which operate on a generic type.
We can then define an instance, which associates a specific type with a generic typeclass, and further defines what the functions actually do.
While this sounds a lot like class-based inheritance, the mechanisms are different.
For example, because the definition of the type class and the instance are separated, it's possible to effectively add functions/methods to a type which is not under your control (e.g., you could add your own custom method's to Java's
String
class, even though you cannot change String
's definition).
-
Implicit parameters, as seen in Scala.
Certain variables can be marked as being implicitly provided, so the programmer does not have to explicitly pass them.
Instead, the compiler will automatically find a value of the appropriate type which has been marked as being implicitly available, and will pass this automatically at compile time.
Unlike default parameters, which are dependent only on the function being called, implicit parameters additionally depend on the values in scope at the call site.
This feature can be used to implement typeclasses at the library level, and even to automatically generate code at compile time.
-
Tail call optimimzations.
If the last thing a function does is make a recursive call, then it's safe to directly jump to the function's body (like a loop), as opposed to setting up the call stack to later return (like a function).
Makes it possible to define recursive functions which use constant amounts of stack space.
Present in all practical functional and logical programming languages, and even modern C compilers can do this.
Ironically, if the target language is low-level, this might become trivial; it's often easier to implement a tail call optimization at the assembly level.
-
Nondeterministic execution (see nondeterministic algorithm).
Allows the same code to incrementally produce multiple answers.
This may sound strange, but this is only because we've been trained to think in terms of questions with deterministic (exactly one) answers.
For example, “what's 3 + 5?” has a deterministic answer, but “what two values equal 8?” has nondeterministic answers (e.g.,
1
and 7
, 2
and 6
, and so on).
-
Unification.
Kind of like assignment, except information can travel in both directions (perhaps even simultaneously!) instead of strictly right-to-left.
A key feature of most logic programming languages, unification allows us to blur the lines between inputs and outputs, enabling us to easily ask questions like “what inputs yield this particular output?“ (going from output to input), or even better “if the first input is
7
, and the last output is 8
, what other outputs are possible"? ” (mixed input/output).