Project Feature Ideas
The following is a non-exhaustive list of possible features you can implement in your language.
The purpose of this list to spark some ideas.
These features have been roughly divided into those historically from imperative, object-oriented, functional, and logical languages.
Modern languages often intermix features from different paradigms; don't feel obligated to think of features only from one group.
Note that features do not exist in isolation; combining even relatively simple features can lead to surprisingly complex interactions (recall the example in class combining increment with assignment).
Traditionally Imperative Features
-
Built-in datatypes, and operations on those data types (e.g.,
int
, double
, bool
, etc.).
These will be trivial (but likely practically necessary) for most target languages.
-
Mutable state; that is, variables can be updated to later hold different values.
Common in many languages, but strongly discouraged or even completely absent in functional and logical languages.
-
Loops (e.g.,
for
, while
, etc.).
Like mutable state, common in many languages, except for functional and logical languages (which instead use recursion for iteration).
To be useful, loops require mutable state to be present.
Loop-related features like break
and continue
may or may not be present.
-
Conditionals (e.g.,
if
).
Seen in most langauages, though often absent from logical languages (e.g., Prolog).
The most common way to perform conditional execution.
-
Pointers.
C, C++, and Rust are some of the only languages with true pointers; most languages use references instead.
The distinction is that a pointer is a memory address, and we can manipulate pointers in many of the same ways in which we can manipulate integers (e.g., incrementing a pointer increments the address to which it points).
In contrast, references only abstractly reference something in memory; for example, the same reference may concretely refer to different memory locations at different times thanks to things like garbage collection.
References disallow direct arithmetic operations, since they aren't fixed to be integers.
-
First-order functions (e.g., C-like functions).
Common to many languages.
Some code is in memory which can take parameters and return values.
-
Recursion.
Most languages which have subroutines allow for recursion, though certain early languages didn't (e.g., Fortran).
-
Dynamic memory allocation (e.g.,
new
in Java/C++, malloc
in C).
It's anticipated that most languages designed will have this feature.
Note that the allocation and deallocation of memory is often closely tied to the target system.
I will not require that you have a deallocation scheme in mind (e.g., garbage collection, free
), nor will I usually give credit for such a scheme, as this is usually a runtime system problem.
-
Function pointers.
Building on first-order functions and pointers, since functions are themselves just code in memory, we can refer to these functions via pointers of their own.
Note: if your target language has higher-order functions (more on those later), I will consider function pointers trivial.
-
Vector-based operations and optimizations.
Certain modern processor architectures support vector operations, which effectively allow for multiple elements of an array to be manipulated with single instructions.
Optimizing compilers can use these instructions to automatically optimize code that manipulates arrays with loops, in order to process multiple array elements in batches, instead of the usual one at a time.
-
Automatic function inlining.
This replaces a function call with the body of the function.
Care is needed to make sure this is safe to do (e.g., this won't work with recursive functions), and to reasonably ensure that performance will be improved (this optimization can backfire when functions get large).
Traditionally Object-Oriented Features
-
Class-based inheritance.
Used by Java, C++, Python, and others.
Methods are put into classes, and it's known at compile time which methods are associated with which classes.
The actual class used by an object can vary at runtime.
-
Prototype-based inheritance.
Objects defer method calls they don't understand to a prototype, somewhat analogous to a parent class.
Unlike parent classes, prototypes can be changed at runtime, and often individual methods, too.
More flexible than class-based inheritance from a programmer standpoint, but it's harder to optimize and misuse can lead to confusing code.
-
Access modifiers, such as
public
, private
, and protected
.
These are checked at compile-time, approximately at the same time as typechecking (it can be part of typechecking, a separate pass, or some combination thereof).
-
Method overriding.
Usually phrased as a specific feature in class-based inheritance models, the ability to redefine a method in a subclass.
Access to a superclass' method may be available by a
super
-like mechanism.
Misuse can lead to code that's difficult to understand.
-
Function/method overloading.
Instead of function call and definition being based on a particular name, we base it on a particular signature, where the signature includes the name, number of parameters, parameter types, and potentially more (or less).
-
Operator overloading.
Related to function overloading, but we can do this with (potentially built-in) operations like
+
or -
.
Java intentionally doesn't support it (citing that it's easy to misuse), C++ has special syntax for it, and Python treats operators consistently with regular methods (it supports operator overloading without doing any extra work).
-
Subtyping, which allows us to have a general type and a more specific variant of that general type.
Both the general type and the specific variant are valid types.
Commonly used for handling class-based inheritance at the type level, though it's possible to separate the concepts of subclassing and subtyping (OCaml does this).
-
Generic programming, i.e., generics, type variables, etc.
Commonly-supported feature with many uses, though not universally supported (e.g., C and Go).
Different languages can implement generics in different ways; for example, C++ uses templates, which are powerful enough to automatically generate code at compile time.
Some languages additionally support bounded generics (e.g., Java), which allow for putting additional constraints on type variables.
If you're interested in implementing generics, traditional unbounded generics would count as one non-trivial feature, and bounded generics would count as a second non-trivial feature.
-
Resource aquisition is initialization (RAII).
Programming idiom that ties the creation of an object with the acquisition of some resource (e.g., memory, file, lock, etc.), and the destruction of the same object with releasing the resource.
Used in C++11 and Rust to semi-automatically deallocate memory when it is no longer needed, without needing special runtime support (and expense).
Traditionally Functional Features
-
Higher-order functions.
Somewhat like first-order functions, except these can be defined at any point, and behave like a normal expression.
Unlike first-order functions, higher-order functions are allowed to close over local variables, internally saving the function in a data structure called a closure.
Closures maintain both the function code and the saved variables.
Incredibly powerful feature.
-
Pattern matching.
Effectively a super-powered
switch
, allows for cases to effectively be nested.
-
Algebraic data types.
Allow for the user to define a type, along with different variants of that type.
For example, we can define integers and real numbers to both be numbers.
Unlike with class-based object-oriented code, these variants are indistinguishable at the type level; with the prior example, the type would be
number
, and the variants would be integer
and real
.
(These variants are called constructors, which are unrelated to object-oriented constructors; the naming is unfortunate.)
To be useful, algebraic data types need to be combined with pattern matching (to see what specific kind of variant we have).
Generally, algebraic data types + pattern matching = one non-trivial feature; this somewhat depends on your target language.
-
Type inference.
Frees programmers from having to write what the input/output/variable types are, leading to more concise code.
The types are automatically determined at compile time, so this is still statically-typed code.
Some languages (e.g., Haskell) and infer most types, whereas others (e.g., Scala) have more limited type inference.
Swift has type inference, and even C++11 has a limited form of type inference (with the
auto
type).
-
Lazy evaluation.
Expressions are not evaluated when they appear, but rather when their results are needed.
When evaluation occurs, the result is saved so that it can be returned directly as opposed to performing the computation again.
Haskell uniformly uses lazy evaluation, and Scala enables it selectively by marking variables with the
lazy
reserved word.
-
Type classes (not the same as class-based inheritance).
We first define a type class, which defines a series of function signatures which operate on a generic type.
We can then define an instance, which associates a specific type with a generic typeclass, and further defines what the functions actually do.
While this sounds a lot like class-based inheritance, the mechanisms are different.
For example, because the definition of the type class and the instance are separated, it's possible to effectively add functions/methods to a type which is not under your control (e.g., you could add your own custom method's to Java's
String
class, even though you cannot change String
's definition).
-
Implicit parameters, as seen in Scala.
Certain variables can be marked as being implicitly provided, so the programmer does not have to explicitly pass them.
Instead, the compiler will automatically find a value of the appropriate type which has been marked as being implicitly available, and will pass this automatically at compile time.
Unlike default parameters, which are dependent only on the function being called, implicit parameters additionally depend on the values in scope at the call site.
This feature can be used to implement typeclasses at the library level, and even to automatically generate code at compile time.
-
Tail call optimimzations.
If the last thing a function does is make a recursive call, then it's safe to directly jump to the function's body (like a loop), as opposed to setting up the call stack to later return (like a function).
Makes it possible to define recursive functions which use constant amounts of stack space.
Present in all practical functional and logical programming languages, and even modern C compilers can do this.
Traditionally Logical Features
-
Nondeterministic execution (see nondeterministic algorithm).
Allows the same code to incrementally produce multiple answers.
This may sound strange, but this is only because we've been trained to think in terms of questions with deterministic (exactly one) answers.
For example, “what's 3 + 5?” has a deterministic answer, but “what two values equal 8?” has nondeterministic answers (e.g.,
1
and 7
, 2
and 6
, and so on).
-
Unification.
Kind of like assignment, except information can travel in both directions (perhaps even simultaneously!) instead of strictly right-to-left.
A key feature of most logic programming languages, unification allows us to blur the lines between inputs and outputs, enabling us to easily ask questions like “what inputs yield this particular output?“ (going from output to input), or even better “if the first input is
7
, and the last output is 8
, what other outputs are possible"? ” (mixed input/output).