Project Details

Details from Syllabus

The details below are copied from the Assignments and Project part of the course syllabus.

The bulk of your grade (70%) is based on a project, which will allow you to cover a subset of V&V in depth on real software. This breakdown reflects what I want you to take away from this class: adequate high-level knowledge of a variety of approaches, and in-depth knowledge on an area(s) of your choosing. Details of the project follow, roughly in chronological order:

System under test (SUT) selection. You will choose a software project which you want to apply V&V techniques to, hereafter referred to as a SUT. The SUT should be sufficiently large (anything under 1,000 lines of code is almost assuredly too small), and can be in any state of development. Your grade is based solely on the V&V aspect, so if you select an incomplete SUT (e.g., it is still in the planning stages), the actual SUT implementation is secondary. You are encouraged to select software you have written in the past, but you must be ok with giving everyone in the class access to this code. You may work in pairs, but more will be expected of such projects.
Project proposal (5%). You will propose exactly what you plan to do with the SUT, from a V&V perspective. For all projects, this must include aspects related to details covered in the first four weeks of class, with an included ACC plan being a hard requirement. It is also expected to include something from weeks 5-9 (e.g., testing with an advanced technique), and potentially something from weeks 10-15, too. I will tell you if your proposal needs more or less; in general, it's better to include too much and have me tell you to scale down. You may propose to use V&V techniques which aren't covered in the class, and this is even encouraged. For more details, see Project Proposal.
Project pull requests (12% total). You are required to submit 12 pull requests between weeks 4-15, one per week, each worth 1%. Each one of these should be relatively small (no greater than 200 lines of code, ideally between 20-100 lines of code). It is expected that you will need to make code changes outside of pull requests. Pull requests are intended for reviews (more on that in a bit), to ensure steady project progress is made, and to catch surprises early.
Project code reviews (12% total). You are required to perform 12 code reviews between weeks 5-16, one per week, each worth 1%. The code being reviewed is from previously-submitted pull requests.
Project implementation (30%). Your final implementation of the project. It is expected that this will be larger than the sum of your pull requests (i.e., not all the code you need to write can be reviewed). This grade is based on whether or not your implementation actually implements what you proposed, with fully-conforming implementations receiving full credit. It is important that there are no surprises here; projects which do not conform to proposals will be penalized, irrespective of reason. If you are worried that your project will not conform to the proposal, let me know before this point and we can adjust the proposal without penalty. It is somewhat expected that proposals may be too broad in scope, and oftentimes we won't be able to tell until we really dive into things.
Project report (6%). You will write background information on the techniques you used, how you applied them, and what you've learned from the application.
Project presentation (5%). You will give an oral presentation on your project, detailing the same sort of information in the report.

Project Proposal

Your project proposal should conform to this template. Example project proposals are shown below, to give a better idea of what these should look like. Note that you may use these proposals as-is, if you so choose.

The project proposal asks you to select advanced V&V techniques before you know all the related details. You are being asked to make a decision with incomplete information; it is wholly expected that this information will need to be revised later. In the meantime, some minimal details follow about the techniques we will discuss later in the class:

Mutation Testing/Analysis.
Despite the name, this is not a testing technique itself, but rather a technique to measure how effective a given testing technique is. This allows us to answer questions like “is testing technique A more effective than testing technique B?” More details can be found on this Wikipedia page, which serves as a good starting point.
Basic Automated Test Generation
The idea here is that we can run what is fundamentally the same test with different inputs, and these inputs can be generated in a straightforward manner. This allows us to run the same kind of test under slightly different parameters, helping to expose edge cases. JUnit 5 (a popular testing framework for Java programs) makes this possible with Parameterized Tests. These links are provided as examples of this technique; the particular way in which JUnit does this has a number of problems associated with it.
Grammar-Based Test Generation
The previously-described approach doesn't scale up well when testing operations which work with complex data. For example, when testing an API that works with Binary Trees, it's not clear how to automatically generate different kinds of trees using only a loop. However, binary trees can be described using Context-Free Grammars. We can write generators of sentences which behave according to a context-free grammars, and thus write binary tree generators. We can add randomness to the generation by using Probabilistic Context-Free Grammars. A surprising number of testing problems can be (at least mostly) phrased using context-free grammars.
Property-Based Testing
We can describe systems in terms of properties we expect them to have, and these properties can be expressed with executable code. For example, if we are writing a function add(x, y) which is intended to add x and y together, then one property may be add(x, y) == add(y, x) (that is, addition is commutative). From a high level, it is straightforward to go from properties to tests, as long as we have a way of filling in variables with appropriate data. With the example above, this means trying different values of x and y, and ensuring that the property is never violated. Testing frameworks exist that can assist with property-based testing, including:
- QuickCheck (originally for Haskell, but since ported to many other languages). QuickCheck pioneered property-based testing.
- Hypothesis (for Python and Java)
- ScalaTest (for Scala)
Ideally, the testing framework can automatically handle the generation of inputs, but in many cases it may be required to somehow assist the framework in this task.
Concolic Execution
Concolic Execution is an automated, Write-box technique that attempts to generate a variety of test inputs which collectively exhaustively test a given software system. In practice, most software systems would require an infinite number of inputs for full exhaustive testing, so certain compromises must be made. A variety of tools exist which can perform concolic execution; see the link with more details. Getting these tools working correctly, and having them generate useful tests, can require a significant amount of effort; this technique is not as automated as it first may seem.
Model Checking
Model Checking is generally intended for testing software at the specification level, not the implementation level. Specifications are written in special languages, and variants of Linear Temporal Logic are popular for this purpose. Some tools allow for verification (proving the specification correct), as opposed to just testing.
Verification: Dafny
Dafny is a relatively basic programming language with built-in verification functionality. The goal is to write programs which can be proven correct, according to a given specification. Unlike model checking, this verification extends to the implementation, not just the specification. While the Dafny language itself is very simple, there is a definite learning curve to getting it to verify even seemingly simple things. For example, it took me:
- 20 minutes to verify that a routine which reversed the contents of an array actually reversed the contents.
- 1 hour to verify that code which concatenated two arrays together actually concatenated the two arrays.
- 3-4 hours to verify that an implementation of Selection Sort actually sorted its input.
While Dafny has a pretty steep learning curve, it's surprisingly easy to pick up relative to other verification approaches. However, Dafny is relatively young, and the incredibly high complexity of its implementation makes it difficult to trust.
Verification: Dependent Types and Coq
Dependent Type Theory allows us to encode a Type System which is incredibly expressive. We can then phrase abstract questions like “Does this program work correctly?” in an equivalent, much more concrete “Does this program have any type errors?” Dependent types are commonly used in programming languages research to verify properties of programming languages, and Coq is a particularly popular and well-established tool for working with dependent types. This verification approach is mature, relatively easy to trust, and behaves in a manner which can be predicted. However, this requires an exceptional amount of background information, and it can be downright frustrating to learn. Unless you're already familiar with this approach, verifying even trivial things can be a challenge (I once spent 12 hours trying to prove that a + (b + c) was equal to (a + b) + c in Coq, and I eventually gave up.)

Note that the above list is not exhaustive; there are many more advanced V&V techniques out there. Anything you find that is not covered under basic techniques is considered an advanced V&V technique for this purpose, and you may select any advanced V&V techniques here, including those which are not listed.

We will be covering verification-based techniques in the couse (week 10 and beyond), and these may count as advanced V&V techniques for the purposes of the project. However, these tend to be much more difficult to use, and we won't get into their details until relatively late in the course. As such, it is recommended to stay away from verification techniques for the project.

Final Project Report

A template for the final project report is here. This template includes the sections you should have in this report, along with the value of those sections.

To further help understand what I'm looking for in the report, an example final report has been provided. The corresponding project proposal for this report is here. This example project covers the codebase here.