Ben Hardekopf CS 162 Assignment 5

Due Date

Due 11:59:59pm Monday, February 23

Description

The goal of this assignment is to become familiar with Prolog, and to gain some experience with logic programming. While you may develop your code using any Prolog implementation you want for this assignment, we recommend using SWI-Prolog, which is available on CSIL via the swipl command. We will be testing your solutions using SWI-Prolog, so make sure your solutions work correctly on SWI-Prolog before submitting via turnin.

For both part A and part B, we use the term “procedure” to refer to the code that you must provide. A procedure is a non-empty set of clauses which have the same name and arity. For example, the between0And5 procedure is shown below, which simply unifies its only argument with a number in the range [0, 5]:

between0And5(0).
between0And5(1).
between0And5(2).
between0And5(3).
between0And5(4).
between0And5(5).

For the sake of this assignment, a single clause also constitutes a procedure. That is, if you are able to write one of the intended procedures using only a single clause, this is still perfectly acceptable. For example, with part B, if there is only one solve clause, this is fine.

Part A: Tutorial

The first part of this assignment is to go through a tutorial covering Prolog, available here. You need not read chapters 7, 13, 14, and 15. We have provided a set of companion exercises for the tutorial here, indexed by tutorial chapter, which must be completed as part of the assignment.

In addition to the tutorial itself, you should also take a look at this code, which implements a simple graph and graph traversal in Prolog. This may prove useful in implementing the ontology portion later in the assignment.

Part B: List Operations

This part of the assignment involves you implementing a series of operations over lists. Download the skeleton code here. You must fill in all the portions labeled ---FILL ME IN---.

Part C: Ontologies

An ontology is a formal description of the knowledge within a domain, complete with ideas and the relationships between said ideas. Ontologies allow for users to rapidly explore a knowledge base, and well-defined representations allow for even automated exploration.

Within the domain of Biology, an ontology of high importance is that of the Gene Ontology (GO). The GO is intended to be a highly descriptive web of knowledge, specific to biological processes (e.g., cell division), cellular components (e.g., organelles), and the functions of related molecules (e.g., protein binding). This structure allows for scientists to rapidly see which different biological components are related to each other and how, along with relevant paper citations. This allows for questions to be answered easily without doing a full literature search, which is important for a fast-moving field with a vast body of knowledge.

The GO can be, and often is, represented as a series of relations between concepts. This is a rather natural representation, given that a significant amount of information is conveyed through connections between concepts, and that this representation is amenable to the incorporation of new information.

This relational representation lends itself to logic programming. Not only does logic programming make the problem of representing the ontology simple, it provides a natural mechanism to query this representation. Overall, the GO can be treated like a larged directed acyclic graph, and queries over the GO often behave as various kinds of graph traversals (you may wish to review this code again from the tutorial).

For this assignment, you will be working with a small subset of the GO, already written as part of a partial logical program. You can download the program here. Over this subset of the GO, you must write the following procedures:

A procedure named toisaroot, which makes a list showing the path from some given concept C to the root, which is always 'cellular component'. For example, the query:
```
toisaroot('DNA helicase complex', List).
```
...should yield only:
```
List = ['DNA helicase complex', 'catalytic complex', 'protein complex', 
        'macromolecular complex', 'cellular component']
```
Not every concept will have a root, reflecting the fact that not every concept can reach 'cellular component' strictly via is_a relationships. For example, the query:
```
toisaroot(cytoplasm, List).
```
...should fail, since there is no way to reach 'cellular component' via is_a relationships from cytoplasm.
A procedure named subconcept to determine if a given concept C1 is a subconcept of some concept C2. This is true if at least one of the following holds:
1. C1 is a part_of C2
2. C1 is_a C2
3. There exists some concept C3 that either is_a or is part_of C2 such that C1 is a subconcept of C3. In other words, C1 either is_a or is part_of C2 transitively.
For example, the query:
```
subconcept('intracellular part', X).
```
...should yield that X could be one of: 'cell part', intracellular, cell, or 'cellular component'. Duplicates are ok, and even expected, which reflect the fact that there were different ways to derive the same information.

Part D: Solving Sudoku Puzzles

For this portion of the assignment, you will write a program in Prolog that can solve Sudoku puzzles. Sudoku is defined as a 9x9 grid, composed of nine 3x3 subgrids. Initially, the grid is partially filled in with numbers. The goal of Sudoku is to fill in the remaining empty squares of the grid with respect to the following three constraints:

Each 9-square row must contain each digit in the range [1, 9]. No repeats are possible.
Each 9-square column must contain each digit in the range [1, 9]. No repeats are possible.
Each 9-square 3x3 subgrid must contain each digit in the range [1, 9]. No repeats are possible.

We have provided some template code to help solve this, provided here. To solve this problem, fill in the procedure named solve that takes a list of nine lists, where each inner list represents a row in the Sudoku puzzle. The inner lists are all of length nine, where each inner list element is either an integer in the range [1, 9] or an unbound variable. An example query is like so (substituting ... for additional list components):

Board = [[1, _, ...], ...], solve(Board).

The solve procedure binds all the unbound variables in a way that satisfies the above three constraints. Once all variables have been bound, as long as the above three constraints have not been violated, a solution has been reached. As a hint, the built-in predicates var(X) and nonvar(X) may come in handy.

As for implementing a naive solve, the general idea is the following:

Nondeterministically bind each blank space to a number between 1 and 9.
Ensure that the Sudoku constraints are not violated.
- If they are not violated, then we have a solution.
- If they are violated, then failure occurs, triggering backtracking to fill in a different integer than previously chosen.
Repeat the previous two steps repeatedly until either:
1. We find a solution.
2. We have tried all possible solutions, and all of them violate the Sudoku constraints.

Sudoku is an example of a problem wherein we can write very simple solutions in Prolog thanks to built-in nondeterminism and backtracking, though most of these tend to be unrealistically slow. In this context, “unrealistically slow” literally means that a solver takes days or even weeks to complete, versus only a few hundred milliseconds on an optimized implementation. While we usually don't care about performance in this class, for this portion of the assignment, it is vital for the purpose of being able to test your code.

For full credit, your solution:

Must be correct. If a board can be solved, then it can provide at least one solution. If a board cannot be solved, then solve should fail.
Must be reasonably fast. For our purposes, this means that it should terminate within 10 seconds on the provided tests on CSIL. Solvers that take longer than this are very unlikely to be able to run through our test suite, which is of comparable difficulty. Failing to terminate within the time limit is considered test failure. For comparison, our reference implementation takes well under a second on CSIL on the provided boards.
Must not use the clpfd library. This library was in many ways written around Sudoku, and a solution which is specific to the library is even provided in the documentation. We are intentionally disallowing this library to highlight a weakness of plain logic programming. We will be discussing this weakness, along with how to address it, later in the course (in fact, the clpfd was written to address this weakness).

In the interest of making your solver fast, we provide some possible ideas for optimization. Many, many other ideas are possible, though we know for a fact that these optimizations can get things speedy, since our own reference implemenation uses them. We list these ideas below:

Whenever an integer is inserted into a blank space (i.e., an uninstantiated variable becomes instantiated), recheck the Sudoku constraints. This is in contrast to filling in all integers and then checking constraints. The fundamental idea here is that we want to fail as quickly as possible upon doing something which violates constraints. Any extra work is a complete waste. Of interest here is the is_set predicate, which can be used to determine if all the elements of a list are unique, even if that list contains uninstantiated variables.
Check only the constraints relevant to changes made in between the last checking of constraints. For example, say we have a partially-instantiated Sudoku board which violates no constraints, and then we fill in the square at the top left row. In this case, we only need to check constraints on the first row, the first column, and the 3x3 cube in the top left, since those are the only constraints relevant to this square. It is pointless to check, say, the 3x3 cube in the bottom right, because nothing has changed in this cube since the last time the constraint was checked. The idea here is to reduce the amount of information which is needed in order to determine if we violated constraints.
Preprocess the board so that all columns (which we can get with the columnAsList helper) and all 3x3 cubes (which we can get with the getCube helper) are immediately available in some data structure which is passed around. Running these helpers is expensive on the provided board, as in the worst case they will end up traversing the entire board. This adds up very quickly, and makes the difference between traversing the whole board around a dozen times versus doing it potentially millions of times. The idea here is to reduce the cost of checking constraints.
Instantiate blank squares from left to right, top to bottom. This is arguably the most basic strategy to use, as it falls naturally from our board representation. While the effectiveness of this pattern varies wildly between different boards, our internal reference solution behaves this way. This is important because the internal test suite consists only of tests which our own implementation can handle within a second, so the test suite itself is biased towards boards which are most easily solved by filling in numbers left to right, top to bottom. While many different possible orderings are possible, in general no single ordering is optimal, which comes from the fact tha Sudoku is fundamentally NP-complete.

For the sake of efficiency, your implementation should not simply try all possible combinations. Instead, you should bind one variable at a time, and check that the above three constraints have not been violated. In case of violation, choose a different number. If there are no more numbers to try, backtrack to a previous choice and go from there. Remember to leverage the Prolog engine for this purpose: it can (and should) automatically handle all the backtracking logic. Only solutions which perform the above optimization (or better) will receive full credit. Of interest here is the is_set predicate, which can be used to determine if all the elements of a list are unique, even if that list contains uninstantiated variables.

Rubric

Tutorial: 10%
List Operations: 20%
Ontology:

toisaroot: 10%
subconcept: 10%

Sudoku: 50%

Deliverables

You can turnin all your code with the provided command:

turnin assign5@cs162 companion/*.pl list_routines.pl ontology.pl sudoku.pl

THIS PAGE ARCHIVES A PAST CS162 CLASS; THIS COURSE IS NOT ACTIVE

THIS PAGE ARCHIVES A PAST CS162 CLASS; THIS COURSE IS NOT ACTIVE

THIS PAGE ARCHIVES A PAST CS162 CLASS; THIS COURSE IS NOT ACTIVE

THIS PAGE ARCHIVES A PAST CS162 CLASS; THIS COURSE IS NOT ACTIVE

THIS PAGE ARCHIVES A PAST CS162 CLASS; THIS COURSE IS NOT ACTIVE

THIS PAGE ARCHIVES A PAST CS162 CLASS; THIS COURSE IS NOT ACTIVE

THIS PAGE ARCHIVES A PAST CS162 CLASS; THIS COURSE IS NOT ACTIVE

THIS PAGE ARCHIVES A PAST CS162 CLASS; THIS COURSE IS NOT ACTIVE

THIS PAGE ARCHIVES A PAST CS162 CLASS; THIS COURSE IS NOT ACTIVE

Due Date

Description

Part A: Tutorial

Part B: List Operations

Part C: Ontologies

Part D: Solving Sudoku Puzzles

Rubric

Deliverables