Due Date

Due 11:59:59pm Wednesday, March 15

Description

The goal of this assignment is to become familiar with Prolog and to gain some experience with logic programming. While you may develop your code using any Prolog implementation you want for this assignment, we recommend using SWI-Prolog, which is available on CSIL via the swipl command. We will be testing your solutions using SWI-Prolog, so make sure your solutions work correctly on SWI-Prolog before submitting via turnin.

For both part A and part B, we use the term “procedure” to refer to the code that you must provide. A procedure is a non-empty set of clauses which have the same name and arity. For example, the between0And5 procedure is shown below, which simply unifies its only argument with a number in the range [0, 5]:

between0And5(0).
between0And5(1).
between0And5(2).
between0And5(3).
between0And5(4).
between0And5(5).

For the sake of this assignment, a single clause also constitutes a procedure. That is, if you are able to write one of the intended procedures using only a single clause, this is still perfectly acceptable. For example, with part B, if there is only one solve clause, this is fine.

Part A: Tutorial

The first part of this assignment is to go through a tutorial covering Prolog, available here. You need not read chapters 7, 13, 14, and 15. We have provided a set of companion exercises for the tutorial here, indexed by tutorial chapter, which must be completed as part of the assignment.

In addition to the tutorial itself, you should also take a look at this code, which implements a simple graph and graph traversal in Prolog. This may prove useful in implementing the ontology portion later in the assignment.

Part B: List Operations

This part of the assignment involves you implementing a series of operations over lists. Download the skeleton code here. You must fill in all the portions labeled ---FILL ME IN---.

Part C: Ontologies

An ontology is a formal description of the knowledge within a domain, complete with ideas and the relationships between said ideas. Ontologies allow for users to rapidly explore a knowledge base, and well-defined representations allow for even automated exploration.

Within the domain of Biology, an ontology of high importance is that of the Gene Ontology (GO). The GO is intended to be a highly descriptive web of knowledge, specific to biological processes (e.g., cell division), cellular components (e.g., organelles), and the functions of related molecules (e.g., protein binding). This structure allows for scientists to rapidly see which different biological components are related to each other and how, along with relevant paper citations. This allows for questions to be answered easily without doing a full literature search, which is important for a fast-moving field with a vast body of knowledge.

The GO can be, and often is, represented as a series of relations between concepts. This is a rather natural representation, given that a significant amount of information is conveyed through connections between concepts, and that this representation is amenable to the incorporation of new information.

This relational representation lends itself to logic programming. Not only does logic programming make the problem of representing the ontology simple, it provides a natural mechanism to query this representation. Overall, the GO can be treated like a large directed acyclic graph, and queries over the GO often behave as various kinds of graph traversals (you may wish to review this code again from the tutorial).

For this portion of the assignment, you will be working with a small subset of the GO, already written as part of a partial logical program. You can download the program here. Over this subset of the GO, you must write the following procedures:

  1. A procedure named toisaroot, which makes a list showing the path from some given concept C to the root, which is always 'cellular component'. For example, the query:

    toisaroot('DNA helicase complex', List).
    
    ...should yield only:
    List = ['DNA helicase complex', 'catalytic complex', 'protein complex', 
            'macromolecular complex', 'cellular component']
    
    Not every concept will have a root, reflecting the fact that not every concept can reach 'cellular component' strictly via is_a relationships. For example, the query:
    toisaroot(cytoplasm, List).
    
    ...should fail, since there is no way to reach 'cellular component' via is_a relationships from cytoplasm.

  2. A procedure named subconcept to determine if a given concept C1 is a subconcept of some concept C2. This is true if at least one of the following holds:

    1. C1 is a part_of C2

    2. C1 is_a C2

    3. There exists some concept C3 that either is_a or is part_of C2 such that C1 is a subconcept of C3. In other words, C1 either is_a or is part_of C2 transitively.

    For example, the query:
    subconcept('intracellular part', X).
    
    ...should yield that X could be one of: 'cell part', intracellular, cell, or 'cellular component'. Duplicates are ok, and even expected, which reflect the fact that there were different ways to derive the same information.

Rubric

Deliverables

You can turnin all your code with the provided command:
turnin assign6@cs162 companion/*.pl list_routines.pl ontology.pl