Tips on Debugging Prolog

Prolog is likely a very different language than what you're used to. Not only is the language pure like Scala (at least the subset we're using), it can be very difficult to wrap your head around nondeterministic evaluation, especially when things aren't working correctly. I have in this page some tips on debugging Prolog which I've used for my own research. Most of these I discovered through trial-and-error, and it is, in my opinion, woefully incomplete. If you have any suggestions / comments for improvement, let me know (at kyledewey dot cs dot ucsb edu) and I'll add it here with your name attached.

Warnings

The usual advice with warnings in any language is to treat them as errors. I've found this to be especially true of Prolog; I've never had a warning that didn't actually indicate an error of some sort. The warnings can get lost in the noise with SWI-PL since it has a huge banner when you first load it up, but the banner can be discarded if you give it the -q option on loading it. Below are some common warnings, what they mean, and their solutions.

Warning Name What It Means Example Why It Is Not an Error Why It Is a Warning How to Fix It
Singleton variables: ... In some scope, a variable was introduced but never used again. This is analogous to having a function which takes a parameter x but doesn't use x.
foo(X).
Semantically meaningful - unify this new variable with something used in whatever position it has, and then discard the variable. Often indicates a typo. A programmer may have meant to use Foo in two placed, but ended up typing Fo for one of the places. This is a nasty bug which can be hard to find. Because of this warning, the compiler would flag this. If this is actually your intention, use underscore (_) instead. Underscore is precisely for the purpose of acting as a placeholder, and it immediately indicates to anyone reading your code that you don't care about whatever is in this position.
Clauses of ... are not together in the source-file The different rules and facts corresponding to one procedure are interspersed with others.
foo(1).
bar(2).
foo(3).
foo(X) :- X > 4.
Strictly speaking, there is no need to have all the clauses together. This tends to be a mistake in large source files, where someone accidentally defines something with the same name as something that was defined previously. While the intention is to define a completely separate procedure, this ends up attaching something onto an existing procedure, which will have a very different effect. This is easy to miss and hard to debug. Interestingly enough, some Prolog engines (e.g., GNU Prolog) treat this as a warning, but will full-on discard the clauses at the end! Put clauses with the same name and arity next to each other in the source file. To fix the example:
foo(1).
foo(3).
foo(X) :- X > 4.
bar(2).

Errors

Since Prolog is dynamically typed, errors tend to show up only at runtime. Some of the error messages are extremely unhelpful. Worse yet, many type errors end up manifesting as unification failure, triggering backtracking in Prolog, whereas in say, Python, we might throw an exception. There is more information about the latter problem in the section titled “Using trace”; this section covers the sort of explicit error messages you might see.

Error Name What It Means Example How to Fix It
Arguments are not sufficiently instantiated Something was expecting an instantiated term, but was instead given an uninstantiated variable. For certain predicates, this is an error condition. One such predicate is is/2, which is used for performing arithmetic. is/2 cannot perform arithmetic on anything but numbers, so it does not know how to handle an uninstantiated variable.
X is 1 + Y. % Y is uninstantiated
If possible, ensure that the variables are instantiated before passing them along. In many situations, this is possible just by reordering things a bit.
Arithmetic: ... is not a function is/2 was passed something that was not a predefined arithmetic function described here. This is somewhat specific to SWI-PL, as some other engines will allow you to define your own operators.
X is foo.
Usually, this means that somehow, something that should be a number ended up getting unified with something else.

Using trace

trace is a powerful tool for letting you observe your code operating in action. trace allows you to see:

To use trace, simply issue the query trace before performing whatever queries you'd like to trace. For example:

trace.
foo(2, X). % will trace through the execution of `foo`

To demonstrate, consider the following code:

% -Num: Some number                                                             
% -Min: Minimum value, inclusive                                                
% -Max: Maximum value, inclusive                                                
%                                                                               
% Nondeterministically binds `Num` to all values between Min and Max            
allBetween(Num, Min, Max) :-
Min =< Max,
       Min = Num.
allBetween(Num, Min, Max) :-
       Min =< Max,
	      NewMin is Min +1,
	      allBetween(Num,NewMin,Max).

We run this code using trace like so:

?- trace.
true.

[trace]  ?- X = 2, allBetween(X, 0, 3).
   Call: (7) _G1752=2 ? creep
   Exit: (7) 2=2 ? creep
   Call: (7) allBetween(2, 0, 3) ? creep
   Call: (8) 0=<3 ? creep
   Exit: (8) 0=<3 ? creep
   Call: (8) 0=2 ? creep
   Fail: (8) 0=2 ? creep
   Redo: (7) allBetween(2, 0, 3) ? creep
   Call: (8) 0=<3 ? creep
   Exit: (8) 0=<3 ? creep
   Call: (8) _G1880 is 0+1 ? creep
   Exit: (8) 1 is 0+1 ? creep
   Call: (8) allBetween(2, 1, 3) ? creep
   Call: (9) 1=<3 ? creep
   Exit: (9) 1=<3 ? creep
   Call: (9) 1=2 ? creep
   Fail: (9) 1=2 ? creep
   Redo: (8) allBetween(2, 1, 3) ? creep
   Call: (9) 1=<3 ? creep
   Exit: (9) 1=<3 ? creep
   Call: (9) _G1883 is 1+1 ? creep
   Exit: (9) 2 is 1+1 ? creep
   Call: (9) allBetween(2, 2, 3) ? creep
   Call: (10) 2=<3 ? creep
   Exit: (10) 2=<3 ? creep
   Call: (10) 2=2 ? creep
   Exit: (10) 2=2 ? creep
   Exit: (9) allBetween(2, 2, 3) ? creep
   Exit: (8) allBetween(2, 1, 3) ? creep
   Exit: (7) allBetween(2, 0, 3) ? creep
X = 2 ;
   Redo: (9) allBetween(2, 2, 3) ? creep
   Call: (10) 2=<3 ? creep
   Exit: (10) 2=<3 ? creep
   Call: (10) _G1886 is 2+1 ? creep
   Exit: (10) 3 is 2+1 ? creep
   Call: (10) allBetween(2, 3, 3) ? creep
   Call: (11) 3=<3 ? creep
   Exit: (11) 3=<3 ? creep
   Call: (11) 3=2 ? creep
   Fail: (11) 3=2 ? creep
   Redo: (10) allBetween(2, 3, 3) ? creep
   Call: (11) 3=<3 ? creep
   Exit: (11) 3=<3 ? creep
   Call: (11) _G1889 is 3+1 ? creep
   Exit: (11) 4 is 3+1 ? creep
   Call: (11) allBetween(2, 4, 3) ? creep
   Call: (12) 4=<3 ? creep
   Fail: (12) 4=<3 ? creep
   Redo: (11) allBetween(2, 4, 3) ? creep
   Call: (12) 4=<3 ? creep
   Fail: (12) 4=<3 ? creep
   Fail: (11) allBetween(2, 4, 3) ? creep
   Fail: (10) allBetween(2, 3, 3) ? creep
   Fail: (9) allBetween(2, 2, 3) ? creep
   Fail: (8) allBetween(2, 1, 3) ? creep
   Fail: (7) allBetween(2, 0, 3) ? creep
false.

The above query is basically asking if 2 is between 0 and 3, albeit in a very inefficient way. The allBetween procedure ends up searching through every number between 0 and 2 before discovering that 2 = 2, resulting in success. The Fails in the above trace show it trying incorrect values, and the Redos in the above trace show it backtracking to make alternative decisions. After finally succeeding on 2, we can immediately see a Redo in the output, which occurs because the user asked for additional solutions via the use of semicolon. This causes it to check 3, which subsequently fails. At this point it runs out of alternative choices, leading ultimately to failure. This makes sense, since 2 is between 0 and 3 only once; if it had succeeded multiple times, then this would mean that 2 must be between these numbers multiple times, which doesn't make a whole lot of sense.

trace is nice because it tells us so much information about our code, but the problem is that sometimes this is way too much information. On larger programs, trace quickly becomes useless as it swamps the programmer with debugging information, most of which is irrelevant to whatever is beging debugged at the moment. For situations like this, there is another form of the trace command, which takes a specific prodecure to trace. For example, consider the following:

:- trace(foo).         % only the `foo` procedure is traced
:- some_long_query(X). % calls `foo` inside somewhere

With the above code, only the foo prodecure will be traced, and everything else will run as normal. This means that the only debugging output you'll get is that specific to foo, and nothing else. This will not even trace the internals of foo - only the initial call and return or failure of foo. If you want more debugging output than this, trace/1 can be issued multiple times, as with:

:- trace(foo).
:- trace(bar).
:- some_long_query(X). % calls `foo` and `bar` inside somewhere

With the above code, both the foo and bar procedures will be traced.

Approaches to Using trace

Based on my experiences, the version of trace that traces everything is usually way too much. Even on relatively small problems we can easily get megabytes of debugging output which must be read by hand, which is not feasible. A better approach is to piecewise trace through specific procedures you suspect to be buggy. That is, a good idea is to develop a hypothesis why you're code isn't working, and then test that hypothesis by performing calls to trace with surgical precision. At the very least, this will tell you if code is being called, which is actually very useful information; I have personally been surprised on many occasions to find some helper I had defined was never called, which was related to some subtle bug.

Using format/2

The format/2 built-in predicate is analogous to printf in C: it allows you to do formatted printing to the terminal. In many languages, using printf-like routines in key places during the debugging phase can quickly find errors. This is also true for Prolog, but the output results can get confusing due to nondeterminism. For example, consider the following code and query:

foo(1).
foo(2).
foo(3).

:- foo(X), format('~w~n', [X]).

The query in the above code calls the foo/1 procedure, which nondeterministically binds one of 1, 2, or 3 to X. The subsequent call to format/2 will print out the number followed by a newline. What's particularly interesting about the above code is that it actually prints all three numbers, despite the fact that the code's structure does not have any sort of looping built in. This is because nondeterministic implicitly puts loops into our code, and it is generally not at all obvious where these are going to be. This is just something to keep in mind if you start peppering your code with format/2 calls.