Binary search is an algorithm for finding a target item in a sorted list. As it’s quite a simple algorithm, it makes a nice example for introducing a few features in Clojure.
First, a review of the algorithm. If you haven’t encountered binary search before, it’s a good idea to try it out yourself on paper using a sorted list with around 10 elements in it.
The input to the algorithm is a sorted list which we’ll call numlist (for simplicity, we’ll assume the list contains integers, but the algorithm generalizes easily to any type of data provided methods of ordering the data and testing for equality exist), and a target to be searched for. Then:
- Set lower to 0 and upper to (length of list – 1). (Lists, like arrays, are assumed to be indexed from 0.)
- Test if lower > upper. If so, target is not found, so return nil.
- Calculate mid = (lower + upper)/2 (using integer division).
- Find the data stored at location mid and call it midvalue.
- If midvalue > target then if target is in the list, it’s in the portion of the list below mid so set upper to mid – 1 and repeat from step 2.
- If midvalue < target then if target is in the list, it’s in the portion above mid so set lower to mid + 1 and repeat from step 2.
- If midvalue == target, then target has been found, so return mid as the location of target in the list.
The algorithm is naturally recursive, in that each time target isn’t found, the list gets chopped in half (or as near to half as integer division will allow) and the search continues in the truncated portion of the list. It is this halving process that makes binary search an efficient algorithm, with logarithmic behaviour. For a list of length 2^n, it will take at most n comparisons to either find target or prove it’s not present.
The Clojure code looks like this:
(defn binsearch [numlist target] (def uppercount (dec (count numlist))) (loop [lower 0 upper uppercount] (if (> lower upper) nil (let [mid (quot (+ lower upper) 2) midvalue (nth numlist mid)] (cond (> midvalue target) (recur lower (dec mid)) (< midvalue target) (recur (inc mid) upper) (= midvalue target) mid)))))
We’ll step through the code to explain the various features. On line 1, the function binsearch is defined, and it takes two arguments: numlist and target. numlist is a list rather than a single number, while target is a bare number. After loading the function into the REPL, it can be called with a line like
(binsearch '(1 3 4 5 6 7 8) 5)
Note that when giving a list as an argument to a function, you must prefix it with a single quote. This tells the REPL that the list is to be treated as a list of data, and not as a function. If you omit the quote, you’ll get an error stating that ‘1’ (the first element in the list) can’t be defined as a function (the actual error message is #<CompilerException java.lang.ClassCastException: java.lang.Integer cannot be cast to clojure.lang.IFn but that’s what it’s saying).
The output from the above call is just 3, since the target of 5 is found at location 3 in numlist.
Anyway, back to the code. On line 2, we define a variable uppercount as (dec (count numlist)). ‘count’ counts the number of elements in a list, and ‘dec’ decrements its argument by 1, so this statement assigns (size of list – 1) to the variable uppercount.
On line 3, we begin a loop. We discussed loops in an earlier post. Here we initialize lower and upper as instructed by the algorithm. You might wonder, by the way, why we used a separate ‘def’ statement to define uppercount and then use this variable to initialize upper. Why couldn’t we just initialize upper to (dec (count numlist)) directly in the loop’s parameter list? The answer seems to be that there is a bug in the Clojure compiler that complains about a mismatch in data types if you try to put a function call in a loop’s parameter list. At least I’m assuming it’s a bug – there doesn’t seem to be any reason why you shouldn’t be able to do this.
Line 5 implements step 2 in the algorithm, and terminates the function if target is not found. Line 6 opens a let statement, and calculates mid and midvalue. Note the use of the quot function, which calculates the integer quotient, discarding any remainder. Recall that variables defined within a let have a scope that is restricted to the let.
Line 7 uses the nth function to find element number mid in numlist. The ‘nth’ function takes two arguments: the first is a list and the second is an integer giving the index of the required element.
Line 8 starts a ‘cond’ statement, which is a bit like a ‘switch’ statement in Java. A ‘cond’ consists of a series of pairs. Within each pair, the first element must be a predicate (returning true or false), and the second element is code that is to be run if the first element is true. This ‘cond’ tests the 3 conditions in the algorithm above.
If midvalue > target, then we make a recursive call with upper replaced by mid – 1 (remember ‘dec’ from above). Recall that within a loop, the ‘recur’ causes a recursive call to the loop statement, not to the enclosing function, so the loop is re-entered on line 3, with lower unchanged and upper set to (dec mid).
Line 10 handles the case where midvalue < target in a similar way. Line 11 ends the recursion if target has been found, and returns mid as its location in the list.
The ‘cond’ function allows the use of an ‘:else’ clause which is called if all the earlier predicates return false. Thus we could replace (= midvalue target) mid on line 11 by :else mid.