Binary search is an algorithm for finding a target item in a sorted list. As it’s quite a simple algorithm, it makes a nice example for introducing a few features in Clojure.

First, a review of the algorithm. If you haven’t encountered binary search before, it’s a good idea to try it out yourself on paper using a sorted list with around 10 elements in it.

The input to the algorithm is a sorted list which we’ll call **numlist** (for simplicity, we’ll assume the list contains integers, but the algorithm generalizes easily to any type of data provided methods of ordering the data and testing for equality exist), and a **target** to be searched for. Then:

- Set
**lower**to 0 and**upper**to (length of list – 1). (Lists, like arrays, are assumed to be indexed from 0.) - Test if
**lower**>**upper**. If so,**target**is not found, so return**nil**. - Calculate
**mid**= (**lower**+**upper**)/2 (using integer division). - Find the data stored at location
**mid**and call it**midvalue**. - If
**midvalue > target**then if**target**is in the list, it’s in the portion of the list below**mid**so set**upper**to**mid**– 1 and repeat from step 2. - If
**midvalue < target**then if**target**is in the list, it’s in the portion above**mid**so set**lower**to**mid**+ 1 and repeat from step 2. - If
**midvalue == target**, then**target**has been found, so return**mid**as the location of**target**in the list.

The algorithm is naturally recursive, in that each time **target** isn’t found, the list gets chopped in half (or as near to half as integer division will allow) and the search continues in the truncated portion of the list. It is this halving process that makes binary search an efficient algorithm, with logarithmic behaviour. For a list of length 2^n, it will take at most n comparisons to either find **target** or prove it’s not present.

The Clojure code looks like this:

(defn binsearch [numlist target] (def uppercount (dec (count numlist))) (loop [lower 0 upper uppercount] (if (> lower upper) nil (let [mid (quot (+ lower upper) 2) midvalue (nth numlist mid)] (cond (> midvalue target) (recur lower (dec mid)) (< midvalue target) (recur (inc mid) upper) (= midvalue target) mid)))))

We’ll step through the code to explain the various features. On line 1, the function **binsearch** is defined, and it takes two arguments: **numlist** and **target**. **numlist** is a list rather than a single number, while **target** is a bare number. After loading the function into the REPL, it can be called with a line like

(binsearch '(1 3 4 5 6 7 8) 5)

Note that when giving a list as an argument to a function, you must prefix it with a single quote. This tells the REPL that the list is to be treated as a list of data, and not as a function. If you omit the quote, you’ll get an error stating that ‘1’ (the first element in the list) can’t be defined as a function (the actual error message is #<CompilerException java.lang.ClassCastException: java.lang.Integer cannot be cast to clojure.lang.IFn but that’s what it’s saying).

The output from the above call is just 3, since the **target** of 5 is found at location 3 in **numlist**.

Anyway, back to the code. On line 2, we define a variable **uppercount** as **(dec (count numlist))**. ‘count’ counts the number of elements in a list, and ‘dec’ decrements its argument by 1, so this statement assigns (size of list – 1) to the variable **uppercount**.

On line 3, we begin a **loop**. We discussed loops in an earlier post. Here we initialize **lower** and **upper** as instructed by the algorithm. You might wonder, by the way, why we used a separate ‘def’ statement to define **uppercount** and then use this variable to initialize **upper**. Why couldn’t we just initialize **upper** to **(dec (count numlist))** directly in the loop’s parameter list? The answer seems to be that there is a bug in the Clojure compiler that complains about a mismatch in data types if you try to put a function call in a loop’s parameter list. At least I’m assuming it’s a bug – there doesn’t seem to be any reason why you shouldn’t be able to do this.

Line 5 implements step 2 in the algorithm, and terminates the function if **target** is not found. Line 6 opens a **let** statement, and calculates **mid** and **midvalue**. Note the use of the **quot** function, which calculates the integer quotient, discarding any remainder. Recall that variables defined within a **let** have a scope that is restricted to the **let**.

Line 7 uses the **nth** function to find element number **mid** in **numlist**. The ‘nth’ function takes two arguments: the first is a list and the second is an integer giving the index of the required element.

Line 8 starts a ‘cond’ statement, which is a bit like a ‘switch’ statement in Java. A ‘cond’ consists of a series of pairs. Within each pair, the first element must be a predicate (returning true or false), and the second element is code that is to be run if the first element is true. This ‘cond’ tests the 3 conditions in the algorithm above.

If **midvalue > target**, then we make a recursive call with **upper** replaced by **mid** – 1 (remember ‘dec’ from above). Recall that within a loop, the ‘recur’ causes a recursive call to the loop statement, not to the enclosing function, so the loop is re-entered on line 3, with **lower** unchanged and **upper** set to **(dec mid)**.

Line 10 handles the case where **midvalue < target** in a similar way. Line 11 ends the recursion if **target** has been found, and returns **mid** as its location in the list.

The ‘cond’ function allows the use of an ‘:else’ clause which is called if all the earlier predicates return false. Thus we could replace **(= midvalue target) mid** on line 11 by **:else mid**.