Tag Archives: Clojure loops

Binary search in Clojure

Binary search is an algorithm for finding a target item in a sorted list. As it’s quite a simple algorithm, it makes a nice example for introducing a few features in Clojure.

First, a review of the algorithm. If you haven’t encountered binary search before, it’s a good idea to try it out yourself on paper using a sorted list with around 10 elements in it.

The input to the algorithm is a sorted list which we’ll call numlist (for simplicity, we’ll assume the list contains integers, but the algorithm generalizes easily to any type of data provided methods of ordering the data and testing for equality exist), and a target to be searched for. Then:

  1. Set lower to 0 and upper to (length of list – 1). (Lists, like arrays, are assumed to be indexed from 0.)
  2. Test if lower > upper. If so, target is not found, so return nil.
  3. Calculate mid = (lower + upper)/2 (using integer division).
  4. Find the data stored at location mid and call it midvalue.
  5. If midvalue > target then if target is in the list, it’s in the portion of the list below mid so set upper to mid – 1 and repeat from step 2.
  6. If midvalue < target then if target is in the list, it’s in the portion above mid so set lower to mid + 1 and repeat from step 2.
  7. If midvalue == target, then target has been found, so return mid as the location of target in the list.

The algorithm is naturally recursive, in that each time target isn’t found, the list gets chopped in half (or as near to half as integer division will allow) and the search continues in the truncated portion of the list. It is this halving process that makes binary search an efficient algorithm, with logarithmic behaviour. For a list of length 2^n, it will take at most n comparisons to either find target or prove it’s not present.

The Clojure code looks like this:

(defn binsearch [numlist target]
  (def uppercount (dec (count numlist)))
  (loop [lower 0
        upper uppercount]
    (if (> lower upper) nil
      (let [mid (quot (+ lower upper) 2)
            midvalue (nth numlist mid)]
        (cond
          (> midvalue target) (recur lower (dec mid))
          (< midvalue target) (recur (inc mid) upper)
          (= midvalue target) mid)))))

We’ll step through the code to explain the various features. On line 1, the function binsearch is defined, and it takes two arguments: numlist and target. numlist is a list rather than a single number, while target is a bare number. After loading the function into the REPL, it can be called with a line like

(binsearch '(1 3 4 5 6 7 8) 5)

Note that when giving a list as an argument to a function, you must prefix it with a single quote. This tells the REPL that the list is to be treated as a list of data, and not as a function. If you omit the quote, you’ll get an error stating that ‘1’ (the first element in the list) can’t be defined as a function (the actual error message is #<CompilerException java.lang.ClassCastException: java.lang.Integer cannot be cast to clojure.lang.IFn but that’s what it’s saying).

The output from the above call is just 3, since the target of 5 is found at location 3 in numlist.

Anyway, back to the code. On line 2, we define a variable uppercount as (dec (count numlist)). ‘count’ counts the number of elements in a list, and ‘dec’ decrements its argument by 1, so this statement assigns (size of list – 1) to the variable uppercount.

On line 3, we begin a loop. We discussed loops in an earlier post. Here we initialize lower and upper as instructed by the algorithm. You might wonder, by the way, why we used a separate ‘def’ statement to define uppercount and then use this variable to initialize upper. Why couldn’t we just initialize upper to (dec (count numlist)) directly in the loop’s parameter list? The answer seems to be that there is a bug in the Clojure compiler that complains about a mismatch in data types if you try to put a function call in a loop’s parameter list. At least I’m assuming it’s a bug – there doesn’t seem to be any reason why you shouldn’t be able to do this.

Line 5 implements step 2 in the algorithm, and terminates the function if target is not found. Line 6 opens a let statement, and calculates mid and midvalue. Note the use of the quot function, which calculates the integer quotient, discarding any remainder. Recall that variables defined within a let have a scope that is restricted to the let.

Line 7 uses the nth function to find element number mid in numlist. The ‘nth’ function takes two arguments: the first is a list and the second is an integer giving the index of the required element.

Line 8 starts a ‘cond’ statement, which is a bit like a ‘switch’ statement in Java. A ‘cond’ consists of a series of pairs. Within each pair, the first element must be a predicate (returning true or false), and the second element is code that is to be run if the first element is true. This ‘cond’ tests the 3 conditions in the algorithm above.

If midvalue > target, then we make a recursive call with upper replaced by mid – 1 (remember ‘dec’ from above). Recall that within a loop, the ‘recur’ causes a recursive call to the loop statement, not to the enclosing function, so the loop is re-entered on line 3, with lower unchanged and upper set to (dec mid).

Line 10 handles the case where midvalue < target in a similar way. Line 11 ends the recursion if target has been found, and returns mid as its location in the list.

The ‘cond’ function allows the use of an ‘:else’ clause which is called if all the earlier predicates return false. Thus we could replace (= midvalue target) mid on line 11 by :else mid.

Loops in Clojure

We saw in the post on recursion in Clojure that the language doesn’t support any loops in the conventional sense. Rather, iteration must be done using recursion, with tail recursion the favoured mechanism. It may come as a surprise that Clojure contains a function called loop. It’s a bit cheeky, since the Clojure loop is really just a shorthand way of doing recursion. To see how this works, first recall the implementation of the power function (for raising a number x to an integer power y) we provided earlier:

(defn power
  ([x y] (power x y 1))
  ([x y current]
  (if (= y 0)
    current
    (if (> y 0)
      (recur x (- y 1) (* x current))
      (recur x (+ y 1) (/ current x))))))

The recursion consists of two steps: an initialization step in which the recursive part of the function is called, with current set to 1 as a starting point, and the recursive part of the function in which the answer is built up by successively multiplying or dividing (depending on whether y is positive or negative) current by x.

The loop function allows us to merge the initialization step into the recursive part of the function, so that only one argument list is required, rather than the two we used above. Let’s see how power is rewritten using loop.

(defn power
  [x y]
  (loop [exponent y
         current 1.0]
    (if (= exponent 0)
      current
      (if (> exponent 0)
        (recur (- exponent 1) (* x current))
        (recur (+ exponent 1) (/ current x))))))

We see that this version of power has a single argument list, taking as input x and y. The loop begins on line 2. A loop’s first argument is a vector (in square brackets) of symbol-value pairs. For each pair, the first element is bound to the second element. So in this example, exponent is bound to the value of y, and current is bound to 1.0.

The second argument of a loop is a statement that is run. Apart from the use of the symbol exponent in place of y, this code is identical to our earlier version. The key point is that when a recur function is encountered, the recursion starts again with the statement within the loop, not with the function that contains the loop. Thus here, the recursion reverts back to line 5, not line 1.

The loop has a superficial resemblance to a for loop in Java. The loop begins with the call to loop, and the pairs in the initialization vector are like the initialization condition in a for loop. The statement following the initialization is like the body of statements in a for loop. However, whereas a for loop has a specific termination condition stated in its first line, a loop can terminate only if the statement it runs recursively contains an anchor step that stops the recursion. In our example here, that condition is the test to see if exponent is zero on line 5. Just as it is possible to write an infinite loop in Java, so it is possible in Clojure if there is no anchor step, or the anchor step is never reached. As always, it’s up to the programmer to ensure this doesn’t happen.

We can expand this example a bit by applying the power function in another function. A method originally due to Newton for finding the nth root of a number goes like this. To find the nth root of number to within a tolerance tol:

  1. Make an initial guess of 1.0 for the root.
  2. Calculate guess to the nth power.
  3. Calculate the absolute value of the difference between number and guess to the nth power.
  4. If this difference is less than tol, return guess as the answer.
  5. If not, take the average of guess and number/(guess^(n-1)) as the next guess and repeat from step 2.
To implement this algorithm, we’ll need a few auxiliary functions; one for calculating the absolute value, one for calculating the average of two numbers and one for checking if a guess is within the tolerance. None of these is recursive, so we can just give their code:
(defn abs
  "Absolute value of argument"
  [n]
  (if (< n 0)
    (* n -1)
    n))

(defn avg
  "Average of two arguments"
  [a b]
  (/ (+ a b) 2))

(defn within-tol?
    "Is guess^n within tolerance of number?"
  [number n tol guess]
  (let [diff (- (power guess n) number)]
    (if (< (abs diff) tol)
      true
      false)))

Hopefully, these are fairly straightforward so we don’t need to comment on them. (OK, there are probably more succinct ways in which these could be implemented in Clojure, but I’m sticking to the elementary techniques we’ve looked at so far.)

One thing is worthy of note, however. Note that the name of the function for testing the tolerance ends with a question mark. Clojure allows more characters to be part of symbol names than most languages. The rules are

  • Symbol names may contain any alphanumeric character (although they cannot begin with a number).
  • Names may also contain the characters * + ! – _ and ?
  • A colon : is also allowed, but not at the beginning or end of the name, and it is not allowed to be repeated within the name. (Actually, colons are probably best avoided…)

For example, a function that returns a boolean result can end in a question mark, since it is determining whether some condition is true or false. The within-tol? function above does just that.

Anyway, with these functions in place, we can write the main function that uses a loop to calculate the nth root of a number:

(defn nth-root
  "Calculates the nth root of number to a tolerance tol"
  [number n tol]
  (loop [guess 1.0]
    (if (within-tol? number n tol guess)
      guess
      (recur (avg guess (/ number (power guess (- n 1))))))))

The function takes 3 arguments: the number, the value of n that specifies which root is to be calculated, and the tolerance to which we want the answer. The loop initializes guess to 1.0. The body of the loop first checks to see if the guess is within the tolerance (this is the anchor step that will stop the recursion), and, if so, returns the current guess as the answer.

If the answer isn’t yet good enough, a recursive call is made to the loop with the new guess as specified in the algorithm above.

As you might expect, the program takes longer to run as you either increase n or decrease the tolerance tol. This is due to the mathematics behind the algorithm: Newton’s method isn’t the most efficient method for finding roots if a high precision is needed. However, it’s a good exercise in recursion.