Tuesday, August 31, 2010

Testing with FsCheck

Now, we will try to test the sort function from before using FsCheck

First, we download and build FsCheck. Then, in the F# interactive shell, (or an .fsx file), we run the following commands.

#I "/home/dave/lib/fsharp/fscheck"
#r "FsCheck.dll"

This sets the load path to where I built the FsCheck dll, and references the dll. The FsCheck Quick Start and documentation have a lot more information than I am about to give...

First, we need a property to test. Since this is a sort function, we would like to see that the data is sorted. We need a function that given an array of data, sorts it, and checks if it is sorted. To keep things easier for now, we will use an integer array. Also, I have learned the hard way to make the tests fail first.. so this first version will be a bad test.


let prop_is_sorted (data:int []) =
    let N = Array.length data
    [|1..N-1|]
    |> Array.forall (fun i ->
            data.[i] >= data.[i-1])
    

This function takes an array of data, and checks that each element should be after the one before it. But, we didn't sort yet, so it should fail on some data. Let's run it to find out.

> Check.Quick prop_is_sorted;;
Falsifiable, after 3 tests (3 shrinks) (StdGen (544801007,295317223)):
[|1; 0|]
val it : unit = ()

The test framework generated three different test cases. Two passed, and the third failed. Also, the framework shrank the failing test case three times, reducing the size of the data set that caused the failure. Shrinking doesn't always find a smaller data set, but when it does, it narrows the range of possible causes of the problem.

Let's update the test function to call our sort function, before checking if the array is sorted.

let prop_is_sorted (data:int []) =
    let N = Array.length data
    sort data
    [|1..N-1|]
    |> Array.forall (fun i ->
            data.[i] >= data.[i-1])

And running it gives us...

> Check.Quick prop_is_sorted;;
Ok, passed 100 tests.
val it : unit = ()

That is great! Except we don't know what kind of data really went into the tests... FsCheck provides ways to get some insight into the data going into the test. In this case, the default generators are probably doing a good enough job, but more complex cases will require more care.

A bad sort function could still pass this test. Perhaps a function that given an array of length N, returns an array with the numbers from 1 to N. We should write more tests on other properties of a sort function to make sure that our sort is doing what it is supposed to. In this case, we have a simple test we can use ... regression test against the system sort.

    
let prop_compare_with_system (data:int[]) =
    let a = Array.sort data // out of place
    sort data // in place
    a = data

This test passes again with 100 successes, and doesn't pass if we give a faulty sort function.

This type of randomized testing is a nice alternative to unit testing. In some cases, it is much less verbose, and it can generate test cases that I would never think of trying myself.

Now that we have some tests in place, we can implement some of the more advanced tweaks to the quicksort algorithm...

Thursday, August 26, 2010

Simple Quicksort in F#

Here is a very simple quicksort function in F#. Quicksort is a generally fast, general purpose sorting algorithm. It works well because it is a divide and conquer algorithm that does all of its work in place.

Quicksort implementations can be cache friendly because it keeps working on smaller and smaller portions of the data set. If the data is in an array, the computer won't have as many cache misses while the algorithm is running. In modern computers, memory bandwidth is one of the limiting factors in how fast things can go.

F# is a multi-paradigm language, which is a good thing for Quicksort. Although I like working with functional code, this code looks a lot nicer when we use mutable state and arrays.

This implementation does the bare minimum needed. The fanciest part of the code is the median-of-three pivot selection. The reason I included this pivot selection is because the rest of the code is simpler if we can assume that the first element of a subarray is smaller than the current pivot. The loops inside the main partition function get a little cleaner.

In an F# program, code must be defined before it is used. So, the main function here is at the bottom of the listing. To get a top down view, read from the bottom up, or vice versa.

The main algorithm consists of two steps (besides checking for empty subarrays)

  1. Pick a pivot element, and partition the sub-array into a chunk less than or equal to the pivot, and a chunk greater than or equal to the pivot.
  2. Recursively sort the subarrays.

The recursion stops when we have a subarray of length 1. In the code here, we have another case for when the subarray has length 2.

There are many other improvements that we can make to the basic algorithm. In the future I may implement some of them, and also, I would like to show how to test this code with FsCheck, a randomized test framework.

let inline swap (data:'a[]) a b =
    let t = data.[a]
    data.[a] <- data.[b]
    data.[b] <- t
    
/// Sort three values from an array.  Put the smallest
/// at index a, biggest at c, and median at b.    
let inline median_of_three (data:'a[]) a b c =
    if data.[b] < data.[a] then
        swap data a b
    if data.[c] < data.[b] then
        swap data c b
        if data.[b] < data.[a] then
            swap data a b 

/// partition the data from index a to b (inclusive)
/// use the element at position b as the pivot, and
/// assume data.[a] is <= data.[b].
let inline partition (data:'a[]) a b =
    // After this is called, return the new position
    // of the pivot value.  After the call, all values before
    // the pivot position should be <= to the pivot, and all values
    // after should be >= the pivot.
    let mutable i = a - 1
    let mutable j = b
    let pivot = data.[b]
    while i < j do
        i <- i + 1
        while data.[i] < pivot do
            i <- i + 1

        j <- j - 1
        while data.[j] > pivot do
            j <- j - 1
            
        if i < j then
            swap data i j
    // Now, i is >= j.  data.[i] is >= pivot.
    // data.[j] <= pivot.  So, move the pivot
    // to the middle where it belongs, and move
    // the value at i to the end of the array.
    swap data i b

    i

let inline sort (data:'a[]) =
    let rec loop a b =
        if a >= b then ()
        elif a + 1 = b then
            if data.[a] > data.[b] then swap data a b
        else
            let c = (a+b)/2
            median_of_three data a b c
            let p = partition data a b
            loop a (p-1)
            loop (p+1) b
    loop 0 (data.Length - 1)
Ideone.com | CBAMC

Saturday, August 21, 2010

GCD in F#

Now we will write a function to calculate the greatest common divisor of two numbers. The Euclidean Algorithm is an efficient, and ancient, method of finding the GCD.

Euclid's algorithm takes advantage of some properties of the gcd function and the modulus function. Here is a loose explanation of why the algorithm works

gcd(a,b) = gcd(b, a - q*b) for any integer q
a % b = a - q*b from some integer q,
so gcd(a,b) = gcd(b, a % b)

Then, we just stop the algorithm when b is zero. And now for the F# code.

let rec gcd a b =
    if b = 0 then a
    else
        gcd b (a % b)

Aside from the syntax, and checking for the end condition, this looks like the math.

And now, a few tests.

let f a b = 
    printfn "gcd(%d,%d) = %d" a b (gcd a b)
f 10 9
f (13*61*2*2*2*3) (13*2*3*5*7)
f -10 12345

With the output,

gcd(10,9) = 1
gcd(19032,2730) = 78
gcd(-10,12345) = 5
ideone.com/hHN7j

Monday, August 16, 2010

Using the sieve to factor numbers

Previously, we saw an overly simple implementation of the Sieve of Eratosthenes. Without any optimizations, that code is pretty much only good for puzzles and learning. Even for puzzles, it may be too slow. Some of the puzzle problems out there require code to find the prime factorization of numbers. For small numbers, trial division might be ok. Often, the problems require the factorization for every number less than one million. Then... it gets slow. Using the same idea as the Sieve, we can prepare an array that contains a prime factor for every number, or one if the number is prime. Then, to get the full factorization of the number, we can walk the array. Here is the code to create the array:
def divisors(N):
    """return the biggest prime divisor for numbers less than N"""
    result = [1 for x in xrange(N)]
    for d in xrange(2,N):
        if result[d] > 1:  continue
        for j in xrange(d, N, d):
                result[j] = d
    return result
 
Then, to find the factorization of a number, we can look it up in the array to find a factor, then look up (n/p) in the array to find the next factor, and keep going until we are down to a prime number.
divs = divisors(2000)
 
def factor(n, divisors=divs):
        result = []
        while n > 1:
                p = divisors[n]
                if p == 1:
                        result.append(n)
                        return result
                while n % p == 0:
                        result.append(p)
                        n /= p
        return result
        
print 1000, factor(1000)
print 24, factor(24)
print 17*31, factor(17*31)
 
Often, the factorization is used for things like computer Euler's totient function, and then we don't care to return the factors, just the result of using them. I used ideone.com to work on this code: Ideone.com | RLRZt

Saturday, August 14, 2010

Sieve of Eratosthenes

Here is a simple implementation of the Sieve of Eratosthenes in Python.

from math import sqrt
def sieve(N):
    """Sieve of Eratosthenes.  Returns primes less than N

    http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
    """
    # Set up the array.  2 is the first prime, not 1.
    ar = range(N)
    ar[1] = 0

    limit = 1 + int(sqrt( N))
    for d in xrange(2,limit):
        if ar[d]:
            # d is prime!  mark off the multiples of d.
            # note that we can start with d*d instead of 2*d.
            for j in xrange(d*d,N,d):
                ar[j] = 0
    return [x for x in ar if x]
There are several optimizations that people tend to do when they write this algorithm. But, this gets us most of the way. I keep rewriting this function for Project Euler problems, but I should really keep a library of the functions I use, or better yet, use Sage.

Friday, August 13, 2010

Simple prime test in F#

Here is the same algorithm as the last post , but in F#. There is no way to break out of a for loop in F#, besides throwing an exception, so this code uses recursion instead.
let is_prime n =
    if n < 2 then false
    else
        let limit = float n |> sqrt |> fun x -> x + 1. |> int
        let rec loop d =
            if d = limit then true
            elif n % d = 0 then false
            else loop (d + 1)
        loop 2

> [1..100] |> List.filter is_prime;;
val it : int list =
  [2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41;
   43; 47; 53; 59; 61; 67; 71; 73; 79; 83; 89; 97]

Thursday, August 12, 2010

Simple prime test in Python

A simple way to test if a number is prime is to check if the number is divisible by any other number besides one and itself. This can take a lot of time, but for small numbers, it might be ok.
def is_prime(n):
   # assumes n > 0
   if n < 2: return True
   for d in range(2,n):
       if n % d == 0: return False 
   return True
This simple function can be sped up by not checking as many numbers. We don't need to test all divisors up to n-1. The largest possible divisor is n/2, so we could stop there. But, even better, we can show that if there are any divisors, there must be divisor less than the sqrt(n). If p*q = n, then either p or q must be less than or equal to sqrt(n). Otherwise, p*q would be bigger than n. So, we can just loop up to sqrt(n) because we only need to find a divisor of n, not every divisor. Here is the updated code:
import math
def is_prime2(n):
    if n < 2: return False
    for d in xrange(2, int(math.sqrt(n)) + 1):
        if n % d == 0: return False
    return True

>>> [x for x in xrange(40) if is_prime2(x)]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]

Finally, we could use the same code to find a divisor of a number. In fact, it finds the smallest divisor. Some of the more advanced primality tests do not determine any of the factors of the number.
def is_prime3(n):
    """returns the smallest divisor of n.  
       returns 1 if the number is prime, or less than 2"""
    if n < 2: return 1
    for d in xrange(2, int(math.sqrt(n)) + 1):
        if n % d == 0: return d
    return 1