Life Codecs @ NamingCrisis.net

Ruminations. Reflections. Refractions. Code.

Jan 26, 2010 - how to software dev

fold / reduce

Recently, I’ve been messing around with a fair bit of Groovy, a little bit of Scala, and a background-favourite (whatever that means), Python.

All 3 languages have functional traits, one functional idea is reduction where a collection of items is operated upon and finally reduced (hence the name) into a single value. From a practical viewpoint, this could be an overall analysis derived for all items in that collection – a sales percentage, a sum, or some other statistic. I am not a functional programming guru, so apologies for the rudimentary description. I was curious to see how each language handles it, and how I might write an implementation of reduce().

In particular I came across Scala’s List.foldLeft(). It also has a corresponding foldRight (and both functions have shorter operator notations :/ and :\ — which look like emoticons as I type this). Right and left determine where in the list it starts converging/reducing. Left starts from the start of the list (if you imagine the list horizontally rather than vertically, with the first/top most item on the left), while right starts at the bottom/end of the list. Depending on values in the list, and/or operations performed on them, the traversal order could obviously impact the result. Thus we note that fold-left and fold-right are basically types of reduce.

On to some code. I’ll keep it simple, albeit a little boring and predictable, the reduction will yield a sum of all elements in the list (yawn…). We’ll see some interesting properties of this simplicity however, and why it matters. I’ll start off with using the stock API for each language, and then roll my own. In all cases, essentially the following happens:

  • Decide on a collection of items (usually repeatably ordered!) to reduce
  • Apply the fold-left/reduce call which takes in an initial value to be used as we wish, in our case it will be a total result, initialised to zero
  • The fold-left/reduce call also takes a closure which is called for each item in the collection. The closure must return/yield a single value each time it is called (i.e. for each item in the collection), and takes 2 parameters, the running total and the current item being iterated over in the collection. I say ‘running total’ for our example, but more generally: for the first item in the collection, it is passed in our initial value, for subsequent items it is passed in the last ‘reduced’ value, i.e. the value that it yielded in the previous iteration.
  • A value is finally yielded and returned to the caller after the last item in the collection has been visited. In our case, the total.

Scala:

1
assert(List(1, 2, 3).foldLeft() { (total, item) => total + item } == 6)

Groovy:

1
2
3
4
5
6
7
assert [1, 2, 3].inject() { total, item -> total + item } == 6
/*
Note: inject() is on the (GDK) Collection supertype, java.lang.Collection.inject(), see:
http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html#inject(java.lang.Object value, groovy.lang.Closure closure)
This is interesting, because a HashSet is a collection, and it's ordering is arbitrary, which means
you want to be careful if your collection is a Set, and your operation expects an ordering!
*/

Python:

1
2
# reduce(func, sequence, init)
assert reduce(lambda total, item: total + item, [1, 2, 3], ) == 6

A Few Notes on Syntax (skip this section as needed):

For the Scala and Groovy versions, note the syntactic sugar, both inject() and foldLeft() are functions that take 2 parameters, the initial value and a closure. It’s just that the closure can be defined as if it is ‘attached’ or ‘hooked to’ to the function rather than passed in, rather nice. With Groovy, the closure arg has to be the last one to allow this syntax (otherwise you can pass it in ‘foo(“bar”, { /*closure */ }, “meow”)’ like a regular parameter. In Scala, it depends on how the closure parameter is defined – in the case of foldLeft it is in its own parameter list, and so is not part of the initial set of args. I’m still a Scala n00b however, so I’ll stop there!

The Python case looks a little funnier, I am using the lambda form to create a new anonymous function, I could have also defined a function in the scope, and passed its name in as the first argument. This would be needed if I had something more involved in the function which the lambda form could not accomodate or accomodated hell-ishly (its body has to be a single expression only, no statements). Python doesn’t support more complicated anonymous functions as a language construct (there are code-level hacks though).

Implementing fold-left:

Now let’s try re-implementing fold-left using Python for clarity. Python has the idea of ‘sequences’ – positionally-ordered collections (from Learning Python 3E) – lists, strings, and tuples. However I am not sure why there’s no specific ‘sequence’ type, because this would have allowed setting a foldLeft/reduce call on this supertype leading to a more OO feel where you could do sequence.foldLeft(…). Instead the approach is much more procedural as you saw above. I’ll stick to the procedural approach for my implementation so arbitrary sequences can be passed in generically. I have not had a look at the source code for foldLeft in the languages above, I shall soon, however here’s my go at it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def foldLeft(sequence, reducefunc, initval):
    if not sequence: return initval
    for i, item in enumerate(sequence):
        return foldLeft(sequence[i+1:], reducefunc, reducefunc(initval, item))

def foldLeftMutable(sequence, reducefunc, initval):
    if not sequence: return initval
    reduced = initval
    for item in sequence:
        reduced = reducefunc(reduced, item)
    return reduced
 
# Test it.  
adderfunc = lambda total, item: total+item
# immutable version    
assert foldLeft([1, 2, 3, 4, 5, 6], adderfunc, ) == 21
assert foldLeft([], adderfunc, ) ==
assert foldLeft(None, adderfunc, ) ==
# mutable version
assert foldLeftMutable([1, 2, 3, 4, 5, 6], adderfunc, ) == 21
assert foldLeftMutable([], adderfunc, ) ==
assert foldLeftMutable(None, adderfunc, ) ==

There are 2 versions of the fold-left functions above, foldLeft() and foldLeftMutable(). The latter holds mutable state, albeit localised to the function (the localvar reduced), so probably no big deal (considered a non-functional implementation though?). Both functions have O(N) complexity1. I initially thought I had something exponential for the recursive version, but then realised that I was returning at each iteration, phew! Of course, O(N) also assumes that reducefunc here is O(1)! foldLeft() though holds zero mutable state even within the function. The sequence is sliced resulting in a new sequence each time before the recursive call. This is probably overkill, resulting in new objects each time – but kinda cool anyway from an immutability viewpoint. Needless to say that immutability is based on the assumption that items in the sequence are also immutable, or if mutable, are not being messed around with by the passed in reducefunc argument (in this case adderfunc). But as far as the direct objects that foldLeft() deals with – the sequence, initval, and reducefunc – it in no way changes them.

foldLeft() is, however, I think tail-recursive (another thing I need to read up on!) so it could just be converted to a plain iterative version by the bytecode compiler which would result in something similar to foldLeftMutable(), any specific comments on this would be highly appreciated. Anyway, bed time.

Footnotes:

1 I just realised that slicing the list to produce a new list is likely to be O(N) rather than a constant time operation, which would invalidate my claim that the recursive version is also O(N), grrrr. Now I really am off to sleep.