Sunday, 16 November 2008

Another day, another blog.....

I have another blog over at wordpress now; a nice light theme to contrast with this one. :)

Tuesday, 22 April 2008

Python Homework

At work, Python is one of many languages we utilize. In fact, Python is one of the APIs we offer for accessing our services.

We generate our Python SDK using our (sort of) open source SDK generation system, which is hosted by our friends at osmosoft. You can find the SDK generation code here.

Anyway, there has been much debate about langauge proliferation within our group. As one of the most enthusiastic exponents of a polyglot programming, I perform quite a bit of evangelism within our group. Most recently, I've started the work of setting up a Python focus group, and in the course of doing so, have begun setting some "homework" for some of my keen fellow engineers.

Always ahead of the game, Kerry has already started blogging about some of these exploits. The second exercise is an interesting one. I'll discuss the thinking behind the exercise, followed by my solution.

Here's the exercise itself:

Write a recursive function that calculates the value of Fibonacci numbers. These are your acceptance criteria:

  • Calculating fib(250) must return 7896325826131730509282738943634332893686268675876375
  • The function must use recursion. No intermediary data structures, etc.
  • The implementation must be written in pure python – no C extension modules, that's cheating.
  • The function must calculate the 250th Fibonacci number in under one second. You will get extra points if:
  • You can also demonstrate a proof for the Reciprocal Fibonacci constant, meeting the following conditions
    • Your proof must also run in under one second
    • Your proof must not duplicate any of the concerns addressed by your original Fibonacci function implementation.
    • Your proof is allowed to call into the Fibonacci function though!
The reciprocal Fibonacci constant is defined here. Here's a couple of hints about things to look for:
  • Memoization
  • Function decorators

Well now. What's the point of this exercise?

The point here is to get people thinking about function composition. Re-use is so often quoted as being important in our work, and yet in our industry at large, we often see developers failing to apply even basic functional (de!)composition, let alone OOP and other high level programming paradigms.

Properly composed functions are a basic unit of abstraction in most languages, and Python is no exception.

Implementation

The performance requirements of the spec are really a red herring. The key to this exercise is to avoid duplicating the performance enhancing code, which is required to avoid the slowdown (and possible program failure) resulting from excessive recursion. I decided to abstract a simple caching algorithm as a function decorator, which is then applied to both recursive functions (fibonacci and the fib' constant implementation). You could use the 'memoized' decorator anywhere you have a function that is referentially transparent. Here's the code:

#!/usr/bin/env python

def memoized(fun):
  target = fun
  cache = {}
  def memoized(*args, **kwargs):
      if cache.has_key(args[0]):
          return cache[args[0]]

      result = target(*args, **kwargs)
      cache[args[0]] = result
      return result
  return memoized

@memoized
def fib(n):
  if n < 2: return n
  return fib(n - 1) + fib(n - 2)

@memoized
def fibConstant(n):
  if(n == 1):
      return (1.0 / fib(n))
  else:
      return (1.0 / fib(n)) + fibConstant(n - 1.0)

if __name__=='__main__':
  import profile
  profile.run('print fib(250)')
  profile.run('print fibConstant(93.0)')

Monday, 11 February 2008

I recently noticed this post and wanted to make some general comments about Erlang, functional programming and OO. I've decided to address some of these issues over the coming weeks, as much of this debate is raging in my workplace currently. But for now, I'd like to address a question raised some weeks ago by one of my co-workers. Along with some fellow Erlangers, I was demonstrating Erlang and OTP to selected members of our 60 strong engineering group during a planning meeting in Dublin. One of the attendees asked a telling question: "what about when things change." Reading between the lines, I could tell that my friend and colleague was really asking about the ease with which you can modify (and thereby maintain and potentially extend) Erlang code. His point of comparison was Java or, to be less specific, languages/platforms that support the Object Oriented paradigm.

The ways in which an object oriented language/platform facilitates change are many, but perhaps the best known of these is polymorphism. The ability to change behaviour at runtime by supplying an invocation target which shares a common, base interface, but overrides the target function with its own implementation: this "ability" is not unique to OO platforms at all. The mechanism used by languages such as C++, Java and .NET varies a little, but the terminology does not. In a langauge that uses manifest typing, such virtual resolution of an invocation target, relies on a mechnism known as dynamic binding (sometimes also called dynamic dispatch). Functional languages are not without support for changing their invocation targets at runtime, but in general, use a different set of mechanics to do so. The strongly typed, non-strict, functional programming langauge Haskell, for example, has full support for polymorphic types and functions, but the mechanics of polymorphism are wildly different to those used by imperative languages.

Another set of langauges also support polymorphism, without the use of parameterized types and type constraints found in Haskell and minus the implementation and/or interface inheritance requirements of C++, Java and the .NET family of languages. These dynamically typed languages include Python, Ruby, and Perl, to name but a few. Such langauges support the kind of ad-hoc polymorphism found in other imperative, Object Oriented langauges. It is both to this family of langauges, as well as the family of functional programming langauges, that Erlang belongs.

Erlang does support polymorphism, as do most other functional languages. Erlang is not strongly typed, it is dynamically typed; In this respect, Erlang is unlike the Algol familiy of langauges to which Java and C# belong. This difference is not as apparent as Erlang's functional heritage, but can be just as conceptually tricky to understand if you have little or no background in dynamic languages. The polymorphism offerred by dynamically typed langauges isn't based on the 'class' of a reference, but rather on the messages being passed; A concept sometimes called duck typing. Because Erlang's type system is not rooted in the mechanics of classification and implementation inheritance, it is sometimes difficult to see that duck typing is there, just beneath the surface. I suspect this is why some folks who're apparently from a dynamic-langauges background fail to see it.
The idea behind duck typing is that you can send a message to an object regardless of (the object's) type, as long as the callee can respond to this message. Here's a mind blowingly simple example:

#!/usr/bin/env python
# demo.py

class Foo:
    def quack(): print 'Foo quacks like a duck'

class Bar:
    def quack(): print 'Bar quacks like a duck'

def test(duck):
    duck.quack()

foo = Foo()
bar = Bar()
[ test(duck) for duck in [ foo, bar ] ]
You'll note that Foo and Bar aren't part of the same inheritance hierarchy! This doesn't matter here, as we only care that the message being sent is supported by the callee. Now to Erlang, which supports polymorphism like this in two ways. The first is simple, and involves calling into modules:
%%% here's our consumer
consumer(Mod) -> Mod:execute().
This consumer's use of Mod is polymorphic, given that you can pass any module (i.e. an atom which is a module name) which supports a function with signature execute/0. There's no type constraint here that stops us from passing in any old nonsense (like a list or number) instead of a module name. You can use a combination of pattern matching and guards to enforce some degree of type safety if you need it (which you will sometimes and wont others). The only constraint we can really apply here is that Mod is an atom, unless we create a guard function that tries to evaluate the atom and see if we can treat it like a module with a given function. For example:
%%% We want to ensure that ....

-module(demo).
-compile(export_all).

is_compatible_module(CallHandlerMod) -> lists:any(
    fun({call,4}) -> true;
       (_)        -> false
    end,
    CallHandlerMod:module_info(functions)
).

make_call(CallHandlerMod, CallData)
    when is_atom(CallHandlerMod) ->
        make_call(is_compatible_module(CallHandlerMod), CallHandlerMod, CallData).
make_call(true, CallHandlerMod, CallData) ->
    %% do some work, and then
    CallHandlerMod:call(13, CallData, "ignored", "deleted");
make_call(false, _, _) -> false.

Now let's consider receiving messages from other processes. When you send a message (asynchronously) to another process, you pass the message to a process identifier (Pid) using the send operator, like so:

SomePid ! { message, "a message..." }

You can also call a registered process simply by using the atom it was registered with. Unless we're working with a registered process, we'll typically be passed the Pid either as a function argument, like so:

someFunction({Name, SenderPid})
    ->
        %%% ... some more code....
        SenderPid ! {message, Data}.
or in an incoming message, where the sender passed its own Pid (self) like so:
TargetPid ! {self(), {message, Data}}
which is reused by the receiver later on:
someFunction() ->
    receive
        {SenderPid, {message, Data}}
            -> %%% some code...
            SenderPid ! {response, Response}
    end.
Note that there is no 'hard' contract between sender and reciever, except for the structure of the data being passed between the two. This isn't really much diferrent from the inter-module calls we made earlier. Here, instead of a module name passed in as the "Mod" argument, we have a process identifier with which we interact. Instead of a function signature consisting of modulename:function-name/arity, we have a signature made up entirely of the kind of data structure the callee will accept. If the caller passes, for example, the tuple {foo, bar}, then several things can happen:
  • If the callee understands this message, the call effectively succeeds. Just as with a polymorphic method call, we don't know what the implementing class will actually do!
  • If the callee ignores this message (there are several mechanisms for doing so), then the call is ignored. Note that this option (silent failure) is dependant on the server (e.g. the callee), not the client.
  • If the callee decides to explode on receiving unrecognisable mail, the usual trap_exit Erlang error handling dance ensues.

Another option, if the sender supplied their own Pid, is for the server (the callee) to respond with an {i_dont_understand_you} message, leaving the client to decide what's appropriate. This mechanism also allows for changing recievers dynamically. You can pass any process identifier you like (even changing the server Pid at runtime) and the client code doesn't need to change to accomodate this. This decoupling is far looser than a direct method call, even when it is dispatched polymorphically. Erlang systems are typically highly decoupled and this is not only due to the nature of asynchronous message passing, but also because of Erlang's dynamically typed nature.

In a subsequent post, we'll look at how inter (erlang) process messaging can be used to implement continuations.