24/09/2014
Functional programming or Efficient programming?
The expressivity of functional programming is awesome: in a few lines, you can write complex functions. Plus it is much more secure than imperative programming. However this comes at a price, a performance price: everything has to be the result of a function call.
This means that if you want to compute both a shortest path between two node AND the cost of this path, you should theoretically call two different functions, that would do almost the exact same job. This is rudely inefficient and inelegant.
To overcome this issue, functional programming allows you to return tupples. The only catch with this approach is that the name of the function is less adequate: to be correct in the shortest path example it would be compute-shortest-path-and-associated-cost.
Imperative programming however does not elegantly solve this problem: to return multiple argument in C you have to use the arguments of the function - which is as simple and efficient that it is awful. On a side note in a logical programming language, i.e. Prolog, this is the only way possible to return arguments - no wonder I fucking hated it back at the uni.
In order to make my Java code better understandable, elegant and secure, I inserted in my graphical framework several usual features of functional programming language, absent in java. Notably: tupples and lambda functions. As a consequence I tend to use iterators more.
use of tupples
Tupples are used to return several argument from one function call, for instance again, a shortest path request. The alternatives would be to:
- return in the arguments (unthinkable: against code readability and clarity);
- store the relevant data (e.g. node parent lists and node costs) of the first function call in a global data structure which can be access by another function call (even more unthinkable: would be a total mess);
- returning a dedicated result structure for each function (unthinkable but a bit less: will bloat the framework with barely used classes);
- returning a hashtable containing the different information accessed by different keys (why not?).
A templated tupple structure is therefore a good compromise solution between the last two so-so solutions, it is a data structure that can be used by several functions, and the type information make it dedicated to the return data of this function.
However, because template have a precise length, it is necessary to create Tupple classes for each number of argument. We will have therefore Tupple2<K1, K2> for tupple of two objects, Tupple3<K1, K2, K3>, for tupples of three object, etc. In my implementation I stopped at three.
The only issue with it is that templated code is very verbose, but this is a problem of Java, not of this approach to tackling the problem.
use of lambda functions
The purpose of the support of lambda function is not to be fun to play around with when you don't know them but to provide code bricks that can be easily combined to perform complex and secure (and slow) operations.
In the framework lambda function have been used extensively to build the math.algebra section, whose most used member is the Semiring class.
This class is actually the very reason why lambda function where introduced: because the operation perform to optimize constraints, compute shortest path, make probabilistic inference, compute reachability all share the exact same mathematical structure, the one of a semiring, they just differ in the valuation and combination they use.
Therefore by providing a generic Semiring object we save a lot of coding by factoring it all in a few key functions and instantiations, this is exactly what I did in the framework.
The drawback of all this theoretical power is pragmatical slowness. I benchmarked simple usage of lambda functions (map/reduce + list/iterator + ad hoc lambda function) against the same functionality written with a proper loop (for + list/iterator + ad hoc loop code): lambda function execute about 3 times slower than the standard loop. This is why when I latter read that Scala was 3 times slower than Java I was everything but surprised: I knew exactly why.
the choice: lambda or not
I first knew about lambda calculus after I finished my studies. I was unofficially following a class about "mathematical foundations of logic" at Paris 7 university. This class blew my mind: we had an awesome teacher (Gille Dowek) for an awesome topic. (On a side note again, a very good profane read about this very topic is the Logicomix book).
Since then I wanted to study it further, and this meant implement it and play with it. This is what I did, and by thinking intensely in a functional way it gave me an even deeper understanding of what was going on in code, making structure apparent.
While the abstraction is powerful to handle an combine different pieces of code, it is performance-wise quite catastrophic. This is why I removed almost all usage of map/reduce loops in my code and replace them by straight loops. It was a bit of a hearthbreaker at the beginning, but I knew I was doing the right thing.
Now it is used in the code of the framework only in two place: one as a souvenir where it won't hinder performances, and one where the power of the abstraction is more interesting than speedup of regular loops as it positively strongly impact on code reusability, readability and quality.
In the end of the day, it is always setting the trade-off among what matters to you, but you need to have it clear first.
Subscribe to:
Posts (Atom)