25/07/2014

The Difficulties of Designing a Graph Library


I am currently building a graph library for the language Java, and as such I am faced with several tantalizing design questions, to which I'll try to give an overview here and the response I came up with.

Since my Master thesis till my PhD I have been working with my very own libraries to support my inference algorithms, which I kept expending, improving and redesigning, to a point I believe is ripe for diffusion and use.

I here review over some of the difficulties and choices one is faced in order to answer this fundamental and intricate question:

How to represent a graph?

while being:
- computationally efficient
- easily reusable by the end-users
- easily reusable by the developers

Here I am not talking about choosing a data structure to represent a graph ADT between an adjacency list graph, and incidence edge graph or an edge list graph.

Well, not only.

I mean how to design any implementation such that the three design constraints hold.

First and foremost: what is a graph?

Short and easy answer: It is a set of relations between objects.

Second: how to represent these relations?

We can have:
- for each object associated a set of all the related objects, this is an adjacency-based representation
- for each relation associated the set of objects within this relation, this is an incidence-based representation

Third: how to implement these representations?

adjacency-based representation:
- a list of objects for each objects: adjacency list
- a square matrix indexed by the objects: adjacency matrix

incidence-based representation:
- a list of objects for each relation: edge list
- a rectangular matrix indexed by the objects and the relations: incidence matrix

Fourth: where to store these information?

This is the actually the most fundamental question, because the whole design of the library will revolve around that. Where to store the information depends on how the information is used.

We could:
- store the list at the level of the objects: one list for each objects, nothing in the graph structure
- store the list at the level of the graph: nothing in the objects, one map object->list<object> in the graph structure

Fifth: how do we choose among all these options?

Implementation-wise, regarding representations and implementation, we don't: we should provide each representation, and let the users choose which one he wants to use according to its needs.

Usage-wise it's up to the performance for a particular usage, or the way a particular algorithm is expressed.

For instance Kruskal's algorithm for maximal spanning tree is based on edge list, Dijkstra's algorithm for shortest paths is based on adjacency list and the Floyd-Warshall algorithm for shortest paths is based on adjacency matrix.

No comments:

Post a Comment