Notes: July 2014

25/07/2014

The Difficulties of Designing a Graph Library

I am currently building a graph library for the language Java, and as such I am faced with several tantalizing design questions, to which I'll try to give an overview here and the response I came up with.

Since my Master thesis till my PhD I have been working with my very own libraries to support my inference algorithms, which I kept expending, improving and redesigning, to a point I believe is ripe for diffusion and use.

I here review over some of the difficulties and choices one is faced in order to answer this fundamental and intricate question:

How to represent a graph?

while being:
- computationally efficient
- easily reusable by the end-users
- easily reusable by the developers

Here I am not talking about choosing a data structure to represent a graph ADT between an adjacency list graph, and incidence edge graph or an edge list graph.

Well, not only.

I mean how to design any implementation such that the three design constraints hold.

First and foremost: what is a graph?

Short and easy answer: It is a set of relations between objects.

Second: how to represent these relations?

We can have:
- for each object associated a set of all the related objects, this is an adjacency-based representation
- for each relation associated the set of objects within this relation, this is an incidence-based representation

Third: how to implement these representations?

adjacency-based representation:
- a list of objects for each objects: adjacency list
- a square matrix indexed by the objects: adjacency matrix

incidence-based representation:
- a list of objects for each relation: edge list
- a rectangular matrix indexed by the objects and the relations: incidence matrix

Fourth: where to store these information?

This is the actually the most fundamental question, because the whole design of the library will revolve around that. Where to store the information depends on how the information is used.

We could:
- store the list at the level of the objects: one list for each objects, nothing in the graph structure
- store the list at the level of the graph: nothing in the objects, one map object->list<object> in the graph structure

Fifth: how do we choose among all these options?

Implementation-wise, regarding representations and implementation, we don't: we should provide each representation, and let the users choose which one he wants to use according to its needs.

Usage-wise it's up to the performance for a particular usage, or the way a particular algorithm is expressed.

For instance Kruskal's algorithm for maximal spanning tree is based on edge list, Dijkstra's algorithm for shortest paths is based on adjacency list and the Floyd-Warshall algorithm for shortest paths is based on adjacency matrix.

05/07/2014

Fan control on ASUS UX32VD under linux

Out of the box, the fan is either full speed or off. This is very annoying, but there exists several practical solutions. I here report my experience and how you can safely and easily fix the issue.

My configuration is an ASUS ux32vd with first ubuntu 13.10 and now 14.04.

In a nutshell

the problem disapear after running the provided program for a few days.

1) Introduction

The most comprehensive thread on the net about the issue is a very interesting read for technical details, but that you may want to skip in order to just get a working solution. In a nutshell it is possible to control the fan speed by writing in the memory of the embedded controller, however not doing it properly through the ACPI interface is hazardous.

Some integrated work around have been developed: among the most complete I found on the internet are one solution based on a kernel module, and here is another solution based on a shell script and acpi_call. They unfortunately lack documentation which can be an obstacle for the layman. I tried, used and modified the last one by Emil Lind, which is based on a shell script and the acpi_call module. I here report more information about how to properly use it, as well as my own modification.

2) What you need to know

First it has to be noted that contrary to what has been written elsewhere, what is controlled by writing to the embedded controller memory through ACPI calls is not the actual speed of the fan, but the maximal speed of the fan.

Secondly, the danger of running these programs is of not exiting them in a safe state, i.e. not to restore the automatic control of the fan and a maximal fan speed when they exit, which could theoretically bring the laptop to overheat.

If you have this in mind an ascertain that each time you use it it leaves properly you can safely use this program.

Thirdly, it has to be noted that after experimenting with this program, the problem disappeared: the fan where able to react proportionally to heat without any ad-hoc fan control loop in the background. So you can expect experimenting the same after a while.

3) How you can fix the fan

I here provide a heavily modified version of the shell program asusfanctrld of Emil Lind: it has a control loop which automatically switches between a set of presets RPM temperature thresholds. The thresholds have been determined empirically in order to minimise the switching between states and provide and overall unnoticeable and cool experience ;) The real fix is the following: for an unknown reason after a few days of use of the following program, the problem disapear by itself. It may reappear if you boot windows. In which case, running again the provided program will fix the issue.

3.1) In more details:
there are four predefined temperature thresholds and four predefined fan RPM values. When a threshold is hit from the upper of lower values, the fan speed is set to the corresponding RPM until another threshold, lower or higher, is reached. ACPI commands are written through the special device /proc/acpi/call provided by the acpi_call module kernel module that must be installed.

3.2) Install:
- step 1 : the acpi_call module has to be downloaded, compiled and loaded (from a root shell: insmod acpi_call.ko) prior to running the fan control loop, which is the most complicated task to be done to get a silent zenbook.
- step 2 : the following shell script as to be copied/pasted to a file, let us call it asusfanctrlnico.sh

3.3) Usage:
The script has to be run as root, leave it running in a windows to check everything runs smoothly until you either get confident enough in it or the fan problems goes away. In order to allow the hard drive to sleep you need to comment out the file logging commands in the main function.

It is not meant to be used by startup scripts. It however can with the appropriate kill commands added to the init.d or start-stop-daemon services (which I didn't as the problem went away after using the program for some days).

3.4) The code of asusfanctrlnico.sh :

ACPI_CALL=/proc/acpi/call

# average temperature without control:
# sleep ~ 27 - 30
# on, idle ~ 45 - 50
# internet browsing, text editing ~ 50 - 58
# temperature threshold (Celsius)

HIGH=70
MID=60
LOW=55
LOWEST=50
MARGIN=1

# corresponding fan RPM
LOWESTRPM=0x35
LOWRPM=0x50
MIDRPM=0x70
HIGHRPM=0xFF

DEBUGACPI=0
DEBUGBEHV=1

ME="$(basename $0)"

LOGFILE=asusfanlog
LOGFILETEMP=fantemplog

fatal() {
    logger -id -t $ME -s "FATAL: $@"
}
info() {
    logger -id -t $ME "INFO: $@  `date +"%H %M %S %s"`"
    echo "INFO $@  `date +"%H %M %S %s"`" >> $LOGFILE
    if [ "$DEBUGBEHV" -gt 0 ]; then
        echo ""
    fi
    echo -en "INFO $@  `date +"%H %M %S %s"`\n"
}
debug_acpi() {
    if [ $DEBUGACPI -gt 0 ]; then
        echo "DEBUG: $@" >> $LOGFILE
    fi
}
debug_behaviour() {
    if [ $DEBUGBEHV -gt 0 ]; then
        echo "DEBUG: $@" >> $LOGFILE
    fi
}


if ! [ -e $ACPI_CALL ]; then
    modprobe acpi_call
fi
if ! [ -e $ACPI_CALL ]; then
    fatal "You must have acpi_call kernel module loaded."
    fatal modprobe acpi_call
    exit 1
fi

if [ "$(id -u)" != "0" ]; then
    echo "Must be run as root."
    exit 1
fi

set_fanspeed() {
    RPM=$1
    command='\_SB.PCI0.LPCB.EC0.ST98 '$1
    echo "$command" > "${ACPI_CALL}"
    debug_acpi $(cat "${ACPI_CALL}")
}

set_fanstate() {
    current=$STATE
    asked=$1
    if [ "$current" != "$asked" ]; then
        info "reset fan state: $current => $asked"
        if [ "$asked" = "high" ]; then
            set_fanspeed $HIGHRPM
        elif [ "$asked" = "mid" ]; then
            set_fanspeed $MIDRPM
        elif [ "$asked" = "low" ]; then
            set_fanspeed $LOWRPM
        elif [ "$asked" = "lowest" ]; then
            set_fanspeed $LOWESTRPM
 else
      fail "error unknown state: $asked"
 fi
 STATE=$asked
    fi
}

set_auto() {
    command='\_SB.ATKD.QMOD 0x02'
    echo "$command" > "${ACPI_CALL}"
    debug_acpi $(cat "${ACPI_CALL}")
    command='\_SB.PCI0.LPCB.EC0.SFNV 0 0'
    echo "$command" > "${ACPI_CALL}"
    debug_acpi $(cat "${ACPI_CALL}")
    CONFIG="auto"
    STATE="auto"
}

set_manual() {
 TEMP=$1
 if [ $TEMP -ge $HIGH -o \( "$STATE" = "high" -a $TEMP -ge $[$MID+$MARGIN] \) ]; then
  set_fanstate high
 elif [ $TEMP -ge $MID -o \( "$STATE" = "mid" -a $TEMP -ge $[$LOW+$MARGIN] \) ]; then
  set_fanstate mid
 elif [ $TEMP -ge $LOW -o \( "$STATE" = "low" -a $TEMP -ge $[$LOWEST+$MARGIN] \)  ]; then
  set_fanstate low
 else
  set_fanstate lowest
 fi
}

set_fanmode() {
 asked=$1
 if [ "$asked" = "auto" ]; then
  set_auto
  CONFIG="auto"
  STATE="auto"
  info "set fan mode: auto"
 elif [ "$asked" = "manual" ]; then
  CONFIG="manual"
  STATE="auto"
  info "set fan mode: manual"
 else
      fail "error unknown mode: $asked"
 fi
}
fail(){
 info "$@"
 set_fanmode auto
 exit 1
}

fatal_callback() {
 fail "stop, fatal signal"
}
sigusr1_callback() {
 set_fanmode manual
 logger -id -s -t $ME "got SIGUSR1, invoking manual control of fans"
 info "siguser1 mode : auto => manual"
}
sigusr2_callback() {
 set_fanmode auto
 logger -id -s -t $ME "got SIGUSR2, setting mode to auto"
 info "siguser1 mode : manual => auto"
}

main() {
    trap "fatal_callback" INT TERM EXIT KILL
    trap "sigusr1_callback" SIGUSR1
    trap "sigusr2_callback" SIGUSR2

    info start
    set_fanmode manual
    if [ $DEBUGBEHV -gt 0 ]; then
        echo ""
    fi
    echo "`date +"%s"` CONFIG $LOWEST $LOW $MID $HIGH  $LOWESTRPM $LOWRPM $MIDRPM $HIGHRPM" >> $LOGFILETEMP
    LASTTEMPSTATE=""
    while :; do
        TEMP0=$[$(</sys/devices/virtual/thermal/thermal_zone0/temp)/1000]
        TEMP1=$[$(</sys/devices/virtual/thermal/thermal_zone1/temp)/1000]
 TEMP=$TEMP1
 if [ $DEBUGBEHV -gt 0 ]; then
            echo -en "\r${TEMP0} ${TEMP1} mode=$CONFIG state=$STATE"
 fi
        if [ "$CONFIG" = "manual" ]; then
  set_manual $TEMP
 fi
 TEMPSTATE="$TEMP0 $TEMP1 $CONFIG $STATE $(cat /proc/loadavg | cut -d\  -f 1,2,3)"
 if [ "$TEMPSTATE" != "$LASTTEMPSTATE" ]; then
  echo "`date +"%s"` $TEMPSTATE" >> $LOGFILETEMP
  LASTTEMPSTATE=$TEMPSTATE
 fi
        sleep 1
    done
}

set -e
main

04/07/2014

Goodness of fit test for log-normal and exponential distributions by bootstrapping

Stackexchange users be welcome to this page :), if you find the program useful or have suggestions, please feel free to leave a comment.

The poweRlaw R library provides the bootstrap_p function which allows to test the goodness of fit of a power law to the data using bootstrapping. Unfortunately the library does not provides such methods for other distributions. We here propose such functions for log-normal and exponential models.
The code is based on the principle of the paper by Clauset et al. 2009 to test goodness of fit by bootstrapping for the log-normal and exponential distributions and reuse some of the powerLaw functions.

I wrote it after realizing that the code provided on the page http://users.dimi.uniud.it/~massimo.franceschet/R/fit.html meant for the same purpose was incorrect: the way the goodness of fit is computed is not correct as it does not strictly follow the procedure described in the paper, and implemented in the poweRlaw package.
Specifically: the synthetic data generation procedure is half implemented, it does not search for the best xmin as provided by the extimate_xmin function of the poweRlaw package, and finally the ks.test discards all the ties, while powerLaw doesn't with its built-in KS test.

Usage: a poweRlaw discrete distribution object has to be given as the first parameter. Prints the p-value on the output. Note that following the paper, about 2500 simulations should be generated to get a 2 points precision.

Interpretation of the p-value: a p-value < 0.1 rejects the hypothesis that the tested law generated the data, while a value > 0.1 does not reject it, but does not disprove the model either.

Note that model comparison based on the likelihood ratio (like the Vuong test, which is implemented in the compare_distribution function of poweRlaw) is better able to tell which of two models better fits the data and if there are enough data to confidently claim so.

Code for log-normal distributions:

bootstrap_ln = function(m,nbsims=5) {
 xmin = m$getXmin()
 ntot = length(m$dat)
 n1 = sum(m$dat<xmin)
 n2 = ntot - n1
 print(n1)
 print(n2)
 print(ntot)

 start_time = Sys.time()

 m_est = estimate_xmin(m)

 end_time = Sys.time()
 total_time = difftime(end_time, start_time, units="secs")

 mean = m_est$pars[1]
 sd = m_est$pars[2]

 print(m_est$KS)
 print(unlist(m_est))

 print(total_time)

 count=0
 for(i in 1:nbsims) {
  # generate synthetic data set
  if(xmin>1) {
   syn1 = sample(m$dat[m$dat<xmin],n1,replace=TRUE)
  } else {
   syn1 = c()
  }
  # sample from distribution
  syn2 = c()
  while(length(syn2)<n2) {
   n2bis = n2-length(syn2)
   syn2bis = round(rlnorm(n2bis,meanlog=mean,sdlog=sd))
   syn2bis = syn2bis[syn2bis>0]
   syn2 = c(syn2,syn2bis)
  }  
  syn = c(syn1,syn2)
  print(length(syn))
  # estimate gof of syn data  
  msyn = dislnorm$new(syn)
  syn_est = estimate_xmin(msyn)
  print(syn_est$KS)
  print(unlist(syn_est))
  if(syn_est$KS >= m_est$KS) {
   count = count + 1
  }
 }
 print(count/nbsims)
}

example:

dsyn = round(rlnorm(1000,2.4,0.98))
dsyn = dsyn[dsyn>0]
length(dsyn)
e_lnsyn = dislnorm(dsyn)
e_lnsyn$setXmin(estimate_xmin(e_lnsyn))
bootstrap_ln(e_lnsyn)

Code for exponential distributions:


bootstrap_exp = function(m,nbsims=5) {
 xmin = m$getXmin()
 ntot = length(m$dat)
xmin
 n1 = sum(m$dat<xmin)
 n2 = ntot - n1
 print(n1)
 print(n2)
 print(ntot)

 start_time = Sys.time()

 m_est = estimate_xmin(m)

 end_time = Sys.time()
 total_time = difftime(end_time, start_time, units="secs")

 rate = m_est$pars[1]

 print(m_est$KS)
 print(unlist(m_est))

 print(total_time)

 count=0
 for(i in 1:nbsims) {
  # generate synthetic data set
  if(xmin>1) {
   syn1 = sample(m$dat[m$dat < xmin],n1,replace=TRUE)
  } else {
   syn1 = c()
  }
  # sample from distribution
  syn2 = c()
  while(length(syn2)<n2) {
   n2bis = n2-length(syn2)
   syn2bis = round(rexp(n2bis,rate=rate))
   syn2bis = syn2bis[syn2bis>0]
   syn2 = c(syn2,syn2bis)
  }  
  syn = c(syn1,syn2)
  # estimate gof of syn data  
  msyn = disexp$new(syn)
  syn_est = estimate_xmin(msyn)
  print(syn_est$KS)
  print(unlist(syn_est))
  if(syn_est$KS >= m_est$KS) {
   count = count + 1
  }
 }
 print(count/nbsims)
}