Monday, September 14, 2009

Wanda's Wisdom

You may be gone tomorrow, but that doesn't mean that you weren't here today.

Monday, September 7, 2009

Break the Cycle: Local Class Definitions

In the process of analyzing the memory behavior of your Python application, you will sooner or later stumble across reference cycles. It is always a good idea to avoid creating reference cycles, though not every cycle is worth breaking (using weak methods avoided some reference cycles but increased the invocation cost).

Debugging reference cycles can be simplified by creating a graphical representation of the reference graph, e.g. using graphviz. Marius Gedminas provides a set of tools to facilitate building graphs at his homepage. Similar facilities exist in Pympler. The latter improved considerably since the official 0.1 release so be sure to grab the version from the svn trunk.

When you know what objects are involved in cyclic dependencies you will want to know why these occurred in the first place, which is not always trivial to figure out. While working on the integration of Bottle (which is really great BTW) in Pympler, I stumbled across an interesting case:

import gc
gc.disable()

def f():
class Foo(object):
pass
f()

from pympler.gui.garbage import GarbageGraph
GarbageGraph(reduce=True).render('cycle1.png', format='png')


This snippet creates the following reference cycle (click on the image to enlarge):



Apparently, defining a class in the local scope of a function or method creates a reference cycle. Lifting the class definition to the module level avoids the reference cycle. It is even more interesting that class objects create reference cycles by design when they go out of scope:

>>> import gc
>>> gc.disable()
>>> class Foo(object):
... pass
>>> del Foo
>>> gc.collect()
6


So what? Well, it is evidently beneficial to define classes in modules or other classes, and not in functions or methods.

Tuesday, September 1, 2009

Substitute assert statements with unittest methods using vim

In the Python community, it's not generally agreed upon whether to use the assert statement or the assert* methods from the unittest module. As some commentators pointed out in a recent discussion, there are (a few) good reasons to prefer the assertion methods, e.g. better error messages.

Here are some vim substitution commands that make the transition from assert statements to the appropriate methods easier:

:%s/assert \(.\+\) == \(.\+\)/self.assertEqual(\1, \2)/gc
:%s/assert \(.\+\) != \(.\+\)/self.assertNotEqual(\1, \2)/gc
:%s/assert \(.\+\)/self.assert_(\1)/gc

Wednesday, August 19, 2009

Convert Images to A4 PDF

Converting raster images to PDF in a printable format can be achieved using the ImageMagick convert utility with the page parameter:

convert -page a4 *.png images.pdf


The converter, however, not quite does what I expected. Images are resized to fill the A4 page but the aspect ratio is preserved and no margin is added. This actually leads to different sized pages for images with different ratios (which is common for scanned documents for example).

In order to create equal-sized PDF pages from a bunch of images, a margin or border needs to be added to the images. Doing this manually is a cumbersome process. Therefore, I've written a little Python script which adds a (white) border to the individual images to enforce an aspect ratio compatible with A4 pages. The script creates a PDF file from a bunch of image files with uniform A4 page size:

import sys
from subprocess import Popen, PIPE

PAGE_WIDTH = 210.0
PAGE_HEIGHT = 297.0

files = [arg for arg in sys.argv[1:-1]]
output = sys.argv[-1]
tmp = ["a4%s" % f for f in files]
for f,t in zip(files, tmp):
p = Popen(["identify", f], stdout=PIPE)
dim = p.communicate()[0].split()[2]
w,h = [float(d) for d in dim.split('x')]
bw,bh = 0,0
if w/h < PAGE_WIDTH/PAGE_HEIGHT:
nw = PAGE_WIDTH * h / PAGE_HEIGHT
bw = int((nw - w) / 2)
else:
nh = PAGE_HEIGHT * w / PAGE_WIDTH
bh = int((nh - h) / 2)
Popen(["convert", "-border", "%dx%d" % (bw,bh),
"-bordercolor", "white", f, t]).communicate()
Popen(["convert", "-page", "a4"] + tmp + [output]).communicate()


Save the script to img2a4pdf.py and invoke it like that:

python img2a4pdf.py *.png output.pdf


Maybe someone will find it useful.

Wednesday, July 22, 2009

Der neueste Kick

Mal wieder eine Meldung auf Tagesschau.de:

"Es ist offensichtlich, dass unsere Milchbauern gerade leiden", sagte sie. "Es geht um echte Menschen und nicht um Statistiken auf einem Blatt Papier."


Etwas weiter unten dann:

Der deutsche Bauernverband hatte das Schlachten von Kühen als eine Möglichkeit gesehen, um das Überangebot an Milch auf dem europäischen Markt zu bereinigen: Wenn anderthalb Millionen Tiere getötet würden, könnte das den Kick geben, um aus dem tiefen Tal wieder herauszukommen.


Mit etwas Glück kommt vielleicht bald die Milch im Kaffee nach dem Steak vom selben Tier. Prost Mahlzeit!

Immerhin, das Fleisch von anderthalb Millionen geschlachteten Rindern reicht in Deutschland keine sechs Monate. Wer wäre da nicht gern Vegetarier.

Sunday, July 19, 2009

Inkscape PDF Export

Inkscape is nice vector drawing program, especially for illustrating Latex documents. In the past, I've always exported to eps first and then converted the docs to PDF using epstopdf. This way, the bounding box is confined to the actual region of interest.

Unfortunately, transparency information is lost in the process. What works, though, is to fit the page to the selection just before directly exporting to PDF. Go to File > Document Properties and press Fit page to selection on the Page tab.

Friday, July 17, 2009

Almost done

At last, I've completed my final thesis. Seven exciting years as a student passed almost too quickly. In a week from now, it'll all be over. Finally, time for traveling, working on Pympler and SCons, getting an interesting job.

In the process of finding a new home, our old server will probably be disconnected before very long. Therefore, it is time to find a new haven for ideas, thoughts and casual code snippets.