Accumulo fails to start, Waiting for accumulo to be initialized

Also manifests as “Instance Id does not exist in Zookeeper”.

I was trying to set up a standalone instance of Hadoop + Zookeeper + Accumulo yesterday and found that after every reboot I needed to destroy and recreate my Accumulo instance, otherwise it wouldn’t connect to Zookeeper. It didn’t seem to make a difference whether I did a clean shutdown or not.

Turns out I didn’t read zoo.cfg very carefully:

# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.

Make sure that your Zookeeper data is being saved in a spot where it won’t get blown away on every reboot!

MongoEngine Class Diagram Tool

I just put together a quick tool to generate UML Class Diagrams from a MongoEngine document schema. It’s a pretty convenient way to get a big-picture overview of a project’s data model.

The project is available on github. Instructions are in the readme file.

Thanks to my employers at PHEMI Health Systems for allowing me to push this code to the open-source community.  And thanks to the MongoEngine team for such a useful project!

Fixing missing dropdown button in Chrome date input

If you’re running the current version of Chrome (25.0.1364.97 as of this writing) you might find that the dropdown button is not showing properly on date input fields.  What’s actually happening is that the dropdown is getting rendered below the date input, which can make it invisible if your control isn’t big enough.

Here’s the CSS workaround I use to correct this:

input[type=date]::-webkit-calendar-picker-indicator {
 display: inline-block;

input[type=date] {
 min-height: 24px;
 vertical-align: top;
 white-space: nowrap;


Rendering HTML as PDF in Python

Weasyprint is the best python tool I’ve found to convert HTML to PDF.  Well-maintained, frequent releases, great documentation, with a solid rendering engine.  But for some reason they’re really hard to find when doing a google search for “python html pdf converter”.  So here’s my best effort at giving them a little googlejuice: use Weasyprint!

Ironically, the top hit for the search “python html pdf converter”, xhtml2pdf, is a real mess. Poorly maintained, unclear ownership, sparsely documented.

There’s no reason to use anything except Weasyprint to convert HTML documents to PDF from Python!

LDAP group membership not updating?

If you are new to LDAP, and you are compelled to set up an LDAP server for your organization, then you may find that sometimes you will make changes on the server that do not appear when you query from your client.

For instance, maybe you’ve defined a set of POSIX groups and have begun adding members to those groups with the memberUid attribute.  And then when you go to one of your LDAP clients and run ‘groups’ or ‘id’ or ‘getent group’ you don’t see the group membership you just set up.

Check if you’re running nscd.

ps -ef | grep nscd

If you are, restart it.

sudo /etc/init.d/nscd restart

Much better, right?

Producer/Consumer pattern with ZeroMQ

If you’re using ZeroMQ to distribute jobs across a bunch of worker processes, keep in mind that the examples given here…

…don’t include any flow control.  Therefore, if your producer is producing messages much faster than your consumers can consume them, the messages will buffer.  And buffer.  To the point that you will either run out of memory, or your workers will start to page out and their performance will degrade badly.

The good news is, there’s an easy fix for this.  Use ZMQ’s “High Water Mark” feature to implement blocking on PUSH- and PULL-type sockets.  In pyzmq, use the method:

socket.setsockopt(zmq.HWM, LIMIT)

On the sockets for both the producer and consumers.  Where LIMIT is the maximum number of outstanding messages.  I’ve set PRODUCER_LIMIT to be n*CONSUMER_LIMIT where n is the number of consumers, seems to work pretty well.

Updated RTree Implementation

I’ve updated my Java R-Tree implementation on github.  These changes fix a couple of bugs:

  1. Ensure that the bounding boxes are tightened properly when nodes split.
  2. Correctly calculate new dimensions in tighten()
  3. Ensure that a non-leaf root node always has at least two elements.
  4. Added implementation of QuadraticPickSeeds, QuadraticPickNext.

Also updated the test suite.  There are no known bugs in the current implementation; please get in touch if you find any problems.

Setting compiler options with SBT

There’s a lot of misinformation out there re. how to set javac and scalac compiler options in sbt. You might see stuff about defining a custom Build object in a .scala file. Or a custom Project object. It’s all nonsense. I guess it applies to previous versions of sbt. In sbt 0.11.2, you set javac and scalac compiler options as shown on the sbt examples page here:

[gist id=2363470]