Big Data at SXSW (begin shameless plug)

On Monday March 14, I’ll be one of the presenters at a SXSW workshop called “Big Data and APIs for PHP Developers”, along with:

We’ll be talking about what Big Data is, how to work with it, Big Data APIs (how to design and implement your own, and how to consume them), data visualization, and the wonders of MapReduce. I’ll talk through a case study around Socorro: the nature of the data we have, how we manage it, and some of the challenges we have faced so far.

Workshops are new at SXSW. They are longer than the traditional panel – 2.5 hours – so we can actually get into some techinical content. We plan on making our presentation a conversation about data, with plenty of war stories.

Hope to see you there!

Being Open

I was recently privileged to be invited to come and give a talk at AOL, on the work we do with Socorro, how Mozilla works, and what it means to be open.

The audience for the talk was a new group at AOL called the Technology Leadership Group. It consists of exceptional technical people – engineers and operational staff – from all parts of the organization, who have come together to form a group of thought leaders.

One of the first items on their agenda is, as Erynn Petersen, who looks after Developer and Open Source Evangelism, puts it: “how we scale, how we use data in new and interesting ways, and what it means to treat all of our projects as open source projects.” My task was partly to talk about how open source communities work, what the challenges are, and how AOL might go about becoming more open.

It’s amazing how things come full circle.

I think every person in the audience was a user of Open Source, and many of them were already Open Source contributors on a wide range of projects. Some had been around since the days when Netscape was acquired by AOL.

I’ll include the (limited) slides here, but the best part of the session in my opinion was the Q&A. We discussed some really interesting questions, and I’ll feature some of those here. (I want to note that I am paraphrasing/summarizing the questions as I remember them, and am not quoting any individual.)

Q: Some of our software and services are subscription-based. If we give that code away, we lose our competitive advantage – no one will pay for it anymore.

A: There are a bunch of viable business models that revolve around making money out of open source. The Mozilla model is fairly unusual in the space. The most common models are:

  • Selling support, training, or a built and bundled version of the software. This model is used by Red Hat, Canonical, Cloudera, and many others.
  • Dual licensing models. One version of the software is available under an open source license and another version is available with a commercial license for embedding. This is (or has been) the MySQL model.
  • Selling a hosted version of Open Source software as a service. This model is used by Github (git) and Automattic (WordPress), among others.
  • It’s also perfectly valid to make some of your software open and leave some proprietary. This is the model used by 37signals – they work on Ruby on Rails and sell SaaS such as Backpack and Basecamp.

Another point is that at Mozilla, our openness *is* our competitive advantage. Our users know that we have no secret agenda: we’re not in it for the money, but we’re also not in it to mine or exploit their personal data. We exist to care for the Open Web. There has been a lot of talk lately about this, best summarized by this statement, which you’ll see in blog posts and tweets from Mozillians:

Firefox answers to no-one but you.

Q: How do we get started? There’s a lot of code – how do we get past the cultural barriers of sharing it?

A: It’s easier to start open than to become open after the fact. However, it can be done – if it couldn’t be done Mozilla wouldn’t exist. Our organization was born from the opening of Netscape. A good number of people in the room were at AOL during the Netscape era, too, which must give them a sense of deja vu. I revisited jwz’s blog post about leaving the Mozilla project, back in those days after I drafted this post, and I recommend reading it as it talks about a lot of the issues.

My answer is that there’s a lot to think about here:

  • What code are we going to make open source? Not everything has to be open source, and it doesn’t have to happen all at once. I suggest starting up a site and repository that projects can graduate to as they become ready for sharing. Here at Mozilla basically everything we work
    on is open source as a matter of principle (“open by default”), but someof it is more likely to be reused than other parts. Tools and libraries are a great starting point.
  • How will that code be licensed? This is partly a philosophical question and partly a legal question. Legal will need to examine the licensing and ownership status of existing code. You might want a contributors’ agreement for people to sign too. Licenses vary and the answer to this question is also dependent on the business model you want to use.
  • How will we share things other than the code? This includes bug reports, documentation, and so on.
  • How will the project be governed? If I want to submit a patch, how do I do that? Who decides if, when, and how that patch will be applied? There are various models for this ranging from the benevolent dictator model to the committee voting model.

I would advise starting with a single project and going from there.

Q: How will we build a community and encourage contributions?
A: This is a great question. We’re actually trying to answer this question on Socorro right now. Here’s what we are doing:

  • Set up paths of communication for the community: mailing lists, newsgroups, discussion forums
  • Make sure you have developer documentation as well as end user documentation
  • If the project is hard to install, consider providing a VM with everything already installed. (We plan to do this both for development and for users who have a small amount of data.)
  • Choose some bugs and mark them as “good first bug” in your bug tracking system.
  • Make the patch submission process transparent and documented.

There was a lot more discussion. I really enjoyed talking to such a smart and engaging group, and I wish AOL the very best in their open source initiative.

The new Socorro

(This post first appeared on the Mozilla Webdev Blog.)

As per my previous blog post, we migrated Socorro to the new data center in Phoenix successfully on Saturday.

This has been a mammoth exercise, that sprang from two separate roots:

  • In June last year, we had a configuration problem that looked like a spike in crashes for a pre-release version of Firefox (3.6.4). This incident (known inaccurately, as “the crash spike”) made it clear that crash-stats is on the critical path to shipping Firefox releases.
  • Near the end of Q3 we realized we were rapidly approaching the capacity of the current Socorro infrastructure.

Around that time we also had difficulty with Socorro stability. This spawned the creation of a master Socorro Stability Plan, with six key areas. I’ll talk about each of these in turn.

Improve stability

Here, we solved some Hbase related issues, upgraded our software, scheduled restarts three times a week and, most importantly, sat down to conduct an architectural review.

The specific HBase issue that we finally solved had to do with intra-cluster communication. (We needed to upgrade our Broadcomm NIC drivers and kernel to solve a known issue when used with HBase. This problem surfaces as the number of TCP connections growing and growing until the box crashes. Solving this removed the need for constant system restarts.)

Architectural improvements

We collect incoming crashes directly to HBase. We determined that as HBase is relatively new and as we’d had stability problems, that we should get Hbase off the critical path for production uptime. We rewrote collectors to seamlessly fall back to disk if HBase was unavailable, and for them optionally to use disk as primary. As part of this process, the system that moves crashes from disk to HBase was replaced. It went from single threaded to multi-threaded, which makes playing catchup after an HBase downtime much much faster.

We still want to put a write through caching layer in front of HBase. This quarter, the Metrics team is prototyping a system for us to test, using Hazelcast.

Build process

We now have an automated build system. When code is checked into svn, Hudson notices, creates a build and runs tests. This build is then deployed in staging. Now we are on the new infrastructure, we will deploy new releases from Hudson as well.

Improved release practices

We have a greatly improved set of release guidelines, including writing a rollback plan for every major release. We did this as part of the migration, too. Developers now have read only access to everything in production: we can audit configs and tail logs.

The biggest change here, though, is switching to Puppet to manage all of our configurations. Socorro has a lot of moving parts, each with its own configuration, and thanks to the fine work by the team, all of these configurations are automatically and centrally managed and can be changed and deployed at the push of a button.

Improved insight into systems

As part of the migration, we audited our monitoring systems. We now have many more nagios monitors on all components, and have spent a lot of time tuning these. We also set up ganglia and have a good feel for what the system looks like under normal load.

We still intend to build a better ops dashboard so we can see all of these checks and balances in one place.

Move to bigger, better hardware in PHX

This one is the doozy. Virtually all Socorro team resources have been on this task full time since October, and we managed to subsume half of the IT/Ops team as well. I’d like to give a special shout-out here to NetOps, who patiently helped with our many many requests.

It’s worth noting that part of the challenge here is that Socorro has a fairly large number of moving parts. I present, for your perusal, a diagram of our current architecture.
Socorro Architecture, 1.7.6
I’ve blogged about our architecture before, and will do so again, soon, so this is just a teaser.

One of the most interesting parts of the migration was the extensive smoke and load testing we performed. We set up a farm of 40 Seamicro nodes and used them to launch crashes at the new infrastructure. This allowed us to find network bottlenecks, misconfigurations, and perform tuning, so that on the day of the migration we were really confident that things would go well. QA also really helped here because with the addition of a huge number of automated tests on the UI, we knew that things were looking great from an end user perspective.

The future

We now have a great infrastructure and set of processes to build on. The goal of Socorro – and Webtools in general – is to help Firefox ship as fast and seamlessly as possible, by providing information and tools that help to build the best possible browser. (I’ll talk more about this in future blog posts, too.)

Thanks

I’ll wrap up by thanking every member of the team that worked on the migration, in no particular order:

  • Justin Dow, Socorro Lead Sysadmin, Puppetmaster
  • Rob Helmer, Engineer, Smoke Tester Extraordinaire and Automater of Builds
  • Lars Lohn, Lead Engineer
  • Ryan Snyder, Web Engineer
  • Josh Berkus, PostgreSQL consultant DBA
  • Daniel Einspanjer, Metrics Architect
  • Xavier Stevens, HBase wizard
  • Corey Shields, Manager, Systems
  • Matthew Zeier, Director of Ops and IT
  • Ravi Pina, NetOps
  • Derek Moore, NetOps
  • Stephen Donner, WebQA lead
  • David Burns, Automated Tester
  • Justin Lazaro, Ops

Good job!
Mozilla Crash Reporting Team

PHP and Big Data

This post first appeared as part of the PHP Advent Calendar.

Big data, data science, analytics. These are some of the hottest buzzwords in tech right now. Five years ago, the boasting rights went to the geek with the largest number of users: these days he with the biggest data wins.

There are a number of approaches to dealing with vast quantities of data, but one of the best known is Apache Hadoop. Hadoop is a toolkit for managing large data sets, based originally on the Google whitepapers about MapReduce and the Google File System. For Socorro, the Mozilla crash reporting system, we use HBase, a non-relational (NoSQL) database built on the Hadoop ecosystem.

The Hadoop world is largely a Java world, since all the tools are written in Java. However, if you feel the same way about Java as Sean Coates, you should not lose hope. You, too, can use PHP to work with Hadoop.

Let’s start by understanding MapReduce. This is a framework for distributed processing of large datasets.

A MapReduce job consists of two pieces of code:

A Mapper
The job of the Mapper is to map input key-value pairs to output key-value pairs.
A Reducer
The Reducer receives and collates results from Mappers.

More parts are needed to make this work:

  • An Input reader generates splits of data for each Mapper to work through.
  • A Partition function takes the output of Mappers and chooses a destination Reducer.
  • An Output writer takes the output of the Reducers and writes it to the Hadoop Distributed File System (HDFS).

In summary, the Mapper and Reducer are the core functionality of a MapReduce job. Now, let’s get set up to write a Mapper and Reducer against Hadoop with PHP.

Setting up Hadoop is a non-trivial task; luckily, a number of VMs are available to help. For this example, I am using the Training VM from Cloudera. (You’ll need VMWare Player for Windows or Linux, or VMWare Fusion for OS X to run this VM.)

Once you’ve started the VM, open up a terminal window. (This VM is Ubuntu based.)

The VM you have just installed comes with a sample data set of the complete works of Shakespeare. You’ll need to put these files into HDFS so that we can work with them. Run the following commands to put the files into HDFS:

cd ~/git/data
tar vzxf shakespeare.tar.gz
hadoop fs -put input /user/training/input

You can confirm this worked by viewing the files in the input directory on HDFS: hadoop fs -ls /user/training

Next, we need to create the mapper and reducer. To demonstrate these, we’ll reproduce what is often referred to as the canonical MapReduce example: word count.

You can find the Java version of this code in the Cloudera Hadoop Tutorial.

As you can see (if you know Java), the mapper reads words from input, and for each word it encounters, emits to standard output the word and the value 1 to indicate that the word has been encountered. The reducer takes output from mappers and aggregates it to produce a set of words and counts.

The easiest way to communicate from PHP to Hadoop and back again is using the Hadoop Streaming API. This expects mappers and reducers to use standard input and output as a pipe for communication.

This is how we write the word count mapper in PHP, which we’ll name mapper.php:

#!/usr/bin/php
<?php

$input = fopen("php://stdin", "r");

while ($line = fgets($input)) {
$line = strtolower($line);
if ($words = preg_split("/W/", $line)) {
foreach ($words as $word) {
echo "$wordt1n";
}
}
}

fclose($input);

We open standard input for reading a line at a time, split that line into an array along word boundaries using a regular expression, and emit output as the word encountered followed by a 1. (I delimited this with tabs, but you may use whatever you like.)

Now, here’s the reducer (reducer.php):

#!/usr/bin/php
<?php

$input = fopen("php://stdin", "r");
$counts = array();

while($line = fgets($input)) {
$tuple = explode("t", $line);
$counts[$tuple[0]] += $tuple[1];
}

fclose($input);

foreach($counts as $word => $count) {
echo("$word $countn");
}

Again, we read a line at a time from standard input, and summarize the results in an array. Finally, we write out the array to standard output.

Copy these scripts to your VM, and once you have saved them, make them executable:

chmod a+x mapper.php
chmod a+x reducer.php

You can run this example code in the VM using the following command:

hadoop 
jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+320.jar 
-mapper mapper.php 
-reducer reducer.php 
-input input 
-output wordcount-php-output

In the output from the command you will see a URL where you can trace the execution of your MapReduce job in a web browser as it runs. Once the job has finished running, you can view the output in the location you specified:

hadoop fs 
-ls /user/training/wordcount-php-output

You should see something like:

Found 2 items
drwxr-xr-x   - training supergroup          0 2010-12-14 15:40
/user/training/wordcount-php-output/_logs
-rw-r--r--   1 training supergroup     279706 2010-12-14 15:40
/user/training/wordcount-php-output/part-00000

You can view the output, too:

hadoop fs 
-cat /user/training/wordcount-php-output/part-00000

An excerpt from the output should look like this:

yeoman 13
yeomen 1
yerk 2
yes 211
yest 1
yesterday 25

This is a pretty trivial example, but once you have this set up and running, it’s easy to extend this to whatever you need to do. Some examples of the kinds of things you can use it for are inverted index construction, machine learning algorithms, and graph traversal. The data you can transform is limited only by your imagination, and, of course, the size of your Hadoop cluster. That’s a topic for another day.

The future of crash reporting

This post first appeared in the Mozilla Webdev Blog on August 5 2010.

In recent blog posts I’ve talked about our plans for Socorro and our move to HBase.

Today, I’d like to invite community feedback on the draft of our plans for Socorro 2.0. In summary, we have been moving our data into HBase, the Hadoop database. In 1.7 we began exclusively using HBase for crash storage. In 1.8 we will move the processors and minidump_stackwalk to Hadoop.

Here comes the future

In 1.9, we will enable pulling data from HBase for the webapp via a web services layer. This layer is also known as “the pythonic middleware layer”. (Nominations for a catchier name are open. My suggestion of calling it “hoopsnake” was not well received.)

In 2.0 we will expose HBase functionality to the end user. We also have a number of other improvements planned for the 2.x releases, including:

  • Full text search of crashes
  • Faceted search
  • Ability for users to run MapReduce jobs from the webapp
  • Better visibility for explosive and critical crashes
  • Better post-crash user engagement via email

Full details can be found in the draft PRD. If you prefer the visual approach you can read the slides I presented at the Mozilla Summit last month.

Give us feedback!

We welcome all feedback from the community of users – please take a look and let us know what we’re missing. We’re also really interested in feedback about the best order in which to implement the planned features.

You can send your feedback to laura at mozilla dot com – I look forward to reading it.

Moving Socorro to HBase

This post first appeared in the Mozilla Webdev Blog on July 26 2010.

We’ve been incredibly busy over on the Socorro project, and I have been remiss in blogging. Over the next week or so I’ll be catching up on what we’ve been doing in a series of blog posts. If you’re not familiar with Socorro, it is the crash reporting system that catches, processes, and presents crash data for Firefox, Thunderbird, Fennec, Camino, and Seamonkey. You can see the output of the system at http://crash-stats.mozilla.com. The project’s code is also being used by people outside Mozilla: most recently Vigil Games are using it to catch crashes from Warhammer 40,000: Dark Millenium Online.

Back in June we launched Socorro 1.7, and we’re now approaching the release of 1.8. In this post, I’ll review what each of these features represents on our roadmap.

First, a bit of history on data storage in Socorro. Until recently, when crashes were submitted, the collector placed them into storage in the file system (NFS). Because of capacity constraints, the collector follows a set of throttling rules in its configuration file in order to make a decision about how to disseminate crashes. Most crashes go to deferred storage and are not processed unless specifically requested. However, some crashes are queued into standard storage for processing. Generally this has been all crashes from alpha, beta, release candidate and other “special” versions; all crashes with a user comment; all crashes from low volume products such as Thunderbird and Camino; and a specified percentage of all other crashes. (Recently this has been between ten and fifteen percent.)

The monitor process watched standard storage and assigned jobs to processors. A processor would pick up crashes from standard storage, process them, and write them to two places: our PostgreSQL database, and back into file system storage. We had been using PostgreSQL for serving data to the webapp, and the file system storage for serving up the full processed crash.

For some time prior to 1.7, we’d been storing all crashes in HBase in parallel with writing them into NFS. The main goal of 1.7 was to make HBase our chief storage mechanism. This involved rewriting the collector and processor to write into HBase. The monitor also needed to be rewritten to look in HBase rather than NFS for crashes awaiting processing. Finally, we have a web service that allows users to pull the full crash, and this also needed to pull crashes from HBase rather than NFS.

Not long before code freeze, we decided we should add a configuration option to the processor to continue storing crashes in NFS as a fallback, in case we had any problems with the release. This would allow us to do a staged switchover, putting processed crashes in both places until we were confident that HBase was working as intended.

During the maintenance window for 1.7 we also took the opportunity to upgrade HBase to the latest version. We are now using Cloudera’s CDH2 Hadoop distribution and HBase 0.20.5.

The release went fairly smoothly, and three days later we were able to turn off the NFS fallback.

We’re now in the final throes of 1.8. While we now have crashes stored in HBase, we are still capacity constrained by the number of processors available. In 1.8, the processors and their associated minidump_stackwalk processes will be daemonized and move to run on the Hadoop nodes. This means that we will be able to horizontally scale the number of processors with the size of the data. Right now we are running fifteen Hadoop nodes in production and this is planned to increase over the rest of the year.

Some of the associated changes in 1.8 are also really exciting. We are introducing a new component to the system, called the registrar. This process will track heartbeats for each of the processors. Also in this version, we have added an introspection API for the processors. The registrar will act as a proxy, allowing us to request status and statistical information for each of the processors. We will need to rebuild the status page (visible at http://crash-stats.mozilla.com/status) to use this new API, but we will have much better information about what each processor is doing.

Update: we’re frozen on 1.8 and expect release later this month.

Socorro: Mozilla’s Crash Reporting System

(Cross-posted from the Mozilla WebDev blog.)

Recently, we’ve been working on planning out the future of Socorro.  If you’re not familiar with it, Socorro is Mozilla’s crash reporting system.

You may have noticed that Firefox has become a lot less crashy recently – we’ve seen a 40% improvement over the last five months.  The data from crash reports enables our engineers to find, diagnose, and fix the most common crashes, so crash reporting is critical to these improvements.

We receive on our peak day each week 2.5 million crash reports, and process 15% of those, for a total of 50 GB.  In total, we receive around 320Gb each day!  Right now we are handicapped by the limitations of our file system storage (NFS) and our database’s ability to handle really large tables.   However, we are in the process of moving to Hadoop, and currently all our crashes are also being written to HBase.  Soon this will become our main data storage, and we’ll be able to do a lot more interesting things with the data.  We’ll also be able to process 100% of crashes.  We want to do this because the long tail of crashes is increasingly interesting, and we may be able to get insights from the data that were not previously possible.

I’ll start by taking a look at how things have worked to date.

History of Crash Reporting

Current Socorro Architecture

The data flows as follows:

  • When Firefox crashes, the crash is submitted to Mozilla by a part of the browser known as Breakpad.  At Mozilla’s end, this is where Socorro comes into play.
  • Crashes are submitted to the collector, which writes them to storage.
  • The monitor watches for crashes arriving, and queues some of them for processing.  Right now, we throttle the system to only process 15% of crashes due to capacity issues.  (We also pick up and transform other crashes on demand as users request them.)
  • Processors pick up crashes and process them.  A processor gets its next job from a queue in our database, invokes minidump_stackwalk (a part of Breakpad) which combines the crash with symbols, where available.  The results are written back into the database.   Some further processing to generate reports (such as top crashes) is done nightly by a set of cron jobs.
  • Finally, the data is available to Firefox and Platform engineers (and anyone else that is interested) via the webui, at http://crash-stats.mozilla.com

Implementation Details

  • The collector, processor, monitor and cron jobs are all written in Python.
  • Crashes are currently stored in NFS, and processed crash information in a PostgreSQL database.
  • The web app is written in PHP (using the Kohana framework) and draws data both from Postgres and from a Pythonic web service.

Roadmap

Future Socorro releases are a joint project between Webdev, Metrics, and IT.  Some of our milestones focus on infrastructure improvements, others on code changes, and still others on UI improvements.  Features generally work their way through to users in this order.

  • 1.6 – 1.6.3 (in production)

    The current production version is 1.6.3, which was released last Wednesday.  We don’t usually do second dot point releases but we did 1.6.1, 1.6.2, and 1.6.3 to get Out Of Process Plugin (OOPP) support out to engineers as it was implemented.

    When an OOPP becomes unresponsive, a pair of twin crashes are generated: one for the plugin process and one for the browser process.  For beta and pre-release products, both of these crashes are available for inspection via Socorro.  Unfortunately, Socorro throttles crash submissions from released products due to capacity constraints.  This means one or the other of the twins may not be available for inspection.  This limitation will vanish with the release of Socorro 1.8.

    You can now see whether a given crash signature is a hang or a crash, and whether it was plugin or browser related.  In the signature tables, if you see a stop sign symbol, that’s a hang.  A window means it is crash report information from the browser, and a small blue brick means it is crash report information from the plugin.

    If you are viewing one half of a hang pair for a pre-release or beta product, you’ll find a link to the other half at the top right of the report.

    You can also limit your searches (using the Advanced Search Filters) to look just at hangs or just at crashes, or to filter by whether a report is browser or plugin related.

  • 1.7 (Q2)

    We are in the process of baking 1.7.  The key feature of this release is that we will no longer be relying on NFS in production. All crash report submissions are already stored in HBase, but with Socorro 1.7, we will retrieve the data from HBase for processing and store the processed result back into HBase.

  • 1.8 (Q2)

    In 1.8, we will migrate the processors and minidump_stackwalk instances to run on our Hadoop nodes, further distributing our architecture.  This will give us the ability to scale up to the amount of data we have as it grows over time. You can see how this will simplify our architecture in the following diagram.

    New Socorro Architecture

    With this release, the 15% throttling of Firefox release channel crashes goes away entirely.

  • 2.0 (Q3 2010)

    You may have noticed 1.9 is missing.  In this release we will be making the power of Hbase available to the end user, so expect some significant UI changes.

    Right now we are in the process of specifying the PRD for 2.0.  This involves interviewing a lot of people on the Firefox, Platform, and QA teams.  If we haven’t scheduled you for an interview and you think we ought to talk to you, please let us know.

Features under consideration

  • Full text search of crashes
  • Faceted search: start by finding crashes that match a particular signature, and then drill down into them by category.
    Which of these crashes involved a particular extension or plugin?  Which ones occured within a short time after startup?
  • The ability to write and run your own Map/Reduce jobs (training will be provided)
  • Detection of “explosive crashes” that appear quickly
  • Viewing crashes by “build time” instead of clock time
  • Classification of crashes by component

This is a big list, obviously. We need your feedback – what should we work on first?

One thing that we’ve learned so far through the interviews is that people are not familiar with the existing features of Socorro, so expect further blog posts with more information on how best to use it!

How to get involved

As always, we welcome feedback and input on our plans.

You can contact the team at socorro-dev@mozilla.com, or me personally at laura@mozilla.com.

In addition, we always welcome contributions.  You can find our code repository at
http://code.google.com/p/socorro/

We hold project meetings on a Wednesday afternoon – details and agendas are here
https://wiki.mozilla.org/Breakpad/Status_Meetings

Parenting Versus Programming

This post was written for and first appeared in the PHP Advent Calendar 2009.

Advent calendars are about Christmas, and for me Christmas has always been a time for family. This year I have recently joined the ranks of the parents among you. I am taking a short break from work and focusing on being a mother rather than being a programmer. This has led me to reflect on the similarities between parenting and coding. I present these here for your enlightenment, or so you can laugh at me.

Lesson One: Smells

Babies, like programs, are associated with a variety of interesting smells. If you try to ignore the smells associated with babies, they only get worse over time. Then the screaming starts.

It was Martin Fowler who popularized the notion of code smells. These are defined as parts of your code that do things in an ugly way, or to put it a different way, they are hacks. Typically, when we find these parts of code, our eyes begin to glaze over, and we enter a strange, zombie-like state. (This will also be familiar to parents.) In this state, we are paralyzed and thus prevented from doing anything about the smell. It is only when we emerge from the cave of code smells into the daylight of clean code that we can again be productive. I am not sure whether it is the horror of the code, a tendency to procrastinate that is endemic among programmers, or simply a fear of breaking things that prevents most people from doing something about the mess. It is always the smelliest parts of the code that are the most fragile.

The lesson programmers can learn from babies, here, is to face your smells and get rid of them as quickly as possible. Then everybody’s happy.

Lesson Two: Sleep

I have now been programming for mumble years. Not long after I started work at Mozilla, a delightful Canadian journalist asked me what it was like to work with a group of people “so much younger than myself.” While I am not actually that old, I am old enough to have picked up some skills. One of these skills is the ability to survive on 4 to 6 hours of sleep a night for extended periods. Although this is not my favorite hobby, I have gained a certain level of mastery.

Another thing I’ve learned is that surviving on 4 to 6 hours of sleep a night for an extended period makes you mildly deranged. With parenting, as with programming, it is sometimes required. When you achieve a certain level of derangement, you find that very silly things start to sound like good ideas, or they just start to happen, whether you intended them to or not. For example, you find yourself accidentally filing the baby in the filing cabinet or implementing a new framework.

This is a lesson both programmers and parents can learn from the military: sleep when you can do so safely, and as often as possible. If you can snatch a few minutes of nap here and there, it may not make you feel a lot better, but your degraded IQ will recover somewhat.

Lesson Three: With Great Power Comes Great Responsibility

When you set out to implement a new web app or raise a child — and yes, I do realize these things are not quite on the same scale — you have a great responsibility to do a good job. Otherwise, everyone who has to interact with your app or child in the future will curse your name. Repeatedly. Your baby — whether it is human or code — is totally dependent upon you to do a good job.

Lesson Four: It Takes a Village

On that note, it is important to realize that it is very hard to raise a child or write a web app completely on your own. Some things are better done with a little help from your friends. Whether that help is providing a role model, a shoulder to cry on, advice when you just don’t know what to do, or purely someone to vent to, it really helps to surround yourself with people you can rely on.

That summarizes the commonalities between parenting and programming that I have learned over the last 7 weeks. I suspect I have a great deal more to learn.

One final note: If anyone approaches you to work on a web app that will take 18 years, run away as fast as you can, and do not look back.

Seven Things

I feel like I’m about last to the party, but after getting tagged by Ben Ramsey I thought I’d contribute to this meme/tag/whatever that’s going around the PHP community.  I intend to blog more in 2009, so this is a starting point anyway.

Seven Things you didn’t know about me:

1.   I learned to program in the 4th grade on an Apple II, in LOGO.  A high school near me had a program where a group of selected nerds from local schools would go there once a week and share two machines.  I was the only one in primary (elementary) school.

2. I used to do a lot of singing – school choir, madrigals, musicals, an a capella group with my friends, and a youth gospel band.  I sing alto.

3. I’m not religious and consider myself a (non militant) atheist.  We were raised that way since my family is a combination of Catholic, Jewish and Presbyterian and my parents didn’t believe in organized religion.  Despite this I went to an Anglican girls’ school for 11 years (also see the above mentioned gospel band) and am technically Jewish.

4. I met my husband Luke Welling in Advanced Software Engineering at college (RMIT University in Melbourne, Australia).  We had to do a review of each other’s code as part of an assignment.  The first words he ever said to me were, “This is crap.”

5.  I moved house every 1-2 years when I was a kid, left school at 16, went back at 19, and then spent far too much time nerding out at college, which I loved.

6.  I’ve been riding horses since I was four years old.  At four, I rode my Shetland pony Froggy through the house to annoy my mother.  (I believe I succeeded.)

7.  I sold my first article to a magazine when I was about 13 years old.  It was a light humorous piece about the challenges involved in buying a horse and was printed in a national horse magazine.  (I always wanted to be a writer when I grew up…or possibly a veterinarian…or maybe a secret agent.  I never thought it would end up being tech books that I wrote.)  I have also written one complete bad novel, and the beginnings of several others.  (Nanowrimo, I’m looking at you.)

OK, now here are my tag-ees.

Tag-wise, here are the rules:

  • Link your original tagger(s), and list these rules on your blog.
  • Share seven facts about yourself in the post—some random, some weird.
  • Tag seven people at the end of your post by leaving their names and the links to their blogs.
  • Let them know they’ve been tagged by leaving a comment on their blogs and/or Twitter.

A Year at Mozilla

This week marks one year I have been at Mozilla.  I’ve always found milestones a good time for reflection, so I tend to think back around these times.

Since I started at Mozilla, I’ve been lucky enough to work on some great projects, including:

– Developing the AMO (http://addons.mozilla.org) API, used by Firefox 3 for the Addons Manager

– Scaling SUMO (http://support.mozilla.com) in preparation for Download Day

– Leading development for SUMO

– Helping plan the PHP5 migration for our web properties, and migrating AMO

– Working with Chris Pollett on full text search for AMO

– Working with Jacob Potter, one of our awesome interns this summer, on Melanion, our loadtesting webapp

– Working with Legal on an upcoming project

– Designing and planning a Single Sign On solution for all of the Mozilla web properties.

There’s been a lot of travel including to the superb Firefox Summit at Whistler, which was one of the highlights of my year.

I’ve also been pretty slack about blogging over the last year, I note, because some of these things really deserve their own entries.

The Mozilla firehose takes a while to absorb, but finally it dawns on you that this place is really really different from other companies, and in a very good way.  John Lilly was calling it “chaord” which is an excellent description – pushing control and responsibility out to the edges.  In some ways it reminds me of academia, with regard to both the autonomy we have and the rigor in the way we do things, in other ways the organic anarchy of many other Open Source projects.

I’m also really lucky, and feel privileged, to work with such a good group of people, both in my own team and in the whole of the organization.

On a more personal note, I’m a much happier person now than I was when I started this job.  I don’t think I’ll ever be the same person who came to the USA for three months three years ago, but I guess time changes everyone.  (Even this year hasn’t been straightforward or quiet on a personal level, but it’s been easier.)

Here’s to many more years of good work with the good people at Mozilla.