Tag Archives: Mozilla

DXR: code search and static analysis

I asked Erik Rose from my team to blog about his work on DXR (docs), the code search and static analysis tool for the Firefox codebase.  He did so on the Mozilla Webdev blog, so it would show up on Planet Mozilla. Today, it was pointed out to me that the Webdev blog is not on Planet.

It’s a great post, summarizing all the things he’s done in the last few months.  Go read the article: DXR digests the Firefox codebase.

Rapid releases: one webdev’s perspective

People still seem to be very confused about why Mozilla has moved to the new rapid release system. I thought I’d try and explain it from my perspective. I should point out that I am not any kind of official spokesperson, and should not be quoted as such. The following is just my own personal opinion.

Imagine, now, you work on a team of web developers, and you only get to push new code to production once a year, or once every eighteen months. Your team has decided to wait until the chosen twenty new features are finished, and not ship until those are totally done and passed through a long staging and QA period. The other hundreds of bugs/tickets you closed out in that 12-18 months would have to wait too.

Seems totally foreign in these days of continuous deployment, doesn’t it?

When I first heard about rapid releases, back at our December All Hands, I had two thoughts. The first was that this was absolutely the right thing to do. When stuff is done we should give it to users. We shouldn’t make them wait, especially when other browsers don’t make them wait.

The second thought was that this was completely audacious and I didn’t know if we could pull it off. Amazingly, it happened, and now Mozilla releases use the train model and leave the station every six weeks.

So now users get features shortly after they are done (big win), but there’s been a lot of fallout. Some of the fallout has been around internal tools breaking – we just pushed out a total death sprint release of Socorro to alleviate some of this, for example. Most of the fallout, however, has been external. I see three main areas, and I’ll talk about each one in turn.

Version numbers

The first thing is pushback on version numbers. I see lots of things like:
“Why is Mozilla using marketing-driven version numbers now?”
“What are they trying to prove?”
“How will I know which versions my addons are compatible with?”
“How will I write code (JS/HTML/CSS) that works on a moving target?”

Version numbers are on the way to becoming much less visible in Firefox, like they are in webapps, or in Chrome, for that matter. (As I heard a Microsoft person say, “Nobody asks ‘Which version of Facebook are you running?’”) So to answer: it’s not marketing-driven. In fact, I think not having big versions full of those twenty new features has been much, much harder for the Engagement (marketing) team to know how to market. I see a lot of rage around version numbers in the newsgroups and on tech news sites (HN, Slashdot, etc), which tells me that we haven’t done a good job communicating this to users. I believe this is a communication issue rather than because it’s a bad idea: nowhere do you see these criticisms of Chrome, which uses the same method.

(This blog post is, in part, my way of trying to help with this.)

Add-on compatibility

The add-ons team has been working really hard to minimize add-on breakage. In realistic terms, most add-ons will continue to work with each new release, they just need a version bump. The team has a process for bumping the compatible versions of an add-on automatically, which solves this problem for add-ons that are hosted on addons.mozilla.org. Self-hosted add-ons will continue to need manual updating, and this has caused problems for people.

The goal is, as I understand it, for add-on authors to use the Add-on SDK wherever possible, which will have versions that are stable for a long time. (Read the Add-ons blog on the roadmap for more information on this.)

Enterprise versions

The other thing that’s caused a lot of stress for people at large companies is the idea that versions won’t persist for a long time. Large enterprises tend not to upgrade desktop software frequently. (This is the sad reason why so many people are still on IE6.)

There is an Enterprise Working Group working on these problems: we are taking it seriously.

finally:

Overall, getting Firefox features to users faster is a good thing. Some of the fallout issues were understood well in advance and had a mitigation plan: add-on incompatibility for example. Some others we haven’t done a good job with.

I truly believe if we had continued to release Firefox at the rate of a major version every 18 months or so, that we would have been on a road to nowhere. We had to get faster. It’s a somewhat risky strategy, but it’s better to take that risk than quietly fade away.

At the end of the day we have to remember the Mozilla mission: to promote openness and innovation on the web. It’s hard to promote innovation within the browser if you ship orders of magnitude more slowly than your competitors.

Notice that I mention the mission: people often don’t know or tend to forget that Mozilla isn’t in this for the money. We’re a non-profit. We’re in it for the good of the web, and we want to do whatever’s needed to make the web a better and more open place. We do these things because we’re passionate about the web.

I’ve never worked anywhere else where the mission and the passion were so prominent. We may sometimes do things you don’t agree with, or communicate them in a less-than-optimal way, but I really want people who read this to understand that our intentions are positive and our goal is the same as it’s always been.

Being Open

I was recently privileged to be invited to come and give a talk at AOL, on the work we do with Socorro, how Mozilla works, and what it means to be open.

The audience for the talk was a new group at AOL called the Technology Leadership Group. It consists of exceptional technical people – engineers and operational staff – from all parts of the organization, who have come together to form a group of thought leaders.

One of the first items on their agenda is, as Erynn Petersen, who looks after Developer and Open Source Evangelism, puts it: “how we scale, how we use data in new and interesting ways, and what it means to treat all of our projects as open source projects.” My task was partly to talk about how open source communities work, what the challenges are, and how AOL might go about becoming more open.

It’s amazing how things come full circle.

I think every person in the audience was a user of Open Source, and many of them were already Open Source contributors on a wide range of projects. Some had been around since the days when Netscape was acquired by AOL.

I’ll include the (limited) slides here, but the best part of the session in my opinion was the Q&A. We discussed some really interesting questions, and I’ll feature some of those here. (I want to note that I am paraphrasing/summarizing the questions as I remember them, and am not quoting any individual.)

Q: Some of our software and services are subscription-based. If we give that code away, we lose our competitive advantage – no one will pay for it anymore.

A: There are a bunch of viable business models that revolve around making money out of open source. The Mozilla model is fairly unusual in the space. The most common models are:

  • Selling support, training, or a built and bundled version of the software. This model is used by Red Hat, Canonical, Cloudera, and many others.
  • Dual licensing models. One version of the software is available under an open source license and another version is available with a commercial license for embedding. This is (or has been) the MySQL model.
  • Selling a hosted version of Open Source software as a service. This model is used by Github (git) and Automattic (WordPress), among others.
  • It’s also perfectly valid to make some of your software open and leave some proprietary. This is the model used by 37signals – they work on Ruby on Rails and sell SaaS such as Backpack and Basecamp.

Another point is that at Mozilla, our openness *is* our competitive advantage. Our users know that we have no secret agenda: we’re not in it for the money, but we’re also not in it to mine or exploit their personal data. We exist to care for the Open Web. There has been a lot of talk lately about this, best summarized by this statement, which you’ll see in blog posts and tweets from Mozillians:

Firefox answers to no-one but you.

Q: How do we get started? There’s a lot of code – how do we get past the cultural barriers of sharing it?

A: It’s easier to start open than to become open after the fact. However, it can be done – if it couldn’t be done Mozilla wouldn’t exist. Our organization was born from the opening of Netscape. A good number of people in the room were at AOL during the Netscape era, too, which must give them a sense of deja vu. I revisited jwz’s blog post about leaving the Mozilla project, back in those days after I drafted this post, and I recommend reading it as it talks about a lot of the issues.

My answer is that there’s a lot to think about here:

  • What code are we going to make open source? Not everything has to be open source, and it doesn’t have to happen all at once. I suggest starting up a site and repository that projects can graduate to as they become ready for sharing. Here at Mozilla basically everything we work
    on is open source as a matter of principle (“open by default”), but someof it is more likely to be reused than other parts. Tools and libraries are a great starting point.
  • How will that code be licensed? This is partly a philosophical question and partly a legal question. Legal will need to examine the licensing and ownership status of existing code. You might want a contributors’ agreement for people to sign too. Licenses vary and the answer to this question is also dependent on the business model you want to use.
  • How will we share things other than the code? This includes bug reports, documentation, and so on.
  • How will the project be governed? If I want to submit a patch, how do I do that? Who decides if, when, and how that patch will be applied? There are various models for this ranging from the benevolent dictator model to the committee voting model.

I would advise starting with a single project and going from there.

Q: How will we build a community and encourage contributions?
A: This is a great question. We’re actually trying to answer this question on Socorro right now. Here’s what we are doing:

  • Set up paths of communication for the community: mailing lists, newsgroups, discussion forums
  • Make sure you have developer documentation as well as end user documentation
  • If the project is hard to install, consider providing a VM with everything already installed. (We plan to do this both for development and for users who have a small amount of data.)
  • Choose some bugs and mark them as “good first bug” in your bug tracking system.
  • Make the patch submission process transparent and documented.

There was a lot more discussion. I really enjoyed talking to such a smart and engaging group, and I wish AOL the very best in their open source initiative.

Socorro: Mozilla’s Crash Reporting System

(Cross-posted from the Mozilla WebDev blog.)

Recently, we’ve been working on planning out the future of Socorro.  If you’re not familiar with it, Socorro is Mozilla’s crash reporting system.

You may have noticed that Firefox has become a lot less crashy recently – we’ve seen a 40% improvement over the last five months.  The data from crash reports enables our engineers to find, diagnose, and fix the most common crashes, so crash reporting is critical to these improvements.

We receive on our peak day each week 2.5 million crash reports, and process 15% of those, for a total of 50 GB.  In total, we receive around 320Gb each day!  Right now we are handicapped by the limitations of our file system storage (NFS) and our database’s ability to handle really large tables.   However, we are in the process of moving to Hadoop, and currently all our crashes are also being written to HBase.  Soon this will become our main data storage, and we’ll be able to do a lot more interesting things with the data.  We’ll also be able to process 100% of crashes.  We want to do this because the long tail of crashes is increasingly interesting, and we may be able to get insights from the data that were not previously possible.

I’ll start by taking a look at how things have worked to date.

History of Crash Reporting

Current Socorro Architecture

The data flows as follows:

  • When Firefox crashes, the crash is submitted to Mozilla by a part of the browser known as Breakpad.  At Mozilla’s end, this is where Socorro comes into play.
  • Crashes are submitted to the collector, which writes them to storage.
  • The monitor watches for crashes arriving, and queues some of them for processing.  Right now, we throttle the system to only process 15% of crashes due to capacity issues.  (We also pick up and transform other crashes on demand as users request them.)
  • Processors pick up crashes and process them.  A processor gets its next job from a queue in our database, invokes minidump_stackwalk (a part of Breakpad) which combines the crash with symbols, where available.  The results are written back into the database.   Some further processing to generate reports (such as top crashes) is done nightly by a set of cron jobs.
  • Finally, the data is available to Firefox and Platform engineers (and anyone else that is interested) via the webui, at http://crash-stats.mozilla.com

Implementation Details

  • The collector, processor, monitor and cron jobs are all written in Python.
  • Crashes are currently stored in NFS, and processed crash information in a PostgreSQL database.
  • The web app is written in PHP (using the Kohana framework) and draws data both from Postgres and from a Pythonic web service.

Roadmap

Future Socorro releases are a joint project between Webdev, Metrics, and IT.  Some of our milestones focus on infrastructure improvements, others on code changes, and still others on UI improvements.  Features generally work their way through to users in this order.

  • 1.6 – 1.6.3 (in production)

    The current production version is 1.6.3, which was released last Wednesday.  We don’t usually do second dot point releases but we did 1.6.1, 1.6.2, and 1.6.3 to get Out Of Process Plugin (OOPP) support out to engineers as it was implemented.

    When an OOPP becomes unresponsive, a pair of twin crashes are generated: one for the plugin process and one for the browser process.  For beta and pre-release products, both of these crashes are available for inspection via Socorro.  Unfortunately, Socorro throttles crash submissions from released products due to capacity constraints.  This means one or the other of the twins may not be available for inspection.  This limitation will vanish with the release of Socorro 1.8.

    You can now see whether a given crash signature is a hang or a crash, and whether it was plugin or browser related.  In the signature tables, if you see a stop sign symbol, that’s a hang.  A window means it is crash report information from the browser, and a small blue brick means it is crash report information from the plugin.

    If you are viewing one half of a hang pair for a pre-release or beta product, you’ll find a link to the other half at the top right of the report.

    You can also limit your searches (using the Advanced Search Filters) to look just at hangs or just at crashes, or to filter by whether a report is browser or plugin related.

  • 1.7 (Q2)

    We are in the process of baking 1.7.  The key feature of this release is that we will no longer be relying on NFS in production. All crash report submissions are already stored in HBase, but with Socorro 1.7, we will retrieve the data from HBase for processing and store the processed result back into HBase.

  • 1.8 (Q2)

    In 1.8, we will migrate the processors and minidump_stackwalk instances to run on our Hadoop nodes, further distributing our architecture.  This will give us the ability to scale up to the amount of data we have as it grows over time. You can see how this will simplify our architecture in the following diagram.

    New Socorro Architecture

    With this release, the 15% throttling of Firefox release channel crashes goes away entirely.

  • 2.0 (Q3 2010)

    You may have noticed 1.9 is missing.  In this release we will be making the power of Hbase available to the end user, so expect some significant UI changes.

    Right now we are in the process of specifying the PRD for 2.0.  This involves interviewing a lot of people on the Firefox, Platform, and QA teams.  If we haven’t scheduled you for an interview and you think we ought to talk to you, please let us know.

Features under consideration

  • Full text search of crashes
  • Faceted search: start by finding crashes that match a particular signature, and then drill down into them by category.
    Which of these crashes involved a particular extension or plugin?  Which ones occured within a short time after startup?
  • The ability to write and run your own Map/Reduce jobs (training will be provided)
  • Detection of “explosive crashes” that appear quickly
  • Viewing crashes by “build time” instead of clock time
  • Classification of crashes by component

This is a big list, obviously. We need your feedback – what should we work on first?

One thing that we’ve learned so far through the interviews is that people are not familiar with the existing features of Socorro, so expect further blog posts with more information on how best to use it!

How to get involved

As always, we welcome feedback and input on our plans.

You can contact the team at socorro-dev@mozilla.com, or me personally at laura@mozilla.com.

In addition, we always welcome contributions.  You can find our code repository at
http://code.google.com/p/socorro/

We hold project meetings on a Wednesday afternoon – details and agendas are here
https://wiki.mozilla.org/Breakpad/Status_Meetings