<< May 2009 | Home | July 2009 >>

How and Why I Monitor My Java Web App

All Developers Should Proactively Monitor Their Applications In Production with time-based graphing (RRD, etc.) and log analysis
Bookmark and Share

It has always surprised me that so many developers confidently send their applications off to production and then fail to monitor them afterward.  "It is not my problem now" and "the IT guys will let me know if there is an issue" seem to be the common attitudes.  Then, when there is a fault, these programmers are performing  emergency tweaks or panicked post-mortem analysis.  The interpersonal stresses caused by such events can quickly lead to a blame game, which is not healthy for the application, the developers and both internal & external customers.

With some basic monitoring, faults, hitting hard scaling limits, etc. can be prevented or at least dealt with proactively.  At my current company, xtendx, we host our servers with Aspectra who provide system monitoring on a cornucopia of parameters via an RRD-based system called tacMON from terreActive.  For time-based monitoring needs, tacMON generates graphs on these parameters over four time ranges: Daily, Weekly, Monthly and Yearly.  For example, here are two interesting graphs:

CPU % Idle Over 24 Hours

terreActive tacMON CPU Graph Over 24 Hours
(Larger Picture)

Tomcat Heap Size in Megabytes Over 7 Days

terreActive tacMON Tomcat Heap Size Over 7 Days
(Larger Picture)

With graphs like these I can constantly monitor the health of my application to prevent faults, tune it to scale even higher, or start the slow and expensive process of expanding the server or network infrastructure.  At the office, we have setup a two screen monitoring system with 16 graphs that we watch through out the day.  These screens are hooked up to two old desktop PCs running Ubuntu 9.04 and FireFox with the browsers in full screen mode.  Once every few minutes an HTML META tag refreshs the images.  Our own low-budge NOC (Network Operations Center), if you will. 

(The paper mache rooster is 'Chuckie', our mascot--orange and black are our company colors. I can get you one too for SFr. 16 if you are interested.)

xtendx Network Operations Center (Lite)
(Larger Picture)


Over time we have come to this list of core metrics that are indicative of failure (server response time) or resource exhaustion (network bandwidth, CPU)
  • Server-level
    • Bandwidth
    • CPU load
    • Disk space
    • I/O operations
  • Application-level
    • Java Heap Size
    • Server Response Time
    • Number of Threads

Beyond these core metrics, there are dozens of other parameters that we can investigate and monitor.  All of these values are monitored 24/7 by the engineering team at Aspectra too.
  • System
    • Fan speeds
    • Temperature (CPU, Power Supply, etc.)
    • Up Time
    • CPU interupps per second and context changes per second
    • Memory Free
    • Swap Space % Used
  • Application
    • Various MySql parameters
    • Other Java Memory Sizes (Eden, etc.)
All of this is critical to proactively detecting faults, being aware of usage patterns, and forecasting future resource requirements.  It also looks kinda sexy and my management likes to show it off. 

Beyond this, I also take a regular tour through the logs that Java, Tomcat and my application produce.  It only take a few minutes to

   grep ERROR logs/mylogfile.txt

and look for recent stack traces and other errors in catalina.out or localhost-yyyy-mm-dd.log.  

If you are not monitoring your application and platform continuously and looking at your log files on a regular basis then you are asleep at the wheel!  A crash is inevitable.

Getting Engaged in the Global Internet Era

Using Multimedia Technologies and Mediums to Announce Our Engagement
Bookmark and Share

Earlier this week, I proposed to my girlfriend, Robyn.  There is an obvious protocol regarding the order in which folks are notified and how personal it is.   For us, the situation is complicated by the distances between us and many of our friends and family members.    (Robyn is from Durban, South Africa.  I grew up in Michigan, Hawaii and California, but currently live in Zürich, Switzerland.  We both have lived in the UK, and we met in Portland, Oregon.)  Here is how it played out:
  1. Tuesday 8pm, over dinner: Propose!  "Yes!"
  2. Tuesday 9pm: Skype Out to Alice, my sister, on her mobile phone.  She is going camping with her family and our mother north of Los Angeles.
  3. Tuesday 9:30pm: Attempt to Skype Out to both mother and father--no luck.  Mom's signal is weak and dad's phone goes straight to voice mail.
  4. Wednesday 7pm: Skype Dad's mobile phone in Michigan, chat for a bit.  Skype Mom's mobile, get through at the camping spot, all is good.
  5. Wednesday 8pm: Post cryptic line on Facebook status that the friends we have in common will understand:  "Joe E. called it months ago. We officially and openly hear the bells ringing some time soon--the laces of two boots are getting all knotted up."  Robyn texts people madly on her mobile phone.  Her own Facebook status reads "said YES!"
  6. Wednesday 830pm: We both update our status to 'engaged'.  (It's not official until it's on Facebook!)
  7. Wednesday 9pm: Skype Out or Video Skype with close friends, like Whitey in Massachusetts, who are available.  Emails for the rest.
  8. Wednesday 10pm: Post new thread to forums of BootsnAll.com, the travel community website that we and many of our friends met through.
  9. Wednesday 1020pm: Post to Facebook pictures of Robyn reading up on visa requirements on the web.
    Robyn Hobbs researching visas to American and the Schengen Zone
  10. Thursday: Respond to various emails, IMs and BnA posts through out the day.
  11. Thursday 330pm: Robyn (@dopeyzn) tweets on Twitter "got engaged to @stuinzuri... super duper stoked :)"
  12. Friday 1PM: Find that our engagement ranks a post on BootBlog: There’s gonna be a Bootie wedding!  We are celebrities now!
  13. Friday 430pm: If one googles "Stu and Robyn" the first link is to the BootBlog post!
What strikes me as interesting, and a change from how I would expect things to have played out just a two decades ago, is the cornucopia of technologies and mediums we used to communicate our engagement.  Not just email and traditional telephones, but also internet messaging, web site forums, Voice over IP, cellular telephony, blogs, Facebook and Twitter.  Real time communications, store and forward, push.  Text, voice, video and still photography. 

Welcome to the future.  May we live happily ever after.

Stack Overflow: Voting Patterns in Detail

Up, Down, all around. Offensive? Close? Spam! Inform Moderator...
Bookmark and Share

Continuing to investigate user voting patterns on Stack Overflow has become a hobby (obsession?) of mine.  Thanks in part to my curiosity and in part to nobody_ (known know mysteriously as 'Kyle Cronin'; the administrator of the Unofficial Stack Overflow Meta Discussion Forum) egging me on, I quickly whipped up a graph showing the propensity to "Up Vote versus Reputation".

Stack Overflow: Percentage of Up Votes (of Up + Down) versus Five Tiers of User Reputation
Up Vote (as a percentage; % = up / (up + down)) for five reputation tiers of users with at least one up vote and one down vote and a reputation of at least 100 (when one is allowed to down vote.) The x-axis represent five user tiers.
The first three represent ~5,000 each, the fourth ~1,500 and the fifth ~125.


It is very clear that users with higher reputations are more likely to down vote.  But this lead to other questions, such as:
  • Do the users with older accounts, especially beta users, make up the negative voting club?
  • Did the down votes shift downwards because of new features introduced to Stack Overflow, specifically new voting options like 'Spam', 'Offensive', 'Inform Moderator' and 'Close'?
To try and answer the first question, I queried the database, tinkered with it in Excel and the resulting graph is below.  The blue line is "Average % Up Votes of All Votes by User Join Date".  The red series is the "Average Reputation by Join Date".

Stack Overflow: Up Votes as a Percentage of Up+Down Votes Over Time and Average Reputation by Join Date

(Larger Image)

Notes on the above graph:
  • The percentage is only for users with
    • at least one up vote
    • one down vote
    • reputation of at least 100
  • The yellow data point on the Average Reputation series represents the day Stack Overflow sign-ups were open to the general public.  Our own little Eternal September, if you will.  (Not that bad, actually.)
  • The purple spike in the %-Up Votes is caused by Niel Butterworth, who has both many votes (~1700 in the data dump) and a 50/50 Up Vote versus Down Vote ratio.
  • The leveling of the average reputation curve a few weeks after (end of September/early October) Stack Overflow went public is interesting.  It seems, to no surprise, that beta users and the initial public users are much more into SO than the follow up users.
  • The far left data point represents seven users who got accounts on 31 July 2008, and are the movers-n-shakers of Stack Overflow.  (Jeff Atwood, Jarrod Dixon, Joel Spolsky, and Jon Galloway) who understandably have very high reputation scores.
To try and answer the second question above ("Did down votes shift to other types of voting options as they came on line"), I queried and graphed the data again to produce a view of "Vote Type as a Percentage of All Votes Cast". Example: Where people in the past SOpedians would down vote a question or answer if they found it was spam, later on they could mark the post as spam or offensive. While the two options are not mutually exclusive, the down vote costs the user a reputation point. Since roughly 90% of votes are up votes, the graph zooms in on the top 10%.


Stack Overflow: Vote Types as a Percentage of All Votes Over Time

(Larger Image)

It seems that the new voting options do not impact up/down voting patterns significantly.  Note the sudden growth of 'close' votes on the trailing week or two of the graph.  It seems to be that this is a change in the raw data rather than a sudden burst of close votes, but am not sure because I myself did not rank the power to vote for closing a question until around that time.   Also, 'close' votes are only valid for questions and not answers, unlike up and down votes.

The last few weeks of the data dump look interesting, so I zoomed in there and produced the below graph.


Stack Overflow: Vote Types as a Percentage of All Votes Over Time

(Larger Image)

The burst and subsequent tapering off of Spam, Offensive and Inform Moderator votes seems very suspicious to me.  Was this actual activity?  Or was there a data collection issue?  Or was that when these voting features were created?   I'm guessing it was a data collection issue.  Future data dumps will show this to be true or not, I hope.

Whatever the cause, the number of votes here is still to small to impact the percentage of up votes over time to any degree.  My conclusion is that
  1. SOpedians with high reputations are more likely to vote down questions and answers
  2. As Stack Overflow gains more and more users with lower reputations, these users are less likely to vote down and bring up the over all percentage of up votes against all votes over time.

Stack Overflow: Down Votes vs. Up Votes vs. Reputation

Where do the 'personalities' and 'forces' of Stack Overflow reside in a three dimentional plot of Down Votes, Up Votes and Reputation scores?
Bookmark and Share

My fascination with up and down voting patterns persists.  Below are some statistics and a graph derived from asking the data "How do the top 1000 Stack Overflow users relate to each other with regards to up votes and down votes?"  Some results:

  • Random statistics on all users (not top 1000)
    • Only 278 of SOpedians have voted down more than 100 times
    • A puny 121 have down voted more than up voted
    • A mere 11 have down voted more than 500 times
  • Mr. Down Voter: Rich B  (.52 Up votes for every down vote)
    • 1796 Down
    • 932 Up
  • Mr. Up Voter: JB King (298 Up vs. Down)
    • 15 Down
    • 4474 Up
  • Mr. Even Handed: (There are two)
  • Other Personalities
Stack Overflow Badges: Reputation versus Up Votes versus Down Votes

Larger Image

Notes on the above graph:
  • The x-axis is Down Votes,
  • the y-axis is Up Votes, and
  • the z-axis (bubble size) is Reputation Score
  • The x-axis and y-axis are not proportional, meaning if one were to draw a line from the orgin at a 45° angle, that line would not represent a 1:1 x:y relationship
Impressions and Analysis:
  • By and large, the Top 10 SOpedians are more likely to vote up than top 1000
  • By and large, everybody in the top 1000 is more likely to vote up than down
  • Joel Spolsky doesn't vote so much, relative to Jeff Atwood--they are the public faces of stackoverflow

Handmade wrist strap with flashlight for my Canon 350D

Robyn took my Aspectra-branded lanyard with small flashlight and modified it on her mother's sewing machine
Bookmark and Share

Over the years, I've never been fond of the around-the-neck camera straps.  Those things inhibit the position I want to place the camera in.  So, I have opted for those lanyard thingies companies hand out at technology conferences and events. 

Last fall, the system administration & co-location vendor my company uses, Aspectra, invited me to an event where we received a lanyard with a decent halogen flashlight attached. 

This turns out to be very handy: One little issue that has some up with photographing in a club environment is the need to step around in the dark corners where all the cables and whatnot live.  With the flashlight, this is much safer.  What did not work for me was the lanyard's large loop and 360° swivel.

While in Durban, South Africa last month, Robyn broke out her mother's sewing machine and did two things: 
  1. created a tight, wrist size loop
  2. mounted the flashlight near the wrist loop for easy access
I am quite pleased.  If there were more hours in the day I'd try to market this to camera shops.  The StuCord 3000!  Or something like that.  Hmmm...

Oh, and note the new fixed 50mm fast lens, a Canon f/1.8!    I am very happy with it and glad I did not spend 5x more for the f/1.6.

Custom Canon Wrist strap

Stack Overflow: Badge Analysis Over Time

The 87/18 Rule Applied to Stack Overflow Badges as Awarded Over the Past Nine Months
Bookmark and Share

Another day, and another Stack Overflow database dump XML to play with.  Some quick statistics from the badges.xml file:
  • 62 distinct badges
  • 239,005 user badges awarded 
  • 49,261 users have received at least one badge
  • The Top 11 badges (of 62, making 18% of distinct badges) make up 87% of badges awarded
    • Teacher (13.1% of all badges awarded)
    • Student (12.4%)
    • Supporter (10.6%)
    • Scholar (10.1%)
    • Editor (9.8%)
    • Nice Answer (9.6%)
    • Autobiographer (5.3%)
    • Critic (4.8%)
    • Commentator (4.1%)
    • Popular Question (3.6%)
    • Organizer (3.2%)
Sorta interesting, but it is worth noting that most of these are handed out like parking tickets in Venice Beach. (Easy badges in italics.)  What was more interesting was the anomalies that only become visible when graphed over time.  This graph is of the top eleven badges over time awarded as a percentage of the same top eleven.  This is better as showing relative increases or decreases in badge awarding events.

Stack Overflow Badges- Top 11 Over the First 9 Months as a Percentage
Larger Image

A few things caught my eye:
  • Beta Days: Things are pretty erratic in the beta days, but that is to be expected with a significantly smaller user base who were actively trying features out as they come online. 

  • Days Long Outage: There is a days-long gap in the data in mid-April.  No badges were handed out for about four or five days, but they were eventually awarded when the problem was fixed.  I did not see a mention of any failures on  blog.stackoverflow.com, so the cause of this outage is a mystery to me. 

  • Drastic drop new Organizer badges: Once the outage was resolved, the relative amount of Organizer badges drops permanently by two-thirds!  Clearly an Illuminati conspiracy to keep us SOpedians down.

    • ~28 Organizer badges are awarded per day for the three weeks prior to the outage
    • ~8 Organizer badges are awarded per day for the following three weeks

    (UPDATE: Geoff Dalgas , a coder at Stack Overflow, posts on The (unofficial) StackOverflow meta-Discussion Forum that the reason for this behavior is due to a database refactoring that allowed them to distinguish between Question edits and tag edits.)

  • Number of Popular Questions badges awarded daily grows over time: It starts out at near zero, and grows over time to be a considerable fraction of the total.  I guess this is to be expected as questions pick up more and more views over time.

  • No Popular Question badges awarded for 27 May: And, unlike the above outage, they do not seem to have been awarded retroactively.  There the missing badges show in both the absolute graph (not shown, trust me) and the relative graph (close up below.)

    (UPDATE: I asked about this anomaly on the new Meta Stack Overflow site (No Popular Question badges awarded for 27 May? ), and it apparently (to be confirmed) the systems view counter was down that day...so not questions incremented over the threshold for a badge.)
Stack Overflow Badges- Missing Popular Question badges?

Stack Overflow: Up and Down Voting Pattern Analysis

SOpedians are getting nicer as time goes on, except for the occational flair up
Bookmark and Share

The kids at stackoverflow.com, most prominently Jeff Atwood, recently released the Creative Commons licensed data behind Stack Overflow via bit torrent, and I eagerly downloaded the database dump and imported into MySql for some analysis of voting patters.

Since the beta, I have always been a fan of the down vote. Many SOpedians find them hostile and mean--to the point off getting their knickers all bunched up. My belief is that they have a cleansing effect of the questions and answers. Down votes are in the spirit of (what I have interpreted the founder's goals to be for) Stack Overflow.

After whipping up an embarrassingly crude Python script to import the voting data into MySql, I ran a simple query that gave me the daily up and down vote totals for each day. Then I graphed it all in Excel and added three new series: up-to-down ratio, 9-day up-to-down ratio average, and a up-to-down trend line.

Stack Overflow Voting Patters Over Time

(larger image)

My interpretations:

  • Stack Over flow went live in mid-September, hence the huge jump in votes then. No surprise there.
  • The humps are weekdays, the troughs are weekends, and the winter holidays are clearly visible.
  • For every down vote there are 10 to 12 up votes
  • The up vote to down vote ratio is increasing over time. My gut tells me that this is related to the introduction and expansion of post closing, deleting and moderator warning functionality. Or maybe "Down Voting Fatigue" sets in with many users? The ideas that maybe SOpedians are posting less junk or that they are just being nicer as time goes on are ridiculous!
  • There is a huge spike in down votes and/or a corresponding drop in up votes on 21 February. There does not seem to be any one post that sparked this. Interesting.
  • The single most down voted post is in response to What is the most spectacular way to shoot yourself in the foot with C++? with (as of 6 June 2009) 39 down votes! (This does not show in the graph...just an ad hoc 'I wonder...' query.)

Interesting stuff. It will be entertaining to comb over the Stack Overflow data in more detail in the future.