Use of dashboards for performance tuning

Jason Gorman describes in his article 6 requirements for the practical use of dashboards in performance related testing.

As a performance tester, I find that I often need to provide snapshot summaries of systems performance ala dashboards. The key requirement imposed is normally timeliness and relevance of data being presented.

So sticking with Jason’s requirements, how do you implement good dashboard design?

1. Cognitive Simplicity – you should be able to digest the meaning of it in a matter of seconds …
I think the key here is to understand your audience. Generally the higher up the management food chain you go, the more simple they like the presentation. For a tech audience, I like to present in something they might relate to, such as transaction distributions represented by a miniature line chart and vertical box plot. Take for example the following graphical representation:
box plot with line chart
I find that to be a popular format. The line chart gives you an instant snapshot of data over time (in this case asvc_t response time on the active queue) and you can easily see where response times spiked in the sample. More useful IMHO though is the use of box plots. The vertical box plot to the right indicates that the top 25% of transactions measured ranged from 0 – 60 msecs indicating a problem with 25% of my results. Couple with the use of visual indicators such as colour, where red represents an exceeded threshold, I find these snapshots very useful and popular with other tech peeps.

A bit higher up the food chain, I find people tend to get suckered by corny dashboard widgets as in the following:
cpu gauge

2. Timeliness – the information should, as much as possible, be a reflection of the reality now …
This is actually a difficult requirement to achieve, and normally the limiting factor for me is the size of the data that you need to analyze in order to provide concise, meaningful snapshots of information. Particularly when that data exists on multiple providers/sources, that are separate to your analysis solution. In any case, the best way to achieve timeliness is to automate the processing as much as possible. To that end, I normally collect data via scheduled cron jobs, and facilitate some sort of batch processing into my dashboard solution. Currently I generate data into csv files, then have a web based front end where I can upload that data for further processing. Sometimes I am dealing with 300MB+ size files of raw data, so you may wish to introduce some form of data aggregation or thinning in your solution. An example of the web front end I use is available at numbrcrunchr.com but this is really not suitable for Internet use. You need high speed bandwidth to upload (and process) such huge files.

3. Actionability – (yes, that’s a made-up word) – you should be able to take some kind of action to correct any problems revealed by the dashboard …
I’m afraid this is something that my own implementation lacks and I fear it could be quite difficult to provide based on the permutations and combinations of advice you may need to provide for a given scenario. However, more often than not, the use of thresholds in your dashboards can often provide the necessary triggers to instigate corrective action. Whether that be as simple as sending an email or sms, restarting a service or flagging the problem in some sort of log is the start of your action plan. Actually telling people how to resolve it, now that’s a challenge!

4. Constructiveness – it should point you towards practical improvements, rather than just damning current performance …
I think as a performance tester you always need to be cognizant of this, as being overly negative both in your analysis and presentation of information can relegate you to the too hard basket. I try to remove any personal comments in my analysis as in:
“the sub system performance of the SAN was very poor and the application could not provide a consistent response time” – this only isolates yourself from the production support and application development teams
Try and concentrate on the aspects which you may be able to tune or improve such as:
“disk io wait was observed to be high, it is recommended that cache hit ratios and network latencies are checked by the ABC support team for the period 03.00-04.00 am”

5. Approachability – it should not be threatening or put people off seeking feedback …
Once again I think this relates to your intended audience, and being aware of what they expect. One performance measurement I particularly hate is RAG (red amber green traffic lights) but I find it commonplace in most places I work. Over simplification of dashboards can be just as bad as over the top detailed widgets! Imagine people’s reaction to your assessment of their overall application performance as ‘red’. I think you need to be granular enough in your approach, and descriptive enough that it encourages feedback or more questions. “Do you know why response times peaked at 3.00am? Why yes, we run an Oracle backup at that time etc”

6. Practicality – a dashboard needs to not be a burden on the people who manage and use it …
I find web based tools are the best. At the end of the day it depends on what the company already has in place, try not to re-invent the wheel as there are a lot of great applications out there (SiteScope, OpenView, SAS etc). But if you get the urge, try some of the multitudes of charting APIs available. I personally prefer ChartDirector but there are tons of options on offer!

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">