<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Altentee &#187; Performance &#38; Test Automation Experts &#187; R</title>
	<atom:link href="http://altentee.com/tag/r/feed/" rel="self" type="application/rss+xml" />
	<link>http://altentee.com</link>
	<description>Performance and Test Automation Experts</description>
	<lastBuildDate>Sat, 12 Jun 2010 00:35:08 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>By the power of R, statistical computing at your fingertips</title>
		<link>http://altentee.com/2007/by-the-power-of-r-statistical-computing-at-your-fingertips/</link>
		<comments>http://altentee.com/2007/by-the-power-of-r-statistical-computing-at-your-fingertips/#comments</comments>
		<pubDate>Wed, 05 Dec 2007 02:07:18 +0000</pubDate>
		<dc:creator>Tim Koopmans</dc:creator>
				<category><![CDATA[90kts]]></category>
		<category><![CDATA[Altentee]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://90kts.com/blog/2007/by-the-power-of-r-statistical-computing-at-your-fingertips/</guid>
		<description><![CDATA[<p>I&#8217;ve explored in previous posts the use of tools such as onboard Analytics (LoadRunner), off-the-shelf tools (Excel) and custom web based implementations (JGraph, ChartDirector) used to analyze the nitty gritty of performance metrics.</p>
<p>All of these tool&#8217;s use are governed by some common factors being:</p>
the Expediency factor &#8211; the timeliness of data being analyzed as measured [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve explored in previous posts the use of tools such as onboard Analytics (LoadRunner), off-the-shelf tools (Excel) and custom web based implementations (JGraph, ChartDirector) used to analyze the nitty gritty of performance metrics.</p>
<p>All of these tool&#8217;s use are governed by some common factors being:</p>
<li><strong>the Expediency factor</strong> &#8211; the timeliness of data being analyzed as measured between capturing and analysis of raw data.</li>
<li><strong>the Pimp factor</strong> &#8211; the &#8216;prettiness&#8217; in the presentation of data. Particularly important for benchmarking or external (public) facing documents. Never underestimate the importance in presentation of your results.</li>
<li><strong>the Share-ability factor</strong> &#8211; how portable the analysis needs to be. Particularly important when working with different technology groups such as middleware, storage or network subject matter experts.</li>
<li><strong>the Proprietary factor</strong> &#8211; sometimes you just can&#8217;t escape this. Your heart may lie with support open source, but often your pay check dictates that you must use proprietary formats, templates and the like as already setup by the client. Particularly pertinent with the use of tools like Load Runner, QALoad and the like.</li>
<p><span id="more-97"></span><br />
Enter stage right, the <a href="http://www.r-project.org/">R-Project</a>.</p>
<blockquote><p>R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.</p></blockquote>
<p>Using R escapes the proprietary factor although some employers may object to you using GPL in their environment. It is a tad weak on the share-ability factor but is very strong on expediency and the subsequent timeliness of your analysis. Oh and the pimp factor, well, that just depends on your imagination. For this demo I&#8217;m sticking with corporate grey on a white background. <img src='http://altentee.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><strong>So how do you use it?</strong><br />
Essentially the reason I&#8217;m finding R so powerful, is its ability to quickly analyze and present meaningful statistical information from a raw data source. In my current implementation, I&#8217;m using a sequence of simple MySQL import scripts to import raw data (e.g. iostat, vmstat etc) in a database table. For my demo, the table name is based on the server I am sampling.</p>
<p>Then using a simple R script, that is executed from the R console, I loop through that table (don&#8217;t even need to bother with what the column names are, as it automatically enumerates those) and present the data as a series of line charts with a lowess curve fitted.</p>
<p>The script itself is as simple as this.<br />
<code>library(RMySQL)<br />
MySQL<br />
m <- dbDriver("MySQL")<br />
con <- dbConnect(m, group="DatabaseName")</p>
<p>perfstat <- function(metric, table, dtgFrom, dtgTo) {<br />
    print(metric)<br />
    query<-paste("SELECT ", metric, " result FROM ", table, "<br />
    WHERE dtg BETWEEN ", dtgFrom, " AND ", dtgTo, sep="")<br />
    rs<-dbSendQuery(con, query)<br />
    df<-fetch(rs,n=-1)<br />
    quartz()<br />
    plot(df$result, yaxs ="i", type="l", col="grey", xlab="Elapsed Time",<br />
    ylab=metric, main=paste("Line chart of ",metric), lwd=1)<br />
    lines(lowess(df$result,f=.05,iter=100), col="blue", lwd="0.25")<br />
}</p>
<p>table<-"perfstats_ServerName"<br />
dtgFrom<-"'2007-11-29 19:00:00'"<br />
dtgTo  <-"'2007-11-30 19:00:00'"<br />
fields <- dbListFields(con, table)<br />
for (i in fields) {<br />
if (i == "dtg") print("skipping") else<br />
perfstat(paste("`",i,"`",sep=""),table, dtgFrom, dtgTo) }</p>
<p>dbDisconnect(con)</code></p>
<p>In that example, I'm sampling performance stats from a remote server spanning a 24 hour period, from which data was sampled at a 5 second interval, and presenting that info as follows.<br />
<a href='http://90kts.com/blog/wp-content/uploads/2007/12/r-output.png' title='r sample otuput'><img src='http://90kts.com/blog/wp-content/uploads/2007/12/r-output.thumbnail.png' alt='r sample otuput' /></a></p>
<p>With a slight twist of the code, you can plot result distributions using my favourite representation (boxplot) to help show the distribution of your data. Code like this:<br />
<code>library(RMySQL)<br />
MySQL<br />
m <- dbDriver("MySQL")<br />
con <- dbConnect(m, group="mysql")</p>
<p>perfstat <- function(metric, table, dtgFrom, dtgTo) {<br />
    print(metric)<br />
    query<-paste("SELECT ", metric, " result, device FROM ", table, "<br />
    WHERE dtg BETWEEN ", dtgFrom, " AND ", dtgTo, " ORDER BY device", sep="")<br />
    rs<-dbSendQuery(con, query)<br />
    df<-fetch(rs,n=-1)<br />
    quartz()<br />
	# set plot parameters<br />
	op <- par(las=2, lwd=0.5, omd=c(0,1,0,1), mar=c(5,15,4,2), cex=0.8)</p>
<p>    boxplot(df$result ~ df$device, data = df, col="lightgray", horizontal=TRUE, lwd=0.5, range=0, notch=TRUE,<br />
		xlab=metric, main=paste("Boxplot of ",metric))<br />
}</p>
<p>table<-"_iostat"<br />
dtgFrom<-"'2007-11-13 18:00:00'"<br />
dtgTo  <-"'2007-11-14 18:00:00'"<br />
fields <- dbListFields(con, table)<br />
for (i in fields) {<br />
if (i == "dtg") print("skipping") else<br />
perfstat(paste("`",i,"`",sep=""),table, dtgFrom, dtgTo) }</p>
<p>dbDisconnect(con)</code><br />
Will present very useful boxplots (grouped by device ID) something like this:<br />
<a href='http://90kts.com/blog/wp-content/uploads/2007/12/r-boxplot.png' title='r boxplot'><img src='http://90kts.com/blog/wp-content/uploads/2007/12/r-boxplot.png' alt='r boxplot' /></a></p>
<p>Which makes it very easy to spot performance bottlenecks from a 20,000' level.</p>
<p>I'm a realtive noob with this program, so will be exploring better ways to script and automate with it in future. But for the time being, its ability to quickly analyze data in a meaningful manner makes it one of the sharper tools in your toolbox.</p>
<p>I'm currently using it from my Macbook Pro, but hope to integrate it on some production Windows servers in the near future (using its ability to output straight to png). This a great product, it's free and relatively easy to use. You could use this tool to automate ongoing performance benchmarking aspects of your application development life cycle, or as a simple health check for your key systems.</p>
]]></content:encoded>
			<wfw:commentRss>http://altentee.com/2007/by-the-power-of-r-statistical-computing-at-your-fingertips/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
