<!DOCTYPE HTML> <html> <head> <title>Boomerang Howto #0: How to read data out of a beacon or before_beacon event handler</title> <meta http-equiv="Content-type" content="text/html; charset=utf-8"> <link rel="stylesheet" type="text/css" href="../boomerang-docs.css"> </head> <body> <span style="float:right;"><a href="../">All Docs</a> | <a href="index.html">Index</a></span> <h1>Boomerang Howto #0:<br>How to read data out of a beacon or before_beacon event handler</h1> <p> For all subsequent examples, you'll need to pull performance data out of the beacon or out of a <code>before_beacon</code> event handler. This howto explains how to do that. </p> <h2>Beacon results back to your server</h2> <p> For most cases you'd want to send performance data back to your server so you can analyse it later and take action against it. The first thing you need to do is set up a url that the javascript will use as a beacon. We'll look at the back end details a little later. You tell boomerang about your beacon URL by passing the <code>beacon_url</code> parameter to the <code>BOOMR.init()</code> method: </p> <pre> <script src="boomerang.js" type="text/javascript"></script> <script type="text/javascript"> BOOMR.init({ beacon_url: "http://yoursite.com/path/to/beacon.gif" }); </script> </pre> <p> I've used beacon.gif as an example, but it could really be any thing. You could write a script in PHP or C# or JSP to handle the beacons as well. I just use a URL that does nothing on the back end, and then later look at my apache web logs to get the beaconed parameters in a batch. </p> <h3>Beacon parameters</h3> <p> The beacon that hits your server will have several parameters. Each plugin also adds its own parameters, so if you have custom plugins set up, you'll get parameters from them as well. This is what you get from the default install: </p> <h4>boomerang parameters</h4> <dl> <dt>v</dt> <dd>Version number of the boomerang library in use.</dd> <dt>u</dt> <dd>URL of page that sends the beacon.</dd> </dl> <h4>roundtrip plugin parameters</h4> <dl> <dt>t_done</dt> <dd><strong>[optional]</strong> Perceived load time of the page.</dd> <dt>t_page</dt> <dd><strong>[optional]</strong> Time taken from the head of the page to page_ready.</dd> <dt>t_resp</dt> <dd><strong>[optional]</strong> Time taken from the user initiating the request to the first byte of the response.</dd> <dt>t_other</dt> <dd><strong>[optional]</strong> Comma separated list of additional timers set by page developer. Each timer is of the format <code>name|value</code>. The following timers may be included: <dl> <dt>t_load</dt> <dd><strong>[optional]</strong> If the page were prerendered, this is the time to fetch and prerender the page.</dd> <dt>t_prerender</dt> <dd><strong>[optional]</strong> If the page were prerendered, this is the time from start of prefetch to the actual page display. It may only be useful for debugging.</dd> <dt>t_postrender</dt> <dd><strong>[optional]</strong> If the page were prerendered, this is the time from prerender finish to actual page display. It may only be useful for debugging.</dd> <dt>boomerang</dt> <dd>The time it took boomerang to load up from first byte to last byte</dd> <dt>boomr_fb</dt> <dd><strong>[optional</strong> The time it took from the start of page load to the first byte of boomerang. Only included if we know when page load started.</dd> </dl></dd> <dt>r</dt> <dd>URL of page that set the start time of the beacon.</dd> <dt>r2</dt> <dd><strong>[optional]</strong> URL of referrer of current page. Only set if different from <code>r</code> and <code>strict_referrer</code> has been explicitly turned off.</dd> <dt>rt.start</dt> <dd>Specifies where the start time came from. May be one of <code>cookie</code> for the start cookie, <code>navigation</code> for the W3C navigation timing API, <code>csi</code> for older versions of Chrome or <code>gtb</code> for the Google Toolbar.</dd> <dt>rt.bstart</dt> <dd>The timestamp when boomerang showed up on the page</dd> <dt>rt.end</dt> <dd>The timestamp when the done() method was called</dd> </dl> <h4>bandwidth & latency plugin</h4> <dl> <dt>bw</dt> <dd>User's measured bandwidth in bytes per second</dd> <dt>bw_err</dt> <dd>95% confidence interval margin of error in measuring user's bandwidth</dd> <dt>lat</dt> <dd>User's measured HTTP latency in milliseconds</dd> <dt>lat_err</dt> <dd>95% confidence interval margin of error in measuring user's latency</dd> <dt>bw_time</dt> <dd>Timestamp (seconds since the epoch) on the user's browser when the bandwidth and latency was measured</dd> </dl> <h2>Read results from javascript</h2> <p> There may be cases where rather than beacon results back to your server (or alongside beaconing), you may want to inspect performance numbers in javascript itself and perhaps make some decisions based on this data. You can get at this data before the beacon fires by subscribing to the <code>before_beacon</code> event. </p> <pre> BOOMR.subscribe('before_beacon', function(o) { <span class="comment">// Do something with o</span> }); </pre> <p> Your event handler is called with a single object parameter. This object contains all of the beacon parameters described above except for the <code>v</code> (version) parameter. To get boomerang's version number, use <code>BOOMR.version</code>. </p> <p> In all these howto documents, we use the following code in the <code>before_beacon</code> handler: </p> <pre> BOOMR.subscribe('before_beacon', function(o) { var html = ""; if(o.t_done) { html += "This page took " + o.t_done + "ms to load<br>"; } if(o.bw) { html += "Your bandwidth to this server is " + parseInt(o.bw/1024) + "kbps (&#x00b1;" + parseInt(o.bw_err*100/o.bw) + "%)<br>"; } if(o.lat) { html += "Your latency to this server is " + parseInt(o.lat) + "&#x00b1;" + o.lat_err + "ms<br>"; } document.getElementById('results').innerHTML = html; }); </pre> <h2>Back end script</h2> <p> A simple back end script would look something like this. Note, I won't include the code that gets the URL parameters out of your environment. I assume you know how to do that. The following code assumes these parameters are in a variable named <code>params</code>. The code is in Javascript, but you can write it in any language that you like. </p> <pre> function extract_boomerang_data(params) { var bw_buckets = [64, 256, 1024, 8192, 30720], bw_bucket = bw_buckets.length, i, url, page_id, ip, ua, woeid; <span class="comment">// First validate your beacon, make sure all datatypes </span> <span class="comment">// are correct and values within reasonable range </span> <span class="comment">// We'll also want to detect fake beacons, but that's more complex </span> if(! validate_beacon(params)) { return false; } <span class="comment">// You may also want to do some kind of random sampling at this point</span> <span class="comment">// Figure out a bandwidth bucket. </span> <span class="comment">// we could get more complex and consider bw_err as well, </span> <span class="comment">// but for this example I'll ignore it </span> for(i=0; i<bw_buckets.length; i++) { if(params.bw <= bw_buckets[i]) { bw_bucket = i; break; } } <span class="comment">// Now figure out a page id from the u parameter </span> <span class="comment">// Since we might have a very large number of URLs that all </span> <span class="comment">// map onto a very small number (possibly 1) of distinct page types </span> <span class="comment">// It's good to create page groups to simplify performance analysis. </span> url = canonicalize_url(params.u); <span class="comment">// get a canonical form for the URL</span> page_id = get_page_id(url); <span class="comment">// get a page id. (many->1 map?) </span> <span class="comment">// At this point we can extract other information from the request </span> <span class="comment">// eg, the user's IP address (good for geo location) and user agent </span> ip = get_user_ip(); <span class="comment">// get user's IP from request </span> woeid = ip_to_woeid(ip); <span class="comment">// convert IP to a Where on earth ID</span> ua = get_normalized_uastring(); <span class="comment">// get a normalized useragent string</span> <span class="comment">// Now insert the data into our database </span> insert_data(page_id, params.t_done, params.bw, params.bw_err, bw_bucket, params.lat, params.lat_err, ip, woeid, ua); return true; } </pre> <h3>Scaling up</h3> <p> The above code works well when you have a few thousand requests in a day. If that number starts growing to the hundreds of thousands or millions, then your beacon handler quickly becomes a bottleneck. It can then make sense to simply batch process the beacons. Let the requests come in to your apache (or other webserver) logs, and then periodically (say once an hour), process those logs as a batch and do a single batch insert into your database. </p> <p> My talk from IPC Berlin 2010 on <a href="http://www.slideshare.net/bluesmoon/scaling-mysql-writes-through-partitioning-ipc-spring-edition">scaling MySQL writes</a> goes into how we handled a large number of beacon results with a single mysql instance. It may help you or you may come up with a better solution. </p> <h2>Statistical analysis of the data</h2> <p> Once you've got your data, it's useful to do a bunch of statistical analysis on it. We'll cover this in a future howto, but for now, have a look at my ConFoo 2010 talk on <a href="http://www.slideshare.net/bluesmoon/index-3441823">the statistics of web performance</a>. </p> <p class="perma-link"> The latest code and docs is available on <a href="http://github.com/lognormal/boomerang/">github.com/lognormal/boomerang</a> </p> <p id="results"> </p> <script src="../../boomerang.js" type="text/javascript"></script> <script src="howtos.js" type="text/javascript"></script> <script type="text/javascript"> BOOMR.init({ user_ip: '10.0.0.1', BW: { base_url: '../../images/', cookie: 'HOWTO-BA' }, RT: { cookie: 'HOWTO-RT' } }); </script> </body> </html> <!-- Copyright (c) 2011, Yahoo! Inc. All rights reserved. Copyrights licensed under the BSD License. See the accompanying LICENSE.txt file for terms. -->