Monday, November 22, 2010

Routing Traffic with JavaScript

As we prepared to launch our new beta site at Edmunds, we had to decide how to gradually move users over without permanently locking them in and without disrupting our current infrastructure. Our plan was to start small, redirecting just a few percent of users, and to ramp up as we gained confidence in the performance and stability of the new site. 



We considered all the obvious options: a server side-only approach, network-based solutions and a mix of server and client side logic. But we eventually settled on a pure JavaScript solution. It's probably the simplest of all the approaches we considered, and it allowed us to achieve our goals with the least amount of cost, effort and risk. In this post I'll discuss our routing logic, issues we ran into and some details you'll want to think about if you plan to implement something similar.



THE BASICS


Our core traffic routing logic is straightforward. When a user visits the legacy site (i.e. the current production site,) they have an X percent chance of being allocated as a beta user. If they get lucky and end up in the pool of beta users, the script redirects them to the beta site using their current, legacy URL path and query string. The allocation decision (legacy or beta) is stored in a cookie and, on repeat visits, the script uses the stored value instead of re-allocating so the user stays pinned to the same site.



URL TRANSLATION


It might seem like reusing the legacy URL path and query string to create a beta site URL would be too simple to handle the full range of URL translation between the old and new sites -- and indeed it is. It only works because the routing script doesn't have to handle anywhere near the full range of possible URLs. Redirection (and thus, URL translation) only happens from landing pages and the beta site needs to handle those URLs anyway. If it didn't, we would see a nasty decrease in traffic when we switch over to the new site. Most companies will likely have the same requirement to support well-known, landing page URLs, so this simple approach is probably enough in most cases.



On the subject of URL translation, keep in mind that a client side redirect will effectively wipe out your original referrer URL and make it appear as if all the traffic to the new site's landing pages is coming from the legacy site. This makes it difficult to compare incoming traffic. The easy solution is to capture the referrer URL on the legacy page and store it in a cookie before performing the redirect. Then provide a clear API so that tracking code on the destination page can retrieve and reset the value.



RANDOM NUMBERS


Something that may be of interest to the more mathematically inclined is the quality of random number generation in JavaScript. Like many simple random functions, JavaScript's built-in Math.random() isn't really very random at all. You certainly wouldn't use it as part of a cryptographic algorithm, and we weren't even sure if it was random enough to hit a routing percentage with any degree of accuracy. We tested a few high quality JavaScript random number generators along with the built-in function and ultimately decided that Math.random() is "good enough" at the scale we're working at and with a lot less CPU overhead compared to the alternatives. Last I heard, we were achieving a routing percentage within 5 hundredths of a percent of our target.



REDIRECTING WITH JAVASCRIPT


Initially, we performed the actual redirect by assigning a new URL to window.location.href. But we soon discovered that in Internet Explorer, href assignments aren't treated as actual redirects and the URL you are trying to redirect from is added to the browser's history. This is technically correct behavior but definitely not what we wanted. Other browsers apparently interpret changing the value of window.location.href during a page load as a redirect and they leave the original page out of the history.



The net effect of IE's behavior was that if a user clicked a link to our legacy site (say, from a search results page), was redirected to beta and then hit the back button, they would "return" to the legacy landing page. The landing page would then redirect them back to the beta site again, making for less than thrilled customers and slightly confused analysts. The solution is to use window.location.replace() instead of changing window.location.href. In all browsers we tested, this keeps the original URL out of the history and makes the back button work as expected.



ONE SCRIPT TO RULE THEM ALL


So far I've only discussed how the script works when it runs on our legacy site. But we serve the exact same script from the beta site to keep all the logic and configuration in one place. The core logic is different for beta visitors (it doesn't redirect and it makes sure that new visitors to the beta site get allocated correctly), but there is quite a bit of shared code. For instance, we store a number of routing-related attributes in a single cookie and the code that manages and wraps that cookie in an object is used by both code paths.



Another shared feature is cookie versioning. Whenever the script executes, it checks a version number stored in the beta routing cookie against the version number configured in the script. If the two don't match the script ignores the cookie and proceeds to re-allocate the user as if it was their first visit to the site. This allows us to reset the pool of beta users at any time and it provides an "escape hatch" if we want to shut down beta traffic altogether.



The script also has features to skip processing on certain internal URLs (e.g. an explicit beta site opt-in page) and to ignore requests from specific referrers. Again, these are equally useful for both legacy and beta requests.



SITE-AWARE


Since we put all of the routing logic in a single script, it needs to know which site it's being served from to execute the correct logic. Our initial solution was to have the script look at the current URL's host name to figure out if it's on the beta or legacy site. This works fine in production, where host names are stable and predictable, but it fell down quickly in our more dynamic test environments. We kept the automatic configuration logic as a fallback, but added support for a global variable to override the automatically detected site. It's a little inelegant, but it means that a template author can guarantee that either the beta or legacy "version" of the routing script will execute whenever a specific template is served. And given that the routing script is only included in one template on each site, it doesn't introduce a lot of manual overhead.



TESTABILITY


A big problem with something that's designed to act randomly is that it's, well, random -- even when it's working correctly. So we added a feature called "test actions" that allow you to request a specific, repeatable outcome by passing in a URL parameter. We added actions to force allocation to the beta or legacy site, to reset the routing cookie and to simulate manual beta site opt-in and opt-out. This has proven to be very handy for developers as well as testers.



CONCLUSION


A purely client-side mechanism to route traffic doesn't sit well with everyone. Some feel that there isn't enough real-time control over the behavior and it just feels counterintuitive to others. However, after working through a few rough spots, our JavaScript router has turned out to be a simple and effective solution. And a nice side effect is that it's very easy to get rid of.  When we switch over to the new site, we'll remove a single include from a single template and it will be gone without a trace.



If you have questions or suggestions, or if you've implemented a different kind of solution to the same problem, please tell us about it in the comments.



3 comments: