Tech at Edmunds.com: 2010

Monday, December 27, 2010

Santa Came Bearing Gifts

As a Thank-You token for the work accomplished in 2010, Edmunds gave all its 400+ employees Apple TVs today! The shipment was supposed to be here last week, but somewhere between here an Taiwan something got stalled. It's like Christmas Day all over again here in the office :-)

The sticker reads:

"Dear Ismail Elshareef:

Thanks for making 2010 such a successful year. We accomplished a lot and have much for which to be proud. Enjoy this memento as a token of our appreciation. We're looking forward to working together to make 2011 even better.

Happy Holidays.

Avi"

Thank you, Avi! We're definitely looking forward to exciting new adventures in 2011.

Stay tuned ... and Happy New Year!

Friday, December 17, 2010

Edmunds.com to Engineers: We Want You!

Edmunds.com has been recognized as one of the "Top 20 Places to Work" by BusinessWeek, Architectural Record, Wall Street Journal and the Los Angeles Business Journal. As someone who's been at Edmunds for almost 3 1/2 years now, I wholeheartedly agree.

If you are a team player, problem solver, initiative taker, proactive innovator and fun all around, then you'll be right at home here at Edmunds. Aside from having one of the coolest office spaces ever, we are located in Santa Monica near the beach. Believe me, it doesn't get any better than that out here in Southern California :-) The salaries are very competitive and the benefits are exceptionally generous. (the image above is from our inaugural Hack Days event that allows engineers and other to innovate on projects they feel passionate about. We are planning to have more of these events in the future!)

So if you're intrigued, check out the listings below and let us know if one or more piques your interest:

Sr. Software Engineer: This position will involve managing application integration activities, providing consulting services to other IT peer groups and assisting in the development of complex initiatives.

Software Configuration Management Engineer: Responsible for defining and automating the release process to allow for fast automated deployment and provisioning to integration test and production environments.

Sr. Front-End Engineer: Responsible for leading software design and development in creating standardized, performant and engaging user experiences using web standards (XHTML, OO JavaScript, CSS.)

Sr. Software Engineer - Data Architecture: Looking for an exceptional Java engineer (ninja level) to play a key role in this evolution of our technology. The position is within the data architecture team and will be responsible for writing heavy code. Our software organization is an open environment where the best idea wins. You'll have the opportunity to work with many new exciting technologies such as Oracle Coherence, Apache SOLR and Hadoop.

Sr. Software Test Engineer: Responsible for developing test specifications and test case designs as well as developing/implementing test tools to uncover product defects for new and existing component releases.

Sr. Systems Administrator: Provide technology support and direction to the team, ensuring that all technologies work effectively together to deliver the technical/infrastructure architecture for large initiatives.

Sr. Systems Administrator - Applications: Support and provide direction to the team, helping to ensure that all technologies work effectively together to deliver a high-performing, reliable implementation of Java based applications. The ideal candidate must be creative, self motivated, innovative, detail oriented, and eager to work with the latest technologies along with a strong desire to maintain and help improve Edmunds' applications and supporting services.

Sr. Systems Engineer - Production Engineering: Leverage cross-discipline expertise in systems management, networking, applications and software development to design, implement, and operationalize systems and application solutions in Edmunds' high-volume, 24/7 production environments.

Systems Administrator: Provide technology support to all of operations, ensuring that all technologies work effectively together to deliver the technical/infrastructure architecture for large initiatives.

Many other positions are open and you can find the complete list on our site. Don't forget to follow us on Twitter and check our our Facebook page to stay up-to-date on all the last minute development in the world of technology at Edmunds.

Hope to hear from you soon!

Wednesday, December 15, 2010

The Reënvisioned Edmunds.com is Now Live!

It's been a long time coming; 26 months to be exact. Finally, the completely re-architected, redesigned and re-imagined Edmunds.com is now live and early indicators show that our Page Views are up 20% from that of legacy's. That's tremendous news for the company and for everyone that's been working really hard on this redesign over the past two years.

If you're wondering what's changed on the site, the answer is everything! Here's a quick breakdown:

Completely new UI/UX geared towards getting out of the consumers' way and letting them get what they want done fast and with ease.
All our pages are made of small, reusable components that can be combined in various permutations to deliver new pages.
Smarter page rendering techniques geared towards making our pages "feel" faster and responsive to our consumers.
Brand new Content Management System that was built in house in an MVC architecture to facilitate content publishing, scheduling, maintenance and distribution.
We use Perforce for Version Control for both code and content
Solr is our search engine.
No RDBMS in Production! We use Oracle Coherence in-memory grid to supply data to our pages.
Built our own publishing (data+content) system based on the Messaging pattern.
Built our own traffic routing system, which was partially discussed in this post.
The processes of server provisioning and application deployments are far more efficient.
Introduced Hadoop and Netezza to our Data Warehousing.
Built an impressive suite of automated tests and data validators to ensure bugs are caught early and protected against in the future.

These are just quick highlights of what went into the redesign. We are extremely excited about this milestone in our company's history and we're looking forward to improving on the great work that we released today and innovating further to take us to the next step of where we want to go.

Don't forget: You can be part of the team too! We're hiring :-)

Tuesday, December 7, 2010

Edmunds as a Platform

At Edmunds, we aspire to fully empower the automotive consumer. We've done a good job so far in achieving that, but I think more could be done. Here are my thoughts on how we can do that. I'd love to hear what you think about it (whether you think it's nuts or actually achievable.) Thank you in advance :-)

Here is the automotive experience life cycle as I see it:

Research: the consumer is looking around for his/her next vehicle

Negotiations: the consumer is negotiating (or not) pricing/features with dealer or private seller

Acquisition: the consumer buys or leases or rents the vehicle

Operation: the consumer owns the vehicle

Maintenance: everything the vehicle experiences while owned by the consumer (i.e. accidents, service, ...etc)

Sale: the consumer wants to sell the vehicle

I believe we have done well in some stages (1, 2 and 6) and hardly scratched the surface in the rest. There is tremendous opportunities here to give the automotive consumer support and information in every single stage of that life cycle.

But we cannot do it alone.

In order to truly empower the consumer in every step of the automotive experience, I feel that we need to allow the community at large to use our data and systems to create value for our consumers through applications--both mobile and wired--and services. The way to do that is to build an open platform.

What is a "Platform"?

At a high level, a platform is an ecosystem that provides the following:

Data Accessibility: providing our data in an easy and standard format ready for consumption by the larger community.

Code Extensibility: Our software should be open and extensible. The community can contribute and build upon our readily available codebase.

Service Oriented Approach: Be visible throughout the life cycle.

Goals

My goal is to ultimately reach æ¶…æ§ƒ, or Nirvana, in automotive consumer empowerment. But in order to do so, we need to lay down a solid foundation that will help us get there. That foundation is illustrated in the Edmunds Platform Pyramid below (Figure 1.0)

Near Future Milestones

In the near future, the goals are:

Technical Brand and Community Presence: The very first step is to get out there and create our technical brand through frequent technical blogs posts, technical speaking engagements, community gatherings, hosting tech mini conferences, be active on Twitter as an "Edmunds" network of techies. This blog is a manifestation of this goal :-)

Complete Documentation: This is the linchpin of all our efforts. In the end, the developer is our consumer and if the developer doesn't know how to use or find our platform, then we've failed.

Internal Open APIs: Getting our developers to use our APIs as if they were 3rd party APIs.

Public Open APIs: Building on the success of the Internal Open APIs, we open it to the public.

Far Future Milestones

Our future accomplishments should build on the ones mentioned above.

API Virtualization: Without changing our underlying API layer, we'll add a layer to customize the APIs for the devices and partnerships they're serving.

Edmunds Labs: A dedicated team of code and product evangelists and enthusiasts that works on bringing Edmunds and the technical community closer together through mutual collaboration.

Open Sourcing: Taking our tried and tested products and open sourcing them through Edmunds Labs

Tech Partnerships: Through Edmunds Labs, foster relationships with software and hardware technical partners that will help us expand our business and product scope.

Edmunds OS: Build on top of an open source mobile platform (Android?) to deliver web-based operating system for cars--the ultimate mobile devices.

What do you think? Is there value in such approach? You can comment below or tweet your thoughts with #edmundsapi . Looking forward to hearing from you.

Monday, December 6, 2010

Scanning for Re-Targeting

Of late, there has been a lot of talk in the media about user re-targeting through unauthorized browser cookies that are unknown to the website serving the content. Ensuring users privacy and protecting our users from being re-targeted have always been high on the Edmunds radar. Cookie testing has always been an integral part of Testing practices at Edmunds for a while now. Every aspect of a cookie: content, date, name, domain, how it's set and when it's set are part of the test.

A few months ago, we revisited our test automation tools and libraries and realized we can very easily tweak them to comb through our website to scan for any unauthorized cookies. Some of the considerations we had in mind were:

1. We wanted the tool to be JavaScript enabled as many of the cookies are being set as a result of a javaScript event or action.

2. Since we work with many ad agencies and vendors, our list of approved vendors and domains is dynamic. We wanted the tool to be able to respect a list of trusted domains.

It did not take us long to come up with a scanner based on Selenium RC that we use for Web Testing at Edmunds. We have been using the tool for periodic scanning. We are happy the it has served the purpose well. We also presented it to the OPA several weeks back.

Now, that's not the only thing we are excited about. We have been working hard on cleaning up our code a little so that we can open-source the tool. Details of the source code, projects details are coming soon. Stay tuned!

Monday, November 22, 2010

How Edmunds Got in The Fast Lane

When we set out to redesign insideline.com back in late 2008, we set big goals for ourselves. Some of those goals included creating our very own Content Management System (CMS), Publishing System and Digital Assets Management (DAM) System and in-memory Distributed Data Grid. On the user-facing front, we had the following goals:

Better Performance: Our pages need to be and feel fast to our users (onLoad fires in < 1.5 sec)

Richer Content: We need to serve larger photos, more videos and interactive components

Better Revenue: Needless to say, we need to increase our revenue

The Challenge

The challenge we had with our front-end goals was that insideline.com generated revenue through ad impressions. Almost all web developers know that including a 3rd-party component, including ads, on a page could potentially degrade the user experience. See Figure 1.0.

Figure 1.0 Eternal struggle between delivering high performing pages and including 3rd-party components with the user experience hanging in the balance

We had to find a way to reconcile our metric for performance and our need to sustain and grow our ad impressions. We knew ahead of time that it wasn't going to be easy considering that the struggle between achieving optimal page performance while including 3rd-party components onto a page has been talked about frequently in the development community without any proposed solution.

The Process

Before we jumped to solutions, we started the process by taking inventory of all the 3rd-party components that we serve on our pages and by noting the specific formats in which we include them. This is what we found:

All 3rd-Party Components existed in either an iFrame or a JavaScript format

All 3rd-Party Components could be lazy-loaded, except for:

Components that use document.write (e.g. Double Click ads)

Components that depend on DOM events like onDOMReady or onLoad (e.g. Omniture, Brightcove)

Our immediate response to that was, "We're going to hack all 3rd-party components to make them all lazy-loadable." So we tried the following:

Overriding document.write: Cache the output of document.write and then place it on the page when you're ready.

iFrame'N'Copy: (A term we coined) Load the 3rd-party component in an empty page that lives on our domain within an iFrame and then copying the content of that page when it's done loading.

Both techniques were successful in certain situations and in certain browsers. However, they were not viable solutions that we would implement on any of our sites in Production.

Then, we had an epiphany that later helped us form an approach toward 3rd-party components and page performance. The revelation was:

Anything You Cannot Lazy-Load Should Be Treated as a Black Box.

This concept might seem trivial to some but it was crucial for us to move forward.

Our Approach

We decided that in order to deliver fast pages that serve 3rd-party components, we need to follow the following three principles:

You Can't Control It All: Our attempt to force some of our 3rd-party components into the "lazy-load box" was a humbling experience

Speed Up What You Can: If we were going to treat things that are out of our control as black boxes, we needed to ensure that things we have control over are lightening fast

Defer Everything You Can't Control: If it's a block box, we'll make sure it is the last thing that is called on the page before it's done loading.

These principles saved us a lot of time and effort. They also helped us develop a JavaScript Loader, which works as follows:

A PAGESETUP object (lightweight static object) is created at the top of the page to act as a placeholder for everything that needs to be loaded on the page.

As the page gets parsed, each component gets to register the following with PAGESETUP:

File dependencies

Code snippet to be executed

Order of execution

The Loader Core logic is included at the bottom of the page and as soon as it's parsed and rendered, it does the following:

Go through PAGESETUP list of file dependencies and add them onto the page asynchronously

When dependecies are loaded, loop through the code snippets and execute them in order

One of the things that the loader was designed to do is to be independent from DOM events, especially onLoad and onDOMReady. This allows us to render the page to our user as soon as the browser parses and executed the Loader Code logic that is included at the bottom of the page. We will have a follow-up post on the JavaScript Loader with code snippets and such.

Figure 2.0 demonstrates the flow in which our JavaScript Loader handles 3rd-party components.

Figure 2.0 3rd-Party Component Handling in JavaScript Loader

The Results

We were able to successfully achieve our goals for insideline.com. The homepage loadtime went from 9 seconds on the old site to 1.4 seconds on the new site. With JavaScript ads on the page, the load time goes up to 1.6 seconds on average. Ads load faster on the site and the we have noticed a considerable reduction in impression discrepancies, which in turn raised our ad revenue on the site by 3%.

Throughout the process, we were able to confirm and validate the results using Firebug and its extensions YSlow, Page Speed, in addition to WebPageTest. These tools have played an invaluable role in ensuring that we deliver the best user experience without compromising our revenue.

When we applied the same principles to our main site redesign (beta.edmunds.com,) we were able to cut the loading time in half. We have also received initial results that indicate that our total page views per session have increased 17%. Obviously that cannot be totally attributed to page performance since we did a major redesign of the site as well, but we are confident that page performance helped bring that number up. Please note that right now, only 25% of our total traffic is redirected to the new beta site. A full launch will be taking place in a few months.

Also, our Page Speed score has jumped 20+points, see Figure 3.0

Figure 3.0 Page Speed Scores for both edmunds.com legacy and new beta site

In The Works

We are currently working on the second generation JavaScript Loader, which will streamline the process even further and ensures even more optimized performance. We are also going to open source the component for the community to use and build upon.

Together, we'll make the web faster one site at a time :-)

P.S. We're hiring :-)

Routing Traffic with JavaScript

As we prepared to launch our new beta site at Edmunds, we had to decide how to gradually move users over without permanently locking them in and without disrupting our current infrastructure. Our plan was to start small, redirecting just a few percent of users, and to ramp up as we gained confidence in the performance and stability of the new site.

We considered all the obvious options: a server side-only approach, network-based solutions and a mix of server and client side logic. But we eventually settled on a pure JavaScript solution. It's probably the simplest of all the approaches we considered, and it allowed us to achieve our goals with the least amount of cost, effort and risk. In this post I'll discuss our routing logic, issues we ran into and some details you'll want to think about if you plan to implement something similar.

THE BASICS

Our core traffic routing logic is straightforward. When a user visits the legacy site (i.e. the current production site,) they have an X percent chance of being allocated as a beta user. If they get lucky and end up in the pool of beta users, the script redirects them to the beta site using their current, legacy URL path and query string. The allocation decision (legacy or beta) is stored in a cookie and, on repeat visits, the script uses the stored value instead of re-allocating so the user stays pinned to the same site.

URL TRANSLATION

It might seem like reusing the legacy URL path and query string to create a beta site URL would be too simple to handle the full range of URL translation between the old and new sites -- and indeed it is. It only works because the routing script doesn't have to handle anywhere near the full range of possible URLs. Redirection (and thus, URL translation) only happens from landing pages and the beta site needs to handle those URLs anyway. If it didn't, we would see a nasty decrease in traffic when we switch over to the new site. Most companies will likely have the same requirement to support well-known, landing page URLs, so this simple approach is probably enough in most cases.

On the subject of URL translation, keep in mind that a client side redirect will effectively wipe out your original referrer URL and make it appear as if all the traffic to the new site's landing pages is coming from the legacy site. This makes it difficult to compare incoming traffic. The easy solution is to capture the referrer URL on the legacy page and store it in a cookie before performing the redirect. Then provide a clear API so that tracking code on the destination page can retrieve and reset the value.

RANDOM NUMBERS

Something that may be of interest to the more mathematically inclined is the quality of random number generation in JavaScript. Like many simple random functions, JavaScript's built-in Math.random() isn't really very random at all. You certainly wouldn't use it as part of a cryptographic algorithm, and we weren't even sure if it was random enough to hit a routing percentage with any degree of accuracy. We tested a few high quality JavaScript random number generators along with the built-in function and ultimately decided that Math.random() is "good enough" at the scale we're working at and with a lot less CPU overhead compared to the alternatives. Last I heard, we were achieving a routing percentage within 5 hundredths of a percent of our target.

REDIRECTING WITH JAVASCRIPT

Initially, we performed the actual redirect by assigning a new URL to window.location.href. But we soon discovered that in Internet Explorer, href assignments aren't treated as actual redirects and the URL you are trying to redirect from is added to the browser's history. This is technically correct behavior but definitely not what we wanted. Other browsers apparently interpret changing the value of window.location.href during a page load as a redirect and they leave the original page out of the history.

The net effect of IE's behavior was that if a user clicked a link to our legacy site (say, from a search results page), was redirected to beta and then hit the back button, they would "return" to the legacy landing page. The landing page would then redirect them back to the beta site again, making for less than thrilled customers and slightly confused analysts. The solution is to use window.location.replace() instead of changing window.location.href. In all browsers we tested, this keeps the original URL out of the history and makes the back button work as expected.

ONE SCRIPT TO RULE THEM ALL

So far I've only discussed how the script works when it runs on our legacy site. But we serve the exact same script from the beta site to keep all the logic and configuration in one place. The core logic is different for beta visitors (it doesn't redirect and it makes sure that new visitors to the beta site get allocated correctly), but there is quite a bit of shared code. For instance, we store a number of routing-related attributes in a single cookie and the code that manages and wraps that cookie in an object is used by both code paths.

Another shared feature is cookie versioning. Whenever the script executes, it checks a version number stored in the beta routing cookie against the version number configured in the script. If the two don't match the script ignores the cookie and proceeds to re-allocate the user as if it was their first visit to the site. This allows us to reset the pool of beta users at any time and it provides an "escape hatch" if we want to shut down beta traffic altogether.

The script also has features to skip processing on certain internal URLs (e.g. an explicit beta site opt-in page) and to ignore requests from specific referrers. Again, these are equally useful for both legacy and beta requests.

SITE-AWARE

Since we put all of the routing logic in a single script, it needs to know which site it's being served from to execute the correct logic. Our initial solution was to have the script look at the current URL's host name to figure out if it's on the beta or legacy site. This works fine in production, where host names are stable and predictable, but it fell down quickly in our more dynamic test environments. We kept the automatic configuration logic as a fallback, but added support for a global variable to override the automatically detected site. It's a little inelegant, but it means that a template author can guarantee that either the beta or legacy "version" of the routing script will execute whenever a specific template is served. And given that the routing script is only included in one template on each site, it doesn't introduce a lot of manual overhead.

TESTABILITY

A big problem with something that's designed to act randomly is that it's, well, random -- even when it's working correctly. So we added a feature called "test actions" that allow you to request a specific, repeatable outcome by passing in a URL parameter. We added actions to force allocation to the beta or legacy site, to reset the routing cookie and to simulate manual beta site opt-in and opt-out. This has proven to be very handy for developers as well as testers.

CONCLUSION

A purely client-side mechanism to route traffic doesn't sit well with everyone. Some feel that there isn't enough real-time control over the behavior and it just feels counterintuitive to others. However, after working through a few rough spots, our JavaScript router has turned out to be a simple and effective solution. And a nice side effect is that it's very easy to get rid of. When we switch over to the new site, we'll remove a single include from a single template and it will be gone without a trace.

If you have questions or suggestions, or if you've implemented a different kind of solution to the same problem, please tell us about it in the comments.

How Edmunds Got in The Fast Lane

Better Performance: Our pages need to be and feel fast to our users (onLoad fires in < 1.5 sec)

Richer Content: We need to serve larger photos, more videos and interactive components

Better Revenue: Needless to say, we need to increase our revenue

The Challenge

Figure 1.0 Eternal struggle between delivering high performing pages and including 3rd-party components with the user experience hanging in the balance

The Process

All 3rd-Party Components existed in either an iFrame or a JavaScript format

All 3rd-Party Components could be lazy-loaded, except for:

Components that use document.write (e.g. Double Click ads)

Components that depend on DOM events like onDOMReady or onLoad (e.g. Omniture, Brightcove)

Our immediate response to that was, "We're going to hack all 3rd-party components to make them all lazy-loadable." So we tried the following:

Overriding document.write: Cache the output of document.write and then place it on the page when you're ready.

iFrame'N'Copy: (A term we coined) Load the 3rd-party component in an empty page that lives on our domain within an iFrame and then copying the content of that page when it's done loading.

Both techniques were successful in certain situations and in certain browsers. However, they were not viable solutions that we would implement on any of our sites in Production.

Then, we had an epiphany that later helped us form an approach toward 3rd-party components and page performance. The revelation was:

Anything You Cannot Lazy-Load Should Be Treated as a Black Box.

This concept might seem trivial to some but it was crucial for us to move forward.

Our Approach

We decided that in order to deliver fast pages that serve 3rd-party components, we need to follow the following three principles:

You Can't Control It All: Our attempt to force some of our 3rd-party components into the "lazy-load box" was a humbling experience

Speed Up What You Can: If we were going to treat things that are out of our control as black boxes, we needed to ensure that things we have control over are lightening fast

Defer Everything You Can't Control: If it's a block box, we'll make sure it is the last thing that is called on the page before it's done loading.

These principles saved us a lot of time and effort. They also helped us develop a JavaScript Loader, which works as follows:

A PAGESETUP object (lightweight static object) is created at the top of the page to act as a placeholder for everything that needs to be loaded on the page.

As the page gets parsed, each component gets to register the following with PAGESETUP:

File dependencies

Code snippet to be executed

Order of execution

The Loader Core logic is included at the bottom of the page and as soon as it's parsed and rendered, it does the following:

Go through PAGESETUP list of file dependencies and add them onto the page asynchronously

When dependecies are loaded, loop through the code snippets and execute them in order

Figure 2.0 demonstrates the flow in which our JavaScript Loader handles 3rd-party components.

Figure 2.0 3rd-Party Component Handling in JavaScript Loader

The Results

Also, our Page Speed score has jumped 20+points, see Figure 3.0

Figure 3.0 Page Speed Scores for both edmunds.com legacy and new beta site

In The Works

Together, we'll make the web faster one site at a time :-)

P.S. We're hiring :-)

Routing Traffic with JavaScript

THE BASICS

URL TRANSLATION

RANDOM NUMBERS

REDIRECTING WITH JAVASCRIPT

ONE SCRIPT TO RULE THEM ALL

SITE-AWARE

TESTABILITY

CONCLUSION

If you have questions or suggestions, or if you've implemented a different kind of solution to the same problem, please tell us about it in the comments.

Thursday, November 11, 2010

Edmunds.com iPhone App Video

We launched our iPhone app back in early October. Since then, we have received great feedback from those of you who have used it. We listened and yesterday we submitted an update to the app store with much of the enhancements and improvements that you asked for.

Here's video of how our app works. Go ahead and download it now, it is FREE.

Tuesday, November 9, 2010

There is No Fighting in The War Room

Over the past few months we have been working on a data warehouse project that will basically be a relaunching of the data warehouse on a new platform. As with any project, the end is a very stressful time. The team has no option but to finish and it often leads to long hours and late nights. As part of this project, we determined that it was best to pull the team into a war room.

What is a war room? And why are we talking about it on a data warehouse project?

Well, our war room doesn't have any of the cool things that you would imagine. We don't have a map with model tanks and soldiers and some kind of instrument to slide them around with. Also missing are faceless people behind computer screens with a beeping radar screen. Sadly, we have no giant map on the wall with beeping lights. In fact we don't have a single beeping graphic!

No, our war room is just a conference room that has a bunch of lap tops, one desk phone and is surrounded by dry erase boards. However, that does not mean we are not doing some really exciting things.

The initial reaction to telling 10 people that they are going to be sitting in a room for 10 hours and forced to stare at either each other or their computers was unsurprisingly not good. People did not want to give up their desks and the quietness and privacy that comes along with that. The first day people had a really hard time with being in the room and everyone had their headphones blasting to avoid any noise or distractions. It was like we just faced everyones desks together. In fact, people were still chatting on IM instead of talking even though they were sitting no more than 4 feet from each other .

By day 2, I started noticing that people were talking more. Instead of emailing questions, questions came up as they popped in development. Developers started collaborating and working with each other to solve problems. As the week went on, the headphones were mostly gone as people learned how to concentrate while there is noise. Questions were still being asked and gone was the waiting and slowness of back and forth through email. The team completed more work in a shorter time then they had during the whole project.

It's been a week and it has not all been perfect. People are on edge from being locked in a room all day with others. One of the things I try to do is ask if something only impacts two people that they go to the smaller conference room next door. One of the problems is that there a lot of different personalities and naturally there will be friction. But, for the most part, what I've seen is a team that is more effective and productive.

If we are going to continue in the future with this model, I think it will be important to come up with rules for the war room. A few suggestions:

Before asking a question, think - Don't just blurt out questions as soon as they pop in your head, try and solve it yourself first for a minute and then ask away

If a conversation only involves you and one other person, kindly use a different area to have that conversation

More to come...

We are still in the war room and it's not perfect in here but so far things are working and we are getting our project done. I hope it continues and that we can continue to experiment with things like this in the future.

Updates to follow once project is complete.

Monday, November 1, 2010

How to Make Useful Paper Prototypes

I have a confession to make. While I am a BIG supporter of Design Thinking and user centered product development, I was not a big fan of the practice of paper prototyping. Paper prototyping, the act of creating complex graphical user interfaces by handwritten means, was not something I thought to be real world and particularly useful. Take a look at the example below:

I made this. It's supposed to represent a web page that contains a scrollable grid of data points and allows you to filter data by checking off range parameters in the left side check boxes.

It took me about 5 minutes to make, which is the purpose of paper prototyping. You make a mock up of the GUI fast, present it to the user, get some feed back and go back to the drawing board.

It's not very good. Putting it in front of a user would give you results with marginal significance. Thus, my original aversion to paper prototypes. I figured that for the amount of time that it takes me to create a useful graphical user interface (GUI) emulation by hand, I could just as well write out a reasonable GUI in HTML.

Right?

Wrong! I was being a Mr. Know-It-All.

Take a look at this:

The photo above is a paper prototype that our Visual Designers made. It's a paper prototype that is useful and relevant!

OK, it's not made by a hand putting ink to paper. But, so what? It suits the purpose of paper prototyping. The work represents a reasonable GUI that can be shown to a user for feedback and subsequent revision. The fact that the Visual Designers could make it quickly on paper, without having to utilize the services of an HTML-geek makes it very cost effective. I particularly like the way that the prototype emulates horizontal scrolling. Having the paper data grid move back and forth behind another paper overlay that contains the anchored GUI elements is very inventive.

Just goes to show you what a little thought and creativity can do. The designers analyzed the need at hand, assessed the skill sets and resources available and did what had to be done, efficiently and competently. Who could ask for more? When done correctly, paper prototypes rock!

BTW: here is one of the Visual Designers. Her name is Carolyn. Her work made me rethink my attitude toward paper prototypes. She rocks!

Thursday, October 28, 2010

Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

Greg Rokita, Director, Senior Technical Architect at Edmunds, gave a talk last weekend at SoCal Code Camp about the best practices of designing a publishing system for high volume sites

In his talk, Greg discussed:

Challenges with Enterprise Data Publishing

Layered Approach with Open Source Projects

System Design

System Monitoring

The talk generated a lot of excitement about our publishing system. We would love to hear what you think!

Wednesday, October 27, 2010

Brown Bag Series: Paddy Does Coherence Part 2

Paddy Hannon, VP of Enterprise Software and Data Architecture, talks about Oracle Coherence in a series of Brown Bags at Edmunds, Inc.

In this video, Paddy covers the following:

- Coherence Data Distribution Models

Monday, October 25, 2010

Keeping Data Backward Compatible with Coherence POF

We have been using Oracle Coherence for a little over 2 years now. All things considered it has served us well - specifically in regard to keeping our data backward compatible. What I mean by "backwards compatability" is the ability to modify data points on existing domain objects without causing applications which don't yet require the new data points to be upgraded.

Coherence has a technology called "POF" (portable-object-format) which provides this capability. This article only dives into one of the many uses of POF, related to "backwards compatibility", as there are many other uses for POF that are not explored in this article. Also, before you continue reading please check out Paddy's video on Coherence if you aren't very familiar with what this technology does.

First off objects stored in Coherence must be serializable by one of the following formats:

Java Serialization

ExternalizableLite

Portable Object Format (POF)

Objects must be serializable because Coherence is a "distributed" object caching system, meaning that even if one JVM creates the object it will most likely end up being serialized across the network to another JVM responsible for caching it. Since our objects represent our "data", you need to make sure that you store your objects in an extensible manner so that data modifications can be made without causing a ripple effect of application deployments. Here at edmunds we have over 50+ different deployable applications (WAR, standard java app, etc) that use various distributed caches, so we cannot afford to re-deploy all of our applications if we are simply refactoring objects.

Taking the above into consideration when choosing a serialization format, Java serialization does have the ability to allow classes to be "versioned" to some extent by using the "serialVersionUID" (following these guidelines), it still is slow and produces a rather large object. Coherence has two additional forms of object serialization to help aid both performance and extensibility:

ExternalizableLite is an extension of Java Serialization with added basic compression.

POF(portable object format), on the other hand, offers much more sophisticated serialization format than that of standard Java serialization or ExternalizableLite.

While POF has lots great features, the one feature we find very useful is that it allows object data to be "versioned" very easily. This has proven to be very useful to keep our data backward compatible so that no application is broken if another is upgraded or updated.

Take the example below with call "Person":

public class Person extends AbstractEvolvable implements
EvolvablePortableObject {

private String firstName;
private String lastName;

private static final int POF_DATA_VERSION = 1;


private static final int FIRST_NAME_POF_INDEX = 1;
private static final int LAST_NAME_POF_INDEX = 2;

/**
* {@inheritDoc}
* Coherence uses this method before calling writeExternal().
*/
public int getImplVersion() {
return POF_DATA_VERSION;
}

public void readExternal(PofReader pofReader) throws IOException {
firstName = pofReader.readString(FIRST_NAME_POF_INDEX);
lastName = pofReader.readString(LAST_NAME_POF_INDEX);
this.setFutureData(pofReader.readRemainder());
}

public void writeExternal(PofWriter pofWriter) throws IOException {
// POF_DATA_VERSION is written automatically by Coherence before
// calling this method.

pofWriter.writeString(FIRST_NAME_POF_INDEX, firstName);
pofWriter.writeString(LAST_NAME_POF_INDEX, lastName);
if(this.getFutureData() != null) {
pofWriter.writeRemainder(this.getFutureData());
}
}
}

"Person" is using POF (POF_DATA_VERSION = "1") and has two properties: firstName and lastName. Now let's assume that the serialized "Person" object is now live in production and being used by several applications. Now let's say that gender becomes a critical property of the "Person" object and we need to add it. Now the call will look like:

public class Person extends AbstractEvolvable implements
EvolvablePortableObject {

private String firstName;
private String lastName;

// New POF field
private String gender;

// Up the data version to 2
private static final int POF_DATA_VERSION = 2;


private static final int FIRST_NAME_POF_INDEX = 1;
private static final int LAST_NAME_POF_INDEX = 2;

// New POF index for gender.
private static final int GENDER_POF_INDEX = 3;

public int getImplVersion() {
return POF_DATA_VERSION;
}

public void readExternal(PofReader pofReader) throws IOException {
firstName = pofReader.readString(FIRST_NAME_POF_INDEX);
lastName = pofReader.readString(LAST_NAME_POF_INDEX);

// Only attempt to read gender if your POF data version
// is at least equal to or greater than the version of the data
// being read in via PofReader.
if(pofReader.getVersionId() >= 2) {
streetAddress = pofReader.readString(GENDER_POF_INDEX)
}

this.setFutureData(pofReader.readRemainder());
}

public void writeExternal(PofWriter pofWriter) throws IOException {

pofWriter.writeString(FIRST_NAME_POF_INDEX, firstName);
pofWriter.writeString(LAST_NAME_POF_INDEX, lastName);

// No logic here as we are just adding a field and our data is
// always only written to the cache in one place - the JMS
// listener.
pofWriter.writeString(gender_POF_INDEX, gender);

if(this.getFutureData() != null) {
pofWriter.writeRemainder(this.getFutureData());
}
}
}

The updates above are:

Add new "gender" field

Add new POF index for "gender"

Increment the data version

Add logic to readExternal() method so that the new "gender" field is only read if you are working with data that is at the version that has the "gender" filed set

Note that we hardcoded the version (2) that is required to read in the "gender" field. We did this so that when future modifications are made it is easy to determine what version a particular data point is compatible with.

Another thing to point out is that you always should increment when adding additional fields to a "POF'able" object as there may be another version of that same object that is writing a totally different type of field at that index which would cause ClassCastExceptions, etc.

Now both version 1 and 2 of the Person object can co-exist in production at the same time in our data grid and the applications that currently use version 1 will remain intact while we can create or update other applications to take advantage of the data available in version 2. In effect, versioning allows for peace of mind when it comes to deploying updated data objects or refactoring our data to accommodate new business requirements.

Hopefully this article showcases one of the ways in which POF can be useful as your serialization format. I say this because there are many other reasons to consider POF as your serialization format other than backwards compatibility alone.

Friday, October 22, 2010

Evolution of Software Engineering

The actual coding of software is really just one small piece in a much the larger fabric of delivering bona fide functional value to users. Here at Edmunds, on the eve of the full site launch of our redesigned site (two years in the making), we find ourselves in the new.

New infrastructure, new processes, new tools, new ways of working with each other and new ways of thinking about ourselves, i.e., how do we as technologists (or for that matter as individuals) provide optimal value in an organization that has organically, seemingly overnight, evolved to place where nearly everything that has successfully brought us where we're at is being turned on its head? The roles that we've played and hung our collective hats on are, for the first time, being significantly redefined in the pursuit of innovation: to break out of the practice of doing relatively small, incremental improvements and into producing something truly different.

The general ethos of the Edmunds team has come to re-conceptualize "software development" to be the broader end-to-end process cycle of delivering value to the user: from idea creation (what to build & why) to production deployment (what, when, who and how often). What this holistic view ultimately means is that Software Development = Product Development. So as the organization starts a new chapter with the adoption of Design Thinking, it feels surprisingly comfortable alongside concepts established (or at least in the process of being established) such as Agile software development, Dev Ops, and Test-driven development (TDD), Scrum, and Kanban swarming. In fact, in a way, it's feels like the natural evolution of these practices. It feels comfortable because unbeknown to ourselves, we've evolved.

So I find myself asking, if software has no value until it's in the hands of real people (users), then what are the things that stand between the idea and effective instantiation of that idea? The answer to the question, and following the question to its natural conclusion (what should be done about it?), is a set of even newer infrastructure and newer process and newer tools. Moreover, it involves yet further redefinition of roles that have to be first broken before they are truly remade. Reevaluating the tasks and responsibilities of roles like "software engineer", "QA engineer" or "release manager" is just the start. A deeper look will naturally lead to questioning some of the principles that have historically been sacrosanct: the requisite checks & balances between Dev & QA, governance of what gets approved (or not) for a release, the concepts of what is a "release", the ceremony around actually pushing functionality to production. Many of these conventions exist to stabilize an inherently dysfunctional system. A system where developers hack out code with little regard for building quality into their code; a system inflexible and so prone to errors that it must be tightly and painfully gated to ensure adequate quality.

But what happens when an organization has sufficiently evolved beyond this?

Stay tuned and find out.

Thursday, October 21, 2010

How Edmunds Broadcasts to MSNBC

Edmunds is on the news a lot. People such as CEO Jeremy Anwyl, Senior Analyst Jessica Caldwell, Senior Consumer Advice Editor Philip Reed and the rest of our expert editorial crew get a lot of air time.

You'd think that these people were on airplanes all the time flying around the country to various studios. But, they're not.

Actually, the reality of the situation is a lot more interesting. Here at Edmunds we actually have a broadcast studio. It's not big. It's a room about 10 ft. by 10 ft.. You could mistake it for someone's office. But, just about every time you see someone from Edmunds on TV, if he or she is not directly at a table with an interviewer, we're broadcasting from this little room. On your television screen it seems as if we have facilities on par with CNN.

So how does all this work?

We use a product named, ReadyCam by VideoLink. Our ReadyCam installation is turnkey. We paid VideoLink to come in and setup the studio for us.

They put in the lighting, camera and earphone hook up. The complete broadcast is handled by VideoLink remotely. They manage our transmission, acting as a virtual cameraman, lighting and sound technician.

Say, MSNBC wants to do an interview with one of our Industry Analysts. MSNBC will coordinate with VideoLink and Edmunds to determine an air time. Prior to the broadcast one of our media technicians goes into the studio and turns on the room equipment. Once the equipment is powered up, VideoLink takes over.

Our analyst has to nothing more than sit at the studio desk and be an expert. VideoLink hands off the video feed to MSNBC and acts as cameraman for the MSNBC director running the entire broadcast.

The analyst interacts with the broadcaster via the remote controlled earphone that's part of the ReadyCam package. If something goes wrong in the studio, VideoLink takes care of the problem.

Using ReadyCam, we're able to create broadcast quality content without having to incur the cost of staffing and maintaining a broadcast quality studio.

It's pretty cool.

Where to Put Our Data Grid?

I wrote a short post about our data grid topology on my personal blog a few days ago. However, I feel that the topic deserves more conversation, so I'm going to cover it in a bit more detail here.

When we first started out, we used a RDBMs as the data repository to power our site. We had a very classic 3-tier architecture with big servers sitting behind the site that powered a collection of relational databases. As part of our development cycle, we moved code from a team's development servers to a shared development integration environment. From there, we deployed to a QA environment, then to a Staging environment and finally to Production.

At each stage the server configurations start to look more and more like Production. I won't get into how painful the deployment process has been in the past because we have made large improvements in automating our deployment processes. The point I want to stress here is that each environment had a full stack. At any given time there were at least four central database server farms that needed to be kept up-to-date with the latest data. In addition, there were times when a schema change was making its way through the stack, thus making the task of keeping data fresh even more problematic.

With our new architecture, we have moved away from a relational database. Now we are leveraging a Coherence data grid and Solr search servers. However, we have kept our original topology of having a full stack per environment.

But I still question myself: Given a modular, service-orientated architecture, does keeping a full stack in each of our many environments make sense? My hypothesis is that a full stack does not make sense.

With some of our other systems we have moved away from having a full stack per environment. For example, our publishing system is deployed as a shared resource. All of our environments plug into this shared environment. The publishing system still supports Development, QA/TEST, and Production environments. Changes to the publishing system still go through dev and test environments prior to being deployed into production, however, that stack is only used for pushing changes to the publishing system and only interact with pre-release environments to test actual changes to the publishing system. At any given point, all environments use the production version of the publishing system. My thinking is that we could take the same approach to our data storage systems. That is, both Solr and Coherence data services could be moved through a track where all environments would plug directly into the production version of the data service.

The advantage of working off of the production data service is that all environments would be in sync with data. Also, developers would be able to test changes more easily and ensure that their new code will work with what is in production. Such a deployment topology will allow us more visibility and control. We'll know the versions of our data services that are being used. Also, this new deployment topology will provide a streamlined mechanism for delivering changes to our data services. Since developers will be managing a shared resource with multiple clients, the data service developers will need to consider backwards compatibility while developing their code.

The disadvantage of using a deployment topology that works off of production data is that bad code in development can affect our production web site. This is a pretty big deal for Edmunds. Our entire revenue stream is derived from the web. Perhaps, what we need to do is use two production grids--one for internal/pre-Production use and one for the Production website?

What are your thoughts? Have any of you considered alternative deployment strategies for your data services? If so, what have you tried? I'd love to hear from anyone out there that has ideas, comments, suggestions.

Aloha,

Paddy Hannon

Edmunds at SoCal Code Camp

This Saturday and Sunday, October 23-24, some of us will be talking at SoCal Code Camp at USC.

On Saturday, Greg Rokita will be giving a talk on Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Web Site. During the same time slot, I will be giving a talk on Mitigating Advertisement Impact on Page Performance. On Sunday, Bob Reselman will give a talk on The Most Important Technology @ SoCal Code Camp.

We are looking forward to it. Stop by and say hello if you can. It would be great to hang out with fellow geeks.

We're also hiring!

How Edmunds Broadcasts to MSNBC

You'd think that these people were on airplanes all the time flying around the country to various studios. But, they're not.

So how does all this work?

We use a product named, ReadyCam by VideoLink. Our ReadyCam installation is turnkey. We paid VideoLink to come in and setup the studio for us.

Our analyst has to nothing more than sit at the studio desk and be an expert. VideoLink hands off the video feed to MSNBC and acts as cameraman for the MSNBC director running the entire broadcast.

The analyst interacts with the broadcaster via the remote controlled earphone that's part of the ReadyCam package. If something goes wrong in the studio, VideoLink takes care of the problem.

Using ReadyCam, we're able to create broadcast quality content without having to incur the cost of staffing and maintaining a broadcast quality studio.

It's pretty cool.

Where to Put Our Data Grid?

I wrote a short post about our data grid topology on my personal blog a few days ago. However, I feel that the topic deserves more conversation, so I'm going to cover it in a bit more detail here.

Aloha,

Paddy Hannon

Edmunds at SoCal Code Camp

This Saturday and Sunday, October 23-24, some of us will be talking at SoCal Code Camp at USC.

We are looking forward to it. Stop by and say hello if you can. It would be great to hang out with fellow geeks.

We're also hiring!

Tuesday, October 19, 2010

Brown Bag Series: Paddy Does Coherence Part 1

Paddy Hannon, VP of Enterprise Software and Data Architecture, talks about Oracle Coherence in a series of Brown Bags at Edmunds, Inc.

In this video, Paddy covers the following:- Coherence as a distributed hashmap
- Serialization methods in Coherence
- Coherence Portable Object Format, or POF.

Sunday, October 17, 2010

The Possibilities Ahead

A couple of books have been circulating around the office for the past few weeks. Both subject matters are directly related to our objective to continually promote and support a culture of innovation and creativity. The first book, Open Leadership by Charlene Li (Kindle|Print), discusses how to be a leader in a time where everyone is endlessly connected and where opinions and biases can make or break a business. The other book, Where Good Ideas Come From by Steven Johnson (Kindle|Print), presents a brief history of innovation through amusing anecdotes and provides a set of tools that help spot and nurture good ideas.

The two books might seem to cover unrelated subjects at a first glance, yet they are connected with common themes; Most notably, the concepts of The Adjacent Possible and Liquid Networks. These two themes have also been covered by Malcolm Gladwell in both Blink (Kindle|Print) and Outliers (Kindle|Print). Just about any business book that talks about innovation and creativity covers these two themes to a certain extent.

Steven Johnson writes that, "what the adjacent possible tells us is that at any moment the world is capable of extraordinary change, but only certain changes can happen." At the heart of the Adjacent Possible theory is that good ideas (or anything for that matter) will come to exist and thrive if and only if all the elements needed for its survival already exist and the proper connections between them are made.

That's what Charlene Li does for us in her Open Leadership book. She identifies the elements needed and the connections required to accomplish open leadership. She provides the parts and we're responsible for building the machine. The same is true for "thin-slicing" that Malcolm Gladwell talks about in Blink. Our ability to make quick decisions based on very limited information is a direct result of many years of cerebral and emotional growth that couldn't have been possible had the right elements not been available or the proper connections not been made. Come to think of it, the Adjacent Possible is the basis of every evolution be it natural or man-made.

The other theme is Liquid Networks, which references environments that promote what Steven Johnson calls, "information spillover." He uses MIT's Building 20 and Microsoft's Building 99 as examples of fluidity in the office space. Charlene Li's call for open leadership inherently promotes liquid networks. In Malcolm Gladwell's Outliers, the environments in which his subjects thrived were all liquid networks.

At Edmunds, we realize that there are a finite set of permutations of space, talent, and process that once implemented could create a fertile environment where extraordinary ideas become a reality. We have been exploring the "edges of possibility" for a while now and I'd like to think that we've made considerable progress. We have adopted Agile rather successfully for over three years now and we're currently experimenting with Design Thinking and seating arrangements to promote "information spillover" and hone our skills of identifying the parts needed to make our ideas adjacent possible.

This very blog and our soon-to-be-released APIs, products and tools are also part of our effort to expand our circle of influence for a broader information spillover. We believe that the community will identify and build new parts that will make bigger and grander ideas enter the realm of adjacent possible.

Above all, we believe highly in our talent and we are always looking for passionate individuals to join our ranks.

Exciting times do lie ahead :-)