Thursday, October 21, 2010

Where to Put Our Data Grid?

I wrote a short post about our data grid topology on my personal blog a few days ago. However, I feel that the topic deserves more conversation, so I'm going to cover it in a bit more detail here.


When we first started out, we used a RDBMs as the data repository to power our site. We had a very classic 3-tier architecture with big servers sitting behind the site that powered a collection of relational databases. As part of our development cycle, we moved code from a team's development servers to a shared development integration environment. From there, we deployed to a QA environment, then to a Staging environment and finally to Production.


At each stage the server configurations start to look more and more like Production. I won't get into how painful the deployment process has been in the past because we have made large improvements in automating our deployment processes. The point I want to stress here is that each environment had a full stack. At any given time there were at least four central database server farms that needed to be kept up-to-date with the latest data. In addition, there were times when a schema change was making its way through the stack, thus making the task of keeping data fresh even more problematic.


With our new architecture, we have moved away from a relational database. Now we are leveraging a Coherence data grid and Solr search servers. However, we have kept our original topology of having a full stack per environment.


But I still question myself: Given a modular, service-orientated architecture, does keeping a full stack in each of our many environments make sense? My hypothesis is that a full stack does not make sense.


With some of our other systems we have moved away from having a full stack per environment. For example, our publishing system is deployed as a shared resource. All of our environments plug into this shared environment. The publishing system still supports Development, QA/TEST, and Production environments. Changes to the publishing system still go through dev and test environments prior to being deployed into production, however, that stack is only used for pushing changes to the publishing system and only interact with pre-release environments to test actual changes to the publishing system. At any given point, all environments use the production version of the publishing system. My thinking is that we could take the same approach to our data storage systems. That is, both Solr and Coherence data services could be moved through a track where all environments would plug directly into the production version of the data service.


The advantage of working off of the production data service is that all environments would be in sync with data. Also, developers would be able to test changes more easily and ensure that their new code will work with what is in production. Such a deployment topology will allow us more visibility and control. We'll know the versions of our data services that are being used. Also, this new deployment topology will provide a streamlined mechanism for delivering changes to our data services. Since developers will be managing a shared resource with multiple clients, the data service developers will need to consider backwards compatibility while developing their code.


The disadvantage of using a deployment topology that works off of production data is that bad code in development can affect our production web site. This is a pretty big deal for Edmunds. Our entire revenue stream is derived from the web. Perhaps, what we need to do is use two production grids--one for internal/pre-Production use and one for the Production website?


What are your thoughts? Have any of you considered alternative deployment strategies for your data services? If so, what have you tried? I'd love to hear from anyone out there that has ideas, comments, suggestions.


Aloha,


Paddy Hannon






4 comments: