Data Sharing in a Service-Oriented Environment
Enterprise software has become very complex. SOA (Service-Oriented Architecture) is mostly presented as the solution. It can be discussed if it is a good solution. As I told in a previous article (see Model-Driven SOA) I have my doubts. For solving the current problems in enterprise software at least a model-driven approach is needed. A model-driven approach can bridge the gap between business and IT. It can also deliver the flexibility so much needed in the current fast changing markets. However, I miss another thing in most SOA stories. When reading these stories I mostly ask myself: where is the data?
In a previous article (SOA defined in a formal way) I gathered some common principles I found in existing definitions of SOA.
- IT assets are described and exposed as Services.
- Services mirror real-world business activities – comprising the enterprise (or inter-enterprise) business processes.
- Services have standardized contracts.
- Service contracts impose low consumer coupling requirements and are themselves decoupled from their surrounding environment.
- Services exposes their functionality through an abstract interface.
- Service are reusable.
- Services can be composed into higher-level business processes.
- Services are supplemented with communicative meta data by which they can be effectively discovered and interpreted.
As you can see all these principles focus on the services and their functionality. In principle that isn’t a bad thing. Because of the nature of the services they are reusable, composable, scalable, and so on. However, where do all these services get their data from? What about stateful services? Little articles can be found addressing these subjects, but that recently has changed.
Grid-Enabled Service-Oriented Architecture
David Chappell and David Berry have written a very nice article in The SOA Magazine. In their article, SOA – Ready for Primetime: The Next-Generation, Grid-Enabled Service-Oriented Architecture , they present an approach solving the SOA performance challenges. They pay a lot of attention to stateful services and data-handling.
To get a high availability and reliability for stateful services they propose a mid-tier caching of stateful services:
A critical part of a grid-enabled SOA environment is a middle-tier caching layer. This layer provides a JCache-compliant, in-memory, distributed data grid solution for state data that is being used by services in a service-oriented solution.
The middle-tier caching layer offloads the memory storage of a service instance to other machines across the grid. This effectively provides a distributed shared memory pool that can be linearly scaled across a heterogeneous grid of machines (which can include both high-powered, big-memory boxes, and other lower-cost commodity hardware).
In a grid-enabled, stateful service-oriented environment (one that makes use of this middle-tier caching layer), all the data objects that an application or service puts in the grid are automatically available to and accessible by all other applications and services in the grid, and none of those data objects will be lost in the event of a server failure. To support this, a group of constantly cooperating caching servers coordinate updates to shared data objects using cluster-wide concurrency control. (read more )
They also propose a nice solution for the way services make use of database persistence:
The whole point of in-memory access is to avoid the overhead of database persistence, which is known to become a bottleneck under peak processing loads. To address this issue, an SOA grid can make use of asynchronous write-behind queues, which eventually update to a database (see Figure 2). We stress the words "asynchronous" and "eventually" here because the operation of updating the database should not interfere or block the update of the grid that is holding the state data for the application or service. (read more )
David and David focus mostly on performance, but in the mean while they present a very nice approach for data handling within a SOA. In principle they define a layer between data and services. An interesting question now is: what things should be kept in mind when designing such a layer?
In search for good theory about this subject I stumbled upon a phd thesis about Extended Enterprise integration approaches by Frank Goethals . A part of this thesis is about the B2B information sharing solution space. I think this theory can also be applied on the way services share data. Also when services only act within organizational borders it is still possible to apply this theory because of the nature of services. Services (or better, the components implementing one or more services) should have contractually specified interfaces and explicit context dependencies only (see SOA and Service Identification). Because of this properties there is (theoretically) no difference between services hosted within or outside an organization.
The problems that show up when trying to share data between services, according to Goethals’ theory  (based on a sound theoretical and practical research), are as follows:
- Valuable information sharing practices have to be identified
- The services have a different viewpoint upon objects.
- An appropriate data format has to be defined.
- Different parties have to make investments (this point is less relevant if the data sharing doesn’t cross organizational borders. However, it can still be a point if different departments (with their own budgets) implement their own services which have to share data)
- Services become dependent upon the service levels provided by the data sharing systems.
- Services must preserve the value of the functional proposition of the data sharing.
- Data ownership may not be well arranged.
- The involved parties change over time.
Goethals has defined a solution space for this problems consisting of three dimensions:
- Decision control: who makes decisions about data.
- Data storage: where is the data stored and who stores it.
- Data transmission: how is the data transferred and who transfers it.
These three dimensions are visualized in Figure 1. Five possible solutions are also shown in this solution space. The two extremes are:
- Shared space: everything centralized.
- Point-to-Point exchanges: everything decentralized.
Figure 1 – The 3-dimensional information sharing solution space with 5 mapped solutions in it 
In the thesis each solution is evaluated against the eight problems presented before and against each other.
Data sharing in a service-oriented environment doesn’t get enough attention. David Chappell and David Berry present a nice approach addressing this problem. However, data sharing isn’t trivial and has more than one solution. In making a good choice for a data sharing solution it is important to pay attention to some common problems and to evaluate the possible solutions against a solution space. Such a solution space is presented by Frank Goethals in his phd thesis. I recommend everyone, struggling with a data sharing problem, to read his thesis!
 David Chappell and David Berry, SOA – Ready for Primetime: The Next-Generation, Grid-Enabled Service-Oriented Architecture. The SOA Magazine, 2007.
 Frank Goethals, Classifying and Assessing Extended Enterprise Integration Approaches. Katholieke Universiteit Leuven, December 2006.