SOA and Service Identification
Systems developed using the principles of a Service-Oriented Architecture can deliver an organization a decent improvement in flexibility, or if you like, agility. This increase of flexibility is obtained by reusing so-called services. By separating the implementation of orchestration and services, the resulting system can easily be changed by orchestrating the services in another way or by just using other services in a certain orchestration. In an ideal situation the services don't need to change. The question is now: how can we identify these services in such a way that we get the flexibility we need without losing performance or introducing a new governance problem?
Component versus Service
In the introduction services and their orchestration were mentioned, but what are services really? In the world of SOA talking about services mostly triggers two kinds of reactions. People with a ‘business background' think about ‘Business Services', while people with a more technical background mostly think about ‘Web Services'. To get a more unified view one should make a distinction between services and components.
A service delivers a certain business function. Both a single process step and an orchestration of more than one process step can be a service. A more formal definition is given by Grönroos in : "A (series of) activities of more or less intangible nature that normally, but not necessarily, take place in interactions between the customer and service employees and/or physical resources or goods and/or systems of the service provider, which are provided as solutions to customer problems". Of course a ‘customer' and a ‘service provider' do not have to be different companies.
A component can be seen as an IT-asset implementing one or more services. Theoretically a component is not linked to any environment. A more practical definition is give by Szyperski in : "A software component is a unit of composition with contractually specified interfaces and explicit context dependencies only. A software component can be deployed independently and is subject to composition by third parties". A business component is a software component, which implements a certain set of services out of a business domain.
Figure 1 visualizes the definitions for services and components and shows how they relate to each other.
Figure 1 – SOA Layers 
Issues in Service Identification
The identification of services can have much effect on the resulting IT landscape. In fact most of the advantages a Service-Oriented Architecture can offer depend on what the system classifies as services. More specifically: the granularity of the services is very important in reaching flexibility and reuse of services. Some other issues are also related to this granularity, a couple of them are explained now.. Note: I talk about automated services now, of course services exist which need human involvement, how to deal with that can be read in the previous post .
Flexibility: By employing a SOA one tries to establish a flexible IT landscape which is easily adaptable when changing business needs demand this. However, when all the needed functionality is defined in, for instance, three different services, not much differentiation is possible in orchestrations. By distinguishing a lot of small services, a lot of different orchestrations can be developed, which can also be reused as services. Roughly speaking: the higher the granularity, the higher the resulting flexibility.
Performance: Using orchestration and services can affect the performance of the total system. Nowadays BPEL (Business Process Execution Language) mainly is used for defining the orchestration, WSDL (Web Service Definition Language) for defining the service interface and SOAP (Simple Object Access Protocol) for defining the messages. All these standards are XML based. This makes them ‘human-readable', but also creates a lot of overhead for system-to-system communication. Imagine the difference in performance between a Java function call (which is compiled into byte code) and a service invoke sending a SOAP message over HTTP. Conclusion: the higher the granularity, the more SOAP calls are needed. More SOAP calls means a lower performance.
Reuse: The use of services, defined in a unified way, gives the opportunity to reuse these services in an easy way. A service directory can be created and different process orchestrations can reuse the same service. But the granularity of the services also affects the possibilities you have. Again, when specifying just a few services reuse is very hard or isn't possible at all. It is easy to see that using smaller services will give more opportunities for reuse.
Complexity: Implementing a SOA for a big enterprise can result in a lot of services. The governance of all these service is a big challenge. A service directory is needed with good search capabilities. Furthermore all services need to be specified in a clear, unified way. These metadata specifications are hot issues in the current market. But that's not all! Think about different versions of the same service due to further development, changing regulations, bug fixes, and so on. Also services developed in different business units of a company which are just slightly different can give a lot of trouble. When different business units use the same service the question arises who's responsible for it. You can imagine that a higher granularity will push the complexity to a maximum.
Figure 2 summarizes the issues in service identification. Neither try to see relations between distances and curves nor search for scientific foundations, the Figure just attempts to clarify how they relate. On the x-axis the granularity is shown. From left to right the granularity increases meaning that the services become smaller. On the y-axis the four issues are shown. Low and high complexity, flexibility and performance are straightforward. By a high reuse the percentage of services that are used in more than one orchestration is meant.
Figure 2 – Overview of issues in service identification
How can we find the optimal decomposition of our business needs into services? First some issues in component identification are explained, subsequently a strategy for choosing the service landscape which best fits the enterprise architecture is proposed.
Issues in Component Identification
Which services and data should be in the same component? This question doesn't have one ‘right' answer. When identifying the different components also a lot of issues play a role. a couple of them are explained in detail.
Existing systems: When attempting to identify business components one always has to deal with existing systems. A green-field approach will (almost) never occur. This means that the existing IT landscape has to be analyzed to determine which components already exist and which services they deliver. This existing components can be best-of-breed or custom-made applications. The problem with these components is that they not always fit exactly into the new service-oriented architecture. If they deliver more services than you need, these services should be disabled. However, not every application supports that. In some cases it is also difficult to make system-to-system connections with such applications. For example, when you need data from an existing application for use in another component, or when you would like to force an existing application to use a data source you provide. This article will not delve deeper into this field, called Enterprise Application Integration, but be aware of the difficulties influencing the identification of business components. For more information concerning this subject, a good starting point is the book of Linthicum .
Performance: Which services and data are coupled in the same component can affect the performance of the whole system. In principle the following heuristic can be applied: "Choose the elements so that they are as independent as possible; that is, elements with low external complexity (low coupling) and high internal complexity (high cohesion)". This is not only a good heuristic for reducing the complexity in your system, it is easy to see that coupling elements with a high cohesion reduces the component-to-component communication needed. When components are deployed on different servers or when communication protocols are used with some overhead, the performance advantages delivered by a good component identification process are huge.
Maintainability: One of the most important issues in IT is maintainability. Maintainability can be defined as "The ease with which a software system or component can be modified or adapt to a changed environment". Using small, well-defined components satisfying the heuristic stated above can help a lot in increasing the maintainability of a system. Big, monolithic components often lead to so-called spaghetti-code, meaning that they are horrible to maintain for anyone else than the developers who build the component. This leads to the same conclusion as for the previous issue: a good component identification process can increase maintainability significantly.
So component identification is important, as is service identification. Can this be achieved optimally? Or do we have some best practices?
Service identification can best be performed in a top-down manner. After defining the process architecture the needed services can be determined. The issues mentioned before, complexity, flexibility, reuse and performance, should be kept in mind. As we have seen using small services gives us a high flexibility and services can be reused very much. On the other hand complexity will become higher and higher, while performance decreases. An optimal size for each service doesn't exist. The best approach is to determine the processes within the process architecture of the enterprise with which the enterprise differentiates itself from competitors. For example, processes for accounting mostly are not the differentiators, but processes describing the handling of client support could be. The services used in these processes should be kept small to achieve high adaptability. The alignment of business and IT should be as high as possible for these processes. For other processes best-of-breed applications can be bought.
Business component identification can be done in a more analytical way. First, both the existing application architecture and the best-of-breed systems which should be bought, must be analyzed in a bottom-up approach. Once the existing components and the services they deliver are listed the gaps between the services identified in the previous step and the existing services listed should be filled in. In order to do this a formal approach, the three dimensional method for business components identification (BCI-3D), can be used for determining an (near) optimal composition in components based on the already stated heuristic, "Choose the elements so that they are as independent as possible; that is, elements with low external complexity (low coupling) and high internal complexity (high cohesion)", in which ‘elements' can be replaced by components. BCI-3D aims at grouping business tasks and their corresponding information objects into business components. So, based on the process architecture and information architecture services and data can be grouped into components. The basis of this method is the construction of a weighted graph containing all services from the process architecture and all data objects from the information architecture as nodes. The edges consist of service-service connections, service-data connections and data-data connections. In Figure 3 an example is shown. The nodes at the top of the graph represent the data objects, the nodes at the bottom represent the different process steps or services. The graph is constructed based on a process and information architecture define in DEMO (Design & Engineering Methodology for Organisations). Details can be found in .
Figure 3 – Relevant relationships for the business component identification method (BCI-3D)
Each edge can be weighted based on the strength of the relationship between two nodes. For example, the relation between a service which creates (and thus owns) a certain object is much stronger than between a service which just uses a certain object. So the edge corresponding with the first relation has a much bigger weight than the edge corresponding with the second relation. Once the graph is fully constructed a starting solution (decomposition of the graph) is generated using a greedy graph partitioning algorithm . The Kernighan and Lin graph-partitioning algorithm  improves this starting solution to an optimal solution. The result is an optimal decomposition of the given graph into sub graphs with high intern costs and low extern costs: the components we are searching for.
Building an IT landscape for an enterprise using a SOA isn't trivial. The identification of services and business components is very important for a lot of issues. The list of issues mentioned is not exhaustive. If you either know other important issues or have experience implementing a SOA (especially with regard to the identifaction of services and components), please react!
 Grönroos, C. Service Management and Marketing. A customer relationship management approach. 2nd edition, Chichester: Wiley, 2001.
 Szyperski, Clemens. Component Software – beyond Object-Oriented Programming, Addison Wesley, 2002.
 Tilak Mitra, Business-driven Development. 09-12-2005. http://www-128.ibm.com/developerworks/webservices/library/ws-bdd/index.html#figure3
 D. Linthicum. Enterprise Application Integration. Addison-Wesley, 2000.
 Matinlassi Mari, Niemelä Eila. The Impact of Maintainability on Component-based Software Systems. Proceedings of the 29th EUROMICRO Conference "New Waves in System Architecture" (EUROMICRO'03), 2003.
 Antonia Albani, Jan L. G. Dietz. Enterprise Ontology based design of Inter-Enteprise Information Systems. 2006.
 Jungnickel, D. "The Greedy Algorithm," in: Graphs, Networks and Algorithms, D. Jungnickel (ed.), Springer, Berlin, 2005, pp. 123-146.
 Kernighan, B.W., and Lin, S. An efficient heurisitc procedure for partitioning graphs. Bell Systems Technical Journal (49) 1970, pp 291-307.