Hardly any integration project fails due to technical issues. When analyzing the reasons for a failed project, it is mostly the discussion and communication of the different involved parties that causes the project to be terminated prematurely.
One of the most fiercely fought battles is surrounding the ownership of core data. This discussion is less about the responsibility for important, but very static reference information such as postal codes, calendar information – but more about information that is rapitliy changing and is also a core to the busioness: client information, contracts,… The fight is very often not about the right place to store the information – it is following the rule that ownership brings power.
If I hold the ownership of the customer data – I own the customer and the other departments havew to come to me if they want their share of the information. I participated in a number of discussions and design meetings which did not try to solve a technical problem – but which were more focussed on company management issues, as the right way to handle information and who owns the business processes. If the design meetings are running into this direction you are only a second away from the worst possible catastrophe of the integration project: political design paralysis and immenent project termination. Bad new: Before the project is terminated a lot of time and money is going to be spend and a lot of people are working for the bin. And that is not the worst outcome: As a result of the termination a general look for a guilty party will always point to the integration team – as the weakest link in the organization. The underlying cause for the problem – the failed discussion on data and functional ownership – will not be addressed.
I am not the only architect who has seen these. Every integration architect with a number of years expirience can talk about these war stories – you will not hear them during the conferences and product meetings with the vendors, but in the evenings when they meet their friends and former co-workers.
Coming home from these meetings and knowing these discussions every architect starts thinking, trying to find a technical solution for an organizational problem. And in this case – there even might be one.
If the best place for the data is not in the applications, why not keep it in a place which is neutral to all the parties – in the integration infrastructure. As I have done my home work I have a canonical representation of the data structure. I even have it down to the level that attributes are typed (which is a very good practice if I want to avoid integration problems), so why should I not keep the data in the canonical form in the integration layer?
Technically the solution sounds pretty much straight forward. For the core entities of the operation/company/firm you develop a number of core services. These services can be used by the different business applicastions to coordinate (create, update, retrieve…) their internal representation of the business entities with each other. And as these entities change, so the update to the central storage of the data is updated.
Similar approaches for a different kind of data is already in use. Many organizations use a central repositories for their master data, such as reference information, ZIP codes, etc. These storages became important as a technical answer to a technical issue: how to coordinate base information on a technical level. The most common example for a central storage of information is the very well known and introduced handling of user accounts in an LDAP server or Active Directory structure.
So – technically the storage of the data in the infrastrucvture is not a problem. The necessary frameworks are either provided by the vendors of the integration frameworks (like Tibco) or can be build on a basic level by the integration team itself. Some service buses do not offer the level of guruanteed delivery the integration architect or business requires and as a result to this, the integration team has already build a level of storage in the bus to ensure that the data is correctly delivered. These solutions can be updated and expanded to accommodate also the missing services you need for the integration approach.
So – the need for a solution might be there, the technology is there for sure… – where are the pitfalls of the solution – why has it not been used everywhere yet?
The first reason are – as usual – costs. By building the central data repositiory another replica of data is generated – of data that is already in the environment. This additional data storage does produce costs, in the generation of the solution, but also operational costs when running and modifying the structures.These costs can be quantified, whereas the savings are more hidden.
The first level of saving is in the increased level of data consistency in the IT environment. This is an indirect saving to the operations manager, as he can reduce the number of staff for the maintenance of the existing data and has less service calls to his departments.
A second level of savings are in the application independent storage of corporate information. This becomes important if the decision is made to either introduce new functionality and solutions, or if existing solutions are replaced. In the first case the centrally stored information is a good basis for the population of the new systems, some enrichment processes will be required, but can be planned in. For the replacement of existing solutions, the central data storage is a perfect master copy of the core data and for data cleansing on the way.
A second reason for the missing popularity of the solution is missing sponsorship. What was the reason for the discussion in the first place – the fight on the ownership of the information – is also a problem for the solution. The owner of the solutions established in thecorporate world have to become a sponsor for a part of functionality that is outside of their reign. This obsical can only be passed if the lead of the integration has enough substance to the hands to build the solution outside of the direct project planning. Therefore the described solution is very successful in organizations in which the integration of applications is seen as a core functionality and equipped with project independent budget. As part of a single project, this solution might have a less successful reception. I can usually determine the abilitiy of an integration team to implement this kind of solution by looking into the organization and the sponsering of the canonical data model. I found that organizations that handle their canonical model as a permanent central function of their integration are much better suited to build this solution.
The final obstical for the implementation of the solution is the timeline of many integration projects. If the conception of the central storgage becomes part of the critical path of the project it is very likely that it is rushed and build as a pure data cache. It then loses a lot of the features which make it a benefit for all involved parties:
– knowledge of the structure of information is availble during the integration design process: the connectivity and extention of the central storage remain the challenges for the work in the project
– additional synergies – e.g. the use of the central data storage for business intelligence processes – can only be introduced as part of independent projects as only these projects sponsor the maintenance and modification of the register.
To summarize the said: The central data repository can a most useful tool in an organization that has very strong and independent systems which need to be integrated. It removes some of the main obsticals for the successful completion of integration projects: the discussion of data ownership by introducing a central and independent place to handle the data. It is technically available and can be introduced into a service architecture. But it also establishes permanent costs which need to be justified by benefits exceeding the use as a neutral broker.
A final word of wisdom to all the architects who read this: Do not rely on the technical solution for a management problem. The solution described here might work in many cases, but more often you need to talk to the sponsering manager to address the underlying problems.