The Data Hub Workflow
Data Hub implements a workflow that is common to all data integration use cases. It consists of three primary phases: Loading, Composition, and Publication, where data is finally exported to target systems. Data is first converted to raw items and is then composed into a canonical view. Finally, it is transformed into target items for publication to the target system. This process is represented in the diagram below.
Source àLoad à Composition à Publication à Target Adapter
Spring Integration provides the primary channel for loading data from data sources. This can be done using virtually any TCP or UDP based protocol supported by the Spring inbound channel adapters. During this load phase, the data is converted to raw items ready for processing. This usually means splitting the rows, removing duplicates, and resolving the data into raw item fragments as key-value pairs ready for composition.
Data Hub is designed to handle concurrent processing, loading raw data items into one or more parallel data feeds, and with one or more data feeds supplying data pools. In this way, incoming data may be controlled and segregated into macro groups for composition, and processed more quickly. By default, Data Hub has a single default feed and a global data pool. New strategies for loading data via feeds into data pools can easily be configured by defining additional feeds and pools.
Due to the nature of raw items entering the data feed as fragments, rather than a single, monolithic data block, any errors do not interrupt the data load and multiple retries on corrected errors are allowed. A simple RESTful GET request returns the entire history of all actions in a given Data Hub data pool. This makes Data Hub the perfect staging platform for a master data management strategy.
The next phase, converting the data into canonical items, consists of two processes: grouping and composition. Both of these are controlled by the use of handlers. Grouping handlers pull the raw imported items into coherent groups, while composition handlers apply the composition rules by which the canonical items are composed. New composition rules can be achieved by implementing new, custom handlers. Additionally, the impact of composition handlers can be affected by simply changing their order or execution.
Canonical items represent a master data type view that is independent of the structure of both source and target systems. It is during this phase that the power of Data Hub as a data staging platform is seen. The canonical view provides a reference model, a standard or template that may be reused regardless of source or target system. Switching from one source or target to another need have no impact on the structure of this canonical data, which represents a data archetype.
It is also during this phase that data can be consolidated from multiple sources, the integrity of data checked, and any quality issues remedied. The entire history of the data transformation is stored, creating an archive that can be used for auditing purposes. Imported data is open to inspection at any phase in the Data Hub workflow, allowing complete transparency of the data processing, and error remediation prior to publishing to target systems.
From the canonical item view, data is transformed into target data items ready for export to the target system. Publication handlers transform or exclude items, or convert them into a list of other, possibly modified, canonical items. You may also choose to publish canonical items that have a DELETED status, and, if your target adapter supports this, the items are removed from your target system. As with other phases of the data transformation process, custom publication handlers may be written to transform the data from the canonical view to virtually any target structure. However, these changes are not persisted.
Once the data has been processed into target items, outbound extensions or adapters then provide the means for delivering the data to target systems.