What is data integration?
Data integration unites data from many different sources and formats into a single, unified view, so it can be consistently accessed across the enterprise.
Achieving this goal on a large scale depends on the success of various smaller-scale operations. For example, to answer a single query submitted by a business user in the enterprise, harnessing data from dozens of sources—enterprise resource planning (ERP) tools, websites, customer-facing applications, and so on—may be necessary to deliver a result. A data integration platform can unite those data points into a cohesive set.
Similarly, modern data integration can help eliminate or significantly mitigate any issues that might arise from a diversity of data tools within teams across the enterprise. Such teams—perhaps even within the same unit of the business—may all be using different tools from different vendors. This, of course, has the potential to create plenty of redundancies, inefficiencies, and inaccuracies, which is why integration is so crucial.
4 common integration categories
There are a number of different ways for enterprises to go about the process of data integration. First, there are general umbrella categories into which many individual integrations can fall:
1. ETL
Many forms of data integration involve extract, transform, and load (ETL) operations, in which copies of data sets from multiple data sources are collected, harmonized, and loaded into a data warehouse or database.
2. ELT
With extract, load, and transform (ELT), data isn't copied but is instead moved intact and loaded into a repository built for big data—to eventually be transformed for use in big data analytics projects.
3. Data virtualization
Data virtualization allows data engineers and scientists to present a unified view of data that has been collected from disparate sources by using a virtualized interface. Data virtualization interfaces typically allow users to present views based on different data models, rather than a single unifying model. This involves moving data from the source platforms each time a query is presented, which sometimes may be resource-intensive or expose some latency issues.
4. Data propagation and data streaming
With data propagation, applications are used to make duplicates of data—usually from a single data source, but sometimes from multiple databases. The copies are then transferred to other databases. Data propagation techniques, including change data capture (CDC), are useful for certain real-time data transactions. Data replication is another type of propagation, often used for backup and disaster recovery. The replication mechanism doesn't involve integration per se, but can be used to that extent.
Meanwhile, as the name of the process implies, streaming data integration is necessary for anyone looking to unite many streams of real-time data into one continuous stream.
Data integration strategies
The most notable specific strategies of data integration, meanwhile, are as follows:
Manual integration
In a manual data integration, data engineers will need to manually track down pertinent data from every relevant source, accessing different interfaces as needed and combining them to present all necessary data at once. This is only "integration" in the strictest sense, as it doesn't provide a literal unified view of the data.
The chief advantage of this strategy is that it's inexpensive, and may be viable for certain niche projects, or in small organizations. But for most enterprise-scale use cases, manual integration may be counterproductive. It's a time- and resource-intensive process that can be prone to data entry errors.
Application-based integration
This integration strategy, sometimes known as enterprise application integration (EAI), uses applications as the engine of its success. The applications locate and integrate data from a wide range of sources by communicating with one another through application program interfaces (APIs).
Because it is largely automated, application-based integration is convenient for data teams. It can also be ideal for organizations that split their data resources between cloud and on-premises infrastructure through hybrid cloud deployments.
On the other hand, the process can be difficult to manage and execute for enterprises that have more than a small number of mission-critical on-premises or cloud apps. Latency and cost issues may arise when moving between different deployment options. Also, the automation in application-based integration requires careful planning of which sources will be integrated and how common keys and data types will be handled.
Uniform data access
This strategy allows for a unified view of data without extracting it from any source system, doing so via a model that "translates" data from different sources so that the data is uniform when presented.
The simplified access and reduced data storage needs are notable advantages of this strategy. But simultaneously accessing many sources can put a significant burden on the overall system. It also involves the complexity of dealing with several different technologies—all of which may have different data types, data structures, and query languages.
Middleware data integration
Software known as middleware—which serves as a layer between applications and the operating system they run on—facilitates this integration strategy. Middleware data integration can be quite useful for app development.
Unfortunately, it can be time-intensive and expensive to implement and maintain middleware. Development costs may add up quickly, and optimal use of this integration method requires the oversight of expert staff.
Hand-coded integration
Data teams that take this approach choose to develop a proprietary integration solution by building it from scratch with structured query language (SQL), Python, or other codes commonly used in data engineering. A DIY integration like this gives engineers the most control, but also requires considerable time and effort to create and maintain.
Common data storage
With this method, data is copied from various source systems and placed in a data warehouse—as such, it's also called data warehousing or physical data integration. Common data storage is both efficient and ideal for analytics needs, but creating copies increases data storage needs, which can lead to rising costs. Organizations interested in this strategy will benefit from a data warehousing platform that isn't bound by the process's traditional limitations.
Data integration benefits and use cases
Integrating data can give significant advantages to enterprises. It simplifies data access and visibility for all business units. When teams and departments have more data at their disposal, they can use it more effectively, which boosts its inherent value.
Data integration also makes collaboration easier and more efficient, as the increased access helps keep siloing to a minimum, enabling self-service and data democratization. Last but not least, the process helps organizations maintain the accuracy and integrity of their data.
The value of data integration is perhaps best illustrated by its value in specific settings.
Unify the healthcare ecosystem
Medical professionals deal with many different sources of data, ranging from electronic health records (EHRs) to various clinical data systems. Integration helps caregivers, healthcare facility staff, medical insurers, and patients stay on the same page.
Improve marketing and the customer experience
Customer data integration is one of the most common data integration use cases. Consolidating customer data and presenting it in a unified view creates a fuller, more accurate portrait of every buyer. As a result, enterprises can market the most relevant products to prospects and customers, and further improve their experience when interacting with customer service agents in an omnichannel environment.
Consolidate business and customer data
Integrating customer data with information from across the business's departments gives company leaders the ability to create a single source of truth about the state of the company. Customer sales data can be juxtaposed with KPIs regarding the supply chain or manufacturing units. This permits a more holistic approach to problem-solving. For example, a C-suite leader looking at integrated data can say, "These specific supply chain issues are causing this particular revenue loss. How do we fix this?"
Overcoming integration challenges with cloud analytics
Integrating data from legacy systems and newer data sources—e.g., data hosted on-premises vs. cloud-based data—can sometimes be difficult, with incompatibility risks in either scenario. Also, once a data integration solution is implemented, the data team can't set it and forget it, because inconsistent oversight can lead to data quality issues, compliance lapses, and other problems.
The most imposing challenge is the sheer volume of data that enterprise-scale organizations produce, process, and manage. And data generation will continue to grow. This volume can make data integration difficult. Developing a strong data integration strategy—with buy-in from all relevant stakeholders—that takes these and other factors into account is critical. But it's equally important to have the right data integration tool backing up such an effort.
Teradata Vantage can be that tool. Its cloud data integration capabilities, cloud compatibility, and unparalleled analytics engines help you not only successfully integrate data, but also analyze and operationalize it for maximum bottom-line benefit.
To learn more, read Gartner's 2021 Magic Quadrant report, which names Teradata a leading vendor in the cloud database management system (DBMS) space.
Unlock more answers with data integration