Marketing and Business Development

Robert Eve

Subscribe to Robert Eve: eMailAlertsEmail Alerts
Get Robert Eve: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: Virtualization Magazine, SOA & WOA Magazine

Virtualization: Article

Ten Mistakes to Avoid When Virtualizing Data

Meeting the ever-changing information needs of today's enterprises

Mistake #3 - Missing the Hybrid Opportunity
In many cases, the best data integration solution is a combination of virtual and physical approaches. There is no reason to be locked into one way or the other. Figure 2 illustrates hybrid use cases, followed by a description of some examples.

  • Physical Data Warehouse and/or Data Mart Schema Extension: This is a way to extend existing schemas, such as adding current operations data to historical repositories.
  • Physical Warehouses, Marts and/or Stores Federation: This is a way to federate multiple physical consolidated sources, such as two or more sales data marts after a merger.
  • Data Warehouse and/or Data Mart Prototyping: This is a way to prototype new warehouses or marts, to accelerate an early stage leading into a larger BI initiative.
  • Data Warehouse and/or Data Mart Source Data Access: This is a way to provide a warehouse or mart with virtual access to source data, such as XML or packaged applications that may not be easily supported by the current ETL tool, or to integrate readily available, already federated views.
  • Data Mart Elimination: This is a way to eliminate or replace physical marts with virtual ones, such as stopping rogue data mart proliferation by providing an easier, more cost-effective virtual option.

Mistake #4 - Assuming Perfect Data Is Prerequisite
Poor data quality is a pervasive problem in enterprises today. While correcting and perfecting source data is the ultimate goal, we may end up leaving our source data alone while settling for cleaning up the data in a warehouse or mart during the consolidation and transformation phases of physical data consolidation.

When data quality issues are simple format discrepancies that reflect implementation details in various systems, data virtualization solutions easily resolve these types of common data discrepancies with zero impact on performance. Some examples include a Part_id field in one source system that reads VARCHAR, while a similar field in another source has INTEGER. Or, Sales_Regions in one system does not match Field_Territories in another. If "heavy-lifting" cleanups are required, integrating with specialized data quality solutions at runtime often meets the business needs, while opening up the opportunity for data virtualization.

Mistake #5 - Anticipating Negative Impact on Operational Systems
Although operational systems are often one of the primary data sources used when virtualizing data, the runtime performance of these systems is not typically impacted as a result. Yet, designers have been schooled to think about data volumes in terms of the size of the physical store and the throughput of the nightly ETLs. When using a virtual approach, designers should instead consider the amount of data that end solutions will actually query on any individual query, and how often these queries will run. If the queries are relatively small (for example, 10,000 rows) and broad (across multiple systems and/or tables), or run relatively infrequently (several hundred times per day), then the impact on operation systems will be light.

System designers and architects anticipating negative impact on operational systems are typically underestimating the speed of the latest data virtualization solutions. Certainly, Moore's Law has accelerated hardware and networks. In addition, 64-bit JVMs, high-performance query optimization algorithms, push-down techniques, caching, clustering and more have advanced the software side of the solution as well.

Taking the time to calculate required data loads helps avoid misjudging the potential impact on the operational systems. One best practice for predicting actual performance impact is to test-drive several of the biggest queries using a data virtualization tool of choice.

Mistake #6 - Failing to Simplify the Problem
While the enterprise data environment is understandably complex, it is usually unnecessary to develop complex data virtualization solutions. The most successful data virtualization projects are broken into smaller components, each addressing pieces of the overall need. This simplification can occur in two ways: by leveraging tools and by right-sizing integration components.

Data virtualization tools help address three fundamental challenges of data integration:

  1. Data Location: Data resides in multiple locations and sources.
  2. Data Structure: Data isn't always in the required form.
  3. Data Completeness: Data frequently needs to be combined with other data to have meaning.

Data virtualization middleware simplifies the location challenge by making all data appear as if it is available from one place, rather than where it is actually stored.

Data abstraction simplifies data complexity by transforming data from its native structure and syntax into reusable views and Web services that are easy for business solutions' developers to understand and business solutions to consume.

Data federation combines data to form more meaningful business information, producing a single view of a customer or a get inventory balances composite service, as examples. Data can be federated from both consolidated stores such as the enterprise data warehouse as well as original sources such as transaction systems.

Successful right-sizing of data integration components requires smart decomposition of requirements. Virtualized views or services built using data virtualization work best when aimed at serving focused needs. These can then be leveraged across multiple use cases and/or combined to support more complex needs.

A recently published book by a team of experts from five technology vendors including Composite Software, An Implementor's Guide to Service Oriented Architecture - Getting It Right, identifies three levels of virtualized data services that allow designers and architects to design smaller, more manageable data integration components as follows:

  • Physical Services: Physical services lie just above the data source, and they transform the data into a form that is easily consumed by higher-level services.
  • Business Services: Business services embody the bulk of the transformation logic that converts data from its physical form into its required business form.
  • Application Services: Application services leverage business services to provide data optimally to the consuming applications.

In this way, solution developers can draw from these simpler, focused data services (relational views work similarly), significantly simplifying their development efforts today, and providing greater reuse and agility tomorrow.

Mistake #7 - Treating SQL/Relational and XML/Hierarchical as Separate Silos
Historically, data integration has focused on supporting business intelligence applications needs, whereas process integration focused on optimizing business processes. These two divergent approaches led to different architectures, tools, middleware, methods, teams and more. However, because today's data virtualization middleware is equally adept at relational and hierarchical data, it is a mistake to silo these key data forms.

This is especially important in cases where a mix of SQL and XML is required; for example, when combining XML data from an outside payroll processor with relational data from an internal sales force automation system to serve XML data within a single view of a sales rep performance portal.

Not only will a unified approach lead to better solutions regardless of data type, but developers and designers will gain experience outside their traditional core areas of expertise.

Mistake #8 - Implementing Data Virtualization Using the Wrong Infrastructure
The loose coupling of data services in a services-oriented architecture (SOA) environment is an excellent use for data virtualization. As a result, SOA is one of data virtualization's most frequent use cases. However, there is sometimes confusion about when to deploy enterprise service bus (ESB) middleware and when to use information servers to design and run the data services typically required.

ESBs are excellent for mediating various transactional and data services. However, they are not designed to support heavy-duty data functions such as high-performance queries, complex federations, XML/SQL transformations, and so forth as required in many of today's enterprise application use cases. On the other hand, data virtualization tools provide an easy-to-use, high-productivity data service development environment and a high-performance, high-reliability runtime information server to meet both design and runtime needs. ESBs can then mediate these services as needed.

Mistake #9 - Segregating Data Virtualization People and Processes
As physical data consolidation technology and approaches have matured, supporting organizations in the form of Integration Competency Centers (ICC) along with best practice methods and processes have grown in support. These centers improve developer productivity, optimize tool usage, reduce project risk, and more. In fact, 10 specific benefits are identified in a book written by two experts at Informatica, Integration Competency Center: An Implementation Methodology.

It would be a mistake to assume that these ICCs, which have evolved from support of physical data consolidation approaches and middleware, can not or should not also be leveraged in support of data virtualization. By embracing data virtualization, ICCs can compound the technology value of data virtualization with complementary people and process resources.

Mistake #10 - Failing to Identify and Communicate Benefits
While data virtualization can accelerate new development, perform quicker change iterations, and reduce both development and operating costs, it's a mistake to assume these benefits sell themselves, especially in tough business times when new technology investment is highly scrutinized.

Fortunately, these benefits can (and should) be measured and communicated.  Here are some ideas for accomplishing this:

  • Start by using the virtual versus physical integration decision tool described previously to identify several data virtualization candidates as a pilot.
  • During the design and development phase for these projects, track the time it takes using data virtualization and contrast it to the time it would have taken using traditional physical approaches.
  • Use this time savings to calculate two additional points of value: time to solution reduction and development cost savings.
  • To measure lifecycle value, estimate the operating costs of extra physical data stores that are saved because of virtualization.
  • Add these hardware operating costs to the estimated development lifecycle cost savings that occur from faster turns on break-fix and enhancement development activities.
  • Finally, package the results of these pilot projects along with an extrapolation across future projects, and communicate them to business and IT leadership.

Industry analysts agree that best practices' leaders draw from portfolios containing both physical and virtual data integration tools to meet the ever-changing information needs of today's enterprises. Multiple use cases across a broad spectrum of industries and government agencies illustrate the mission-critical benefits derived from data virtualization. These benefits include reduced time-to-solution, lower overall costs for both implementation and on-going maintenance, and greater agility to adapt to change. By becoming familiar with common mistakes to avoid, enterprises arm themselves with the wisdom necessary to successfully implement data virtualization in their data integration infrastructures, and thereby begin to reap the benefits.


  • Composite Software, in conjunction with data virtualization users and industry analysts, developed a simple decision-making tool for determining when to use a virtual, physical or hybrid approach to data integration. Free copies are available online.

More Stories By Robert Eve

Robert "Bob" Eve is vice president of marketing at Composite Software. Prior to joining Composite, he held executive-level marketing and business development roles at several other enterprise software companies. At Informatica and Mercury Interactive, he helped penetrate new segments in his role as the vice president of Market Development. Bob ran Marketing and Alliances at Kintana (acquired by Mercury Interactive in 2003) where he defined the IT Governance category. As vice president of Alliances at PeopleSoft, Bob was responsible for more than 300 partners and 100 staff members. Bob has an MS in management from MIT and a BS in business administration with honors from University of California, Berkeley. He is a frequent contributor to publications including SYS-CON's SOA World Magazine and Virtualization Journal.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.