This chapter describes the key elements of a meta data repository architecture and explains how to tie data warehouse architecture into the architecture of the meta data repository. After reviewing these essential elements, I examine the three basic architectural approaches for building a meta data repository and discuss the advantages and disadvantages of each. Last, I
discuss advanced meta data architecture techniques such as closed-loop and bidirectional meta data, which are gaining popularity as our industry evolves.
Anyone who has worked on a decision support project understands that the biggest challenge in building a data warehouse is integrating all of the disparate sources of data and transforming the data into meaningful information. The same is true for a meta data repository. A meta data repository typically needs to be able to integrate a variety of types and sources of meta data and turn the resulting stew into meaningful, accessible business and technical meta data. For example, a company may have a meta data requirement to show its business users the business definition of a field that appears on a data warehouse report. The company probably used a data modeling tool to construct the physical data models to store the data presented in the report’s field. Let’s say the business definition for the field originates from an outside source (i.e., it is external meta data) that arrives in a spreadsheet report. The meta data integration process must create a link from the meta data on the table’s field in the report to the business definition for that field in the spreadsheet. When we look at the process in this way, it’s easy to see why integration is no easy feat. (Just consider creating the necessary links to all of the various types and sources of data and the myriad delivery forms that they involve.) In fact, integrating the data is probably the most complex task in the meta data repository implementation effort.
Scalable
If integration is the most difficult of the meta data architecture characteristics to achieve, scalability is the most important characteristic. A meta data repository that is not built to grow, and grow substantially over time, will soon become obsolete. Three factors are driving the current proliferation of meta data repositories:
Continuing growth of decision support systems. As we discussed in Chapter 1, businesses are constantly demanding greater and greater functionality from their decision support systems. It is not unusual for both the size of a data warehouse database and the number of users accessing it to double in the first year of operation. As these decision support initiatives continue to grow, the meta data repository must be able to expand to address the increasing functional requirements.
Recognition of the value of enterprise-wide meta data. During the past three or four years, companies have begun to recognize the value that a meta data repository can bring to their decision support initiatives. Companies are now beginning to expand their repository efforts to include all of their information systems, not just decision support. I am aware of two Fortune 100 firms that are looking to initiate an enterprise-wide meta data solution. As soon as one of these major companies builds a repository to support all of its information systems, many others are likely to follow suit. Chapter 11, The Future of Meta Data, addresses the value of applying enterprise-wide meta data to corporate information systems.
Increasing reliance on knowledge management. Knowledge management is a discipline that promotes the application of technology to identifying, capturing, and sharing all of a company’s information assets (e.g., documents, policies, procedures, databases, and the inherent knowledge of the company’s workforce). The concept of knowledge management is a good one: Capture the information assets and make them available throughout the enterprise. However, knowledge management is generating mixed reviews in the real world. Companies are just now beginning to understand that a meta data repository is the technical backbone that is necessary to implement a knowledge management effort. Software vendors and corporations alike are now expanding their meta data solutions to provide a real-world approach to knowledge management. (Once again, Chapter 11, The Future of Meta Data, offers a detailed discussion of this topic.)
META DATA: IT’S NOT JUST FOR DECISION SUPPORT
A number of years ago I was speaking at a conference in Chicago about the value that meta data can bring to a decision support system. After the talk, a member of the audience approached me and asked why I limited my meta data discussion to only those topics under decision support, since meta data can support all of a company’s IT systems. I agreed that meta data can significantly aid a corporation’s IT systems, but explained that I did not address it during the talk because it was difficult enough to convince people that meta data can help a decision support system, let alone provide value to every information system in the company.
My stance on this topic and my presentations have changed dramatically in the past few years. Now that people understand the value, they’re looking for the specifics of how to use enterprise-wide data most effectively and leverage it to their information systems.
Robust
As with any system, a meta data repository must have sufficient functionality and performance to meet the needs of the organization that it serves. The repository’s architecture must be able to support both business and technical user reports and views of the meta data, as well as providing acceptable user access to these views. Some of the other functionality required from the meta data architecture includes:
- Ability to handle time- or activity-generated events
- Import/export capability
- Support for data lineage
- Security setup and authorization facilities
- Archival and backup facilities
- Ability to produce business and technical reports
Customizable
If the meta data processes are home-grown (i.e., built without the use of meta data integration or access tools), then customization is not a problem since the entire application is tailored for the specific business environment. If, however, a company uses meta data tools to implement the repository architecture (as most do), the tools need to be customized to meet the specific current and future needs of the meta data initiative.
Customization is a major issue for companies that purchase prepackaged meta data solutions from software vendors. These solutions are generally so rigid in their architecture that they cannot fill the specific needs of any company. In the case of a meta data solution, one size definitely does not fit all! To be truly effective, these prepackaged solutions require a significant amount of customization to tailor them for each business environment.
Open
The technology used for the meta data integration and access processes must be open and flexible. For example, the database used to store the meta data is generally relational, but the meta data architecture should be sufficiently flexible to allow a company to switch from one relational database to another without massive architectural changes.
Also, an open meta data repository enables a company to share meta data externally, and most important, make it accessible to all users. If, for example, a company decides to Web-enable all of its meta data reports, the processes for providing access to these reports should be able to use any standard Web browser.
Key Elements of Meta Data Architecture
In addition to the general characteristics of good architecture, all good data repositories share a set of key elements that are essential for success, regardless of the architectural approach used to build the repository. In short, all good repositories:
- Are based on clear, well-defined management direction
- Use the same front end as the data warehouse
- Use the same entity and attribute naming standards throughout
- Incorporate multiple sources of meta data
- Include automated and reusable processes
- Use a standardized integration process
- Use a flexible meta model
- Manage multiple versions of meta data
- Incorporate update facilities
- Use a component-based multitier architecture
- Incorporate a security management scheme
- Incorporate cross-tool meta data dependency and lineage
Clear Management Direction
A set of clear, well-defined repository requirements are critical to the success of the meta data project. While this may not seem like an architectural issue, it is. I have seen more than one repository effort in which management changes in direction caused severe changes in the repository architecture.
Probably the most extreme case of misdirection that I dealt with involved a company that, for many years, depended on UNIX-based hardware and a Sybase database. When we began to evaluate meta data tools, therefore, we focused on tools that would be compatible with UNIX and Sybase. After we had selected the tools and finished designing the repository architecture, the
company hired a new CIO, who quickly decided to replace Sybase and the UNIX boxes with IBM DB2 running on a mainframe. This edict absolutely devastated our repository project and threw the IT department into general disarray since the staff was configured to support Sybase and UNIX. The tools we had selected would have worked well with a UNIX box, but were likely to be far less satisfactory on a mainframe. This change in management direction made the tools that we had selected far less than optimal for the company’s environment, but the new CIO was reluctant to allow us to go through the tool selection process a second time. As a result, we had to implement using tools that were not well suited for the environment. See Chapter 6, Building the
Meta Data Project Plan, for details on how to clearly define the project scope.
The Same Front End
Whenever possible, the meta data repository should use the same front end as the data warehouse. Business users do not like to learn new tools, so it’s always best to limit the number of tools that they need to use.
There is a caveat to this, however. If the decision support system’s front end cannot meet the needs of the meta data repository, it is far better to select or build a new one than to try to make do just to eliminate the need for users to learn a new tool. Using an inappropriate front end can severely limit the functionality of the data repository and is sure to cause more user dissatisfaction than learning a new tool.
Entity and Attribute Naming Standards
The vast majority of most companies’ data is stored in relational databases of some sort. The physical names used to represent the entities (i.e., tables) and attributes (i.e., fields) in these databases should be standardized. For example, policy number is a common attribute in an insurance company database. Policy number may be physically named Policy_Num, Policy_Nbr, or Policy_No. If an insurance company is not consistent is its naming standards, that is, if it uses more than one of these names to refer to the attribute policy number, problems arise when we use a meta data integration tool to prepare the company’s data for a repository. Meta data integration tools compare entity and attribute names across transformation programs to see if they represent the same data element. Most tools would interpret Policy_Num, Policy_Nbr, and Policy_No as three different data elements, thereby causing the meta data in the repository to look “cluttered” and difficult to use.
Ideally, businesses should standardize their database naming conventions throughout the enterprise, but, after many years of consulting, I’ve yet to find a Global 2000 company that has done this across all systems. At a minimum, though, companies should standardize their database and file naming standards across their data warehousing projects—and many manage to do this.