Governance, Stewardship, and Quality of Temporal Data in a Data Warehousing Context

By James Kobielus, 

Organizations must hold people accountable for their actions, and that depends on having the right data, tools, and processes for keeping track of the precise sequence of events over time.

Timing is everything when you’re trying to pinpoint the parties who are personally responsible in any business context. Consequently, time-series discovery is the core task of any good investigator, be they Sherlock Holmes or his real-world counterparts in the hunt for perps and other parties of interest.

Audit trails are essential for time-series discovery in legal proceedings, and they support equivalent functions in compliance, security, and other business application contexts. Audit trails must describe correctly and consistently the prior sequence of events, so that organizations can identify precisely who took what actions when under which circumstances.

To help identify the responsible parties on specific actions, decisions, and outcomes, the best audit trails should, at minimum, support longitudinal analysis, which rolls up records into a comprehensive view of the entire sequence of events. But the databases where the audit trails are stored should also support time-validity analysis, which rolls back time itself to show the exact state of all the valid data available to the each responsible party at the times they made their decisions. Without the former, you can’t fit each event into the larger narrative of what transpired. Without the latter, you can’t fit each event into the narrative of who should be punished or exonerated.

All of that requires strong data quality, which relies, in turn, on having access to databases and tools that facilitate the requisite governance and stewardship procedures. Data warehouses are where you should be keeping your system-of-record data to support time-series analyses. Consequently, temporal data management is an intrinsic feature of any mature data warehousing, governance, and stewardship practice. Indeed, the ability to traverse data over time is at the very heart of the concept of data warehousing, as defined long ago by Bill Inmon: “a subject-oriented, nonvolatile, integrated, time-variant collection of data in support of management’s decisions.”

Many organizations have deployed transactional databases such as IBM DB2 for data warehousing and temporal data management. If they use a high-performance implementation, such as DB2 with BLU Acceleration software running on IBM POWER8 processors, they can do in-memory time-series analyses of large audit trails with astonishing speed. If you want further depth on DB2’s native temporal data management features, I strongly recommend this technical article.

Temporal data management concepts may be unfamiliar to some data warehousing professionals. Here’s a good recent article providing a good primer on temporal database computing. As the author states, “A temporal database will show you the actual value back then, as it was known back then, and the actual value back then, as it is known now.”

These concepts are a bit tricky to explain clearly, but I’ll take a shot. The “actual value back then, as it is known now” is the “valid time,” and may be updated or corrected within a temporal database if the previously-believed valid time is found to have been in error. The “actual value back then, as it was known back then” is the “transaction time”; it remains unchanged and may diverge from the “valid time” as the latter is corrected.

Essentially, this arrangement enables the record of any historical data to be corrected at any time in the future. It also preserves the record, for each point in the past, of that moment’s own erroneous picture of the even deeper past. This gets to the heart of the “what they knew and when they knew it” heart of personal responsibility.

As I was reading this recent article that discusses time-series data in an Internet of Things (IoT) context, the association of temporality with personal responsibility came into new focus. What if, through IoT, we were able to save every last datum that each individual person produced, accessed, viewed, owned, or otherwise came into contact with at each point in time? And what if we could roll it back to infer what they “knew” and “when they knew it” on a second-by-second basis?

This is not a far-fetched scenario. As the IoT gains ubiquity in our lives, will make this a very realistic scenario (for the moment, let’s overlook the staggering big-data management and analytics challenges that this would entail). And as this temporal data gets correlated with geospatial, social, and other data sources–and mined through data lineage tools–it will make it possible to rollup high-resolution, 360-degree portraits of personal responsibility. We’ll have a full audit trail of exactly who knew (individually and collectively) what when, where, how, why, and with what consequences.

Whether you’re a prosecuting attorney building a case, a law-enforcement official searching trying to uncover terrorist plots in the nick of time, or an IT security administrator trying to finger the shadowy perpetrators of a hack attack, these IoT-infused discovery tools will prove addictive.

The effectiveness of governance in the modern world will depend on our ability to maintain the requisite audit trails in whatever data warehouse or other well-governed repository best suits your operational requirements.

About James, 

James Kobielus is IBM Senior Program Director, Product Marketing, Big Data Analytics solutions. He is an industry veteran, a popular speaker and social media participant, and a thought leader in big data, Hadoop, enterprise data warehousing, advanced analytics, business intelligence, data management, and next best action technologies. Follow James on Twitter : @jameskobielus

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s