5 January, 2017
Most Big Data Projects Fail: 4 Ways to Make Yours Succeed!
It’s well known that most big data projects fail. There are many reasons for this, and it turns out that very few are because of the particulars of unproven technology. In fact, often new projects around well-developed technologies fail spectacularly (ex – Obamacare website). Though most post-mortem surveys are quick to blame the scope, changing business requirements, even lack of project management, in our experience, projects typically fail because teams:
– don’t pay attention to changes required in operating process,
– don’t recognize the lack of operating skills in support staff,
– don’t consider and have poor operational integration, and
– don’t plan for sufficient operational oversight.
In short, it’s really missing operational planning that most often leads to big data project failure. In order to understand how to save your project, you need to understand how each of these operational hurdles impacts your organization and how to overcome them.
In my experience leading projects for hundreds of global enterprises, the single most common reason that big data projects fail is because no one is accountable for identifying and implementing the necessary changes to process. Consider as an example what seems to be a fairly basic process of preparing a forecast on a traditional data warehouse like Teradata, an analytic database like Vertica, and a big data system like Hadoop. From an underlying technology perspective, the data warehouse and the analytic database are most similar since both follow traditional relational database structures. Yet from a process perspective, moving a forecasting routine from a data warehouse to a big data system is more like lye to succeed.
The reason for this is that the process for data integration, managing system resources and running analytics is very similar for centralized data management systems like a warehouse or a big data system. IT teams have processes and policies for managing data lakes, allocating resources based on user requests, for chargeback of shared resources, and for monitoring signs of possible system issues. In contrast, processes for provisioning, sizing and monitoring an analytic database are drastically different. The isolation of data, the control over resources, and the chargeback mechanisms don’t map to existing warehouse infrastructure. Additionally the system performance and failure modes are completely different. While it seems like a minor technical distinction between a data warehouse and an analytic database, from an IT operating process, it means the difference between a successful project and likely failure.