Big data. Many words and emotions flow from such two little words. Technologists and IT mavens have their take. That group in marketing hears the siren call of one-stop for their social media analysis. C-level executives in all industries are aware and seeking the value for their organizations.
As the volume, velocity, and variety of data continues to grow exponentially, proving the value of big data environments and initiatives is a common topic of discussion. One approach is to point to articles of successful implementations by others facing similar objectives. Another approach is to believe all that is said in webinars provided solely by product vendors. This article suggests steps to building a proof of concept that can lead to a decision for implementing a full-scale big data environment.
The Project Champion
Your organization's culture will determine whether you start your proof of concept as a skunkworks or as a formally organized project. Either way, the project will need a visionary leader with close ties to executives in one or more business units. This leader is not the person who can recite the command line installation procedures for Hadoop from memory. This person is one charged with solving a pain point in the company. He or she is your champion. That person knows that the problem can be solved within 30 or 60 days, and knows that 90 days is too long. The champion likely has more business acumen than development chops, and isn't likely to be duped into needlessly long discussions in technical jargon. The champion is also interested in data and knows enough about development to stay away from the bits.
Address a Real Problem
The proof of concept isn't meant to solve the problems that have the attention of a dozen managers. The champion must concisely state a problem. People on the project must be able to clearly grasp the purpose and desired outcome. The team of developers and analysts implementing the proof of concept is probably fewer than ten people, and a team of three might be sufficient. One person and one champion? There is not enough value placed on the proof of concept, perhaps.
Use Real Data
A proof of concept whose results need to be explained with '...and if we were to use actual data…' has an air of wasted cycles to executives. The champion has a real problem, so use real data. A few terabytes of corporate data and a few related public datasets might be all that is needed to surface the measures of a pain point.
To The Cloud!
There are a number of well-positioned companies offering cloud-based big data platforms. Yes, one of the attractions of big data is the commodity servers that can be used to scale out an environment, rapidly and relatively inexpensively. If you build internally, there may be roadblocks in introducing hardware to a network, especially if the hardware is currently foreign to the standards. Proof of concepts should avoid roadblocks, not roll through them. The champion can probably put the cloud fees on a corporate card.
Model The Solution
Whether you implement a Hadoop-based or NoSQL solution will be a function of the Real Problem that you are addressing. In either case, a datastore will need to be modeled. Data acquisition and loading procedures will be implemented and tested. Multiple iterations will take place. Restartability of processes will be developed. Analysis methodologies will be chosen and tried and changed. The champion's problem will be front-and-center to the team's efforts. It is at this phase of the proof of concept that restraint must be exercised, in lieu of building out a much larger project that could, when finalized, solve many more problems. Or, create many more problems in a proof of concept.
Think alerts and timelines rather than page after page of charts. Key findings will be tied to measures defined by the champion and sought by the team from the beginning. Show those in a couple of beautiful visualizations using evaluation versions of popular tools.
Measures and Metrics and the Story
What do people remember? The management team remembers a small number of measures, if repeated several times during the project. The champion remembers the progress and the emerging solution as defined by the metrics. The executives remember the story of how the pain point was initiated, found, and solved.
Document The Steps
When the proof of concept is shown to be successful in solving a pain point, and is greenlighted for larger implementation, documentation on the server setup and data integration tasks will speed the next phase.
Are big data projects ever started with just a couple of months of server time from Amazon? Yes, and a key to success is a proof of concept project focused on a real need, defined by a person who can usher the project along to completion.
Dave Leininger has been a Data Consultant for 30 years. In that time, he has discussed data issues with managers and executives in hundreds of corporations and consulting companies in 20 countries. Mr. Leininger has shared his insights on data warehouse, data conversion, and knowledge management projects with multi-national banks, government agencies, educational institutions and large manufacturing companies. Reach him at Fusion Alliance at dleininger@FusionAlliance.com.