During a recent trawl of my archives I happened across an article from DM Review, now Information Management, first published back in 2000. The topic: Top Ten Data Mining Business Questions. This was back in the days before the term predictive analytics was common parlance but I read through the list and wondered – have we really changed that much? I'm paraphrasing a bit, but the list was this.
1. What are the business benefits? Have you figured out what you can do with this? Can you quantify and measure the benefits? Have you really worked out what the actual business problem is you are trying to solve? This is an iterative set of questions and to my mind fits well with the CRISP DM methodology laid out back in 1996 but still regularly used in real life projects.
2. What technical know-how do I need? How technical do my users need to be? Can business users make sense of this without an in-depth background in statistics? Is the interface easy to use without programming skills? These questions were being asked before the data scientist job title had been invented and I find that they're still common concerns when I talk to clients today. I am convinced that intelligent, data-literate individuals in your business can learn to be excellent analysts with a little expert hand holding.
3. How clear will the results be? The results of your analysis should be presented to business users in plain English, accompanied with graphs using terms that business users can easily comprehend and use. You may substitute 'management' for 'business users' here too…
4. What about follow-up questions? Analysis generally spawns more questions and more analysis. Will users be able to quickly mine data to answer these? Can they do quick 'throw away modelling' easily to test new theories? Is it easy enough to use that analysts can use their creativity to try out their ideas? Quickly?
5. And what about the business users? The original article was making a point here about having numerous people doing analysis and the importance of ensuring that lots of different groups with different perspectives could mine the data. I’m not sure this has worked out as expected. These days I tend to see organisations with separated groups of insight analysts and data scientists who liaise with the business users but mine the data on their behalf. It will be interesting to see how this changes as the predictive analytics and big data fields further mature.
6. How accurate, complete and consistent are the analytic techniques? Can we use techniques which analyse all the data, not just samples? Does it work? Are the algorithms correct? Is the range of techniques wide enough to satisfy the wildest dreams of our analytical team?
7. Can we engage in incremental analysis? The point here was really about automation. Can we update models and analysis daily, weekly, monthly as new data comes to us? This is still an important consideration today, and that’s before we even start thinking about the real time analysis debate.
8. How effective is the data handling? How much data can the system deal with? Can we mine against the database directly? The debate probably still rages on in the face of big data, but these are very real considerations. How do organisations deal with massive volumes of data that they're collecting and make sure they are using it as effectively as possible? But what happened to the sampling methods of classical statistics? Have we moved to a whole new way of thinking about data analysis?
9. Can the analytics system be integrated with our existing systems? IT infrastructure environments have changed dramatically since the original article was published and yet this consideration is still as important as ever. You can’t sign up to a total systems redesign just to enable the analysis of some data. Data analysis tools need to be open and agnostic about data types, sizes, locations in order to give the flexibility needed.
10. What support will be available? How will I manage once the system is installed? Will I need specific maintenance staff? Another database administrator? This is much less of a concern these days than it was thirteen years ago. Analytics today is folded into day-to-day system management and shouldn’t really need a dedicated support infrastructure.
So has the data mining / predictive analytics landscape changed that much in 13 years? Well, yes and no.
Yes in that advanced and predictive analytics is much more mainstream today . It’s the heart and soul of decision-making in organisations across the globe, enabling facts and unseen patterns to inform everything from individual customer decisions up to large scale corporate direction decisions.
No in that the concerns people have about starting on their analytics journey are still much the same as those expressed 13 years ago. It’s right to worry early about defining the business goals and measurable benefits. It’s right to make sure you have people who can use software to its best effect. And, most importantly of all, it’s right to make sure the results can be translated and explained simply to managers who can use them.