What is a data scientist?
The predictive analytics field seems to love nothing more than giving a new name to an established concept. In my last post I argued that the concept of ‘big data’ itself is nothing new. Over the last few years I’ve seen more and more job ads recruiting ‘data scientists’ and I find myself rather unjustifiably irritated by the rise of this new job title. I can’t quite put my finger on what it is that irritates me about it but I think it’s tied up with the creation of a new job title for something which isn’t really all that new, and a related annoyance at trying to make it sound cool, almost as if adding ‘science’ to the end of something it validates it. A bit like turning a cookery lesson into domestic science. I didn’t hold with that use of words either.
The rise of the data scientist has gone hand in hand with the rise of big data, by which I mean the rise of the use of the terminology rather than the use of the technology. They are not really functions which are exclusively bound together but they are certainly on the same bandwagon. The data scientist role complements big data projects because of the range, width and scale of data involved.
In reality what does a data scientist do?
I’ve heard it said that a data scientist “represents an evolution from the business or data analyst role”. During my 20 or so years in the analytics business I’ve hired and worked with statisticians, research analysts, researchers, customer insight experts, business intelligence people, data miners and more. That list of job titles probably represents an accurate progression of the general shift in job titles over the past 20 years as well. And all of them are still out there. But how do they differ?
The formal skills required are comparable across all these roles. The training is similar. Each has a solid foundation in modeling, statistics, analytics and mathematical approaches. All need to be data literate in various forms. All need to consider the approaches of hypothetis-led analysis and data-led analysis. All need to consider the differences between data collected exclusively for analysis against data collected without analysis in mind. What you can ask of your data will vary dramatically depending on these factors. Do I need to be a data scientist to do this – or can a statistician do it?
Some people will tell you that what sets the data scientist apart is “strong business acumen” and the “ability to communicate findings to both business and IT leaders”. Again I’d argue that good analysts have always had to be able to do this. Failure to understand how the analytics can help the business means the work is essentially just a back office research project. If we look at the CRISP DM methodology developed in the early 1990’s as a methodology for analytics the very first step is business understanding – and the process flows backwards and forward through this step. It’s always been essential for analysts to understand the business problem and to be able to communicate the results in terms of something practical that can be used within the business. If, as an analyst, you weren’t doing this then you weren’t very good at your job.
What is happening today is that more responsibility for identifying business problems and selecting the appropriate problems to tackle is being pushed into the hands of analysts. Here, with the right skills and the right tools, an open-minded analytics team can direct the company towards its objectives. Whether this is sensible depends entirely on the company and its people.
A good analyst is inquisitive: exploring, asking questions, digging deeper, doing “what if” analysis, questioning existing assumptions, looking for new sources of information. A really top-tier analyst should then work in partnership with the business leaders to communicate informed conclusions and recommendations across the organisation. These skills remain the same regardless of the job title. Whether statistician, analyst or data scientist, the ability to translate findings into actionable insights is critical.
Perhaps a data scientist is just a plain old analyst who, in a market which is heating up, needs to differentiate themselves from the old world view of a basement research statisticians. Predictive analytics is getting sexier – maybe analysts want sexy job titles to match.