The Art of Data Science
Several articles have been published emphasizing why data should be treated as an organizational asset. And there are many that focus on deriving value by analysing the structured data and Big Data to gain deep business insights. Now, companies are taking a step further, to turn their data assets into products/services which in themselves would carry commercial value. Early examples of such data products include “people you may know” introduced by LinkedIn, suggesting products you may like based on your current search, by Amazon, and now, suggesting similar “looking” products and so on. In addition to the online businesses, a variety of traditional industries are now developing data products to further their business interests.
Data science enables conversion of the data assets into data products. Not surprisingly, the demand is very high for this hot new profession. International Data Corporation (IDC) predicts a need for Data Science professional is growing by the day and for instance for the year 2018 alone there was requirement of 181,000 such professionals in the US. McKinsey Global Institute estimated that the shortage of data scientists in last year alone at 190,000.
DATA SCIENCE – UP CLOSE
Data Science helps uncover hidden patterns from large volume of structured and unstructured data which can then be deployed commercially. What differentiates it from standard Business Intelligence is that one is not sure what one is looking for and instead, attempts to uncover hidden patterns of commercial value.
Take for instance, an attempt to classify video clips as, say, political, sports, humour, self-improvement, etc. without manually opening it. To achieve this, the data scientist will need to study numerous subjective characteristics of the video clip, such as voice modulation, sentiment analysis, colour, speech to text, NLP, or another parameter yet unknown, to detect repeatable patterns that will enable classification of the video clips accurately.
As a discipline of study, Data Science combines the technology of data analysis, visualization, statistics, mathematics, and the knowledge of business prerogatives. Statistics plays a central role in fitting patterns to data sets. With descriptive statistics one can qualify, categorize and describe what is shown by the available data, while inferential statistics helps in deducing possibilities beyond the available data. While statistical techniques provide quantitative insight, sound business knowledge helps translate it into business outcome.
Given that we are now generating more data than ever, the need for identifying patterns from such huge volumes of data have never been more relevant. And with technological advances, it is becoming more feasible. This can only mean that there is little excuse for organizations to overlook the value that can be gained through data science.
TURNING DATA INTO PRODUCT
One can find several definitions of a data product depending on the aspect of data analysis that one stresses. Simply put, a data product requires analysing large volumes of structured and unstructured data to identify patterns that can be gainfully used.
Creating a data product involves:
- Defining a problem.
- Postulating the desired outcome.
- Determining the data required for analysis and ensuring its cleanliness, completeness and authenticity.
- Using statistics, visualization techniques, domain knowledge to analyse the data from several perspectives to uncover patterns, or trends.
- Upon observation of a pattern, design experiments to confirm the accuracy and repeatability in different scenarios.
- Represent the successful pattern as an algorithm/build models which a machine can learn and use for analysis.
To succeed with a data product, it is essential to have quality data. This ensures that the patterns being fitted to the data are not obscured by errant or outlying data. Thus, the clean and relevant data helps shorten the pattern identification cycle and increases the success of the data product.
Typically, data products are bundled with other offerings that generate revenue. However, it is essential to estimate the additional value the data product will bring to such products and whether the effort spent on creating the data product is justifiable.
APPLICATIONS OF DATA SCIENCE
Data science finds critical usage in many key sectors. In the financial domain, data science is being used to unearth frauds or test risk models to evaluate credit risks. In retail, targeted offers to prospective buyers are increasing conversion rates. In fact, data science finds applications in every industry that generates data.
Some analytical applications of data science include:
- Increasing viewership by understanding patterns of media content consumption.
- Dynamic pricing of products and services by understanding shopping patterns.
- Credit Risk, Treasury Risk, Fraud Management.
- Insurance claims prediction.
- Enhanced supply chain capabilities by understanding demand and supply patterns.
- Patient diagnostics and treatment from real time patient data.
- Increase support by understanding Voter preferences during polls.
WHAT MAKES A DATA SCIENTIST?
- An analytical mindset.
- Broad statistical expertise – Understanding data mining, inference, prediction.
- Understanding of data structures, data management, visualizations, and algorithms.
- Understanding of business domain.
- R or Python which is used for building data models and doing analysis. Hive (like SQL) is used for querying and analysing big data in Hadoop. Shiny is a web development framework and application server for the R language to deploy the data products as interactive web apps. Python along with Java, Perl, C/C++ is also increasingly being used for the purpose. Others include Pig, Spark, and Matlab.
More industries are waking up to the benefits of data science. Traditional businesses are also exploiting data science to build data products that will propel their respective businesses. The benefits of data science are just unfolding, are you in the fray?