Keeping Up with the Quants
This book is a gentle introduction to using analytics in business. The authors describe analytical thinking and they provide a framework for doing so in any organization. The authors assume that the reader will not be doing the actual number crunching, but orchestrating the process of using analytics to make better business decisions. The target audience is executives and managers who want to be better informed consumers of data and analytics.
Main Idea
Key Topics
- Analytics vocabulary
- A 3 stage, 6 step framework for analytical thinking
- How creativity plays a role
- Developing quantitative analysis capabilities
- Effectively working with quants
Why Everyone Needs Analytical Skills
Quant vocabulary
Analytics – the use of data, statistics and models as a decision making tool and for adding value to customer transactions.
- Descriptive – Also known as reporting. A numeric description of events; shows what happened and when.
- Predictive – Uses past data to predict the likelihood of future events based on relationships between variables.
- Prescriptive – An attempt to explain why something happened or the probability of some event occurring in the future. Models the result of manipulating one independent variable and controlling everything else.
- Optimization – Finding the ideal level of a variable, for example, in product pricing.
Qualitative analysis – Includes exploratory research, incorporates unstructured data, does not apply any statistical analysis.
Quantitative analysis – Using statistical, mathematical and computational techniques on structured data.
- statistics
- forecasting
- data mining
- text mining
- optimization
- experimental design – test and control groups
Structured data – Data that fits neatly into rows and columns, generally quantitative, and the metadata can be defined.
Unstructured data – Digital information in the form or text, images and video. The growth of social networks and user driven web content has led to an explosion of unstructured data that can be captured and repurposed.
Big data – Refers to a massive volume data, often unstructured. It requires preparation before it can be analysed, using new technologies like Hadoop. For example, the Facebook content added by users is considered to be big data by any definition: Facebook has 30 billion data items (status updates, photos, etc.) added by 600 million users.
Quants – The professionals who can make sense of data.
The first analytical thinking example in the book is taken from Cigna, a health service company. The vice-president of clinical operations has lots of data on hand, with reports and metrics about the frequency of hospital readmissions. However, she has no idea of what is influencing these numbers one way or another. She wanted to know if the consultations by phone with Cigna customers were making a difference in reducing hospital readmissions. To investigate, she worked with the analytics group to use a “matched case control” methodology, which matched pairs of patients as close as possible, where one had received coaching calls and the other did not. She was able to learn what type of call center interventions were effective and for what diseases. From there, she was able to prioritize spending and encourage staff to spend more time on the activities that have the most impact on their customer’s health.
Where can analytics be effective?
- Marketing – pricing, store locations, promotion targeting, web site customization, digital advertising
- Supply Chain Management – inventory management, location of distribution centers, truck loading, delivery routing
- Finance – financial performance drivers, forecasts
- Human Resources – hiring practices, predicting attrition, compensation, what training for which employees
- Research and Development – product features most desired by customers, product or service effectiveness, product design appeal
- personal preferences
- unimportant decisions
- speed is required
- one time decisions
- decisions that are made repeatedly
- the necessary time is available to gather data and do the analysis
- the decision is important enough to justify the investment
Three Analytical Thinking Stages
Stage 1: Framing the Problem
Step 1: Problem Recognition
Steps for stakeholder management:
- Identifying all stakeholders
- Documenting stakeholder needs
- Assessing stakeholder’s interest and influence
- Managing expectations
- Taking actions
- Reviewing status and repeating
This is also the time to think about the story that will be told with the data. There are six types of analytical stories:
- The CSI Story – Solving operational problems that crop up by using data to confirm the problem and then to find the solution. For example, analyzing why ecommerce transactions fail.
- The Eureka Story – This is a more purposeful look at a bigger problem: a deeper analysis over a longer period of time to examine potential changes to strategy or aspects of a business model. For example, Expedia’s decision to waive fees for travel changes or cancellations was based on an analysis of customer behaviour and market forces over an extended period of time.
- The Mad Scientist Story – This involves rigorous testing and measuring the results against a control group; also known as A/B or split testing in the world of website analytics. For example, for a restaurant chain, answering the question of what level and type of remodeling improves walk-in traffic?
- The Survey Story – This is classic quantitative research: asking customers specific questions and analyzing the answers. The results can be problematic, as other factors might have an influence, most notably how the questions are asked. It is also difficult to ascertain that the survey sample is truly representative of the demographic to be studied.
- The Prediction Story – Answering the question, what factors drove past events? Also known as predictive analytics or predictive modeling – for example, the “next best offer”, an often automated presentation to the customer of the offer that they are most likely to accept, based on demographics and past transactions.
- Here’s what happened Story – presenting facts, often over time. This type of story is well suited to visual representations. The challenge lies in presenting a wealth of data in a report that is both understandable and attention getting.
The story should start with a broad scope, but narrowed down to a specific and testable set of hypotheses before moving on to the next step.
Step 2: Review of previous findings
This step is about asking, “has this story been told before?” Business problems are rarely unique, so one is likely to find an approach that worked for some other organization facing a similar challenge. This research will encourage refining the problem, modifying the scope, or perhaps seeking out different stakeholders. Framing the problem is an iterative process.
By way of example, the authors walk us through attacking a marketing mix problem using the six step framework. The executives have an intuitive feeling that spending was too high on marketing campaigns given the disappointing results. Using a combination of consultants and internal staff, they were able to gather and consolidate data, develop and refine some models to come to the conclusion that spending was not too high, but the mix needed adjustment in order to improve results. Their data analysis pointed them to spending more on television advertising.
Stage 2: Solving the Problem
This book advocates hypothesis driven research and analysis. Another approach to analytics is data mining and machine learning, which is well suited to to big data environments where there is an exceptionally large volume of data to examine. Software driven solutions can be a complement to manual data analysis techniques. They are also suited for automated decision systems, where a high number of decisions need to be made rapidly. A good example of this is in digital ad placement.
Step 3: Modeling
Based on experience and previous findings, a decision needs to be made as to what variables should be part of the model. Variables can be subjective but still very useful. Subjective in this sense means quantifying something not easily measurable.
Step 4: Data Collection
- binary – 1 or 0; yes or no
- categorical or nominal – categories, like a place (city) or flavour (blueberry)
- ordinal – numbers, where a greater number means more of something. For example, in a survey response, a higher number means agreeing more to a statement
- numerical (interval and ratio) – numbers with standard units, like weight in kilograms, distance in miles
Even subjective concepts can be quantified. The example given is the set of variables defined by early erectile dysfunction (ED) researchers. There is no device to measure “erection confidence”, but the test subject can assign a value to it. This data is very useful even if there is no way to prove scientifically that the values are accurate.
Adding different data to the mix can be valuable. A team of students working on the Netflix film recommendation problem added Internet Movie Database (IMDB) data to the Netflix data to get better results than the team using only the Netflix data. More and better data will beat a better algorithm almost every time. We are seeing more examples of this in professional sports and the insurance industry.
Step 5: Data Analysis
There are many software packages available for different types of analysis. The larger enterprise systems have overlapping features, but each one has its strengths. This is a small sampling:
Reporting – telling the “here’s what happend” data story
- IBM Cognos
- Microsoft Excel, SQL Server (Reporting Services), SharePoint
- MicroStrategy
- SAP Business Objects
- QlikTech
- Tableau
- TIBCO Spotfire
- IBM SPSS
- R
- SAS
Types of Models
Example models:
- correlation analysis – establishing the relationship between two numeric variables
- chi-square, goodness of fit – identifying significant relationships between nominal categories
- regression analysis – also known as multiple or linear regression, this analysis fits data to an equation, then extends it beyond the sample data
Key Statistical Concepts
- ANOVA – analysis of variance
- Causality – the relationship between and event and a second event
- Cluster analysis – grouping observations together; common exploratory technique in data mining
- Correlation – quantifies how two or more variables are related; not related to causation
- Dependent variable – (explained variable or response variable) – an attempt to predict or explain this variable
- Factor analysis – identifies the underlying relationships between a large number of variables, leading to the creation of “factors”, which are amalgamated composite variables
- Chi square or goodness-of-fit test – a measure of how well a data sample fits a distribution
- Hypothesis testing – answering the question, does the theoretical result match the actual result? Is the theory consistent with the observed behaviour?
- Independent variable – (explanatory variable, predictor variable, regressor) – a known variable used to predict or explain another, dependent, variable, which is of unknown value
- p-value – probability that the null hypothesis is true, i.e., that the hypothesis is not valid
- regression – estimating a dependent value based on independent values
- R-squared (R2) – how well a regression line fits the data, a higher value meaning a closer fit
- Significance level (alpha, α) – the criteria for rejecting the null hypothesis
- t-test, student’s t-test – a method for determining how two different data sets resemble each other
- Type I error or α (alpha) error – a false positive, or incorrectly rejecting the null hypothesis
Stage 3: Communicating and Acting on Results
Step 6: Results presentation and action
George Roumeliotis of the Intuit data science group suggests a six step outline for a data story:
- My understanding of the business problem
- How I will measure the business impact
- What data is available
- The initial solution hypothesis
- The solution
- The business impact of the solution
Direct communication and collaboration with the client is encouraged. Technical jargon and reference to specific statistical techniques should be avoided: “Nobody cares about your R squared.”
Visual analytics are an attractive way to tell a data story. There are many types of graphics or charts, each suited to showing different a perspective of the data. Here are some common examples:
- Relationships among data points – scatterpoint, matrix plot, heat map, network diagram
- Comparison of values or frequencies – bar chart, histogram, bubble chart
- Illustrating the change in one variable in relation to another (usually a change over time) – line graph, stack graph
- Visualizing parts of a whole – pie chart, tree map
- Data across geography – overlaying colours, bubbles or spikes on a map
- Text frequencies – tag cloud, phrase net
The challenge of the presentation, no matter how it is delivered, is to bring attention to a theory that data supports. Interactivity, video, and gamificaiton can be used to get ideas across in a creative manner.
Beyond the Report
Savvy companies don’t just keep analytics in reports and presentations. It is possible to embed analytics into customer facing applications. For example, the professional social network LinkedIn has several analytics driven features on their website, like the “People You May Know” feature, which dynamically recommends connections based on the information in your profile and the existing members of your network.
Quantitative Analysis and Creativity
Success with analytics requires creativity and imagination. This is how right brain thinking fits into each of the six steps:
- Framing the problem – intuition about a hypothesis on how a complex result can be predicted by simple and measurable factors
- Review of previous findings – finding relevant analytical techniques that are not from the same industry
- Modeling – using different or unconventional choices of variables to get better results
- Data collection – deciding on what to collect and how, especially for measurements that are not easily quantifiable
- Data analysis – The one step where creativity can lead to trouble.
- Results presentation – Crafting a compelling story that will stick
Software and left-brain analysis can find patterns in data, but creative thinking must be applied to make sense of the patterns and find which ones are useful.
Developing Quantitative Analysis Capabilities
- The laws of probability and randomness, and their misunderstanding, which is described as “the greatest intellectual shortcoming of most adults.”
- Disproportionate or even great significance attributed to random events, as written about by Nassim Nicholas Taleb in his book, Fooled by Randomness.
- Random walk hypothesis – changes in stock prices have no discernable pattern or trend.
- Regression to the mean – Some variation in data is normal and it can lead to false conclusions. For example, you can win at the casino, but if you keep going back, you will always lose money.
Developing quantitative habits means demanding numbers when ideas are presented and using data to investigate theories and ideas. By the same token, numbers can never be trusted as they are presented. Poor sampling and stale data can render an analysis irrelevant. Be suspicious in three ways: relevance, accuracy, and interpretation.
A common fallacy is causation arguments: always remember that “correlation is not causation”. To test for causation, ask if random assignment was part of the experiment. This is often unethical or not practical in health related studies, but correlation is often presented as causation in the media. (Heavy drinking causes cancer in ten year study – were test subjects randomly asked to drink heavily or abstain from drinking for ten years?) Correlations are still very useful, but a correlation should never bet interpreted as causation.
Questions should alway be asked about any analysis or numbers presented as the truth. For example, if an average is presented, questions need to be asked about standard deviation and outliers. Some questions to ask about any quantitative analysis:
- Is there any data to support the hypothesis?
- Can you describe the source data used in the analysis?
- Is the sample data representative of the population?
- Did any outliers affect the results?
- What are the assumptions built into the model?
- Were other analytical approaches considered?
Working with Quants
- Executives and business decision makers
- Business professionals and staff
- Quantitative analysts or data scientists
Karl Kempf, an Intel Fellow, who is also known as the Chief Mathematician or Uber Quant heads the decision engineering group at Intel. He offers some insights in being an effective business quant:
- Encourage an understanding of the business and how intuition guides business decisions.
- Build mutual respect between executives and quants.
- Engage the skeptics and the naysayers, as it forces the data analysts to make a convincing case for their conclusions.
- Work with the business experts to determine inputs (sources of data), outputs (how the results will be consumed), key variables and the relationships between the variables.
- Analytics is an iterative process of building the model and incorporating feedback from different parts of the business.
- manufacturing
- inventory and logistics planning up and down the supply chain
- product development
- Time and attention
- An understanding of the necessary investment in time and dollars
- To know enough about math and statistics to have a general idea how proposed models work
- Push back if something is not clearly explained
- Promote to staff the importance of using analytics
At a minimum, business executives should understand the following concepts:
- Measures of central tendency (mean, median, mode)
- Probability and distributions
- Sampling
- Basics of correlation and regression analysis
- Rudiments of experimental design
- Interpretation of visual analytics
Assumptions
Understanding the assumptions baked into a model is to know under what circumstances a model becomes less useful. All models are based on assumptions, and at some point the assumptions become far fetched. For example, a deteriorating economy will erode consumer’s willingness to pay higher prices, so if price increases are part of the model, then the model is only useful during times of economic expansion. Assumptions are based on data, which implies that the future will be somewhat like the past. Changes in context or the environment will invalidate assumptions. Always ask penetrating questions about any model, and use experiments to test hypotheses and models. Insist that any technical jargon be translated into a plain language that everyone can understand.
What should be expected of analytics professionals:
- Interested in solving business problems, not just applying quantitative methods and doing the analysis
- Able to talk in business and financial terms
- Can build effective working relationships with staff
- Can provide time and cost estimates
- Have the patience to explain technical concepts
- Builds rapid prototypes and uses an iterative approach of rapidly incorporating feedback
- Provide sound bites and graphics so that the results can be easily communicated and understood
Closing thoughts
- A statistics or math background is not required to engage with “quants” and leverage the power of analytics.
- “Solving the problem” and in-depth analysis is only one step in the analytical thinking process: framing the problem beforehand and communicating the results afterwards are equally important stages.
- Creativity and imagination and play a role in analytical thinking and data driven decision making