2  Business Analytics

2.1 Introduction

Analytics are ubiquitous, they are all around us. They make our daily lives a lot simpler.

Google knows to look for not just the words you search for, but can almost guess (fairly accurately) what those words mean, and what you are really searching for. Netflix and YouTube almost know what you might want to watch next. Gmail can classify your email into real email, junk, promotions, and social messages. CVS knows what coupons to offer you. Your newsfeed shows you the stories you would be interested in. Your employer probably has a good idea of whether you are a flight risk. LinkedIn can show you jobs that are suited to your skills. Companies can react to your reviews, even though they receive thousands of reviews every day. Your computer can recognize your face. Zillow can reasonably accurately estimate the value of any home.

All of this is made possible by data and analytics. And while it may look like magic, in the end it really mostly linear algebra and calculus at work behind the scenes.

In this set of notes, structured almost as as book, we are going to look behind the curtain and see what makes all of this possible. We will examine the ‘why’, the ‘what’, as well as the ‘how’. Which means we will try understand why something makes sense, what problems can be solved, and, we will also look at the how with practical examples of solving such problems.

Problems that data can solve often don’t look like they can be solved by data, they almost always appear to be things that require human intelligence and judgment. This also means we will always be thinking about restructuring and reformulating problems into forms that make them amenable to be solved by the analytic tools we have at hand. This makes analytics as much a creative art, as it is about math and algorithms.

More recently, a new class of analytics tools — large language models (LLMs) like ChatGPT, Claude, and Gemini — has made it possible for ordinary users to interact with data in plain English, without writing a single line of code. Whether it is generating a Python script, summarizing a long report, or answering questions about a dataset, the boundaries of what is tractable with data have expanded dramatically. Yet the fundamentals have not changed: data quality, clear problem framing, and sound statistical thinking remain the foundations on which all useful analytics are built. These notes are therefore structured to develop both the timeless concepts and an awareness of how the modern AI landscape extends them.

Code
from IPython.display import YouTubeVideo
YouTubeVideo('lHQeD9uJ0e0', width=672, height=378)

2.1.1 Who needs analytics?

So who needs analytics? Anyone who needs to make a decision needs analytics. Analytics support humans in making decisions, and can sometimes completely take the task of making decisions off the plates of humans.

Around us, we see many examples of analytics in action. The phrases in the parentheses suggest possible analytical tools that can help with the task described.

  • Governments analyze and predict pensions and healthcare bills. (Time series)
  • Google calculates whether you will or will not click an ad (Classification)
  • Amazon shows you what you will buy next (Association rules)
  • Insurance companies predict who will live, die, or have an accident (Classification)
  • Epidemiologists forecast the outbreak of diseases (Regression, RNNs)
  • Netflix wants to know what you would like to watch next (Recommender systems)

2.1.2 Defining analytics

But before we dive deeper, let us pause for a moment to think about what we mean by analytics. A quick search will reveal several definitions of analytics, and they are probably all accurate. A key thing though about analytics is that they are data-based, and that they provide us an intuition or an understanding which we did not have before. Another way of saying this is that analytics provide insights. Business analytics are actionable insights from data. Understanding the fundamental concepts underlying business analytics is essential for success at all levels in most organizations today.

So here is an attempted definition of analytics: Analytics are data-based actionable insights
- They are data-based – which means opinions alone are not analytics
- They are actionable – which means they drive decisions, helping select a course of action among multiple available
- They are insightful – which means they uncover things that weren’t known before with certainty

We will define analytics broadly to mean anything that allows us to profit from data. Profit includes not only improving the bottomline by increasing revenues or reducing costs, but also things that help us achieve our goals in any way, for example, make our customers happier, improve the dependability of our products, improve patient outcomes in a healthcare setting. Depending on the business, these may or may not have a directly defined relationship to profits.

Analyzing data with a view to profit from it has been called many different things such as data mining, business analytics, data science, decision science, and countless other phrases, and there are people you will find on the internet that know and can eloquently debate the fine differences between all of these.

But as mentioned before, we will not delve into the semantics and focus on everything that allows us to profit from data – no matter what it is called by scholars. In the end, terminology makes no difference, our goal is to use data and improve outcomes – for ourselves, for our families, for our customers, for our shareholders. To achieve this goal, we will not limit ourselves to one branch or a single narrow interpretation of analytics. If it is something that can help us, we will include it in our arsenel.
- AI assistants understand context and generate human-like text, code and analysis (Large Language Models)
- Hospitals predict patient readmission risk to allocate nursing resources proactively (Classification, Survival Analysis)
- Manufacturers detect equipment faults before they cause downtime (Anomaly Detection, IoT Time Series)
- Retailers optimize dynamic pricing in real time based on demand signals (Reinforcement Learning, Optimization)
- Hiring tools screen resumes and flag candidates — now subject to regulatory scrutiny (Classification, AI Ethics)

Code
YouTubeVideo('fQJWIOhYCWI', width=672, height=378)

2.1.3 What we will cover

A lot of the data analytical work today relies on machine learning and artificial intelligence algorithms. These notes provide a high-level understanding of how these algorithms structure problems and solve them. Later chapters extend this to deep learning, NLP, and large language models (LLMs) — the technologies behind tools like ChatGPT and Claude. By the end, a reader should be able to not just understand these systems conceptually, but also build simplified versions of them in Python.

2.1.4 What do analytics look like?

Analytics manifest themselves in multiple forms:
1. Analytical dashboards and reports: providing information to support decisions. This is the most common use case for consuming analytics. 2. Embedded analytics: Analytics embedded in an application, for example, providing intelligent responses based on user interactions such as a bot responding to a query, or a workflow routing a transaction to a particular path based on data. 3. Automated analytics: Analytics embedded in a process where an analytic drives an automated decision or application behavior, for example an instant decision on a credit card. 4. Generative / Conversational Analytics: A user interacts with data or a model in natural language — asking questions, requesting summaries, or generating code directly from a plain-English description of the desired analysis. Tools like ChatGPT Advanced Data Analysis or Claude are practical examples of this form in action today.

The boundary between embedded and automated analytics can be fuzzy.

2.1.5 The practical perspective

Analytics vary in sophistication, and data can be presented in different forms. For example, data may be available as:
- Raw, or new-raw data: Counts, observed facts, sensor readings
- Summarizations: Subtotals, sorts, filters, pivots, averages, basic descriptive statistics
- Time series: Comparison of the same measure over time
- Predictive analytics: Classification, prediction, clustering

We will address all of the above in greater detail. One key thing to keep in mind is that greater sophistication and complexity does not mean superior analytics, fitness-for-purpose is more important. In practice, the use of analytics takes multiple forms, most of which can be bucketed into the following categories:

  • Inform a decision by providing facts and numbers (eg, in a report)
  • Recommend a course of action based on data
  • Automatically take a decision and execute

Given we repeatedly emphasize decision making as a key goal for analytics, a slight distinction between the types of decisions is relevant. Why this matters is because the way we build our solutions thinking about repeatability, performance and scalability depends upon the nature and frequency of decisions to be supported. Broadly, there are:

  • One-time decisions, that require discoveries of useful patterns and relationships as a one time exercise, and
  • Repeated decisions, that often need to be made at scale and require automation. These can benefit from even small improvements in decision making accuracy

2.1.6 Terminology confusion

Often, we see a distinction being drawn between descriptive and predictive analytics.

  • Descriptive Analytics: Describe what has happened in the past through reports, dashboards, statistics, traditional data mining etc.
  • Predictive Analytics: Use modeling techniques based on past data to predict the future or determine correlations between variables. Includes linear regression, time series analysis, data mining to find patterns for prediction, simulations.
  • Prescriptive Analytics: Identify the best course of action to take based on a rule, and the rule may be a model based prediction.

Other descriptions for analytics include exploratory, inferential, causal, mechanistic etc.

Then there are some other phrases one might come across in the context of business analytics. Here are some more:

  • Data Mining: the process of discovering meaningful correlations, patterns and trends by sifting through large amounts of data stored in repositories. Data mining employs pattern recognition technologies, as well as statistical and mathematical techniques.
    Source: Gartner, quoted from Data Mining for Business Intelligence by Shmueli et al
  • Data science: the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.
    Source: DataRobot website
  • Data analysis: a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.
    Source: Wikipedia

2.1.7 What about big data?

Big data essentially means datasets that are too large for traditional data processing systems, and therefore require new processing technologies. But what was big yesterday is probably only medium sized, or even small by today’s standards, so the phrase big data does not have a precise definition. What is big data today, might just be right sized for your phone of the future.

Big data technologies support data processing, data engineering, and also data science activities – for example, Apache Spark, a big data solution, has an entire library of machine learning algorithms built in.

2.1.8 AI, ML and Deep Learning

Yet another cut of the business analytics paradigm is AI, ML and deep learning.

  • Artificial Intelligence: the effort to automate intellectual tasks normally performed by humans. This term has the most expansive scope, and includes Machine Learning and Deep Learning.

  • Machine Learning: A machine-learning system is one that is trained rather than explicitly programmed. It’s presented with many examples relevant to a task, and it finds statistical structure in these examples that eventually allows the system to come up with rules for automating the task.

  • Deep Learning: Deep learning is a specific subfield of machine learning: learning from data that puts an emphasis on learning successive layers of increasingly meaningful representations. The “deep” refers to successive layers of representations. Neural networks (NNs) are nearly always the technology that underlies deep learning systems.

    Source: Deep Learning with Python, François Chollet

  • Generative AI: AI models that can generate new content — text, images, audio, code — that resembles human-created material. LLMs (GPT-4, Claude, Gemini) are the most prominent examples. Unlike traditional predictive models which output a single prediction, generative models produce rich, open-ended outputs.

  • MLOps: The practice of deploying, monitoring, and maintaining ML models in production — the operational side of data science. Building a model is only half the job; keeping it performing reliably in production is the other half.

  • AI Regulation: The EU AI Act (2024) and analogous regulations in other jurisdictions impose obligations on high-risk AI uses (e.g., hiring, credit, healthcare). Practitioners should be aware that model governance and explainability are increasingly not optional.

2.1.9 Does terminology matter?

This may be repetitive, but necessary to stress - our goal is to profit from data, using any and all computational techniques and resources we can get our hands on. We will use many of these terms interchangeably. We will mostly talk about business analytics in the context of improving decisions and business outcomes, and refer to the general universe of tools and methods as data science. These tools and methods include Business Intelligence, Data Mining, Forecasting, Modeling, Machine Learning and AI, Big Data, NLP and other techniques that help us profit from data.

That makes sense because for the businessman, the differences between all of these “types of analytics” is merely semantic. What we are trying to do is to generate value of some sort using data and numbers, and the creation of something that is useful is more important than what we call it. We will stick with analytics to mean anything we can do with data that can provide us a profit, or help us achieve our goals.

2.1.10 Where are analytics used?

The widest applications of data science and analytics have been in marketing for tasks such as:
- Targeted marketing
- Online advertising
- Recommendations for cross-selling

Yet just about every other business area has benefited from advances in data science:
- Accounting
- HR
- Compliance
- Supply chain
- Web analytics
- You name it…

2.1.11 Analytics as strategic assets

Data and data science capabilities are strategic assets. What that means is that in the early days of adoption they provide a competitive advantage that allows us to outdo our competition. As they mature and become part of the mainstream, they become essential for survival.

Both data, and data science capabilities, are increasingly the deciding factor behind who wins and who loses. Both the data and the capabilities are needed: having lots of data is an insufficient condition for success – the capability to apply data science on the available data to profitable use cases is key.
This is becaue the best data science team can yield little value without the right data which has some predictive power. And the best data cannot yield insights without the right data science talent.

Organizations need to invest in both data and data science to benefit from analytics.

This is why understanding the fundamental concepts underlying business analytics is essential for success at all levels in most organizations. This invariably involves knowing at a conceptual level the data science process, capabilities of algorithms such as the ideas behind classification and predictions, AI and ML, and the evaluation of models. We will cover all of these in due time.

2.1.12 Our goal

The goal for these notes is that when you are done going through these notes, you should be able to:
- When faced with a business problem, be able to assess whether and how data can be used to arrive at a solution or improve performance
- Act competently and with confidence when working with data science teams
- Identify opportunities for data science teams to apply technical solutions to, and monitor their implementation progress, and review the business benefits
- Oversee analytical teams and direct their work
- Identify data driven competitive threats, and be able to formulate a strategic response
- Critically evaluate data science proposals (from internal teams, consultants)
- Simplify and concisely explain data driven approaches to leadership, management and others
- Leverage generative AI tools (LLMs) as productivity multipliers in your analytics workflow — for writing code, exploring data, summarizing results, and drafting insights - Evaluate and govern AI systems responsibly — understanding limitations, biases, and risks of models so you can advocate for ethical and compliant use