Module 3

Module 3: Summarizing Data – Analytics and Reporting

So far, we’ve retrieved individual rows of data and connected related information. But often, applications need to display summarized information – think dashboards showing total sales, user counts, average ratings, or reports grouping data by category. SQL’s aggregate functions and grouping capabilities are perfect for this.


Lesson 3.1: Aggregate Functions: COUNT, SUM, AVG, MAX, MIN

Aggregate functions perform a calculation on a set of rows and return a single, summary result.

Application Scenario: Answering questions like: How many users do we have? What’s our total revenue? What’s the average product price? What’s the most expensive item? When was the first order placed?

Here are the most common aggregate functions:

  • COUNT(column) or COUNT(*): Counts the number of rows.
    • COUNT(*): Counts all rows returned by the query.
    • COUNT(column): Counts rows where the specified column is NOT NULL.
    • COUNT(DISTINCT column): Counts the number of unique non-NULL values in the column.
  • SUM(column): Calculates the sum of all values in a numeric column.
  • AVG(column): Calculates the average of all values in a numeric column.
  • MAX(column): Finds the maximum value in a column (works for numbers, dates, and text).
  • MIN(column): Finds the minimum value in a column (works for numbers, dates, and text).

Example 1: Count the total number of registered users

SQL

SELECT COUNT(*) AS total_users
FROM users;

Result: A single row with a single column total_users showing the count.

Example 2: Count how many products are currently in stock

SQL

SELECT COUNT(product_id) AS products_in_stock
FROM products
WHERE stock_quantity > 0;

Example 3: Calculate the total revenue from all orders

SQL

SELECT SUM(total_amount) AS total_revenue
FROM orders;

Example 4: Find the average price of all products

SQL

SELECT AVG(price) AS average_product_price
FROM products;

Example 5: Find the price of the most expensive product

SQL

SELECT MAX(price) AS max_price
FROM products;

Example 6: Find the date the first user registered

SQL

SELECT MIN(registration_date) AS first_registration
FROM users;

Example 7: Count the number of distinct categories that have products

SQL

SELECT COUNT(DISTINCT category_id) AS number_of_categories_with_products
FROM products
WHERE category_id IS NOT NULL; -- Important if category_id could be NULL

Using Aggregate Functions with WHERE:

You can use a WHERE clause to filter the rows before the aggregate function is applied.

Example 8: Calculate the total revenue from orders placed by user_id = 1

SQL

SELECT SUM(total_amount) AS alice_total_revenue
FROM orders
WHERE user_id = 1;

Try it yourself:

  1. Count the total number of orders.
  2. Find the average stock quantity across all products.
  3. Find the minimum total_amount for any order.
  4. Count how many distinct users have placed orders. (Hint: Use COUNT(DISTINCT ...) on the orders table).

Lesson 3.2: Grouping Data: GROUP BY

Aggregate functions are powerful on their own, but they become even more useful when combined with the GROUP BY clause. GROUP BY allows you to divide the rows of your table into groups based on the values in one or more columns and then apply an aggregate function to each group.

Application Scenario: Generating reports like:

  • Number of products per category.
  • Total sales amount per user.
  • Average order value per month.
  • Number of users who registered per day.

Syntax:

SQL

SELECT
    column_to_group_by_1,
    column_to_group_by_2, -- Optional: Group by multiple columns
    AGG_FUNC(column_to_aggregate)
FROM
    tablename
[WHERE condition] -- WHERE filters rows BEFORE grouping
GROUP BY
    column_to_group_by_1,
    column_to_group_by_2 -- Must include all non-aggregated columns from SELECT
[ORDER BY ...]; -- Optional: Sort the grouped results

Important Rule: Any column in the SELECT list that is not an aggregate function must be included in the GROUP BY clause. This is because GROUP BY collapses multiple rows into one summary row per group; the database needs to know which values to show for the non-aggregated columns (it will be the value common to all rows in that group).

Example 1: Count the number of products in each category

SQL

SELECT
    category_id,
    COUNT(product_id) AS num_products -- Apply COUNT to each group
FROM
    products
GROUP BY
    category_id -- Group rows based on their category_id
ORDER BY
    category_id;

Result: Rows showing each category_id and the count of products associated with it.

Example 2: Calculate the total amount spent by each user

SQL

SELECT
    user_id,
    SUM(total_amount) AS total_spent_per_user,
    COUNT(order_id) AS number_of_orders -- Can have multiple aggregates
FROM
    orders
GROUP BY
    user_id -- Group rows by user_id
ORDER BY
    total_spent_per_user DESC; -- Show biggest spenders first

Result: Rows showing each user_id, the sum of their total_amount across all orders, and the number of orders they placed.

Example 3: Combining JOIN and GROUP BY – Count products per category name

We need the category name from the categories table.

SQL

SELECT
    c.category_name, -- Select the name from categories table
    COUNT(p.product_id) AS num_products
FROM
    products AS p
INNER JOIN
    categories AS c ON p.category_id = c.category_id -- Join first
GROUP BY
    c.category_name -- Group by the category name
ORDER BY
    c.category_name;

Result: Rows showing each category name and the count of products in that category.

Try it yourself:

  1. Find the average stock_quantity for products in each category_id.
  2. Find the date of the most recent order placed by each user_id. (Hint: Use MAX on order_date).
  3. Calculate the total quantity of items sold for each product_id using the order_items table. (Hint: SUM(quantity) grouped by product_id).

Lesson 3.3: Filtering Groups: HAVING

We know WHERE filters rows before they are processed by GROUP BY and aggregate functions. But what if you want to filter the results after the grouping has happened, based on the result of an aggregate function?

Application Scenario:

  • Show only categories that have more than 5 products.
  • List only users who have placed 3 or more orders.
  • Find products whose average order quantity is greater than 2.

This is what the HAVING clause is for. It filters the grouped rows.

Syntax:

SQL

SELECT
    column_to_group_by,
    AGG_FUNC(column_to_aggregate)
FROM
    tablename
[WHERE condition] -- Filters individual rows BEFORE grouping
GROUP BY
    column_to_group_by
HAVING
    aggregate_condition -- Filters groups AFTER grouping
[ORDER BY ...];

Key Difference: WHERE vs. HAVING

  • WHERE filters individual rows. It operates before GROUP BY. You generally cannot use aggregate functions directly in the WHERE clause.
  • HAVING filters entire groups based on an aggregate result. It operates after GROUP BY. You can (and usually do) use aggregate functions in the HAVING clause.

Example 1: Find categories with more than 2 products

SQL

SELECT
    category_id,
    COUNT(product_id) AS num_products
FROM
    products
GROUP BY
    category_id
HAVING
    COUNT(product_id) > 2; -- Filter groups where the count is > 2

Result: Only those category_ids (and their counts) that have more than 2 products associated with them.

Example 2: Find users who have placed 2 or more orders

SQL

SELECT
    user_id,
    COUNT(order_id) AS num_orders
FROM
    orders
GROUP BY
    user_id
HAVING
    COUNT(order_id) >= 2; -- Filter groups (users) with 2 or more orders

Example 3: Combining WHERE, GROUP BY, and HAVING

Find users who registered in or after 2025 (WHERE) and have spent more than $100 in total (HAVING).

SQL

SELECT
    o.user_id,
    u.email, -- Need to join to get email
    SUM(o.total_amount) AS total_spent
FROM
    orders AS o
INNER JOIN
    users AS u ON o.user_id = u.user_id
WHERE
    u.registration_date >= '2025-01-01' -- Filter users BEFORE grouping
GROUP BY
    o.user_id, u.email -- Group by user ID and email (email needed in SELECT)
HAVING
    SUM(o.total_amount) > 100; -- Filter groups AFTER aggregation

Try it yourself:

  1. From the order_items table, find the product_ids where the total quantity sold across all orders is greater than 5.
  2. Find the category_ids where the average product price is less than $50.
  3. List users (user_id) who have placed exactly 1 order.

Mini-Project 3: Store Analytics

Time to generate some insights from our E-commerce data using aggregation, grouping, and filtering!

  1. Product Count per Category: Write a query that lists each category name (category_name) and the total number of products (product_count) within that category. Order the results by category name. (Requires: JOIN, GROUP BY, COUNT).
  2. Average Order Value: Calculate the single average value of the total_amount across all orders in the orders table. Alias the result as average_order_value. (Requires: AVG).
  3. Active Customers: Find the user_id and email of users who have placed 2 or more orders. List the user’s email and the total number of orders they placed, aliased as num_orders. (Requires: JOIN, GROUP BY, COUNT, HAVING).

Fantastic! You’ve now learned how to summarize data using aggregate functions (COUNT, SUM, AVG, MAX, MIN), how to apply these functions to specific groups using GROUP BY, and how to filter those groups using HAVING. These skills are crucial for building reporting features and gaining analytical insights from application data. Next, we’ll look at how to modify data using INSERT, UPDATE, and DELETE.