Module 3: Summarizing Data – Analytics and Reporting
So far, we’ve retrieved individual rows of data and connected related information. But often, applications need to display summarized information – think dashboards showing total sales, user counts, average ratings, or reports grouping data by category. SQL’s aggregate functions and grouping capabilities are perfect for this.
Lesson 3.1: Aggregate Functions: COUNT
, SUM
, AVG
, MAX
, MIN
Aggregate functions perform a calculation on a set of rows and return a single, summary result.
Application Scenario: Answering questions like: How many users do we have? What’s our total revenue? What’s the average product price? What’s the most expensive item? When was the first order placed?
Here are the most common aggregate functions:
COUNT(column)
orCOUNT(*)
: Counts the number of rows.COUNT(*)
: Counts all rows returned by the query.COUNT(column)
: Counts rows where the specifiedcolumn
is NOT NULL.COUNT(DISTINCT column)
: Counts the number of unique non-NULL values in the column.
SUM(column)
: Calculates the sum of all values in a numeric column.AVG(column)
: Calculates the average of all values in a numeric column.MAX(column)
: Finds the maximum value in a column (works for numbers, dates, and text).MIN(column)
: Finds the minimum value in a column (works for numbers, dates, and text).
Example 1: Count the total number of registered users
SQL
SELECT COUNT(*) AS total_users
FROM users;
Result: A single row with a single column total_users
showing the count.
Example 2: Count how many products are currently in stock
SQL
SELECT COUNT(product_id) AS products_in_stock
FROM products
WHERE stock_quantity > 0;
Example 3: Calculate the total revenue from all orders
SQL
SELECT SUM(total_amount) AS total_revenue
FROM orders;
Example 4: Find the average price of all products
SQL
SELECT AVG(price) AS average_product_price
FROM products;
Example 5: Find the price of the most expensive product
SQL
SELECT MAX(price) AS max_price
FROM products;
Example 6: Find the date the first user registered
SQL
SELECT MIN(registration_date) AS first_registration
FROM users;
Example 7: Count the number of distinct categories that have products
SQL
SELECT COUNT(DISTINCT category_id) AS number_of_categories_with_products
FROM products
WHERE category_id IS NOT NULL; -- Important if category_id could be NULL
Using Aggregate Functions with WHERE
:
You can use a WHERE
clause to filter the rows before the aggregate function is applied.
Example 8: Calculate the total revenue from orders placed by user_id = 1
SQL
SELECT SUM(total_amount) AS alice_total_revenue
FROM orders
WHERE user_id = 1;
Try it yourself:
- Count the total number of orders.
- Find the average stock quantity across all products.
- Find the minimum
total_amount
for any order. - Count how many distinct users have placed orders. (Hint: Use
COUNT(DISTINCT ...)
on theorders
table).
Lesson 3.2: Grouping Data: GROUP BY
Aggregate functions are powerful on their own, but they become even more useful when combined with the GROUP BY
clause. GROUP BY
allows you to divide the rows of your table into groups based on the values in one or more columns and then apply an aggregate function to each group.
Application Scenario: Generating reports like:
- Number of products per category.
- Total sales amount per user.
- Average order value per month.
- Number of users who registered per day.
Syntax:
SQL
SELECT
column_to_group_by_1,
column_to_group_by_2, -- Optional: Group by multiple columns
AGG_FUNC(column_to_aggregate)
FROM
tablename
[WHERE condition] -- WHERE filters rows BEFORE grouping
GROUP BY
column_to_group_by_1,
column_to_group_by_2 -- Must include all non-aggregated columns from SELECT
[ORDER BY ...]; -- Optional: Sort the grouped results
Important Rule: Any column in the SELECT
list that is not an aggregate function must be included in the GROUP BY
clause. This is because GROUP BY
collapses multiple rows into one summary row per group; the database needs to know which values to show for the non-aggregated columns (it will be the value common to all rows in that group).
Example 1: Count the number of products in each category
SQL
SELECT
category_id,
COUNT(product_id) AS num_products -- Apply COUNT to each group
FROM
products
GROUP BY
category_id -- Group rows based on their category_id
ORDER BY
category_id;
Result: Rows showing each category_id
and the count of products associated with it.
Example 2: Calculate the total amount spent by each user
SQL
SELECT
user_id,
SUM(total_amount) AS total_spent_per_user,
COUNT(order_id) AS number_of_orders -- Can have multiple aggregates
FROM
orders
GROUP BY
user_id -- Group rows by user_id
ORDER BY
total_spent_per_user DESC; -- Show biggest spenders first
Result: Rows showing each user_id
, the sum of their total_amount
across all orders, and the number of orders they placed.
Example 3: Combining JOIN
and GROUP BY
– Count products per category name
We need the category name from the categories
table.
SQL
SELECT
c.category_name, -- Select the name from categories table
COUNT(p.product_id) AS num_products
FROM
products AS p
INNER JOIN
categories AS c ON p.category_id = c.category_id -- Join first
GROUP BY
c.category_name -- Group by the category name
ORDER BY
c.category_name;
Result: Rows showing each category name and the count of products in that category.
Try it yourself:
- Find the average
stock_quantity
for products in eachcategory_id
. - Find the date of the most recent order placed by each
user_id
. (Hint: UseMAX
onorder_date
). - Calculate the total quantity of items sold for each
product_id
using theorder_items
table. (Hint:SUM(quantity)
grouped byproduct_id
).
Lesson 3.3: Filtering Groups: HAVING
We know WHERE
filters rows before they are processed by GROUP BY
and aggregate functions. But what if you want to filter the results after the grouping has happened, based on the result of an aggregate function?
Application Scenario:
- Show only categories that have more than 5 products.
- List only users who have placed 3 or more orders.
- Find products whose average order quantity is greater than 2.
This is what the HAVING
clause is for. It filters the grouped rows.
Syntax:
SQL
SELECT
column_to_group_by,
AGG_FUNC(column_to_aggregate)
FROM
tablename
[WHERE condition] -- Filters individual rows BEFORE grouping
GROUP BY
column_to_group_by
HAVING
aggregate_condition -- Filters groups AFTER grouping
[ORDER BY ...];
Key Difference: WHERE
vs. HAVING
WHERE
filters individual rows. It operates beforeGROUP BY
. You generally cannot use aggregate functions directly in theWHERE
clause.HAVING
filters entire groups based on an aggregate result. It operates afterGROUP BY
. You can (and usually do) use aggregate functions in theHAVING
clause.
Example 1: Find categories with more than 2 products
SQL
SELECT
category_id,
COUNT(product_id) AS num_products
FROM
products
GROUP BY
category_id
HAVING
COUNT(product_id) > 2; -- Filter groups where the count is > 2
Result: Only those category_id
s (and their counts) that have more than 2 products associated with them.
Example 2: Find users who have placed 2 or more orders
SQL
SELECT
user_id,
COUNT(order_id) AS num_orders
FROM
orders
GROUP BY
user_id
HAVING
COUNT(order_id) >= 2; -- Filter groups (users) with 2 or more orders
Example 3: Combining WHERE
, GROUP BY
, and HAVING
Find users who registered in or after 2025 (WHERE
) and have spent more than $100 in total (HAVING
).
SQL
SELECT
o.user_id,
u.email, -- Need to join to get email
SUM(o.total_amount) AS total_spent
FROM
orders AS o
INNER JOIN
users AS u ON o.user_id = u.user_id
WHERE
u.registration_date >= '2025-01-01' -- Filter users BEFORE grouping
GROUP BY
o.user_id, u.email -- Group by user ID and email (email needed in SELECT)
HAVING
SUM(o.total_amount) > 100; -- Filter groups AFTER aggregation
Try it yourself:
- From the
order_items
table, find theproduct_id
s where the total quantity sold across all orders is greater than 5. - Find the
category_id
s where the average product price is less than $50. - List users (
user_id
) who have placed exactly 1 order.
Mini-Project 3: Store Analytics
Time to generate some insights from our E-commerce data using aggregation, grouping, and filtering!
- Product Count per Category: Write a query that lists each category name (
category_name
) and the total number of products (product_count
) within that category. Order the results by category name. (Requires:JOIN
,GROUP BY
,COUNT
). - Average Order Value: Calculate the single average value of the
total_amount
across all orders in theorders
table. Alias the result asaverage_order_value
. (Requires:AVG
). - Active Customers: Find the
user_id
andemail
of users who have placed 2 or more orders. List the user’s email and the total number of orders they placed, aliased asnum_orders
. (Requires:JOIN
,GROUP BY
,COUNT
,HAVING
).
Fantastic! You’ve now learned how to summarize data using aggregate functions (COUNT
, SUM
, AVG
, MAX
, MIN
), how to apply these functions to specific groups using GROUP BY
, and how to filter those groups using HAVING
. These skills are crucial for building reporting features and gaining analytical insights from application data. Next, we’ll look at how to modify data using INSERT
, UPDATE
, and DELETE
.