Module 2: Connecting Data – Working with Related Information
In Module 1, we learned how to retrieve data from a single table. However, most real-world applications involve data spread across multiple related tables (like our users
, products
, and orders
tables). In this module, you’ll learn how to connect these tables to answer more complex questions.
Lesson 2.1: Why Multiple Tables? The Power of Relationships
You might wonder: why not just store everything in one giant table? For example, why not put all the user’s details (name, email) directly into every single order they place?
Let’s consider the problems with that approach:
- Data Redundancy: If a user places 10 orders, you’d store their name and email 10 times. This wastes space.
- Update Anomalies: If a user updates their email address, you’d have to find and change it in all 10 order records. If you miss one, the data becomes inconsistent. This is unreliable.
- Deletion Anomalies: If a user hasn’t placed any orders yet, where do you store their user information if it’s tied only to orders? Or, if you delete a user’s only order, do you lose their user record entirely?
- Insertion Anomalies: How do you add a new product category if no products belong to it yet?
To solve these problems, we use Normalization. This is the process of organizing data into multiple tables to reduce redundancy and improve data integrity. We split data into logical units (like users, products, orders) and then define relationships between them.
Primary Keys and Foreign Keys:
These relationships are maintained using keys:
- Primary Key (PK): A column (or set of columns) that contains a unique identifier for each row in a table. Examples:
users.user_id
,products.product_id
,orders.order_id
. No two rows in the same table can have the same Primary Key value, and it cannot be empty (NULL). - Foreign Key (FK): A column (or set of columns) in one table that refers to the Primary Key of another table. This creates the link. Examples:
orders.user_id
is a Foreign Key referencingusers.user_id
. This links an order to the user who placed it.products.category_id
is a Foreign Key referencingcategories.category_id
. This links a product to its category.order_items.order_id
referencesorders.order_id
, andorder_items.product_id
referencesproducts.product_id
. This links order line items to both the specific order and the specific product.
By splitting data logically and linking tables using Primary and Foreign Keys, we create a relational structure that is efficient, reliable, and flexible. The next step is learning how to query across these relationships.
Lesson 2.2: Combining Tables: INNER JOIN
The most common way to combine rows from two or more tables based on a related column is using an INNER JOIN
. An INNER JOIN
returns only the rows where the join condition (the link between the tables) is met in both tables.
Application Scenario: You need to display a list of orders, but instead of showing just the user_id
, you want to show the user’s actual email
address. Or, you want to list products along with their category names, not just the category_id
.
Syntax:
SQL
SELECT
tableA.column1,
tableB.column2
FROM
tableA
INNER JOIN
tableB ON tableA.related_column = tableB.related_column;
INNER JOIN tableB
: Specifies the second table you want to connect to.ON tableA.related_column = tableB.related_column
: This is the crucial join condition. It tells the database how the tables are related – typically by matching the Foreign Key in one table with the Primary Key in the other.
Example 1: Show Order IDs and the corresponding User’s Email
We need to connect orders
(which has user_id
) and users
(which has user_id
and email
).
SQL
SELECT
orders.order_id,
users.email -- Select the email from the users table
FROM
orders -- Start with the orders table
INNER JOIN
users ON orders.user_id = users.user_id; -- Join condition: match user_id in both tables
This query looks at each order, finds the matching user_id
in the users
table, and returns the order_id
and the corresponding user’s email
. Orders belonging to users not found in the users
table (which shouldn’t happen with proper Foreign Keys) would be excluded.
Example 2: Show Product Names and their Category Names
We need to connect products
(which has category_id
) and categories
(which has category_id
and category_name
).
SQL
SELECT
products.product_name,
categories.category_name -- Select the category name
FROM
products
INNER JOIN
categories ON products.category_id = categories.category_id; -- Join condition: match category_id
This returns a list of product names paired with the name of the category they belong to. Products without a matching category (or categories without matching products) would be excluded.
Think of an INNER JOIN
like the intersection in a Venn diagram – you only get the records that have a match in both tables based on the ON
condition.
Lesson 2.3: More Join Types: LEFT JOIN
What if you want to retrieve all rows from one table, and only the matching rows from a second table? For instance, maybe you want to list all users, and also show their order IDs if they have placed any orders. An INNER JOIN
wouldn’t work here, because it would exclude users who haven’t placed any orders.
This is where LEFT JOIN
(also called LEFT OUTER JOIN
) comes in. It returns all rows from the left table (the one listed after FROM
) and the matched rows from the right table (the one listed after LEFT JOIN
). If there’s no match in the right table for a row from the left table, the columns selected from the right table will have NULL
values for that row.
Application Scenario:
- Displaying a list of all registered users and how many orders each has placed (even if it’s zero). (Requires aggregation, which we’ll cover later, but
LEFT JOIN
is the foundation). - Listing all products and their category names, ensuring products appear even if their category is missing (perhaps
category_id
was allowed to be NULL).
Syntax:
SQL
SELECT
tableA.column1,
tableB.column2
FROM
tableA -- Left table
LEFT JOIN
tableB -- Right table
ON tableA.related_column = tableB.related_column;
Example 1: List ALL Users and any Order IDs they might have
We want all users, so users
is the left table. We join it with orders
.
SQL
SELECT
users.user_id,
users.email,
orders.order_id -- This will be NULL if the user has no orders
FROM
users -- Left table: All rows from here will be included
LEFT JOIN
orders ON users.user_id = orders.user_id; -- Join condition
This query returns one row for every user. If a user has placed orders, you might see multiple rows for that user (one for each order). If a user has placed no orders, you will still see one row for that user, but the orders.order_id
column will be NULL
.
Think of a LEFT JOIN
like a Venn diagram where you take the entire left circle, plus the intersection part of the right circle.
(Note: There’s also RIGHT JOIN
which keeps all rows from the right table, and FULL OUTER JOIN
which keeps all rows from both tables, filling in NULL
s where matches don’t exist. However, INNER JOIN
and LEFT JOIN
are by far the most commonly used in application development).
Lesson 2.4: Using Aliases
When writing queries, especially those involving joins, table and column names can become long and repetitive. Sometimes, you might even join a table to itself or join multiple tables that have columns with the same name (like id
). Aliases help simplify this.
- Table Aliases: Assign a short temporary name to a table within a query.
- Column Aliases: Assign a different, often more descriptive, temporary name to a column in the result set.
Application Scenario: Improve readability of complex queries, avoid ambiguity when column names clash.
Syntax:
- Table Alias:
FROM tablename AS alias
or simplyFROM tablename alias
- Column Alias:
SELECT column AS alias_name
orSELECT expression AS alias_name
Example 1: Rewriting INNER JOIN with Table Aliases
Let’s rewrite the query showing order IDs and user emails using aliases o
for orders
and u
for users
.
SQL
SELECT
o.order_id,
u.email -- Use aliases to refer to columns
FROM
orders AS o -- Assign alias 'o' to orders
INNER JOIN
users AS u ON o.user_id = u.user_id; -- Use aliases in the JOIN condition
This is much shorter and often easier to read once you get used to it.
Example 2: Rewriting LEFT JOIN with Table Aliases and a Column Alias
Let’s rewrite the query listing all users and their order IDs, using aliases and renaming order_id
in the output.
SQL
SELECT
u.user_id,
u.email,
o.order_id AS UserOrderID -- Rename 'order_id' to 'UserOrderID' in the output
FROM
users u -- Alias 'u' for users (AS is optional for table aliases)
LEFT JOIN
orders o ON u.user_id = o.user_id; -- Alias 'o' for orders
Using aliases is highly recommended for any query involving more than one table. It makes your SQL cleaner and easier to maintain.
Mini-Project 2: Connecting Store Data
Apply your knowledge of JOIN
s and aliases to retrieve combined information from the E-commerce database:
- Products with Category Names: Write a query to list all product names (
product_name
) along with the corresponding category name (category_name
). Use anINNER JOIN
and table aliases (p
for products,c
for categories). - Orders with User Emails: Write a query that lists each
order_id
, theorder_date
, and theemail
of the user who placed the order. Use anINNER JOIN
and table aliases (o
for orders,u
for users). - All Users and Their Orders: Write a query that lists the
user_id
andemail
for all users. For each user, also list theorder_id
of any orders they have placed. If a user has placed no orders, theiruser_id
andemail
should still appear, but theorder_id
should beNULL
. Use aLEFT JOIN
and table aliases.
Great job! You can now connect related tables using INNER JOIN
to find matching data and LEFT JOIN
to include all data from one table even if there’s no match in the other. You also learned how to use aliases to keep your queries readable. In the next module, we’ll explore how to summarize and group data using aggregate functions.