Module 2

Module 2: Connecting Data – Working with Related Information

In Module 1, we learned how to retrieve data from a single table. However, most real-world applications involve data spread across multiple related tables (like our users, products, and orders tables). In this module, you’ll learn how to connect these tables to answer more complex questions.


Lesson 2.1: Why Multiple Tables? The Power of Relationships

You might wonder: why not just store everything in one giant table? For example, why not put all the user’s details (name, email) directly into every single order they place?

Let’s consider the problems with that approach:

  1. Data Redundancy: If a user places 10 orders, you’d store their name and email 10 times. This wastes space.
  2. Update Anomalies: If a user updates their email address, you’d have to find and change it in all 10 order records. If you miss one, the data becomes inconsistent. This is unreliable.
  3. Deletion Anomalies: If a user hasn’t placed any orders yet, where do you store their user information if it’s tied only to orders? Or, if you delete a user’s only order, do you lose their user record entirely?
  4. Insertion Anomalies: How do you add a new product category if no products belong to it yet?

To solve these problems, we use Normalization. This is the process of organizing data into multiple tables to reduce redundancy and improve data integrity. We split data into logical units (like users, products, orders) and then define relationships between them.

Primary Keys and Foreign Keys:

These relationships are maintained using keys:

  • Primary Key (PK): A column (or set of columns) that contains a unique identifier for each row in a table. Examples: users.user_id, products.product_id, orders.order_id. No two rows in the same table can have the same Primary Key value, and it cannot be empty (NULL).
  • Foreign Key (FK): A column (or set of columns) in one table that refers to the Primary Key of another table. This creates the link. Examples:
    • orders.user_id is a Foreign Key referencing users.user_id. This links an order to the user who placed it.
    • products.category_id is a Foreign Key referencing categories.category_id. This links a product to its category.
    • order_items.order_id references orders.order_id, and order_items.product_id references products.product_id. This links order line items to both the specific order and the specific product.

By splitting data logically and linking tables using Primary and Foreign Keys, we create a relational structure that is efficient, reliable, and flexible. The next step is learning how to query across these relationships.


Lesson 2.2: Combining Tables: INNER JOIN

The most common way to combine rows from two or more tables based on a related column is using an INNER JOIN. An INNER JOIN returns only the rows where the join condition (the link between the tables) is met in both tables.

Application Scenario: You need to display a list of orders, but instead of showing just the user_id, you want to show the user’s actual email address. Or, you want to list products along with their category names, not just the category_id.

Syntax:

SQL

SELECT
    tableA.column1,
    tableB.column2
FROM
    tableA
INNER JOIN
    tableB ON tableA.related_column = tableB.related_column;
  • INNER JOIN tableB: Specifies the second table you want to connect to.
  • ON tableA.related_column = tableB.related_column: This is the crucial join condition. It tells the database how the tables are related – typically by matching the Foreign Key in one table with the Primary Key in the other.

Example 1: Show Order IDs and the corresponding User’s Email

We need to connect orders (which has user_id) and users (which has user_id and email).

SQL

SELECT
    orders.order_id,
    users.email -- Select the email from the users table
FROM
    orders -- Start with the orders table
INNER JOIN
    users ON orders.user_id = users.user_id; -- Join condition: match user_id in both tables

This query looks at each order, finds the matching user_id in the users table, and returns the order_id and the corresponding user’s email. Orders belonging to users not found in the users table (which shouldn’t happen with proper Foreign Keys) would be excluded.

Example 2: Show Product Names and their Category Names

We need to connect products (which has category_id) and categories (which has category_id and category_name).

SQL

SELECT
    products.product_name,
    categories.category_name -- Select the category name
FROM
    products
INNER JOIN
    categories ON products.category_id = categories.category_id; -- Join condition: match category_id

This returns a list of product names paired with the name of the category they belong to. Products without a matching category (or categories without matching products) would be excluded.

Think of an INNER JOIN like the intersection in a Venn diagram – you only get the records that have a match in both tables based on the ON condition.


Lesson 2.3: More Join Types: LEFT JOIN

What if you want to retrieve all rows from one table, and only the matching rows from a second table? For instance, maybe you want to list all users, and also show their order IDs if they have placed any orders. An INNER JOIN wouldn’t work here, because it would exclude users who haven’t placed any orders.

This is where LEFT JOIN (also called LEFT OUTER JOIN) comes in. It returns all rows from the left table (the one listed after FROM) and the matched rows from the right table (the one listed after LEFT JOIN). If there’s no match in the right table for a row from the left table, the columns selected from the right table will have NULL values for that row.

Application Scenario:

  • Displaying a list of all registered users and how many orders each has placed (even if it’s zero). (Requires aggregation, which we’ll cover later, but LEFT JOIN is the foundation).
  • Listing all products and their category names, ensuring products appear even if their category is missing (perhaps category_id was allowed to be NULL).

Syntax:

SQL

SELECT
    tableA.column1,
    tableB.column2
FROM
    tableA -- Left table
LEFT JOIN
    tableB -- Right table
    ON tableA.related_column = tableB.related_column;

Example 1: List ALL Users and any Order IDs they might have

We want all users, so users is the left table. We join it with orders.

SQL

SELECT
    users.user_id,
    users.email,
    orders.order_id -- This will be NULL if the user has no orders
FROM
    users -- Left table: All rows from here will be included
LEFT JOIN
    orders ON users.user_id = orders.user_id; -- Join condition

This query returns one row for every user. If a user has placed orders, you might see multiple rows for that user (one for each order). If a user has placed no orders, you will still see one row for that user, but the orders.order_id column will be NULL.

Think of a LEFT JOIN like a Venn diagram where you take the entire left circle, plus the intersection part of the right circle.

(Note: There’s also RIGHT JOIN which keeps all rows from the right table, and FULL OUTER JOIN which keeps all rows from both tables, filling in NULLs where matches don’t exist. However, INNER JOIN and LEFT JOIN are by far the most commonly used in application development).


Lesson 2.4: Using Aliases

When writing queries, especially those involving joins, table and column names can become long and repetitive. Sometimes, you might even join a table to itself or join multiple tables that have columns with the same name (like id). Aliases help simplify this.

  • Table Aliases: Assign a short temporary name to a table within a query.
  • Column Aliases: Assign a different, often more descriptive, temporary name to a column in the result set.

Application Scenario: Improve readability of complex queries, avoid ambiguity when column names clash.

Syntax:

  • Table Alias: FROM tablename AS alias or simply FROM tablename alias
  • Column Alias: SELECT column AS alias_name or SELECT expression AS alias_name

Example 1: Rewriting INNER JOIN with Table Aliases

Let’s rewrite the query showing order IDs and user emails using aliases o for orders and u for users.

SQL

SELECT
    o.order_id,
    u.email -- Use aliases to refer to columns
FROM
    orders AS o -- Assign alias 'o' to orders
INNER JOIN
    users AS u ON o.user_id = u.user_id; -- Use aliases in the JOIN condition

This is much shorter and often easier to read once you get used to it.

Example 2: Rewriting LEFT JOIN with Table Aliases and a Column Alias

Let’s rewrite the query listing all users and their order IDs, using aliases and renaming order_id in the output.

SQL

SELECT
    u.user_id,
    u.email,
    o.order_id AS UserOrderID -- Rename 'order_id' to 'UserOrderID' in the output
FROM
    users u -- Alias 'u' for users (AS is optional for table aliases)
LEFT JOIN
    orders o ON u.user_id = o.user_id; -- Alias 'o' for orders

Using aliases is highly recommended for any query involving more than one table. It makes your SQL cleaner and easier to maintain.


Mini-Project 2: Connecting Store Data

Apply your knowledge of JOINs and aliases to retrieve combined information from the E-commerce database:

  1. Products with Category Names: Write a query to list all product names (product_name) along with the corresponding category name (category_name). Use an INNER JOIN and table aliases (p for products, c for categories).
  2. Orders with User Emails: Write a query that lists each order_id, the order_date, and the email of the user who placed the order. Use an INNER JOIN and table aliases (o for orders, u for users).
  3. All Users and Their Orders: Write a query that lists the user_id and email for all users. For each user, also list the order_id of any orders they have placed. If a user has placed no orders, their user_id and email should still appear, but the order_id should be NULL. Use a LEFT JOIN and table aliases.

Great job! You can now connect related tables using INNER JOIN to find matching data and LEFT JOIN to include all data from one table even if there’s no match in the other. You also learned how to use aliases to keep your queries readable. In the next module, we’ll explore how to summarize and group data using aggregate functions.