Module 5

Module 5: Advanced Techniques for Complex Applications

You’ve mastered the fundamentals of reading, joining, aggregating, and modifying data. Now, let’s explore some advanced techniques that allow you to solve more complex problems, write cleaner queries, and perform sophisticated analysis directly within the database.


Lesson 5.1: Subqueries (Queries within Queries)

A subquery is simply a SELECT statement nested inside another SQL statement (SELECT, INSERT, UPDATE, or DELETE). They allow you to perform multi-step logic within a single query by using the results of an inner query in an outer query.

Application Scenario:

  • Finding products priced higher than the overall average price.
  • Identifying users who have purchased a specific, popular item.
  • Selecting orders whose total amount is greater than the average amount for that specific user (though Window Functions are often better for this).

Types of Subqueries:

Subqueries are powerful but can sometimes make queries complex and hard to read, especially when nested deeply. This leads us to CTEs.


    Subquery in the WHERE Clause: Used for filtering based on the result of another query. Often used with operators like IN, NOT IN, EXISTS, NOT EXISTS, or comparison operators (=, >, <, etc.) if the subquery returns a single value (scalar subquery). Example: Find products priced above the average product price. SQLSELECT product_name, price FROM products WHERE price > (SELECT AVG(price) FROM products); -- Subquery returns the single average price Example: Find users who have placed at least one order. SQLSELECT user_id, first_name, email FROM users WHERE user_id IN (SELECT DISTINCT user_id FROM orders); -- Subquery returns a list of user IDs who have ordered *(Alternatively, EXISTS can be used: WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = users.user_id)) *
    Subquery in the SELECT Clause (Scalar Subquery): The subquery must return exactly one row and one column. Its value can then be included as a column in the outer query's results. Use these sparingly as they can sometimes impact performance. Example: Show each product's price alongside the overall maximum price in the store. SQLSELECT product_name, price, (SELECT MAX(price) FROM products) AS overall_max_price -- Scalar subquery FROM products;
    Subquery in the SELECT Clause (Scalar Subquery): The subquery must return exactly one row and one column. Its value can then be included as a column in the outer query's results. Use these sparingly as they can sometimes impact performance. Example: Show each product's price alongside the overall maximum price in the store. SQLSELECT product_name, price, (SELECT MAX(price) FROM products) AS overall_max_price -- Scalar subquery FROM products;
    Subquery in the FROM Clause (Derived Table): The result set of the subquery acts like a temporary table (a "derived table") that the outer query can select from, join with, or filter. The derived table must be given an alias. Example: Find categories where the count of products in that category is greater than 5. SQLSELECT category_id, num_products FROM (SELECT category_id, COUNT(*) AS num_products -- Inner query calculates counts per category FROM products GROUP BY category_id) AS category_product_counts -- Alias for the derived table WHERE num_products > 5; -- Outer query filters the derived table

      Lesson 5.2: Common Table Expressions (CTEs): WITH Clause

      Common Table Expressions (CTEs) provide a way to define temporary, named result sets that you can reference within a single SQL statement (SELECT, INSERT, UPDATE, or DELETE). They are often used to break down complex queries into simpler, logical steps, making them much more readable and maintainable than deeply nested subqueries.

      Application Scenario:

      • Complex financial reports requiring multiple calculation steps.
      • Finding customers matching several criteria defined in separate logical steps.
      • Refactoring complex queries involving multiple joins and aggregations.

      Syntax:

      SQL

      WITH cte_name1 AS (
          -- Query definition for the first CTE
          SELECT column1, column2 FROM tableA WHERE ...
      ), -- Comma separates multiple CTEs
      cte_name2 AS (
          -- Query definition for the second CTE (can reference cte_name1)
          SELECT columnA, columnB FROM tableB JOIN cte_name1 ON ...
      ) -- No comma before the final statement
      -- Main query that can reference any defined CTE
      SELECT
          c1.column1, c2.columnB
      FROM
          cte_name1 AS c1
      JOIN
          cte_name2 AS c2 ON c1.column2 = c2.columnA
      WHERE ...;
      

      Key Points:

      • Defined using the WITH keyword at the beginning of the statement.
      • Acts like a temporary view, existing only for the duration of the query.
      • Improves readability by separating logical units.
      • Can reference other CTEs defined earlier within the same WITH clause.

      Example 1: Rewrite the “Categories with > 5 products” query using a CTE (Compare this to the Derived Table example in Lesson 5.1)

      SQL

      WITH CategoryProductCounts AS ( -- Define the CTE to calculate counts
          SELECT category_id, COUNT(*) AS num_products
          FROM products
          GROUP BY category_id
      )
      -- Main query uses the CTE
      SELECT
          c.category_name,
          cpc.num_products
      FROM
          CategoryProductCounts AS cpc
      INNER JOIN
          categories AS c ON cpc.category_id = c.category_id
      WHERE
          cpc.num_products > 5;
      

      Many find this structure easier to follow than the derived table version.

      Example 2: Multi-step analysis – Find total spending for users who registered in 2025

      SQL

      WITH RecentUsers AS ( -- CTE 1: Find users registered in or after 2025
          SELECT user_id
          FROM users
          WHERE EXTRACT(YEAR FROM registration_date) >= 2025 -- Syntax may vary
      ),
      UserSpending AS ( -- CTE 2: Calculate total spending per user (references RecentUsers)
          SELECT o.user_id, SUM(o.total_amount) AS total_spent
          FROM orders AS o
          WHERE o.user_id IN (SELECT user_id FROM RecentUsers) -- Filter for recent users
          GROUP BY o.user_id
      )
      -- Final Query: Select user details and their spending
      SELECT
          u.user_id,
          u.email,
          us.total_spent
      FROM
          users AS u
      JOIN
          UserSpending AS us ON u.user_id = us.user_id
      ORDER BY
          us.total_spent DESC;
      

      CTEs are a valuable tool for organizing complex SQL logic.


      Lesson 5.3: (Optional but Recommended) Window Functions

      Window functions perform calculations across a set of rows related to the current row – this set is called the “window”. Unlike aggregate functions with GROUP BY, window functions do not collapse the rows; they return a value for each row based on the window.

      Application Scenario:

      • Ranking items within categories (e.g., top-selling products per category).
      • Calculating running totals or moving averages.
      • Comparing a row’s value to an aggregate calculated over a related set of rows (e.g., comparing an order’s amount to the average order amount for that specific customer).

      Syntax:

      function_name() OVER ( [PARTITION BY partition_column(s)] [ORDER BY order_column(s)] [frame_clause] )

      • function_name(): The window function itself (e.g., RANK(), SUM(), AVG()).
      • OVER (...): Keyword indicating it’s a window function.
      • PARTITION BY partition_column(s): (Optional) Divides the rows into partitions (groups). The function is applied independently to each partition. If omitted, the entire result set is one partition.
      • ORDER BY order_column(s): (Optional, but required/important for many functions) Specifies the order of rows within each partition. Crucial for ranking functions (RANK, DENSE_RANK, ROW_NUMBER) and functions sensitive to order (LAG, LEAD, running totals).
      • frame_clause: (Optional, more advanced) Defines the subset of rows within the partition relative to the current row (e.g., ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING).

      Common Window Functions:

      • Ranking: RANK(), DENSE_RANK(), ROW_NUMBER()
      • Analytic: LAG() (access previous row), LEAD() (access next row)
      • Aggregate as Window: SUM(), AVG(), COUNT(), MAX(), MIN() used with the OVER() clause.

      Example 1: Rank products by price within each category (highest price gets rank 1)

      SQL

      SELECT
          product_name,
          category_id,
          price,
          RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS price_rank_in_category
      FROM products;
      

      Result: Each product row now includes its price rank within its category.

      Example 2: Calculate the running total of order amounts for each user

      SQL

      SELECT
          order_id,
          user_id,
          order_date,
          total_amount,
          SUM(total_amount) OVER (PARTITION BY user_id ORDER BY order_date ASC) AS running_user_total
      FROM orders;
      

      Result: Each order row includes the cumulative sum of total_amount for that user up to that order’s date.

      Example 3: Show each order amount next to the average order amount for that specific user

      SQL

      SELECT
          order_id,
          user_id,
          total_amount,
          AVG(total_amount) OVER (PARTITION BY user_id) AS avg_order_amount_for_user
      FROM orders;
      

      Result: Each order row includes the overall average total_amount calculated across all orders placed by that same user.

      Window functions are incredibly powerful for analysis but can take some practice to master. They offer insights that are difficult or inefficient to obtain using only standard joins and group bys.


      Lesson 5.4: Working with Dates and Times

      Applications constantly deal with dates and times: timestamps for events, user registration dates, order dates, date ranges for reports, etc. SQL provides functions to manipulate and query this temporal data.

      Application Scenario:

      • Fetching records created within the last 24 hours.
      • Grouping sales data by month or year.
      • Calculating the age of user accounts.
      • Displaying timestamps in a user-friendly format (though often done in application code).

      Common Operations and Functions (Syntax often varies between PostgreSQL, MySQL, SQLite, etc. – Always check your specific DB documentation!):

      1. Getting the Current Date/Time:
        • CURRENT_TIMESTAMP (Standard SQL)
        • NOW() (Common in PostgreSQL, MySQL)
        • GETDATE() (SQL Server)
        • CURRENT_DATE, CURRENT_TIME
      2. Extracting Parts of a Date/Time:
        • EXTRACT(part FROM datetime_column) (Standard SQL - part can be YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, etc.)DATE_PART('part', datetime_column) (PostgreSQL)YEAR(date_col), MONTH(date_col), DAY(date_col) (MySQL, others)
        Example: Group orders by year and month SQLSELECT EXTRACT(YEAR FROM order_date) AS order_year, EXTRACT(MONTH FROM order_date) AS order_month, COUNT(*) AS num_orders, SUM(total_amount) AS monthly_revenue FROM orders GROUP BY order_year, order_month ORDER BY order_year, order_month;
      3. Date/Time Arithmetic: Adding or subtracting intervals.
        • datetime_col + INTERVAL ‘1 day’datetime_col - INTERVAL '3 month' (Standard SQL interval syntax)DATE_ADD(date, INTERVAL value unit), DATE_SUB(...) (MySQL)DATEDIFF(unit, start_date, end_date) (Calculates difference – unit/syntax varies)AGE(timestamp1, timestamp2) (PostgreSQL – calculates difference nicely)
        Example: Find orders placed in the last 30 days SQLSELECT order_id, order_date, total_amount FROM orders WHERE order_date >= CURRENT_DATE – INTERVAL ’30 day’; — Adjust syntax if needed
      4. Formatting (Often done in application layer):
        • TO_CHAR(datetime_col, ‘YYYY-MM-DD HH24:MI’) (PostgreSQL, Oracle)
        • DATE_FORMAT(datetime_col, '%Y-%m-%d %H:%i') (MySQL)
      5. Casting/Converting: Changing between date/time/timestamp types.
        • CAST(timestamp_col AS DATE) (Standard SQL)
        • timestamp_col::DATE (PostgreSQL shortcut)

      Time Zones: Handling time zones is complex! Databases often have specific data types (TIMESTAMP WITH TIME ZONE or TIMESTAMPTZ) and functions (AT TIME ZONE) to manage them. Often, a convention is chosen (like storing everything in UTC) and conversions are handled in the application layer.

      Working effectively with dates and times requires understanding the specific functions and syntax available in your chosen database system.


      Practice Suggestion for Module 5:

      Instead of a formal Mini-Project, try applying these advanced techniques to previous problems:

      1. Can you rewrite some of the complex JOIN queries from Mini-Project 2 or 3 using CTEs for better readability?
      2. Can you use subqueries to find users who ordered the most expensive product?
      3. Try applying window functions: Calculate the rank of users based on their total spending. Show each order alongside the date of the previous order placed by the same user (using LAG()).

      This module introduced powerful tools like Subqueries, CTEs, Window Functions, and Date/Time manipulation. While potentially more complex than the basics, they unlock sophisticated querying and analytical capabilities essential for many modern applications. Next, we’ll touch upon database design principles and performance considerations.