Capstone Project Module – Analytics, Science and Life

Capstone Project Module: Applying Your SQL Skills

Congratulations on making it through the core modules! You’ve learned how to query databases, modify data, connect related information, aggregate results, design table structures, and understand key real-world considerations like security and performance.

Now it’s time to put it all together. The Capstone Project module offers several scenarios where you can apply your SQL knowledge to solve practical problems, similar to tasks you might encounter in application development or data analysis.

Purpose: To synthesize your learning and gain confidence by building something tangible using SQL.

Instructions:

Choose one of the projects below that interests you the most.
Ensure you have your local database environment (SQLite, PostgreSQL, MySQL with a client tool) set up, as established in Module 0.
If the project requires a specific schema, use the provided CREATE TABLE statements (or the schema from the E-commerce database we’ve been using). Load any provided sample data.
Tackle the listed tasks by writing the required SQL queries.
Test your queries thoroughly in your SQL client to ensure they produce the expected results. Analyze the output.
Think about why the queries work and how they relate to the application scenario described.

Project Option 1: E-commerce Analytics Dashboard

Context: You are tasked with providing the SQL queries needed to populate key metrics for an internal analytics dashboard for the E-commerce store we’ve used throughout the course.

Objective: Write efficient SQL queries to calculate various performance indicators based on the existing users, products, categories, orders, and order_items tables. (Use the schema and sample data provided in Module 0).

Tasks: Write SQL queries to find the following:

Total Revenue: Calculate the sum of total_amount from all orders.
Revenue Last Month: Calculate the total revenue from orders placed in the previous calendar month. (Requires date functions based on CURRENT_DATE – syntax might vary, e.g., finding the first and last day of the previous month).
New Users This Month: Count how many users registered in the current calendar month.
Average Order Value (AOV): Calculate the average total_amount across all orders.
Best-Selling Products (by Quantity): List the product_id, product_name, and total quantity sold for the top 5 products, ordered by total quantity descending. (Requires joining order_items and products, using SUM, GROUP BY, ORDER BY, LIMIT).
Top Spending Customers: List the user_id, email, and the total sum of total_amount for the top 5 customers, ordered by their total spending descending. (Requires joining users and orders, using SUM, GROUP BY, ORDER BY, LIMIT).
Sales Revenue Per Category: List the category_name and the total revenue (SUM of order_items.quantity * order_items.price_per_unit) generated from products within each category. (Requires joining multiple tables: categories, products, order_items).

Project Option 2: Social Media Feed Generation

Context: You are building the backend for a simple social media platform. A core feature is generating a user’s feed, showing recent posts from people they follow.

Objective: Define a basic schema (or use the one provided) and write the SQL queries needed to support following users and generating a personalized feed.

Provided Schema: (Run these CREATE TABLE statements in your database)

SQL

CREATE TABLE users (
    user_id INTEGER PRIMARY KEY AUTOINCREMENT,
    username VARCHAR(50) NOT NULL UNIQUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE posts (
    post_id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id INTEGER NOT NULL, -- Author of the post
    content TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE
);

-- Junction table for follower relationship
CREATE TABLE followers (
    follower_id INTEGER NOT NULL, -- The user who is following
    following_id INTEGER NOT NULL, -- The user who is being followed
    PRIMARY KEY (follower_id, following_id), -- Ensures a user can't follow the same person twice
    FOREIGN KEY (follower_id) REFERENCES users(user_id) ON DELETE CASCADE,
    FOREIGN KEY (following_id) REFERENCES users(user_id) ON DELETE CASCADE
);

-- Optional: Add a likes table
CREATE TABLE likes (
    like_id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id INTEGER NOT NULL, -- User who liked the post
    post_id INTEGER NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (user_id, post_id), -- User can only like a post once
    FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE,
    FOREIGN KEY (post_id) REFERENCES posts(post_id) ON DELETE CASCADE
);

Sample Data Insertion (Run these after creating tables):

SQL

-- Insert Users
INSERT INTO users (username) VALUES ('alice'), ('bob'), ('charlie'), ('diana');

-- Insert Posts (find user_ids first if needed, assume 1=alice, 2=bob, 3=charlie, 4=diana)
INSERT INTO posts (user_id, content) VALUES
(1, 'Hello world! My first post.'),
(2, 'Thinking about SQL today.'),
(1, 'Just joined this cool platform.'),
(3, 'Coding challenges are fun.'),
(4, 'Enjoying the weekend!'),
(2, 'Inner joins vs Outer joins...'),
(1, 'My third post here.');


-- Insert Follow relationships (Alice follows Bob, Charlie; Bob follows Alice; Charlie follows Alice)
INSERT INTO followers (follower_id, following_id) VALUES
(1, 2), -- Alice follows Bob
(1, 3), -- Alice follows Charlie
(2, 1), -- Bob follows Alice
(3, 1); -- Charlie follows Alice


-- Optional: Insert Likes (Alice likes Bob's post 2; Bob likes Alice's post 1)
-- Find post_ids first if needed. Assume post_id 2 is Bob's SQL post, post_id 1 is Alice's first post.
INSERT INTO likes (user_id, post_id) VALUES (1, 2), (2, 1);

Tasks: Write SQL queries to:

Find Following: Find the username of everyone that a specific user is following (e.g., find who user_id = 1 is following). Requires JOIN.
Find Followers: Find the username of everyone who is following a specific user (e.g., find who follows user_id = 1). Requires JOIN.
Generate Feed: This is the core task. For a specific user (e.g., user_id = 1), retrieve the content and created_at timestamp for all posts made by users that this specific user is following. The feed should be ordered by the post creation date, newest first, and limited to a reasonable number (e.g., 20 posts). (Requires joining posts and followers, filtering by follower_id, ORDER BY, LIMIT).
(Optional) Count Likes: For a specific post (e.g., post_id = 2), count how many likes it has received. (Requires COUNT on likes table with WHERE).
(Optional) Posts Liked by User: Find the content of all posts liked by a specific user (e.g., user_id = 1). (Requires joining likes and posts).

Project Option 3: Blog/CMS Backend Queries

Context: You are working on a Content Management System (CMS) or blog platform. You need to write the SQL queries that would power common data retrieval operations for displaying posts, authors, and categories.

Objective: Define a basic schema (or use the one provided) for a blog and write SQL queries to fetch data needed for typical blog features.

Provided Schema:

SQL

CREATE TABLE authors (
    author_id INTEGER PRIMARY KEY AUTOINCREMENT,
    author_name VARCHAR(150) NOT NULL,
    bio TEXT
);

CREATE TABLE categories (
    category_id INTEGER PRIMARY KEY AUTOINCREMENT,
    category_name VARCHAR(100) NOT NULL UNIQUE
);

CREATE TABLE posts (
    post_id INTEGER PRIMARY KEY AUTOINCREMENT,
    author_id INTEGER NOT NULL,
    title VARCHAR(255) NOT NULL,
    body TEXT NOT NULL,
    published_at TIMESTAMP, -- NULL if draft, set when published
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (author_id) REFERENCES authors(author_id) ON DELETE CASCADE
);

-- Junction table for posts belonging to multiple categories
CREATE TABLE post_categories (
    post_id INTEGER NOT NULL,
    category_id INTEGER NOT NULL,
    PRIMARY KEY (post_id, category_id),
    FOREIGN KEY (post_id) REFERENCES posts(post_id) ON DELETE CASCADE,
    FOREIGN KEY (category_id) REFERENCES categories(category_id) ON DELETE CASCADE
);

Sample Data Insertion (Run these after creating tables):

SQL

-- Insert Authors
INSERT INTO authors (author_name, bio) VALUES
('Jane Doe', 'Technology blogger and enthusiast.'),
('John Smith', 'Travel writer and photographer.');

-- Insert Categories
INSERT INTO categories (category_name) VALUES ('Technology'), ('Travel'), ('SQL'), ('Web Development');

-- Insert Posts (find author_ids/category_ids if needed, assume 1=Jane, 2=John; 1=Tech, 2=Travel, 3=SQL, 4=WebDev)
-- Post 1: Tech post by Jane
INSERT INTO posts (author_id, title, body, published_at) VALUES
(1, 'The Future of AI', 'AI is rapidly evolving...', '2025-03-15 10:00:00');
INSERT INTO post_categories (post_id, category_id) VALUES (1, 1); -- Tech

-- Post 2: Travel post by John
INSERT INTO posts (author_id, title, body, published_at) VALUES
(2, 'Exploring the Alps', 'Hiking through breathtaking scenery...', '2025-03-20 14:30:00');
INSERT INTO post_categories (post_id, category_id) VALUES (2, 2); -- Travel

-- Post 3: SQL post by Jane
INSERT INTO posts (author_id, title, body, published_at) VALUES
(1, 'Mastering SQL Joins', 'Inner, Left, Right, Full Outer...', '2025-04-01 09:00:00');
INSERT INTO post_categories (post_id, category_id) VALUES (3, 1), (3, 3); -- Tech, SQL

-- Post 4: Draft post by Jane
INSERT INTO posts (author_id, title, body, published_at) VALUES
(1, 'Web Dev Trends 2025', 'Upcoming trends...', NULL); -- Draft
INSERT INTO post_categories (post_id, category_id) VALUES (4, 1), (4, 4); -- Tech, Web Development

Tasks: Write SQL queries to:

Get Published Posts: Select the title, body, published_at, and the author_name for all posts that have a non-NULL published_at date, ordered by published_at descending (most recent first). (Requires JOIN, WHERE, ORDER BY).
Get Posts by Category: Select the title and published_at date for all published posts belonging to a specific category name (e.g., ‘Technology’). (Requires joining posts, post_categories, categories, using WHERE on category_name and published_at).
Get Posts by Author: Select the title and published_at date for all published posts written by a specific author name (e.g., ‘Jane Doe’). (Requires joining posts and authors, using WHERE on author_name and published_at).
Get Post with Author and Categories: For a single specific post (e.g., post_id = 3), retrieve its title, body, the author_name, and a list/aggregation of all category_names it belongs to. (The category list part might require GROUP_CONCAT (MySQL/SQLite) or STRING_AGG (PostgreSQL) or simply returning multiple rows if aggregation is too complex).
(Optional) Count Posts per Author: List each author_name and the total number of published posts they have written. (Requires JOIN, WHERE, GROUP BY, COUNT).

Checking Your Work & Next Steps:

Run each query in your SQL client. Does the output look correct based on the task description and sample data?
Are you using the appropriate clauses (WHERE, JOIN, GROUP BY, HAVING, ORDER BY, LIMIT)?
Are your joins connecting tables on the correct keys?
Could your query be written more efficiently or readably (e.g., using aliases, CTEs for complex parts)?

Completing one of these projects demonstrates a solid practical understanding of SQL. This is a significant achievement! From here, you can continue exploring the advanced topics mentioned in Module 7 or start applying your SQL skills in your own projects or work. Good luck!