Improve the database FilmDB and Course Materials - DSH-Project-II

Improve the database FilmDB and Course Materials - DSH-Project-IIPublishmentTasks1. Database Enhancement2. Lecture Notes Review3. Course Content ProposalPrefaceAcknowledgementEnvironment1. Database Enhancement1.1 Data Sources1.1.1 Kaggle (Daily-updated TMDB dataset)1.1.2 TMDB Official API1.2 Workflow1.3 Design1.3.1 Preserve the original tables, add new tables and columns1.3.2 New high-utility tables1.4 Compatibility Strategy for PostgreSQL and openGauss1.4.1 Approach1.4.2 Avoid DB-specific UPSERT syntax by using staging + merge logic1.5 Pipeline1.5.1 One-time setup1.5.2 Daily baseline import from Kaggle1.5.3 Incremental catch-up using TMDB API1.5.4 Data validation and integrity checks1.6 Implementation1.6.1 Environment Preparation1.6.2 Database Initialization1.6.3 Daily Update (Kaggle Baseline + TMDB Delta + Validation)1.6.4 Export Deliverables (Final SQL Dumps)1.6.5 Troubleshooting1.7 Results1.7.1 init_pg.sh1.7.2 init_og.sh1.7.3 daily_update_pg.sh1.7.4 daily_update_og.sh1.7.5 validate_all.sh1.7.6 export_pg.sh1.7.7 export_og.sh1.7.8 Ultimate Files2. Lecture Notes Review2.1 CS213 Lecture-01 CS213-DB-01.pdf2.1.1 Slides Updates2.1.1.1 Page 32.1.1.2 Page 142.1.1.3 Pages 15–162.1.1.4 Page 252.1.1.5 Page 262.1.1.6 Pages 29–362.1.1.7 Pages 38–402.1.1.8 Page 412.1.1.9 Pages 61–642.1.1.10 Page 742.1.1.11 Pages 76–832.1.2 openGauss insertion2.1.3 References2.2 CS213 Lecture-02 CS213-DB-02.pdf2.2.1 Slides Updates2.2.1.1 Page 92.2.1.2 Page 172.2.1.3 Pages 24–252.2.1.4 Page 282.2.1.5 Pages 30–312.2.1.6 Page 322.2.1.7 Page 362.2.1.8 Page 462.2.1.9 Page 492.2.1.10 Pages 50–532.2.1.11 Page 602.2.1.12 Page 632.2.1.13 Pages 65–662.2.2 openGauss insertion2.2.3 References2.3 CS213 Lecture-03 CS213-DB-03.pdf2.3.1 Slides Updates2.3.1.1 Page 92.3.1.2 Page 112.3.1.3 Page 132.3.1.4 Page 142.3.1.5 Page 152.3.1.6 Page 232.3.1.7 Page 262.3.1.8 Page 272.3.1.9 Pages 29–312.3.1.10 Pages 33–362.3.1.11 Page 552.3.1.12 Page 572.3.1.13 Page 582.3.1.14 Page 652.3.1.15 Pages 72–73 and 78–792.3.2 openGauss insertion2.3.3 References2.4 CS213 Lecture-04 CS213-DB-04.pdf2.4.1 Slides Updates2.4.1.1 Page 32.4.1.2 Page 72.4.1.3 Pages 14–152.4.1.4 Page 162.4.1.5 Page 192.4.1.6 Page 212.4.1.7 Page 222.4.1.8 Pages 23–272.4.1.9 Pages 33 and 672.4.1.10 Page 452.4.1.11 Page 432.4.1.12 Page 492.4.1.13 Page 532.4.1.14 Pages 68–692.4.2 openGauss insertion2.4.3 References2.5 CS213 Lecture-05 CS213-DB-05.pdf2.5.1 Slides Updates2.5.1.1 Page 102.5.1.2 Page 162.5.1.3 Page 182.5.1.4 Pages 21–272.5.1.5 Page 292.5.1.6 Page 322.5.1.7 Pages 35–362.5.1.8 Pages 38–412.5.1.9 Page 472.5.1.10 Pages 44 and 502.5.1.11 Page 662.5.1.12 Pages 76–792.5.1.13 Page 812.5.1.14 Page 852.5.2 openGauss insertion2.5.3 References2.6 CS213 Lecture-06 CS213-DB-06.pdf2.6.1 Slides Updates2.6.1.1 Pages 3–92.6.1.2 Page 112.6.1.3 Page 142.6.1.4 Page 152.6.1.5 Pages 17–182.6.1.6 Pages 22–262.6.1.7 Pages 33–352.6.1.8 Page 372.6.1.9 Pages 52–532.6.1.10 Pages 60 and 672.6.2 openGauss insertion2.6.3 References2.7 CS213 Lecture-07 CS213-DB-07.pdf2.7.1 Slides Updates2.7.1.1 Pages 8–122.7.1.2 Page 92.7.1.3 Page 202.7.1.4 Page 212.7.1.5 Page 232.7.1.6 Page 262.7.1.7 Page 282.7.1.8 Page 292.7.1.9 Page 372.7.1.10 Page 382.7.1.11 Pages 54–592.7.1.12 Page 682.7.1.13 Pages 70–73, 80–822.7.2 openGauss insertion2.7.3 References2.8 CS213 Lecture-08 CS213-DB-08.pdf2.8.1 Slides Updates2.8.1.1 Page 32.8.1.2 Page 42.8.1.3 Page 82.8.1.4 Pages 10–122.8.1.5 Pages 15–162.8.1.6 Page 242.8.1.7 Pages 25–262.8.1.8 Pages 27–292.8.1.9 Page 332.8.1.10 Page 352.8.1.11 Pages 36–392.8.1.12 Page 412.8.1.13 Pages 44–472.8.1.14 Pages 49–552.8.1.15 Page 562.8.2 openGauss insertion2.8.3 References2.9 CS213 Lecture-09 CS213-DB-09.pdf2.9.1 Slides Updates2.9.1.1 Page 12.9.1.2 Page 32.9.1.3 Page 62.9.1.4 Page 92.9.1.5 Pages 14–162.9.1.6 Page 172.9.1.7 Pages 19–252.9.1.8 Pages 38–402.9.1.9 Page 432.9.1.10 Page 442.9.1.11 Page 472.9.1.12 Pages 53–592.9.1.13 Pages 63–672.9.2 openGauss insertion2.9.3 References2.10 CS213 Lecture-10 CS213-DB-10.pdf2.10.1 Slides Updates2.10.1.1 Page 62.10.1.2 Pages 11–152.10.1.3 Page 122.10.1.4 Pages 18–202.10.1.5 Page 212.10.1.6 Pages 32–332.10.1.7 Page 342.10.1.8 Pages 44–462.10.1.9 Page 512.10.1.10 Page 802.10.1.11 Pages 96–1022.10.2 openGauss insertion2.10.3 References2.11 CS213 Lecture-11 CS213-DB-11.pdf2.11.1 Slides Updates2.11.1.1 Page 52.11.1.2 Page 102.11.1.3 Page 142.11.1.4 Pages 18–192.11.1.5 Page 232.11.1.6 Page 252.11.1.7 Page 432.11.1.8 Page 482.11.1.9 Page 512.11.1.10 Pages 52–552.11.1.11 Page 542.11.1.12 Page 582.11.1.13 Pages 59–612.11.2 openGauss insertion2.11.2.1 Insert after Page 5 (View syntax)2.11.2.2 Insert after Page 25 (Computed expressions and indexes)2.11.2.3 Insert after Pages 35–36 (Views-on-views performance)2.11.2.4 Insert after Pages 43–49 (Privileges)2.11.2.5 Insert after Page 54 (Modifying a view)2.11.3 References2.12 CS213 Lecture-12 CS213-DB-12.pdf2.12.1 Slides Updates2.12.1.1 Page 42.12.1.2 Page 62.12.1.3 Pages 6–82.12.1.4 Pages 10–112.12.1.5 Pages 12–132.12.1.6 Pages 16–182.12.1.7 Pages 19–222.12.1.8 Page 252.12.1.9 Pages 28–292.12.1.10 Page 302.12.1.11 Page 322.12.1.12 Page 392.12.2 openGauss insertion2.12.3 References2.13 CS213 Lecture-13 CS213-DB-13.pdf2.13.1 Slides Updates2.13.1.1 Page 32.13.1.2 Page 112.13.1.3 Page 142.13.1.4 Pages 15–162.13.1.5 Pages 22–262.13.1.6 Pages 30–312.13.1.7 Page 322.13.1.8 Page 362.13.1.9 Page 382.13.1.10 Page 432.13.1.11 Pages 46–532.13.2 openGauss insertion2.13.3 References3. Course Content Proposal3.1 Distributed and Cloud-Native Databases Track3.1.1 Replication, Read/Write Splitting, and Consistency Expectations3.1.2 Distributed Transactions: 2PC, Failure Modes, and Saga Alternatives3.1.3 Elasticity and Cost–Performance Co-Optimization in Cloud Deployments3.1.4 HTAP and Real-Time Analytics: OLTP Meets OLAP3.2 AI-Era Databases and Retrieval Track3.2.1 Production Text Search: From LIKE to Full-Text Search3.2.2 Vector Search and Hybrid Retrieval: Concepts and Course-Level Practice3.2.3 AI-Assisted Database Engineering Workflow: Verified, Reproducible, and Safe3.2.4 Observability and Performance Regression: Evidence-Driven Debugging3.2.5 Mini-Project Option: Search Feature for the Course Database3.3 Privacy, Compliance, and Data Ethics Track3.3.1 Data Classification and Minimization3.3.2 Access Control and Auditability3.3.3 De-identification, Masking, and Encryption Basics3.3.4 Data Lifecycle: Retention, Deletion, and “Right to be Forgotten”3.4 Database Low-Level Operating Principles Track3.4.1 On-Disk Storage: Pages, Tuples, and Slotted Layout3.4.2 Buffer Manager and I/O: Why Memory Settings Matter3.4.3 WAL and Crash Recovery: Durability in Practice3.4.4 Concurrency Control: MVCC, Locks, and Vacuum/GC3.4.5 Query Execution Engine: Operators and Plan-to-Performance Mapping3.4.6 Capstone Mini-Lab: “From Bytes to Behavior”3.5 SummarySummary

Publishment

Github

Tsawke's Blog

Luogu Blog

Tasks

1. Database Enhancement

2. Lecture Notes Review

3. Course Content Proposal

Preface

The provided filmdb.sql includes directives like PRAGMA that are not compatible with PostgreSQL/openGauss, and the country codes in it are not strictly ISO-3166, and the most significant, it's not latest (only until 2019). Overall, this report aims to refresh the database and keep it latest.

Acknowledgement

I acknowledge the use of multiple AI tools during the preparation of this report (including ChatGPT 5.2 Thinking, Gemini 3 Pro, Gemini 3 Pro (High, Antigravity), Claude Opus 4.5 (Thinking, Antigravity), GPT-5.1-codex-max (cursor), e.t.c.) to support drafting, code assistance, and language refinement. All AI-generated content was subsequently manually reviewed, reorganized, edited, and cleaned to ensure accuracy, originality, and clarity. Where applicable, the outputs were further validated through practical experiments to confirm correctness and executability.

To ensure transparency and reproducibility, the core code has been fully disclosed, and execution screenshots are included in the report, so that all presented content remains meaningful, concise, and valuable to the reader. In addition, as the data is used solely for research and educational purposes, the data usage and attribution comply with the relevant requirements and guidelines of sources such as Kaggle and TMDB.

Environment

Tips: The Tutorial for Database Setup.pdf is provided for installing databases in this environment.

1. Database Enhancement

1.1 Data Sources

This project requires “as new as possible”, therefore ideally updated to the day the program runs, and extensible to run daily in the future. This report provides a union plan to solve it.

1.1.1 Kaggle (Daily-updated TMDB dataset)

image-20251213022835374

Use a daily refreshed Kaggle dataset based on TMDB as the main baseline source.

1.1.2 TMDB Official API

Kaggle is updated daily, but the program must reflect “today” whenever it runs. Therefore, TMDb official API is used as an incremental “delta” source after Kaggle import.

1.2 Workflow

  1. Download Kaggle daily snapshot (baseline).

  2. Load into staging tables.

  3. Merge into FilmDB core tables (movies, countries, and any new enhancement tables).

  4. Read last sync timestamp from a local metadata table.

  5. Use TMDb Changes API to fetch changed IDs from last sync date to today.

  6. Pull details for each changed movie ID; apply inserts/updates in a controlled transaction.

  7. Validate constraints and export SQL.

1.3 Design

1.3.1 Preserve the original tables, add new tables and columns

Keep existing core tables (movies, countries, people, credits, e.t.c.). Add new columns and new tables for external identifiers and richer metadata.

1.3.2 New high-utility tables

Enhance movies with new columns (nullable):

Add new tables:

  1. movie_update_log(sync_id, run_ts, source, dataset_version, start_date, end_date, status, notes) to record each update run.

  2. country_code_alias(alias_code, canonical_code) to map ISO codes into FilmDB’s internal codes (e.g., map es to sp to avoid breaking the existing countries table).

  3. staging_movies_* tables (temporary or persistent) to allow safe bulk imports and deterministic merges.

Principles:

1.4 Compatibility Strategy for PostgreSQL and openGauss

1.4.1 Approach

Produce two SQL outputs for maximum safety:

Even though openGauss can run in PG compatibility mode, differences still exist (e.g. notably upsert syntax).

Generating two scripts avoids unexpected failures in real environments.

1.4.2 Avoid DB-specific UPSERT syntax by using staging + merge logic

Instead of relying on PostgreSQL ON CONFLICT or openGauss ON DUPLICATE KEY UPDATE, implement portable merge logic:

  1. Insert Kaggle rows into staging tables without strict constraints.

  2. Normalize and deduplicate in SQL queries that use NOT EXISTS joins.

  3. Update existing rows with deterministic rules (only update if new data is non-null and different). This approach works consistently across PostgreSQL and openGauss with minimal dialect differences.

1.5 Pipeline

This pipeline is designed to be reproducible and runnable on any day. Kaggle provides a daily-updated TMDb baseline snapshot, and the TMDb official API is used to catch up to “today” by applying incremental changes since the last sync date.

1.5.1 One-time setup

  1. Configure Kaggle API:

    • Save Kaggle token to ~/.kaggle/kaggle.json.

    • Typical permissions requirement: chmod 600 ~/.kaggle/kaggle.json.

    • Download the dataset snapshot with Kaggle CLI:

      • kaggle datasets download -d alanvourch/tmdb-movies-daily-updates -p ~/workspace/LargeFiles/ --unzip

  2. Confirm baseline CSV location:

    • The extracted Kaggle CSV is already available at ~/workspace/LargeFiles/TMDB_all_movies.csv.

  3. Configure TMDB API:

    • Obtain a TMDb API credential (API Read Access Token / Bearer token).

    • Export it for the pipeline runner:

      • export TMDB_BEARER_TOKEN="YOUR_TOKEN_HERE".

      • Here's my API Read Access Token, it's necessary to apply one on https://www.themoviedb.org/settings/api.

        image-20251213235027253

  4. Initialize DB schema enhancement (run once):

    • Apply schema extensions (new columns on movies, new tables pipeline_state, movie_update_log, country_code_alias, and staging tables).

    • This is executed by running 01_schema_enhancement.sql.

1.5.2 Daily baseline import from Kaggle

  1. Normalize Kaggle CSV to a stable staging format

    • Motivation: Kaggle datasets often contain malformed quoting that breaks standard parsers. The updated script uses Python's robust csv standard library to pre-parse data before vectorizing with Pandas, recovering ~99% of valid records (from <10k to >670k rows).

    • Input: ~/workspace/LargeFiles/TMDB_all_movies.csv

    • Output: ~/workspace/LargeFiles/tmdb_kaggle_normalized.csv

  2. Load Kaggle normalized CSV into staging

    • Truncate staging and load the normalized CSV into staging_movies_kaggle.

    • Use 02_load_kaggle_staging.sql.

    • PostgreSQL: Uses client-side \copy for simplicity.

    • openGauss: Uses robust container-file-copy (docker cp) plus server-side COPY to avoid shell buffer limits and encoding issues unique to gsql.

  3. Merge Kaggle staging into FilmDB core tables

    • Normalize and map country codes (e.g., ISO es mapped to FilmDB sp) via country_code_alias.

      • Insert new movies and update existing ones with deterministic rules:

        • Use tmdb_id as the primary identifier when available.

        • If tmdb_id is NULL, fall back to (title, country, year_released) as the legacy identity key.

    • Handle legacy-key collisions deterministically (e.g., if a new row’s legacy key matches an existing movie with a different tmdb_id, apply a traceable title suffix within length limits).

    • This is executed by 03_merge_kaggle_into_core.sql.

  4. Log baseline completion

    • Insert a success record into movie_update_log with source='kaggle' and the dataset version string if recorded.

    • This is included in 03_merge_kaggle_into_core.sql.

1.5.3 Incremental catch-up using TMDB API

  1. Determine the incremental window

    • Read the last successful sync date from pipeline_state.tmdb_last_sync.

    • Use it as start_date; set end_date to the current date (run day).

    • This can be retrieved by running 04_read_last_sync.sql.

  2. Fetch changed movie IDs and write a TMDb delta CSV

    • Use the TMDb “Changes” endpoint to fetch changed movie IDs between start_date and end_date (chunk into <=14-day windows if needed).

    • Fetch details for each changed ID, normalize fields into a stable delta CSV.

    • Output: ~/workspace/LargeFiles/tmdb_delta_normalized.csv.

  3. Load delta CSV into staging and merge into core

    1. Load tmdb_delta_normalized.csv into staging_movies_tmdb_delta using 05_load_tmdb_delta_staging.sql.

    1. Merge delta staging into core:

      • Insert missing movies by tmdb_id.

      • Update existing movies by tmdb_id using “fill-if-null / improve-if-empty” rules.

      • Update pipeline_state.tmdb_last_sync to the run day for the next one-click incremental update.

      • This is executed by 06_merge_tmdb_delta_into_core.sql.

  4. Log delta completion

    • Insert a success record into movie_update_log with source='tmdb'.

    • This is included in 06_merge_tmdb_delta_into_core.sql.

1.5.4 Data validation and integrity checks

After changes:

  1. Verify referential integrity:

    • Ensure every movies.country exists in countries.country_code.

  2. Verify uniqueness invariants:

    • Ensure no duplicated non-null tmdb_id values exist in movies (enforced by a unique index).

  3. Verify “newest year” signal:

    • Check MAX(year_released) to confirm the database now extends beyond 2019.

  4. Log validation results:

    • Record validation execution in movie_update_log for traceability.

These checks are executed by 07_validate_and_log.sql, orchestrated by the wrapper script validate_all.sh which runs verify on both databases.

1.6 Implementation

To implement the update pipeline more conveniently and make it fully reproducible, we prepared a set of batch scripts (shell scripts) that automate the complete workflow. A specialized wrapper run_with_cpu_limit.sh is provided to enforce CPU affinity (pinning to cores 0-1) to prevent resource contention during intensive parallel data loading.

1.6.1 Environment Preparation

Before running any scripts, ensure the following environment is ready:

1.6.2 Database Initialization

These scripts rebuild the filmdb database from scratch, load the original filmdb.sql, and then apply schema extensions via 01_schema_enhancement.sql. This ensures the database contains the required staging tables, metadata tables, and the extra columns needed for the new pipeline (e.g., tmdb_id).

Note: This step will drop (delete) the existing filmdb database and recreate it. Do not run it if you need to preserve the current database state.

1.6.3 Daily Update (Kaggle Baseline + TMDB Delta + Validation)

These scripts execute the full daily ETL process:

  1. Kaggle baseline import If ~/workspace/LargeFiles/TMDB_all_movies.csv exists, the script will automatically normalize it into a stable schema (fixed columns) and load it into staging tables, then merge into FilmDB core tables.

  2. TMDB incremental sync (delta layer) The script reads the “last sync date” from the database metadata table, calls the TMDB API to fetch changes since that date up to “today”, downloads the latest details for changed movies, loads the delta into staging, and merges it into core tables.

  3. Validation After merges, the script runs integrity checks (e.g., country foreign-key validity and tmdb_id uniqueness) and records run status in the update log.

1.6.4 Export Deliverables (Final SQL Dumps)

After daily updates are complete, export the final deliverable SQL files. This produces two outputs to maximize compatibility and avoid environment-specific failures:

Output file: filmdb_pg.sql

Output file: filmdb_og.sql

1.6.5 Troubleshooting

If the baseline CSV is missing, re-download the Kaggle dataset and confirm it is extracted to the expected location before running the daily update scripts.

1.7 Results

1.7.1 init_pg.sh

image-20251214040407752

1.7.2 init_og.sh

image-20251214040356089

1.7.3 daily_update_pg.sh

image-20251214211145903

1.7.4 daily_update_og.sh

image-20251214205753125

1.7.5 validate_all.sh

image-20251214211213032

1.7.6 export_pg.sh

image-20251214211248850

1.7.7 export_og.sh

image-20251214211447092

1.7.8 Ultimate Files

image-20251214212803785


2. Lecture Notes Review

2.1 CS213 Lecture-01 CS213-DB-01.pdf

2.1.1 Slides Updates

2.1.1.1 Page 3

Replace with

2.1.1.2 Page 14

Change

2.1.1.3 Pages 15–16

Fix

2.1.1.4 Page 25

Replace with

2.1.1.5 Page 26

Replace with

2.1.1.6 Pages 29–36

Add one slide after Page 36

2.1.1.7 Pages 38–40

Replace with two concepts

2.1.1.8 Page 41

Replace with

2.1.1.9 Pages 61–64

Change

2.1.1.10 Page 74

Replace definitions with

Link: OpenTextBC normalization

2.1.1.11 Pages 76–83

Add one slide after Page 83

2.1.2 openGauss insertion

  1. Insert after Page 11 (DBMS introduction) Reason: The lecture introduces “DBMS” but gives no modern concrete system example; openGauss provides a locally relevant, PostgreSQL-like relational DBMS for later demos and labs. Link: openGauss overview

  2. Insert after Page 16 (history timeline ends at 2010s) Reason: Timeline stops at 2010s; add 2020s trends and introduce openGauss (2020–) to keep the “history” section current. Link: Huawei announcement

  3. Insert after Page 25 (row order) Reason: Students often confuse relational theory with SQL behavior; add a short openGauss query demo to reinforce “ORDER BY is required for guaranteed order”. Demo snippet for the new slide:

    Link: openGauss ORDER BY

    Note: openGauss may run in different compatibility modes; in PostgreSQL-like mode the behavior matches PostgreSQL, but you should still treat order as unspecified without ORDER BY.

  4. Insert after Page 26 (duplicates) Reason: The slide states duplicates are forbidden, but SQL query results can contain duplicates; add openGauss demo to show DISTINCT and mention constraints (PRIMARY KEY, UNIQUE). Demo snippet for the new slide:

    Link: openGauss DISTINCT

  5. Insert after Page 36 (keys) Reason: Keys are introduced conceptually; add one openGauss DDL slide mapping keys to real constraints, so students see “concept → SQL”. Demo snippet for the new slide:

  6. Insert after Page 83 (ER modeling) Reason: ER is taught, but students need the bridge to SQL; add an openGauss “ER → DDL” example, especially M:N junction tables. Demo snippet for the new slide:

2.1.3 References


2.2 CS213 Lecture-02 CS213-DB-02.pdf

2.2.1 Slides Updates

2.2.1.1 Page 9

Fix

2.2.1.2 Page 17

Add after the “SQL standard exists” point

2.2.1.3 Pages 24–25

Replace with

2.2.1.4 Page 28

Replace with

2.2.1.5 Pages 30–31

Fix

Replace the date-time bullets with

2.2.1.6 Page 32

Replace the “bad CREATE TABLE” example with a clean version

2.2.1.7 Page 36

Add

2.2.1.8 Page 46

Add

2.2.1.9 Page 49

Update

2.2.1.10 Pages 50–53

Add after the foreign key definition

2.2.1.11 Page 60

Replace with

2.2.1.12 Page 63

Add

2.2.1.13 Pages 65–66

Replace with

2.2.2 openGauss insertion

  1. Insert after Page 17 (SQL standard vs dialects) Reason: Students need one concrete dialect target; openGauss is a PostgreSQL-like dialect and fits the lecture’s “dialects” message. Link: openGauss overview

  2. Insert after Pages 24–25 (identifiers and naming) Reason: Quoting and case rules vary by product; add openGauss guidance to avoid teaching “quoted identifiers everywhere” habits. Suggested note: “Prefer lowercase snake_case; avoid quoted names unless required.”

  3. Insert after Page 28 (NULL vs empty string) Reason: Oracle’s empty-string-as-NULL is a common confusion point; add a one-line openGauss contrast so students don’t generalize Oracle behavior.

  4. Insert after Pages 30–31 (date/time types) Reason: The lecture mentions DATETIME vs TIMESTAMP; add an openGauss mapping slide so students know what to use in openGauss.

  5. Insert after Pages 50–53 (constraints and foreign keys) Reason: This is where students decide “do I actually enforce rules?”; add openGauss examples to show CHECK/UNIQUE/FK are real and enforceable. Demo snippet for the new slide:

  6. Insert after Pages 65–66 (INSERT, quoting, dates) Reason: This is where students write lots of literals; add openGauss “safe quoting + ISO date” examples to reduce beginner errors. Demo snippet for the new slide:

2.2.3 References

2.3 CS213 Lecture-03 CS213-DB-03.pdf

2.3.1 Slides Updates

2.3.1.1 Page 9

Replace with

2.3.1.2 Page 11

Fix

Replace the derived-table example with a clean version

2.3.1.3 Page 13

Fix

Add

2.3.1.4 Page 14

Add

2.3.1.5 Page 15

Add

2.3.1.6 Page 23

Add

2.3.1.7 Page 26

Fix

Replace with

2.3.1.8 Page 27

Fix

Replace with

2.3.1.9 Pages 29–31

Replace with

2.3.1.10 Pages 33–36

Replace with

2.3.1.11 Page 55

Replace with

2.3.1.12 Page 57

Fix the concatenation examples

2.3.1.13 Page 58

Replace with

2.3.1.14 Page 65

Fix

2.3.1.15 Pages 72–73 and 78–79

Replace with

2.3.2 openGauss insertion

  1. Insert after Page 9 (string vs identifier quoting) Reason: Students copy quoting habits early; openGauss follows PostgreSQL-style quoting (single quotes for strings, double quotes for identifiers). Link: gsql Tool Reference

  2. Insert after Pages 26–27 (case-insensitive matching, performance) Reason: The lecture warns that upper(title) in WHERE is slow; openGauss provides ILIKE as the explicit solution. Also, openGauss Dolphin compatibility can change LIKE behavior, so teach ILIKE to be unambiguous. Demo snippet:

    Link: Mode Matching Operators Link: Dolphin Character Processing Functions and Operators

  3. Insert after Pages 29–31 (ISO dates + parsing) Reason: Date input defaults vary by DBMS; openGauss students should learn ISO literals and explicit parsing. Demo snippet:

    Link: Date and Time Functions and Operators

  4. Insert after Pages 33–36 (timestamp “whole-day” filtering) Reason: This is a frequent real-world bug; add one openGauss example showing the half-open interval pattern. Demo snippet:

    Link: Date and Time Functions and Operators

  5. Insert after Page 55 (how to inspect tables in openGauss) Reason: Students need an actionable “inspect schema” workflow for the course DBMS. Demo snippet:

    Link: gsql Tool Reference Link: Meta-Command Reference

  6. Insert after Pages 78–79 (CASE for NULL labeling) Reason: Connect CASE to a simple reporting task and reinforce correct NULL handling in openGauss. Demo snippet:

    Link: Expressions

2.3.3 References

2.4 CS213 Lecture-04 CS213-DB-04.pdf

2.4.1 Slides Updates

2.4.1.1 Page 3

Replace with

2.4.1.2 Page 7

Fix

Add

2.4.1.3 Pages 14–15

Replace with

2.4.1.4 Page 16

Update

2.4.1.5 Page 19

Add

2.4.1.6 Page 21

Replace with

2.4.1.7 Page 22

Add

2.4.1.8 Pages 23–27

Replace with

2.4.1.9 Pages 33 and 67

Fix

Replace with a complete version

2.4.1.10 Page 45

Fix

2.4.1.11 Page 43

Replace with

2.4.1.12 Page 49

Update

2.4.1.13 Page 53

Replace with

2.4.1.14 Pages 68–69

Replace with

2.4.2 openGauss insertion

  1. Insert after Page 6 (DISTINCT basics) Reason: The lecture introduces DISTINCT but lacks a concrete course DBMS demo; add an openGauss runnable example and reinforce “row-level distinctness”. Demo snippet:

    Link: openGauss DISTINCT

  2. Insert after Page 13 (GROUP BY syntax rule) Reason: Students commonly select non-aggregated columns that are not in GROUP BY; add an openGauss example showing the correct pattern. Demo snippet:

    Link: openGauss GROUP BY Clause

  3. Insert after Page 19 (HAVING vs WHERE) Reason: The lecture introduces HAVING, but students confuse it with WHERE; add a short openGauss “before vs after grouping” demo. Demo snippet:

    Link: openGauss HAVING Clause

  4. Insert after Page 22 (optimizer visibility) Reason: The slides mention optimizers differ; add one openGauss tool students can use to observe plans and costs. Demo snippet:

    Link: openGauss EXPLAIN

  5. Insert after Page 27 (COUNT differences with NULLs) Reason: The lecture explains the concept, but a small openGauss demo makes it stick and prevents common mistakes in analytics queries. Demo snippet:

    Link: openGauss Aggregate Functions

  6. Insert after Page 54 (JOIN with aliases as the default style) Reason: The lecture shows multiple syntaxes (NATURAL/USING/ON); standardize on the safest teaching pattern for openGauss labs. Demo snippet:

    Link: openGauss JOIN Clause

  7. Insert after Page 69 (avoid accidental Cartesian products) Reason: The lecture warns about forgetting join predicates; add an openGauss-specific habit: always write explicit joins, and use CROSS JOIN only intentionally. Demo snippet:

    Link: openGauss JOIN Clause

2.4.3 References

2.5 CS213 Lecture-05 CS213-DB-05.pdf

2.5.1 Slides Updates

2.5.1.1 Page 10

Replace with

2.5.1.2 Page 16

Replace with

2.5.1.3 Page 18

Replace with

2.5.1.4 Pages 21–27

Add

2.5.1.5 Page 29

Replace with a simpler “correct” solution

2.5.1.6 Page 32

Replace with

2.5.1.7 Pages 35–36

Add

2.5.1.8 Pages 38–41

Replace the placeholder subquery with a complete version

2.5.1.9 Page 47

Fix the broken INTERSECT example

2.5.1.10 Pages 44 and 50

Add

2.5.1.11 Page 66

Replace with

2.5.1.12 Pages 76–79

Add

2.5.1.13 Page 81

Replace with

2.5.1.14 Page 85

Add

2.5.2 openGauss insertion

  1. Insert after Page 10 (outer join types in openGauss) Reason: The slide claims only LEFT OUTER JOIN matters; openGauss supports LEFT/RIGHT/FULL, so students should learn the full vocabulary and the “rewrite RIGHT as LEFT” habit. Link: JOIN (CROSS/INNER/LEFT/RIGHT/FULL)

  2. Insert after Page 16 (COALESCE in openGauss) Reason: This is the practical “show 0 instead of NULL” moment; add the official openGauss function reference to match what students will run in labs. Demo snippet:

    Link: COALESCE

  3. Insert after Page 27 (outer join filter placement in openGauss) Reason: The “outer join killer” is a frequent beginner bug; show the correct openGauss pattern: move the filter into ON. Demo snippet:

    Link: SELECT (JOIN nesting rules)

  4. Insert after Page 36 (UNION vs UNION ALL in openGauss) Reason: Students often default to UNION and accidentally lose rows; openGauss supports UNION ALL and documents the required column/type matching. Link: UNION Clause

  5. Insert after Page 51 (INTERSECT/EXCEPT naming and use in openGauss) Reason: The slides mention EXCEPT / MINUS; openGauss uses EXCEPT and supports set-operation type resolution similarly to UNION. Link: UNION, CASE, and Related Constructs

  6. Insert after Page 79 (safe anti-join: NOT EXISTS in openGauss) Reason: NOT IN + NULL is a real production pitfall; openGauss documents EXISTS/NOT EXISTS directly. Demo snippet:

    Link: Subquery Expressions (EXISTS/NOT EXISTS)

2.5.3 References

2.6 CS213 Lecture-06 CS213-DB-06.pdf

2.6.1 Slides Updates

2.6.1.1 Pages 3–9

Add

2.6.1.2 Page 11

Replace with

2.6.1.3 Page 14

Replace with

2.6.1.4 Page 15

Add

Example to add:

2.6.1.5 Pages 17–18

Replace with

2.6.1.6 Pages 22–26

Add

2.6.1.7 Pages 33–35

Replace with

Add

2.6.1.8 Page 37

Update

2.6.1.9 Pages 52–53

Fix (incomplete queries): Replace with a complete version

Add

2.6.1.10 Pages 60 and 67

Fix (broken SQL layout): Replace with a clean version

Fix (incomplete “top 2 per country” query): Replace with

2.6.2 openGauss insertion

  1. Insert after Page 11 (NULL ordering) Reason: The slides say NULL ordering is DBMS-dependent; add an openGauss demo showing explicit NULLS FIRST/LAST to avoid surprises across systems. Demo snippet for the new slide:

Link: openGauss ORDER BY Clause

  1. Insert after Page 14 (collation) Reason: The slide uses Oracle-flavored examples; add an openGauss-friendly note that PostgreSQL-like systems use COLLATE, and collation availability depends on installed locales. Suggested note: “Use COLLATE when locale-specific ordering matters; test on the target server.”

  2. Insert after Pages 22–26 (top-N and pagination) Reason: Students will implement paging immediately; add an openGauss example for LIMIT/OFFSET and a brief warning about large offsets. Demo snippet for the new slide:

Link: openGauss ORDER BY Clause

  1. Insert after Pages 33–35 (hierarchies) Reason: The lecture shows CONNECT BY (Oracle-specific) and mentions recursive queries next; add one openGauss slide that uses WITH RECURSIVE as the portable approach. Demo snippet for the new slide:

  1. Insert after Pages 52–53 (percentage of total) Reason: The slide’s main query is incomplete; add a runnable openGauss version so students can copy/paste and verify sum(...) over (). Demo snippet for the new slide: use the completed query from Section 2.6.1.9.

  2. Insert after Pages 60–67 (top-N per group) Reason: This is a high-value pattern (“top K per category”); add an openGauss slide with a clean row_number() solution and one sentence on tie behavior (row_number vs dense_rank). Demo snippet for the new slide: use the completed query from Section 2.6.1.10.

2.6.3 References

2.7 CS213 Lecture-07 CS213-DB-07.pdf

2.7.1 Slides Updates

2.7.1.1 Pages 8–12

Add

2.7.1.2 Page 9

Fix

2.7.1.3 Page 20

Fix

2.7.1.4 Page 21

Fix

2.7.1.5 Page 23

Change

2.7.1.6 Page 26

Replace with

2.7.1.7 Page 28

Fix

Add

2.7.1.8 Page 29

Fix

Add

2.7.1.9 Page 37

Replace with

2.7.1.10 Page 38

Add

2.7.1.11 Pages 54–59

Replace with

2.7.1.12 Page 68

Fix

2.7.1.13 Pages 70–73, 80–82

Fix

2.7.2 openGauss insertion

  1. Insert after Page 12 (full-text search overview) Reason: The lecture builds a manual word-index table; add one slide showing openGauss has built-in full-text search operators/functions so students see the “real” approach. (docs.opengauss.org) Link: openGauss Text Search Functions and Operators Demo snippet for the new slide:

  2. Insert after Page 28 (autocommit warning) Reason: Students using openGauss should practice explicit transactions in gsql, aligned with the “Beware of autocommit” message. (docs.opengauss.org) Link: openGauss gsql Tool Demo snippet for the new slide:

  3. Insert after Page 29 (DDL and commits differ by DBMS) Reason: The slide highlights portability risk; add one openGauss example to show PostgreSQL-like “DDL is transactional” behavior in a concrete way. Demo snippet for the new slide:

  4. Insert after Page 46 (sequence function syntax) Reason: The lecture lists multiple syntaxes; add the openGauss function form and one key rule (currval/lastval require nextval in the session). (docs.opengauss.org) Link: openGauss Sequence Functions

  5. Insert after Page 59 (retrieving generated IDs) Reason: Replace the “last generated value” habit with the clearer, safer openGauss pattern: RETURNING. (docs.opengauss.org) Link: openGauss INSERT Statement Demo snippet for the new slide:

  6. Insert after Page 82 (PostgreSQL COPY / psql \copy) Reason: The lecture mentions PostgreSQL COPY and \copy; add the openGauss gsql \copy equivalent so students can import local files without server-side file access. (docs.opengauss.org) Link: openGauss gsql \copy Meta-Command Demo snippet for the new slide:

2.7.3 References

2.8 CS213 Lecture-08 CS213-DB-08.pdf

2.8.1 Slides Updates

2.8.1.1 Page 3

Fix

2.8.1.2 Page 4

Fix

2.8.1.3 Page 8

Replace with

2.8.1.4 Pages 10–12

Add

2.8.1.5 Pages 15–16

Fix

2.8.1.6 Page 24

Replace with

2.8.1.7 Pages 25–26

Replace with

2.8.1.8 Pages 27–29

Tighten

2.8.1.9 Page 33

Add

2.8.1.10 Page 35

Replace with

2.8.1.11 Pages 36–39

Fix

Add

2.8.1.12 Page 41

Replace with

2.8.1.13 Pages 44–47

Fix

2.8.1.14 Pages 49–55

Replace with

2.8.1.15 Page 56

Replace with

2.8.2 openGauss insertion

  1. Insert after Page 12 (set-based update patterns) Reason: The slides focus on correlated subqueries; add the openGauss-friendly UPDATE ... FROM pattern as the default approach. Link: UPDATE Statement

  2. Insert after Page 24 (duplicate join matches) Reason: Students need one concrete rule: ensure the join keys are unique, or the update result can be unpredictable. Show “add UNIQUE or pre-aggregate” as the fix. Link: UPDATE Statement

  3. Insert after Page 27 (MERGE as standard UPSERT) Reason: Confirm MERGE support and highlight the “match condition should be key-like” rule. Link: MERGE INTO Statement Guide

  4. Insert after Page 29 (dialect differences for UPSERT) Reason: The lecture shows MySQL/SQLite syntax; add the openGauss angle: ON DUPLICATE KEY UPDATE requires a unique constraint or unique index. Link: INSERT Guide

  5. Insert after Page 35 (TRUNCATE safety) Reason: The rollback claim is risky; add a TRUNCATE slide focused on what it does (fast page release + strong locks) and when to use it. Link: TRUNCATE Guide

  6. Insert after Page 47 (functions and procedures) Reason: The lecture introduces PL/pgSQL and procedures; add one minimal example for each so students can run them immediately. Link: PL/pgSQL Functions Link: CREATE PROCEDURE

2.8.3 References

2.9 CS213 Lecture-09 CS213-DB-09.pdf

2.9.1 Slides Updates

2.9.1.1 Page 1

Fix

2.9.1.2 Page 3

Replace with

2.9.1.3 Page 6

Replace with

2.9.1.4 Page 9

Add

2.9.1.5 Pages 14–16

Update

Suggested replacement snippet:

2.9.1.6 Page 17

Replace with

2.9.1.7 Pages 19–25

Fix

2.9.1.8 Pages 38–40

Replace with

2.9.1.9 Page 43

Replace with

Suggested corrected table:

2.9.1.10 Page 44

Fix

2.9.1.11 Page 47

Update

2.9.1.12 Pages 53–59

Replace with

2.9.1.13 Pages 63–67

Replace with

2.9.2 openGauss insertion

  1. Insert after Page 3 (functions vs procedures) Reason: The slide text implies PostgreSQL has only functions; add a concrete openGauss target model: functions vs procedures and how students should call each. Link: CREATE PROCEDURE Link: CALL

  2. Insert after Page 6 (transaction boundaries) Reason: Students often try to “COMMIT inside a function”; add an openGauss note: use procedures for business operations and keep set-based SQL inside. Suggested note: “Use procedures for multi-step operations; keep functions lightweight and side-effect controlled.”

  3. Insert after Pages 14–16 (safe ID retrieval) Reason: Replace lastval() habit with the safer pattern students can copy: INSERT ... RETURNING in openGauss (PostgreSQL-like). Link: INSERT Demo snippet for the new slide:

  4. Insert after Page 17 (calling syntax) Reason: The slide suggests calling “procedures” via SELECT; add openGauss examples that distinguish SELECT fn() vs CALL proc(). Demo snippet for the new slide:

  5. Insert after Page 32 (trigger timing and NEW/OLD) Reason: When students implement triggers, they need one runnable openGauss example showing NEW / OLD and timing choice. Link: CREATE TRIGGER

  6. Insert after Pages 43–47 (auditing pattern) Reason: The lecture demonstrates auditing in PostgreSQL; add an openGauss-ready version emphasizing TIMESTAMP, current_user, and one-row-per-changed-column design. Link: PL/pgSQL

2.9.3 References

2.10 CS213 Lecture-10 CS213-DB-10.pdf

2.10.1 Slides Updates

2.10.1.1 Page 6

Replace with

2.10.1.2 Pages 11–15

Update

2.10.1.3 Page 12

Clarify

2.10.1.4 Pages 18–20

Add

2.10.1.5 Page 21

Update

2.10.1.6 Pages 32–33

Add

2.10.1.7 Page 34

Improve

2.10.1.8 Pages 44–46

Add

2.10.1.9 Page 51

Add

2.10.1.10 Page 80

Add

2.10.1.11 Pages 96–102

Fix + Clarify

2.10.2 openGauss insertion

  1. Insert after Page 21 (buffering and read-ahead) Reason: Students learn storage hierarchy best when they can map it to real DB configuration knobs in a concrete system. Demo snippet for the new slide:

  2. Insert after Page 33 (slotted pages and page size) Reason: The lecture discusses pages conceptually; a one-line query makes “page size” real and testable in openGauss. Demo snippet for the new slide:

  3. Insert after Page 46 (PK/UNIQUE create indexes “behind your back”) Reason: This is a perfect place to prove it with a catalog query in openGauss, reinforcing that constraints usually imply indexes. Demo snippet for the new slide:

  4. Insert after Page 80 (EXPLAIN) Reason: Students need a repeatable workflow for performance: write query → EXPLAIN → change index/query → re-check. openGauss is a practical default target. Demo snippet for the new slide:

  5. Insert after Pages 96–102 (LIKE/functions/date parts/implicit casts) Reason: The slides correctly warn about “dead indexes”, but students also need the standard fixes in a real DBMS: expression indexes, range predicates, and explicit casting. Demo snippet for the new slide:

2.10.3 References

2.11 CS213 Lecture-11 CS213-DB-11.pdf

2.11.1 Slides Updates

2.11.1.1 Page 5

Fix the syntax layout of CREATE VIEW (column list placement). Replace with

2.11.1.2 Page 10

Reorder the example so it is executable SQL. Replace with

2.11.1.3 Page 14

Polish wording + make the “SELECT *” warning more explicit.

2.11.1.4 Pages 18–19

Tone cleanup (remove insulting phrasing).

2.11.1.5 Page 23

Grammar fix.

2.11.1.6 Page 25

Add one mitigation sentence after the “function column breaks index usage” point.

2.11.1.7 Page 43

Fix REVOKE syntax (use FROM, not TO). Replace with

2.11.1.8 Page 48

Make the “schema usage” remark concrete with one example.

2.11.1.9 Page 51

Complete the GRANT statement.

2.11.1.10 Pages 52–55

Fix the “current user” function to be portable in PostgreSQL-like systems.

2.11.1.11 Page 54

Clarify the “don’t drop & recreate” warning with the exact action.

2.11.1.12 Page 58

Replace “alter view or replace view” with precise commands.

2.11.1.13 Pages 59–61

Add a 1-slide takeaway (security limitation).


2.11.2 openGauss insertion

2.11.2.1 Insert after Page 5 (View syntax)

Insert content: openGauss CREATE VIEW / CREATE OR REPLACE VIEW syntax and a minimal example. Reason: anchors the lecture’s generic SQL to the course engine. Official link: openGauss CREATE VIEW

2.11.2.2 Insert after Page 25 (Computed expressions and indexes)

Insert content: expression index idea for computed predicates (when supported by the engine and function properties allow). Reason: provides a concrete “what to do next” for the performance pitfall. Official link: openGauss CREATE INDEX

2.11.2.3 Insert after Pages 35–36 (Views-on-views performance)

Insert content: materialized view as an option for expensive “reporting views” + refresh concept. Reason: offers a standard performance escape hatch when the view is intentionally heavy. Official link: openGauss CREATE MATERIALIZED VIEW

2.11.2.4 Insert after Pages 43–49 (Privileges)

Insert content: openGauss GRANT/REVOKE syntax reference + schema privilege note. Reason: students can immediately map the lecture’s conceptual DCL to exact commands. Official links:

2.11.2.5 Insert after Page 54 (Modifying a view)

Insert content: “don’t drop; use replace” workflow and how to verify a view definition (system catalogs / built-in helpers). Reason: matches the slide warning with a practical checklist. Official link: openGauss CREATE VIEW


2.11.3 References

2.12 CS213 Lecture-12 CS213-DB-12.pdf

2.12.1 Slides Updates

2.12.1.1 Page 4

Fix

Add

2.12.1.2 Page 6

Polish

2.12.1.3 Pages 6–8

Replace with

2.12.1.4 Pages 10–11

Refine

2.12.1.5 Pages 12–13

Replace

Clarify

2.12.1.6 Pages 16–18

Add

2.12.1.7 Pages 19–22

Improve

2.12.1.8 Page 25

Fix

Add

2.12.1.9 Pages 28–29

Fix

2.12.1.10 Page 30

Replace with

2.12.1.11 Page 32

Fix

Add

2.12.1.12 Page 39

Add

2.12.2 openGauss insertion

  1. Insert after Page 8 (updating a join view) Reason: The lecture shows “SQL Server vs most products”; add one concrete openGauss outcome so students learn what happens in the course DBMS. Link: openGauss CREATE VIEW

  2. Insert after Pages 12–13 (user-dependent filtering in a view) Reason: The example uses user; replace with CURRENT_USER and show the openGauss-friendly pattern for per-user filtering. Demo snippet for the new slide:

    Link: openGauss CREATE VIEW

  3. Insert after Page 20 (WITH CHECK OPTION) Reason: This is the exact point where students see “disappearing rows”; add openGauss syntax + a 2-line insert/update demo that fails under CHECK OPTION. Link: openGauss CREATE VIEW

  4. Insert after Pages 25–26 (INSTEAD OF triggers) Reason: The slides introduce INSTEAD OF triggers conceptually; add openGauss “how you actually do it” to connect the idea to practice. Links:

  5. Insert after Pages 33–37 (data dictionary access) Reason: The lecture names information_schema and pg_... views; add 2–3 catalog queries students can run in openGauss to list tables/views/columns. Suggested demo snippet:

2.12.3 References

2.13 CS213 Lecture-13 CS213-DB-13.pdf

2.13.1 Slides Updates

2.13.1.1 Page 3

Replace with

2.13.1.2 Page 11

Add

2.13.1.3 Page 14

Replace with

2.13.1.4 Pages 15–16

Add

2.13.1.5 Pages 22–26

Add a note (terminology clarity)

2.13.1.6 Pages 30–31

Fix

2.13.1.7 Page 32

Replace with

2.13.1.8 Page 36

Replace with

2.13.1.9 Page 38

Fix (wording)

2.13.1.10 Page 43

Replace with

2.13.1.11 Pages 46–53

Add (one slide)


2.13.2 openGauss insertion

  1. Insert after Page 11 (cost estimation and catalog statistics) Reason: The slide introduces “statistics from the catalog”; add one practical openGauss workflow so students can connect “stats → plan choice”. Link: openGauss EXPLAIN Link: openGauss ANALYZE Demo snippet for the new slide:

  2. Insert after Page 18 (Seq Scan vs Index Scan wording) Reason: The slide names PostgreSQL operators; openGauss uses the same style of plan nodes, so one EXPLAIN screenshot/snippet makes the terminology stick. Link: openGauss EXPLAIN Demo snippet for the new slide:

  3. Insert after Pages 22–26 (clustered vs non-clustered indexes) Reason: These slides can mislead students into assuming “clustered index” is universal; add an openGauss note that physical clustering is not the default and is a separate operation. Link: openGauss CLUSTER

  4. Insert after Pages 32–34 (parsing, cache, and reuse) Reason: The lecture talks about caching parsed queries; openGauss examples should show the practical entry point: prepared statements (server-side) plus repeated EXECUTE. Link: openGauss PREPARE Demo snippet for the new slide:

  5. Insert after Pages 48–50 (two-phase commit) Reason: The lecture introduces 2PC conceptually; add an openGauss “prepared transaction” snippet so students see what 2PC looks like in SQL. Link: openGauss PREPARE TRANSACTION Demo snippet for the new slide:


2.13.3 References


3. Course Content Proposal

3.1 Distributed and Cloud-Native Databases Track

3.1.1 Replication, Read/Write Splitting, and Consistency Expectations

3.1.2 Distributed Transactions: 2PC, Failure Modes, and Saga Alternatives

3.1.3 Elasticity and Cost–Performance Co-Optimization in Cloud Deployments

3.1.4 HTAP and Real-Time Analytics: OLTP Meets OLAP


3.2 AI-Era Databases and Retrieval Track

3.2.2 Vector Search and Hybrid Retrieval: Concepts and Course-Level Practice

3.2.3 AI-Assisted Database Engineering Workflow: Verified, Reproducible, and Safe

3.2.4 Observability and Performance Regression: Evidence-Driven Debugging

3.2.5 Mini-Project Option: Search Feature for the Course Database

3.3 Privacy, Compliance, and Data Ethics Track

3.3.1 Data Classification and Minimization

3.3.2 Access Control and Auditability

3.3.3 De-identification, Masking, and Encryption Basics

3.3.4 Data Lifecycle: Retention, Deletion, and “Right to be Forgotten”


3.4 Database Low-Level Operating Principles Track

3.4.1 On-Disk Storage: Pages, Tuples, and Slotted Layout

3.4.2 Buffer Manager and I/O: Why Memory Settings Matter

3.4.3 WAL and Crash Recovery: Durability in Practice

3.4.4 Concurrency Control: MVCC, Locks, and Vacuum/GC

3.4.5 Query Execution Engine: Operators and Plan-to-Performance Mapping

3.4.6 Capstone Mini-Lab: “From Bytes to Behavior”

3.5 Summary

Overall, this proposal emphasizes three complementary aspects of upgrades to the course. First, it adds more modern database topics—replication, cloud-native cost–performance trade-offs, HTAP, and practical retrieval—so students can connect core SQL skills to how real systems are deployed today. Second, it strengthens “how databases work” understanding by introducing low-level operating principles (storage pages, buffer management, WAL/recovery, MVCC/locking, and execution operators), so performance and correctness become explainable rather than trial-and-error. Third, it formalizes responsible AI usage: students may use AI tools to draft ideas, but must verify outputs with official documentation, minimal reproducible tests, and evidence-driven performance evaluation, ensuring the final work is accurate, safe, and maintainable.

Summary

In this report, we upgrade FilmDB and the CS213 course materials along three aligned directions.

We modernize FilmDB into a “run-any-day” update pipeline: a daily Kaggle TMDB snapshot provides the baseline, the official TMDB API supplies incremental deltas, and staging-based merges plus integrity checks keep the database current beyond the original 2019 cutoff. We preserve legacy compatibility while introducing stable identifiers (e.g., tmdb_id), deterministic deduplication, run logging, and two safe deliverables (filmdb_pg.sql and filmdb_og.sql) to reduce dialect risk.

We review the lecture slides for correctness and teachability by fixing errors and incomplete SQL, tightening misleading claims, standardizing safer patterns, and adding openGauss insertions with official references so students have a consistent concrete DBMS target.

We propose modern course modules covering cloud/distributed databases, AI-era retrieval and evidence-driven performance work, privacy/compliance engineering, and low-level database internals—so students learn both modern deployments and the underlying mechanisms, while using AI tools responsibly through verification and reproducible tests.