Database playground¶
info
- Updated 8/20/2022 - postgres content
- Updated 06/2025 - Puzzles and Postgresql on k8s
This repository supports simple studies for database playground (SQL, JPA) using technologies like db2, postgresql. duckdb. It is completed by study of some change data capture to move data from database to kafka topic.
SQL basics¶
In the SQL language we can differentiate the Data Definition Language (DDL), which are statements to create, change, or deleting tables, from Data Modification Language (DML), which are used to define statements to modify the data and to do not change the metadata.
The figure below illustrates some common basic patterns.
For deeper tutorial see the sqltutorial.org site. The Employee DDL is in DDL folder.
Examples of some basic commands applied to customers table.
select * from customers;
select distinct(name) from customers;
select count(distinct(rate)) from films;
-- how many customer has the name bob and are older than 18
select count(*) from customer where name = 'bob' and age >= 18;
-- sort by salary
select name, salary from customers order by salary DESC;
-- limit the number of records returned
SELECT name, salary FROM customers ORDER BY salary DESC LIMIT 10;
-- BETWEEN
select count(*) from payment where amount between 8 and 9;
select * from payment where payment_date between '2007-02-01' and '2007-2-15';
-- IN to test value in a list of options
select count(*) from payment where amount in(0.99, 1.98, 1.99);
-- LIKE and ILIKE (case non-sensitive) to do pattern matching on string
select * from customer where first_name ilike 'J%';
-- Modify the table
alter table customers add column email varchar(100);
-- update a unique record
update customers set email='max@email.com' where id = 4;
-- Aggregate min, max, avg, count,...
select round(avg(replacement_code),3) from film;
-- GROUP BY combined with aggregate. Who is the customer spending the most
select customer_id, sum(amount) from payment group by customer_id order by sum(amount) DESC;
-- get the day with the most transactions
select DATE(payment_date), sum(amount) from payment group by DATE(payment_date) order by sum(amount) DESC;
-- HAVING to allow us to use the aggregate result to filter the result along with group by
select customer_id, sum(amount) from payment
where staff_id = 2
group by customer_id having sum(amount) >= 110;
See also postgres study for information to run SQL on a local postgresql started with docker compose.
Exercises on the dvdrental database¶
See postgres to restore the database schema and data from the tar file.
-
How many payment transactions were greater than $5.00?
-
How many actors have a first name that starts with the letter P?
* How many unique districts are our customers from? -
How many films have a rating of R and a replacement cost between $5 and $15?
-
How many films have the word Truman somewhere in the title?
-
Which staff member processes the biggest number of transactions?
-
What is the avg replacement cost per film rating?
-
Customer eligible for platinum status having more than 40 transactions
-
Customer who spent more than 100$ with a given staff
Intermediate SQL¶
- Protect against null for value and substitute using COALESCE
Play with timestamp¶
-- get the year of a date
select extract(year from payment_date) as myyear from payment;
-- can use month, quarter, month, day
-- get how old is a record
select AGE(payment_date) from payment;
--
select TO_CHAR(payment_date, "MM-DD-YYYY") from payment;
Mathematical functions¶
Compute things from columns. Examples from postgresql
- Get list of students who scored better than average grade
-- avg grade
select AVG(grade) from test_scores
-- solution with subquery
select student, grade from test_scrores
where grade > ( select AVG(grade) from test_scores);
Self-join¶
A query in which table is joined to itself: used to compare values in a column of rows within the same table. Need to use aliases. Get the employe's name and the name of his manager:
select emp.name, report.name as manager from employees as emp
join employees as report
on emp.emp_id = report.report_id
- find the films with the same length presented as pair
from f1.title, f2.title, f1.length from film as f1
inner join film as f2
on f1.film_id != f2.film_id and f1.length = f2.length
- Compare the various amounts of films per movie rating
select
sum(CASE rating
WHEN 'R' THEN 1
ELSE 0
END) as R,
sum(CASE rating
WHEN 'PG' THEN 1
ELSE 0
END) as PG,
sum(CASE rating
WHEN 'PG-13' THEN 1
ELSE 0
END) as PG13
from film
Add views¶
It is a stored query to be executed often.
Table creation¶
For primary key try to using numerical type and Postgresl sequence like:
Use CHECK to put constraint on column and between column
Alter table¶
Alter table in postgresql - doc
Common queries to clean data¶
Remove duplicates¶
-- search for same transaction
SELECT transaction_id, COUNT(transaction_id)
FROM Transaction__Table
GROUP BY transaction_id
HAVING COUNT(transaction_id) > 1;
--
Quarkus with panache and DB2¶
The project is inventory app and the interesting part is the property file
quarkus.datasource.username=cbm39816
quarkus.datasource.password=
quarkus.datasource.url=jdbc:db2://dashdb-txn-sbox-yp-dal09-03.services.dal.bluemix.net:50001/BLUDB:sslConnection=true;
quarkus.datasource.max-size=8
quarkus.datasource.min-size=2
quarkus.datasource.driver=com.ibm.db2.jcc.DB2Driver
quarkus.hibernate-orm.dialect=org.hibernate.dialect.DB2Dialect
quarkus.hibernate-orm.database.generation=drop-and-create
quarkus.hibernate-orm.database.default-schema=CBM39816
quarkus.hibernate-orm.log.sql=true
quarkus.hibernate-orm.sql-load-script=import.sql
quarkus.smallrye-openapi.path=/inventory/swagger
quarkus.swagger-ui.always-include=true