Skip to content

Database playground

info

  • Updated 8/20/2022 - postgres content
  • Update 11/2024

This repository supports simple studies for database playground (SQL, JPA) using technologies like db2, postgresql or mysql. It is completed by study of some change data capture to move data from database to kafka topic.

SQL basics

In the SQL language we can differentiate the Data Definition Language (DDL), which are statements to create, change, or deleting tables, from Data Modification Language (DML), which are used to define statements to modify the data and to do not change the metadata.

The figure below illustrates some common basic patterns.

For deeper tutorial see the sqltutorial.org site. The Employee DDL is in DDL folder.

Examples of some basic commands applied to customers table.

select * from customers;
select distinct(name) from customers;
select count(distinct(rate)) from films;

-- how many customer has the name bob and are older than 18
select count(*) from customer where name = 'bob' and age >= 18;
-- sort by salary
select name, salary from customers order by salary DESC;

-- limit the number of records returned
SELECT name, salary FROM customers ORDER BY salary DESC LIMIT 10;

-- BETWEEN
select count(*) from payment where amount between 8 and 9;
select * from payment where payment_date between '2007-02-01' and '2007-2-15';

-- IN to test value in a list of options
select count(*) from payment where amount in(0.99, 1.98, 1.99);

-- LIKE and ILIKE (case non-sensitive) to do pattern matching on string
select * from customer where first_name ilike 'J%';

-- Modify the table
alter table customers add column email varchar(100);
-- update a unique record
update customers  set email='max@email.com' where id = 4;

-- Aggregate min, max, avg, count,...
select round(avg(replacement_code),3) from film;

-- GROUP BY combined with aggregate. Who is the customer spending the most
select customer_id, sum(amount) from  payment group by customer_id order by sum(amount) DESC;

-- get the day with the most transactions
select DATE(payment_date), sum(amount) from  payment group by  DATE(payment_date) order by sum(amount) DESC;

-- HAVING to allow us to use the aggregate result to filter the result along with group by
select customer_id, sum(amount) from  payment  
where staff_id = 2 
group by  customer_id having sum(amount) >= 110;

See also postgres study for information to run SQL on a local postgresql started with docker compose.

Exercises on the dvdrental database

See postgres to restore the database schema and data from the tar file.

  • How many payment transactions were greater than $5.00?

    select count(*) from payment where amount > 5
    
  • How many actors have a first name that starts with the letter P?

    select count(first_name) from actor where first_name like 'P%';
    
    * How many unique districts are our customers from?

    select count(distinct(district)) from address;
    -- Retrieve the list of names for those distinct districts 
    select distinct(district) from address where district is not null and district != '' order by district asc;
    
  • How many films have a rating of R and a replacement cost between $5 and $15?

    select count(*) from film where  rating = 'R' and replacement_cost between 5 and 15;
    
  • How many films have the word Truman somewhere in the title?

    select count(*) from film where title ilike '%truman%';
    
  • Which staff member processes the biggest number of transactions?

    select staff_id, count(amount) from  payment group by  staff_id;
    
  • What is the avg replacement cost per film rating?

    select rating, round(avg(replacement_cost),2) from  film group by  rating;
    
  • Customer eligible for platinum status having more than 40 transactions

    select customer_id, count(amount) from  payment group by  customer_id having count(amount) >= 40 ORDER by count(amount);
    
  • Customer who spent more than 100$ with a given staff

    select customer_id, sum(amount) from  payment  where staff_id = 2 group by  customer_id having sum(amount) >= 100;
    

Intermediate SQL

  • Protect against null for value and substitute using COALESCE
select product, (price - coalesce(discount,0)) from products;

Play with timestamp

-- get the year of a date
select extract(year from payment_date) as myyear from payment;
-- can use month, quarter, month, day
-- get how old is a record
select AGE(payment_date) from payment;
--
select TO_CHAR(payment_date, "MM-DD-YYYY") from payment;

Mathematical functions

Compute things from columns. Examples from postgresql

  • Get list of students who scored better than average grade
-- avg grade
select AVG(grade) from test_scores
-- solution with subquery
select student, grade from test_scrores
where grade > ( select AVG(grade) from test_scores);

Self-join

A query in which table is joined to itself: used to compare values in a column of rows within the same table. Need to use aliases. Get the employe's name and the name of his manager:

select emp.name, report.name as manager from employees as emp
join employees as report 
on emp.emp_id = report.report_id 
  • find the films with the same length presented as pair
from f1.title, f2.title, f1.length from film as f1
inner join film as f2
on f1.film_id != f2.film_id and f1.length = f2.length
  • Compare the various amounts of films per movie rating
select
sum(CASE rating
    WHEN 'R' THEN 1
    ELSE 0
END) as R,
sum(CASE rating
    WHEN 'PG' THEN 1
    ELSE 0
END) as PG,
sum(CASE rating
    WHEN 'PG-13' THEN 1
    ELSE 0
END) as PG13
from film

Add views

It is a stored query to be executed often.

create view customer_info as 
SELECT ... -- the sql to repeat

Table creation

For primary key try to using numerical type and Postgresl sequence like:

CREATE TABLE players (player_id SERIAL PRIMARY_KEY, age smallint not null)

Use CHECK to put constraint on column and between column

hire_date DATE check (hire_date > birthday)

Alter table

-- drop a constraint
alter table people alter column name drop not null

Alter table in postgresql - doc

Common queries to clean data

Remove duplicates

-- search for same transaction
SELECT transaction_id, COUNT(transaction_id)
FROM Transaction__Table
GROUP BY transaction_id
HAVING COUNT(transaction_id) > 1;
-- 

Quarkus with panache and DB2

The project is inventory app and the interesting part is the property file

quarkus.datasource.username=cbm39816
quarkus.datasource.password=
quarkus.datasource.url=jdbc:db2://dashdb-txn-sbox-yp-dal09-03.services.dal.bluemix.net:50001/BLUDB:sslConnection=true;
quarkus.datasource.max-size=8
quarkus.datasource.min-size=2
quarkus.datasource.driver=com.ibm.db2.jcc.DB2Driver
quarkus.hibernate-orm.dialect=org.hibernate.dialect.DB2Dialect
quarkus.hibernate-orm.database.generation=drop-and-create
quarkus.hibernate-orm.database.default-schema=CBM39816
quarkus.hibernate-orm.log.sql=true
quarkus.hibernate-orm.sql-load-script=import.sql
quarkus.smallrye-openapi.path=/inventory/swagger
quarkus.swagger-ui.always-include=true

Postgresql

See dedicated note.

Setting CDC on those table

The Kafka connector is deployed as part of Strimzi.

TBC