Advanced Databases for Beginners

Dave Cross <dave@mag-sol.com>

Magnum Solutions Ltd

Advanced Databases for Beginners

Databases have become ubiqitous
Almost every web application has a database at its backend
Databases are being designed by people who have no training in DB design
Or who have never used a "real" database
- MySQL 3.x is still very common
- Missing many of these features
Making basic errors
Database design is a skill

Well Designed Databases

Well designed databases are:
- Easier to write queries against
- Easier to maintain
- Faster

Intelligent Databases

Databases allow you to describe your data in detail
This is the best place for this information
Not in your application
Application logic needs to be duplicated when another application connects to your database
Database logic is already there

Practicalities

Advanced databases need an advanced database server
Commercial database systems
- Oracle
- Sybase
- DB2
- etc...

Practicalities (cont)

Open source database systems
- PostgreSQL
- MySQL (from version 4.0)
  - Version 5 is better
  - Version 3 is much much worse
- SQLite doesn't do a lot of this stuff
Examples based on MySQL 5.0

What We Will Cover

Naming conventions
Data types
Relationships
Normalisation
Indexes
Joins, Subqueries and Views
Triggers

What We Won't Cover

Performance Tuning
Stored Procedures
Too large to cover
Too much difference between database systems

What You Should Know Already

Simple table creation statements

  CREATE TABLE foo (id INTEGER, name CHAR(20))

Simple use of INSERT/SELECT/UPDATE/DELETE

Naming Conventions

Just as important as in code
Consistancy is important
Many RDBMSs not case sensitive
- therefore my_table better than MyTable
- and my_column better than MyColumn
my_objects vs my_object - pick one and stick with it

Naming Keys

Consistant name for primary key (use "id")

  CREATE TABLE book (
    id INTEGER,
    ...
  }

Consistant naming convention for foreign keys (use table name)

  CREATE TABLE book (
    ...
    author INTEGER
  )

Naming Convention examples

Bad example:

 CREATE table book (
   book_id INTEGER,
   book_title VARCHAR(100),
   book_author_id INTEGER
 }

Good example

 CREATE table book (
   id INTEGER,
   title VARCHAR(100),
   author INTEGER
 )

Naming Conventions vs ORM

Your ORM may insist on particular naming conventions
For tables
For primary keys
For foreign keys
This is generally a clue that your ORM is flawed

Data Types

Use the right data type for your data
Numbers for numbers, characters for characters
Choose the right size data type
- TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT
- CHAR, VARCHAR or TEXT
Use NOT NULL where appropriate

Data Types (cont)

You won't get it right first time
Learn to love ALTER TABLE
Sizing columns correctly can save a lot of disk space
Also effects index efficiency

Dates and Times

Store dates and times in datetime columns
Most databases can do date/time arithmetic on datetime columns
- ADDDATE, ADDTIME, DATEDIFF
Functions to extract parts of date
- YEAR, MONTH, MONTHNAME, DAY, DAYNAME, HOUR, MINUTE, SECOND
It's often much easier to do this in the database

Number 1 Date/Time Database Tip

Store dates and times as UTC
Treat all other timezones as display variations

Enumerated Data Types

Many database systems allow you to define enumerated data types
String type that allows one value from a predefined set

  CREATE TABLE person (
    id INTEGER,
    name CHAR(30),
    date_of_birth DATE,
    sex ENUM('M', 'F')
  )

Saves space

Keys and Relationships

Build relationships between tables
This is the power of relational databases
- (Tho' it's not why they are called "relational")

Primary Keys

Give each table in your database a primary key
Primary key is a unique identifier
Tip: Make it an integer

Choosing Primary Keys

Usually best if it's a purely internal value
A value you can control
External keys don't always remain integers
- "We're appending the location code to this key"
External keys don't remain unique
- "We're now going to take data from two systems"
- "This external system reuses old keys"

Defining Primary Keys

Define primary key in CREATE TABLE statement

 CREATE TABLE artist (
   id INTEGER AUTO_INCREMENT PRIMARY KEY,
   name VARCHAR(50)
 )

 CREATE TABLE artist (
   id INTEGER AUTO_INCREMENT,
   name VARCHAR(50),
   PRIMARY KEY (id)
 )

Defining Primary Keys (cont)

In MySQL use AUTO_INCREMENT
Can also use multi-column keys
- Not often necessary though
Defining a column as a primary key automatically makes it unique and not null

Fundamental Rules of Primary Keys

Never update a primary key
Never reuse a primary key

Foreign Keys

The other end of the relationship
A column in one table that contains values from the primary key of another table

For example an album table would contain the key of the artist

 CREATE TABLE album (
   id INTEGER PRIMARY KEY AUTO_INCREMENT,
   artist INTEGER,
   title VARCHAR(50),
   release_date DATE
 )

Using Foreign Keys

You can then get the artist details given the album id

 SELECT artist.name
 FROM   artist
 JOIN   album
 ON     album.artist = artist.id
 WHERE  album.id = ?

Or all albums for an artist

 SELECT title
 FROM   album
 JOIN   artist
 ON     album.artist = artist.id
 WHERE  artist.id = ?

Defining Foreign Keys

Notice that we haven't defined album.artist as a foreign key anywhere
Joins don't require foreign keys to be defined

But you can define them

 CREATE TABLE album (
   id INTEGER AUTO_INCREMENT PRIMARY KEY,
   artist INTEGER REFERENCES artist(id),
   title VARCHAR(50),
   release_date DATE
 )

Defining Foreign Keys (cont)

Alternative Syntax

 CREATE TABLE album (
   id INTEGER AUTO_INCREMENT PRIMARY KEY,
   artist INTEGER,
   title VARCHAR(50),
   release_date DATE,
   FOREIGN KEY (artist) REFERENCES artist (id)
 )

Why Define Foreign Keys?

If table joins work without defining foreign keys then why bother defining them?
Referential integrity
With foreign keys defined the database will watch your back and stop you doing anything stupid
Can't insert foreign keys where the primary keys don't exist
Can't delete primary keys that exist as a foreign key

Referential Integrity in Action

  mysql> insert album values (1, 'Alright Still', '2006-07-17', 1);
  ERROR 1452 (23000): Cannot add or update a child row:
   a foreign key constraint fails (`album/album`, CONSTRAINT
   `album_ibfk_1` FOREIGN KEY (`artist`) REFERENCES `artist` (`id`))

  mysql> insert artist values (1, 'Lily Allen');
  Query OK, 1 row affected (0.02 sec)

  mysql> insert album values (1, 'Alright Still', '2006-07-17', 1);
  Query OK, 1 row affected (0.01 sec)

  mysql> delete from artist where id = 1;
  ERROR 1451 (23000): Cannot delete or update a parent row:
   a foreign key constraint fails (`album/album`, CONSTRAINT
   `album_ibfk_1` FOREIGN KEY (`artist`) REFERENCES `artist` (`id`))

Big MySQL Foreign Key Caveat

Declaring foreign keys has no effect in the default (MyISAM) storage engine

To get referential integrity features use InnoDB

   CREATE TABLE album (
     id INTEGER AUTO_INCREMENT,
     artist INTEGER REFERENCES artist(id),
     title VARCHAR(50),
     release_date DATE
   ) ENGINE=InnoDB;

Configuring Foreign Keys

The deletion strategy is configurable
FOREIGN KEY (column) REFERENCES table (column) ON DELETE RESTRICT
- Default action. Can't delete
FOREIGN KEY (column) REFERENCES table (column) ON DELETE CASCADE
- Deletes all child records
- If you delete an artist all associated albums are automatically deleted

Configuring Foreign Keys (cont)

FOREIGN KEY (column) REFERENCES table (column) ON DELETE SET NULL
- All child records have foreign key set to null
- Effectively marks them as "unowned"
Also works for updates
- ON UPDATE RESTRICT, etc
- But you should never update a primary key

Relationship Cookbook

There are a small number of standard relationships that you often see
Standard ways of handling them
No point in reinventing wheels

One to Many

A parent object has many child objects
- Note: "many" can include "none"
A child object has exactly one parent object
Standard primary key/foreign key set-up

One to One

A parent object has exactly one child object
- Sometimes zero or one
A child object has exactly one parent object
Standard primary key/foreign key set-up
- But make the foreign key unique

Many to Many

A parent object has many child objects
A child object has many parent objects
e.g. An album has many songs, a song is on many albums
Create a linking table (album_track)
Two one to many relationships
Can also add more information to link table
- e.g. Track number

Normalisation

"Don't Repeat Yourself" for databases
Each piece of data appears once in the database
One definitive source for each fact
If the same information is stored more than once then versions can get out of step
Which instance do you trust?

Unnormalised Table

An example of unnormalised data
MP3 metadata

track title artist album year

1 Smile Lily Allen Alright Still 2006

2 Knock 'Em Out Lily Allen Alright Still 2006

3 LDN Lily Allen Alright Still 2006
Far too much repeated data

track	title	artist	album	year
1	Smile	Lily Allen	Alright Still	2006
2	Knock 'Em Out	Lily Allen	Alright Still	2006
3	LDN	Lily Allen	Alright Still	2006

Real Life Horror Story: MP3 Downloads

Converting Access database to MySQL
Artist table
Couldn't find the MP3 table
Took a closer look at the artist table

Real Life Horror Story: MP3 Downloads (cont)

name, contact details, track titles, file names
Last two were comma separated fields
artist table:

name contact details tracks files

Lily Allen lily@example.com Smile,
Knock 'Em Out,
LDN smile.mp3,
knock_em_out.mp3,
ldn.mp3
Good job song titles never contain commas!

name	contact details	tracks	files
Lily Allen	lily@example.com	Smile, Knock 'Em Out, LDN	smile.mp3, knock_em_out.mp3, ldn.mp3

Normalising MP3 Data

Need three tables
Artist, Album, Track
artist table:

artist

id name

1 Lily Allen
Normalising MP3 Data (cont)
- album table:
album

id artist title year

1 1 Alright Still 2006
track table

track

id album number title

1 1 1 Smile

2 1 2 Knock 'Em Out

3 1 3 LDN

artist
id	name
1	Lily Allen

album
id	artist	title	year
1	1	Alright Still	2006

track
id	album	number	title
1	1	1	Smile
2	1	2	Knock 'Em Out
3	1	3	LDN

Normalising Data

Many different normal forms
- Six or seven last time I checked
See Wikipedia article "Database Normalisation"
Third normal form is usually good enough
- Aka "common sense"
Not as complex as it might seem

Simple Guide to Normalising Data

Identify data objects (tables)
Identify data items (columns)
Assign items to appropriate objects
You need to understand the data
- (Or have good access to someone who does)

Normalised Data

A data table is in (third) normal form when its columns are dependent on...
- The key
- The whole key
- And nothing but the key

Know When To Stop

Normalisation can be taken too far
I once worked with a database that went far too far
Only three tables in the database
- Objects
- Attribute
- Associations
Very hard to work with

Denormalisation

Sometimes a normalised database has performance issues
Need to join too many tables to get data out
May consider planned denormalisation to counter this
Always start from a normalised database
Always know exactly which problems you are solving
Always use triggers to control the denormalised columns

Indexes

Indexes have two purposes in a database
To improve query performance
To impose uniqueness constraints

Indexes for Performance

An index is just like an index in a book
Provides a quick way to find data
Query engine looks up data in an index
Index points to the relevant data page
Faster than doing a table scan

Index Downsides

Indexes make queries faster
But they need to be kept up to date
Therefore updates are slower
No free lunches
Only add indexes that are going to be useful

Creating Indexes

Create indexes with CREATE INDEX

  CREATE INDEX index_name
  ON table_name (column1, ...)

Example:

  CREATE INDEX artist_ind
  ON album (artist)

Using Indexes

Now queries that use the artist column on album will be faster

  SELECT *
  FROM   album
  WHERE  artist = ?

Don't need to explicity use the index

Multi-Column Indexes

An index can contain more than one column

Example:

  CREATE INDEX artist_year_ind
  ON album (artist, year)

Speeds up queries of the form

  SELECT *
  FROM   album
  WHERE  artist = ? AND year = ?

Creating Unique Indexes

Unique indexes prevent duplicate rows from being inserted into tables
Primary keys are (by definition) also unique indexes

Add UNIQUE keywork to CREATE INDEX statement

  CREATE UNIQUE INDEX index_name
  ON table (column1, ...)

Unique Index Example

Each album can only have one track with each track number

  CREATE UNIQUE INDEX album_track_ind
  ON track (album, number)

Breaking Unique Indexes

Insert one track

 mysql> insert track (title,album,number)
     -> values ('Smile',1,1);
 Query OK, 1 row affected (0.02 sec)

Insert second track

  mysql> insert track (title,album,number)
      -> values ('Knock Em Out',1,1);
  ERROR 1062 (23000): Duplicate entry '1-1' for key 2

Alternative Index syntax

Can also create single-column unique indexes in column definitions

  CREATE TABLE foo (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    code CHAR(10) UNIQUE
  )

Can also create indexes in table definitions

  CREATE TABLE foo (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    code CHAR(10),
    UNIQUE (code)
  )

Joining Tables

Now you have a nicely normalised database
You need to query multiple tables to get any useful data out
Query multiple tables using table joins

Old Style Join Syntax

It's possible to join tables just using the WHERE clause

  SELECT album.title, album.release_date
  FROM   album, artist
  WHERE  artist.name = 'Pink Floyd'
  AND    artist.id = album.artist

Join clauses muddled up with selection clauses
Potentially confusing
Still seen, but best avoided

New Style Join Syntax

SQL now supports a more explicit JOIN clause

  SELECT album.title, album.release_date
  FROM   album
  JOIN   artist
  ON     artist.id = album.artist
  WHERE  artist.name = 'Pink Floyd'

Join clause clearly separated from selection clauses
Also more flexible

Types of Join

Cross join
Inner join
Outer join
- Left, right and full

Cross Join

  SELECT some columns
  FROM   table1, table2

No join clause
All rows in table 1 returned matched with all rows in table 2
If table 1 has m rows and table 2 has n rows, returns m*n rows
Also known as "Cartesian Product"
Usually not what you want

Inner Join

  SELECT some columns
  FROM   table1
  JOIN   table2
  ON     table1.col1 = table2.col2

Finds intersection between two tables
Returns matches where row exists in both tables
Default join type
Most used type

Left Outer Join

  SELECT some columns
  FROM   table1
  LEFT OUTER JOIN table2
  ON     tab1e1.col1 = table2.col2

Returns all rows from table 1 and matching data from table 2
Returns NULLs where no matching data in table 2

Right Outer Join

  SELECT some columns
  FROM   table1
  RIGHT OUTER JOIN table2
  ON     tab1e1.col1 = table2.col2

Returns all rows from table 2 and matching data from table 1
Returns NULLs where no matching data in table 1

Full Outer Join

  SELECT some columns
  FROM   table1
  FULL OUTER JOIN table2
  ON     tab1e1.col1 = table2.col2

Returns the combination of LEFT and RIGHT outer joins
Currently not supported by MySQL
That syntax doesn't work

Subqueries

Subqueries are another way to extract data from multiple tables

  SELECT *
  FROM   some_table
  WHERE  some_column IN
    (SELECT some_other_column
     FROM   some_other_table)

Selects all values of some_other_column from some_other_table
Selects rows from some_table where some_column contains one of those values

Nested Subqueries

Subqueries can be nested

  SELECT *
  FROM   some_table
  WHERE  some_column IN
    (SELECT some_other_column
     FROM   some_other_table
     WHERE  a_different_column IN
       (SELECT yet_another_column
        FROM   yet_another_table))

Comparison Subqueries

Can use other comparison operators with subqueries

  SELECT *
  FROM   some_table
  WHERE  some_column >
    (SELECT some_other_column
     FROM   some_other_table)

Comparisons with ANY or ALL

Useful when subquery doesn't return a single value

  SELECT *
  FROM   some_table
  WHERE  some_column > ANY
    (SELECT some_other_column
     FROM   some_other_table)

  SELECT *
  FROM   some_table
  WHERE  some_column > ALL
    (SELECT some_other_column
     FROM   some_other_table)

Row Subqueries

Can compare multiple values from the subquery

  SELECT *
  FROM   some_table
  WHERE  (some_column, another_column) =
    (SELECT some_other_column, a_second_column
     FROM   some_other_table)

Correlated Subqueries

Subqueries can use values from the outer query

  SELECT *
  FROM   some_table
  WHERE  some_column = ANY
    (SELECT some_other_column
     FROM   some_other_table
     WHERE  some_other_table.another_column
            = some_table.yet_another_column)

EXISTS and NOT EXISTS

  SELECT *
  FROM   some_table
  WHERE EXISTS
    (SELECT *
     FROM   some_other_table
     WHERE  some_table.some_column
            = some_other_table.some_other_column)

  SELECT *
  FROM   some_table
  WHERE NOT EXISTS
    (SELECT *
     FROM   some_other_table
     WHERE  some_table.some_column
            = some_other_table.some_other_column)

Subqueries Summary

Subqueries are a very powerful feature of SQL
The key to many kinds of complex data manipulation
Good support for them in MySQL 4.1
Well worth learning how to use them

Views

A view is a pre-built select statement that can be treated as a table

  CREATE VIEW view_name AS
  SELECT ...

  SELECT * FROM view_name

Display only certain columns from a table
Display only certain rows from a table
Create "denormalised" tables to make queries simpler
Create summary tables

Single Table Views

Only display certain columns

  CREATE VIEW view_name AS
  SELECT a_column, another_column
  FROM   a_table

Useful for security
Some users only allowed to see certain columns

Single Table Views (cont)

Only display certain rows

  CREATE VIEW view_name AS
  SELECT *
  FROM   table
  WHERE  some_condition_is_true

This year's albums

  CREATE VIEW albums_2006 AS
  SELECT *
  FROM   album
  WHERE  YEAR(release_date) = 2006

Denormalised Tables

Need to extract data by joining many tables
Pre-build complex joins as views

  CREATE VIEW mp3 AS
  SELECT artist.name AS artist,
         album.title AS title,
         track.title AS track
  FROM   artist
  JOIN   album ON album.artist = artist.id
  JOIN   track ON track.album = album.id;

Rename columns in result set

Summary Tables

Pre-build common summary tables as views

  CREATE VIEW albums_by_year AS
  SELECT YEAR(release_date),
         COUNT(*) AS album_count
  FROM   album
  GROUP BY YEAR(release_date)

Triggers

A trigger is a piece of executable code that is fired (triggered) when a particular database action occurs
Useful to run actions when data is inserted, updated or deleted
Triggers can run before or after INSERT, UPDATE or DELETE events

Defining Triggers

MySQL 5.x syntax

  CREATE TRIGGER name 
  BEFORE|AFTER INSERT|UPDATE|DELETE
  ON table_name
  FOR EACH ROW some_code

example:

  CREATE TRIGGER name BEFORE INSERT
  ON artist
  FOR EACH ROW some useful action

NEW and OLD

Triggers have access to "pseudo-tables" called NEW and OLD
NEW contains the new data for each row
OLD contains the old data for each row
OLD isn't available in INSERT triggers
NEW isn't available in DELETE triggers

Trigger Example: Auditing

Create a table with two extra columns - updater and updated

  CREATE TABLE some_table (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    name CHAR(20),
    updater VARCHAR(60),
    updated DATETIME
  )

Trigger Example: Auditing (2)

Create triggers

  CREATE TRIGGER ins BEFORE INSERT
  ON some_table
  FOR EACH ROW
  SET NEW.updater = USER(), NEW.updated = NOW()

  CREATE TRIGGER upd BEFORE UPDATE
  ON some_table
  FOR EACH ROW
  SET NEW.updater = USER(), NEW.updated = NOW()

Trigger Example: Auditing (3)

Do some updates and inserts

  INSERT some_table (name) values ('Dave')
  INSERT some_table (name) values ('Perl')
  UPDATE some_table
  SET name = 'davorg' WHERE name = 'Dave'

  SELECT * FROM some_table
  +----+--------+----------------+---------------------+
  | id | name   | updater        | updated             |
  +----+--------+----------------+---------------------+
  |  1 | davorg | dave@localhost | 2006-08-27 16:52:45 |
  |  2 | Perl   | dave@localhost | 2006-08-27 16:52:05 |
  +----+--------+----------------+---------------------+

Possible extension - handling deletions

Trigger Example: Denormalisation (1)

Create parent and child tables with one parent column denormalised into the child table

  CREATE TABLE parent (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    name CHAR(20)
  )

  CREATE TABLE child (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    name CHAR(20),
    parent INTEGER REFERENCES parent(id),
    parent_name char(20)
  )

Trigger Example: Denormalisation (2)

Create denormalisation triggers

  CREATE TRIGGER ins_chd BEFORE INSERT
  ON child
  FOR EACH ROW SET NEW.parent_name =
                      (SELECT name
                       FROM   parent
                       WHERE  id = NEW.parent)

  CREATE TRIGGER upd_par AFTER UPDATE
  ON parent
  FOR EACH ROW UPDATE child
               SET    parent_name = NEW.name
               WHERE  parent = NEW.id

Trigger Example: Denormalisation (3)

Insert rows

  INSERT parent (name)
  VALUES ('Foo')

  INSERT child (name, parent)
  VALUES ('Child', 1)

  select * from child;
  +----+-------+--------+-------------+
  | id | name  | parent | parent_name |
  +----+-------+--------+-------------+
  |  1 | Child |      1 | Foo         |
  +----+-------+--------+-------------+

Trigger Example: Denormalisation (4)

Update parent

  UPDATE parent
  SET    name = 'Bar'
  WHERE  name = 'Foo'

  select * from child;
  +----+-------+--------+-------------+
  | id | name  | parent | parent_name |
  +----+-------+--------+-------------+
  |  1 | Child |      1 | Bar         |
  +----+-------+--------+-------------+

Book Recommendations

SQL In Nutshell - Kevin Kline

Book Recommendations

Introduction to Database Systems - CJ Date

Book Recommendations

SQL For Smarties - Joe Celko

Book Recommendations

Database in Depth - CJ Date

Some Conclusions

Database design is a complex and interesting area
Well designed databases are easier to work with
Use your database to describe your data in as much detail as possible
Upgrade to a database that supports these features
Hire a database expert

YAPC::Europe / 30th August 2006