Databases have become ubiqitous
Almost every web application has a database at its backend
Databases are being designed by people who have no training in DB design
Or who have never used a "real" database
MySQL 3.x is still very common
Missing many of these features
Making basic errors
Database design is a skill
Well designed databases are:
Easier to write queries against
Easier to maintain
Faster
Databases allow you to describe your data in detail
This is the best place for this information
Not in your application
Application logic needs to be duplicated when another application connects to your database
Database logic is already there
Advanced databases need an advanced database server
Commercial database systems
Oracle
Sybase
DB2
etc...
Open source database systems
PostgreSQL
MySQL (from version 4.0)
Version 5 is better
Version 3 is much much worse
SQLite doesn't do a lot of this stuff
Examples based on MySQL 5.0
Naming conventions
Data types
Relationships
Normalisation
Indexes
Joins, Subqueries and Views
Triggers
Performance Tuning
Stored Procedures
Too large to cover
Too much difference between database systems
Simple table creation statements
CREATE TABLE foo (id INTEGER, name CHAR(20))
Simple use of INSERT/SELECT/UPDATE/DELETE
Just as important as in code
Consistancy is important
Many RDBMSs not case sensitive
therefore my_table better than MyTable
and my_column better than MyColumn
my_objects vs my_object - pick one and stick with it
Consistant name for primary key (use "id")
CREATE TABLE book ( id INTEGER, ... }
Consistant naming convention for foreign keys (use table name)
CREATE TABLE book ( ... author INTEGER )
Bad example:
CREATE table book ( book_id INTEGER, book_title VARCHAR(100), book_author_id INTEGER }
Good example
CREATE table book ( id INTEGER, title VARCHAR(100), author INTEGER )
Your ORM may insist on particular naming conventions
For tables
For primary keys
For foreign keys
This is generally a clue that your ORM is flawed
Use the right data type for your data
Numbers for numbers, characters for characters
Choose the right size data type
TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT
CHAR, VARCHAR or TEXT
Use NOT NULL where appropriate
You won't get it right first time
Learn to love ALTER TABLE
Sizing columns correctly can save a lot of disk space
Also effects index efficiency
Store dates and times in datetime columns
Most databases can do date/time arithmetic on datetime columns
ADDDATE, ADDTIME, DATEDIFF
Functions to extract parts of date
YEAR, MONTH, MONTHNAME, DAY, DAYNAME, HOUR, MINUTE, SECOND
It's often much easier to do this in the database
Store dates and times as UTC
Treat all other timezones as display variations
Many database systems allow you to define enumerated data types
String type that allows one value from a predefined set
CREATE TABLE person ( id INTEGER, name CHAR(30), date_of_birth DATE, sex ENUM('M', 'F') )
Saves space
Build relationships between tables
This is the power of relational databases
(Tho' it's not why they are called "relational")
Give each table in your database a primary key
Primary key is a unique identifier
Tip: Make it an integer
Usually best if it's a purely internal value
A value you can control
External keys don't always remain integers
"We're appending the location code to this key"
External keys don't remain unique
"We're now going to take data from two systems"
"This external system reuses old keys"
Define primary key in CREATE TABLE statement
CREATE TABLE artist ( id INTEGER AUTO_INCREMENT PRIMARY KEY, name VARCHAR(50) )
Or
CREATE TABLE artist ( id INTEGER AUTO_INCREMENT, name VARCHAR(50), PRIMARY KEY (id) )
In MySQL use AUTO_INCREMENT
Can also use multi-column keys
Not often necessary though
Defining a column as a primary key automatically makes it unique and not null
Never update a primary key
Never reuse a primary key
The other end of the relationship
A column in one table that contains values from the primary key of another table
For example an album table would contain the key of the artist
CREATE TABLE album ( id INTEGER PRIMARY KEY AUTO_INCREMENT, artist INTEGER, title VARCHAR(50), release_date DATE )
You can then get the artist details given the album id
SELECT artist.name FROM artist JOIN album ON album.artist = artist.id WHERE album.id = ?
Or all albums for an artist
SELECT title FROM album JOIN artist ON album.artist = artist.id WHERE artist.id = ?
Notice that we haven't defined album.artist
as a foreign key anywhere
Joins don't require foreign keys to be defined
But you can define them
CREATE TABLE album ( id INTEGER AUTO_INCREMENT PRIMARY KEY, artist INTEGER REFERENCES artist(id), title VARCHAR(50), release_date DATE )
Alternative Syntax
CREATE TABLE album ( id INTEGER AUTO_INCREMENT PRIMARY KEY, artist INTEGER, title VARCHAR(50), release_date DATE, FOREIGN KEY (artist) REFERENCES artist (id) )
If table joins work without defining foreign keys then why bother defining them?
Referential integrity
With foreign keys defined the database will watch your back and stop you doing anything stupid
Can't insert foreign keys where the primary keys don't exist
Can't delete primary keys that exist as a foreign key
mysql> insert album values (1, 'Alright Still', '2006-07-17', 1); ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`album/album`, CONSTRAINT `album_ibfk_1` FOREIGN KEY (`artist`) REFERENCES `artist` (`id`))
mysql> insert artist values (1, 'Lily Allen'); Query OK, 1 row affected (0.02 sec)
mysql> insert album values (1, 'Alright Still', '2006-07-17', 1); Query OK, 1 row affected (0.01 sec)
mysql> delete from artist where id = 1; ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`album/album`, CONSTRAINT `album_ibfk_1` FOREIGN KEY (`artist`) REFERENCES `artist` (`id`))
Declaring foreign keys has no effect in the default (MyISAM) storage engine
To get referential integrity features use InnoDB
CREATE TABLE album ( id INTEGER AUTO_INCREMENT, artist INTEGER REFERENCES artist(id), title VARCHAR(50), release_date DATE ) ENGINE=InnoDB;
The deletion strategy is configurable
FOREIGN KEY (column) REFERENCES table (column) ON DELETE RESTRICT
Default action. Can't delete
FOREIGN KEY (column) REFERENCES table (column) ON DELETE CASCADE
Deletes all child records
If you delete an artist all associated albums are automatically deleted
FOREIGN KEY (column) REFERENCES table (column) ON DELETE SET NULL
All child records have foreign key set to null
Effectively marks them as "unowned"
Also works for updates
ON UPDATE RESTRICT, etc
But you should never update a primary key
There are a small number of standard relationships that you often see
Standard ways of handling them
No point in reinventing wheels
A parent object has many child objects
Note: "many" can include "none"
A child object has exactly one parent object
Standard primary key/foreign key set-up
A parent object has exactly one child object
Sometimes zero or one
A child object has exactly one parent object
Standard primary key/foreign key set-up
But make the foreign key unique
A parent object has many child objects
A child object has many parent objects
e.g. An album has many songs, a song is on many albums
Create a linking table (album_track)
Two one to many relationships
Can also add more information to link table
e.g. Track number
"Don't Repeat Yourself" for databases
Each piece of data appears once in the database
One definitive source for each fact
If the same information is stored more than once then versions can get out of step
Which instance do you trust?
An example of unnormalised data
MP3 metadata
track | title | artist | album | year |
---|---|---|---|---|
1 | Smile | Lily Allen | Alright Still | 2006 |
2 | Knock 'Em Out | Lily Allen | Alright Still | 2006 |
3 | LDN | Lily Allen | Alright Still | 2006 |
Far too much repeated data
Converting Access database to MySQL
Artist table
Couldn't find the MP3 table
Took a closer look at the artist table
name, contact details, track titles, file names
Last two were comma separated fields
artist table:
name | contact details | tracks | files |
---|---|---|---|
Lily Allen | lily@example.com | Smile, Knock 'Em Out, LDN |
smile.mp3, knock_em_out.mp3, ldn.mp3 |
Good job song titles never contain commas!
Need three tables
Artist, Album, Track
artist table:
artist | |||
---|---|---|---|
id | name | ||
1 | Lily Allen |
album | |||
---|---|---|---|
id | artist | title | year |
1 | 1 | Alright Still | 2006 |
track table
track | |||
---|---|---|---|
id | album | number | title |
1 | 1 | 1 | Smile |
2 | 1 | 2 | Knock 'Em Out |
3 | 1 | 3 | LDN |
Many different normal forms
Six or seven last time I checked
See Wikipedia article "Database Normalisation"
Third normal form is usually good enough
Aka "common sense"
Not as complex as it might seem
Identify data objects (tables)
Identify data items (columns)
Assign items to appropriate objects
You need to understand the data
(Or have good access to someone who does)
A data table is in (third) normal form when its columns are dependent on...
The key
The whole key
And nothing but the key
Normalisation can be taken too far
I once worked with a database that went far too far
Only three tables in the database
Objects
Attribute
Associations
Very hard to work with
Sometimes a normalised database has performance issues
Need to join too many tables to get data out
May consider planned denormalisation to counter this
Always start from a normalised database
Always know exactly which problems you are solving
Always use triggers to control the denormalised columns
Indexes have two purposes in a database
To improve query performance
To impose uniqueness constraints
An index is just like an index in a book
Provides a quick way to find data
Query engine looks up data in an index
Index points to the relevant data page
Faster than doing a table scan
Indexes make queries faster
But they need to be kept up to date
Therefore updates are slower
No free lunches
Only add indexes that are going to be useful
Create indexes with CREATE INDEX
CREATE INDEX index_name ON table_name (column1, ...)
Example:
CREATE INDEX artist_ind ON album (artist)
Now queries that use the artist column on album will be faster
SELECT * FROM album WHERE artist = ?
Don't need to explicity use the index
An index can contain more than one column
Example:
CREATE INDEX artist_year_ind ON album (artist, year)
Speeds up queries of the form
SELECT * FROM album WHERE artist = ? AND year = ?
Unique indexes prevent duplicate rows from being inserted into tables
Primary keys are (by definition) also unique indexes
Add UNIQUE keywork to CREATE INDEX statement
CREATE UNIQUE INDEX index_name ON table (column1, ...)
Each album can only have one track with each track number
CREATE UNIQUE INDEX album_track_ind ON track (album, number)
Insert one track
mysql> insert track (title,album,number) -> values ('Smile',1,1); Query OK, 1 row affected (0.02 sec)
Insert second track
mysql> insert track (title,album,number) -> values ('Knock Em Out',1,1); ERROR 1062 (23000): Duplicate entry '1-1' for key 2
Can also create single-column unique indexes in column definitions
CREATE TABLE foo ( id INTEGER PRIMARY KEY AUTO_INCREMENT, code CHAR(10) UNIQUE )
Can also create indexes in table definitions
CREATE TABLE foo ( id INTEGER PRIMARY KEY AUTO_INCREMENT, code CHAR(10), UNIQUE (code) )
Now you have a nicely normalised database
You need to query multiple tables to get any useful data out
Query multiple tables using table joins
It's possible to join tables just using the WHERE clause
SELECT album.title, album.release_date FROM album, artist WHERE artist.name = 'Pink Floyd' AND artist.id = album.artist
Join clauses muddled up with selection clauses
Potentially confusing
Still seen, but best avoided
SQL now supports a more explicit JOIN clause
SELECT album.title, album.release_date FROM album JOIN artist ON artist.id = album.artist WHERE artist.name = 'Pink Floyd'
Join clause clearly separated from selection clauses
Also more flexible
Cross join
Inner join
Outer join
Left, right and full
SELECT some columns FROM table1, table2
No join clause
All rows in table 1 returned matched with all rows in table 2
If table 1 has m rows and table 2 has n rows, returns m*n rows
Also known as "Cartesian Product"
Usually not what you want
SELECT some columns FROM table1 JOIN table2 ON table1.col1 = table2.col2
Finds intersection between two tables
Returns matches where row exists in both tables
Default join type
Most used type
SELECT some columns FROM table1 LEFT OUTER JOIN table2 ON tab1e1.col1 = table2.col2
Returns all rows from table 1 and matching data from table 2
Returns NULLs where no matching data in table 2
SELECT some columns FROM table1 RIGHT OUTER JOIN table2 ON tab1e1.col1 = table2.col2
Returns all rows from table 2 and matching data from table 1
Returns NULLs where no matching data in table 1
SELECT some columns FROM table1 FULL OUTER JOIN table2 ON tab1e1.col1 = table2.col2
Returns the combination of LEFT and RIGHT outer joins
Currently not supported by MySQL
That syntax doesn't work
Subqueries are another way to extract data from multiple tables
SELECT * FROM some_table WHERE some_column IN (SELECT some_other_column FROM some_other_table)
Selects all values of some_other_column
from some_other_table
Selects rows from some_table
where some_column
contains one of those values
Subqueries can be nested
SELECT * FROM some_table WHERE some_column IN (SELECT some_other_column FROM some_other_table WHERE a_different_column IN (SELECT yet_another_column FROM yet_another_table))
Can use other comparison operators with subqueries
SELECT * FROM some_table WHERE some_column > (SELECT some_other_column FROM some_other_table)
Useful when subquery doesn't return a single value
SELECT * FROM some_table WHERE some_column > ANY (SELECT some_other_column FROM some_other_table)
SELECT * FROM some_table WHERE some_column > ALL (SELECT some_other_column FROM some_other_table)
Can compare multiple values from the subquery
SELECT * FROM some_table WHERE (some_column, another_column) = (SELECT some_other_column, a_second_column FROM some_other_table)
Subqueries can use values from the outer query
SELECT * FROM some_table WHERE some_column = ANY (SELECT some_other_column FROM some_other_table WHERE some_other_table.another_column = some_table.yet_another_column)
SELECT * FROM some_table WHERE EXISTS (SELECT * FROM some_other_table WHERE some_table.some_column = some_other_table.some_other_column)
SELECT * FROM some_table WHERE NOT EXISTS (SELECT * FROM some_other_table WHERE some_table.some_column = some_other_table.some_other_column)
Subqueries are a very powerful feature of SQL
The key to many kinds of complex data manipulation
Good support for them in MySQL 4.1
Well worth learning how to use them
A view is a pre-built select statement that can be treated as a table
CREATE VIEW view_name AS SELECT ...
SELECT * FROM view_name
Display only certain columns from a table
Display only certain rows from a table
Create "denormalised" tables to make queries simpler
Create summary tables
Only display certain columns
CREATE VIEW view_name AS SELECT a_column, another_column FROM a_table
Useful for security
Some users only allowed to see certain columns
Only display certain rows
CREATE VIEW view_name AS SELECT * FROM table WHERE some_condition_is_true
This year's albums
CREATE VIEW albums_2006 AS SELECT * FROM album WHERE YEAR(release_date) = 2006
Need to extract data by joining many tables
Pre-build complex joins as views
CREATE VIEW mp3 AS SELECT artist.name AS artist, album.title AS title, track.title AS track FROM artist JOIN album ON album.artist = artist.id JOIN track ON track.album = album.id;
Rename columns in result set
Pre-build common summary tables as views
CREATE VIEW albums_by_year AS SELECT YEAR(release_date), COUNT(*) AS album_count FROM album GROUP BY YEAR(release_date)
A trigger is a piece of executable code that is fired (triggered) when a particular database action occurs
Useful to run actions when data is inserted, updated or deleted
Triggers can run before or after INSERT, UPDATE or DELETE events
MySQL 5.x syntax
CREATE TRIGGER name BEFORE|AFTER INSERT|UPDATE|DELETE ON table_name FOR EACH ROW some_code
example:
CREATE TRIGGER name BEFORE INSERT ON artist FOR EACH ROW some useful action
Triggers have access to "pseudo-tables" called NEW and OLD
NEW contains the new data for each row
OLD contains the old data for each row
OLD isn't available in INSERT triggers
NEW isn't available in DELETE triggers
Create a table with two extra columns - updater and updated
CREATE TABLE some_table ( id INTEGER PRIMARY KEY AUTO_INCREMENT, name CHAR(20), updater VARCHAR(60), updated DATETIME )
Create triggers
CREATE TRIGGER ins BEFORE INSERT ON some_table FOR EACH ROW SET NEW.updater = USER(), NEW.updated = NOW()
CREATE TRIGGER upd BEFORE UPDATE ON some_table FOR EACH ROW SET NEW.updater = USER(), NEW.updated = NOW()
Do some updates and inserts
INSERT some_table (name) values ('Dave') INSERT some_table (name) values ('Perl') UPDATE some_table SET name = 'davorg' WHERE name = 'Dave'
SELECT * FROM some_table +----+--------+----------------+---------------------+ | id | name | updater | updated | +----+--------+----------------+---------------------+ | 1 | davorg | dave@localhost | 2006-08-27 16:52:45 | | 2 | Perl | dave@localhost | 2006-08-27 16:52:05 | +----+--------+----------------+---------------------+
Possible extension - handling deletions
Create parent and child tables with one parent column denormalised into the child table
CREATE TABLE parent ( id INTEGER PRIMARY KEY AUTO_INCREMENT, name CHAR(20) )
CREATE TABLE child ( id INTEGER PRIMARY KEY AUTO_INCREMENT, name CHAR(20), parent INTEGER REFERENCES parent(id), parent_name char(20) )
Create denormalisation triggers
CREATE TRIGGER ins_chd BEFORE INSERT ON child FOR EACH ROW SET NEW.parent_name = (SELECT name FROM parent WHERE id = NEW.parent)
CREATE TRIGGER upd_par AFTER UPDATE ON parent FOR EACH ROW UPDATE child SET parent_name = NEW.name WHERE parent = NEW.id
Insert rows
INSERT parent (name) VALUES ('Foo')
INSERT child (name, parent) VALUES ('Child', 1)
select * from child; +----+-------+--------+-------------+ | id | name | parent | parent_name | +----+-------+--------+-------------+ | 1 | Child | 1 | Foo | +----+-------+--------+-------------+
Update parent
UPDATE parent SET name = 'Bar' WHERE name = 'Foo'
select * from child; +----+-------+--------+-------------+ | id | name | parent | parent_name | +----+-------+--------+-------------+ | 1 | Child | 1 | Bar | +----+-------+--------+-------------+
SQL In Nutshell - Kevin Kline
Introduction to Database Systems - CJ Date
SQL For Smarties - Joe Celko
Database in Depth - CJ Date
Database design is a complex and interesting area
Well designed databases are easier to work with
Use your database to describe your data in as much detail as possible
Upgrade to a database that supports these features
Hire a database expert