Advanced Databases for Beginners

Dave Cross <dave@mag-sol.com>

Magnum Solutions Ltd

Advanced Databases for Beginners

  • Databases have become ubiqitous

  • Almost every web application has a database at its backend

  • Databases are being designed by people who have no training in DB design

  • Or who have never used a "real" database

    • MySQL 3.x is still very common

    • Missing many of these features

  • Making basic errors

  • Database design is a skill

Well Designed Databases

  • Well designed databases are:

    • Easier to write queries against

    • Easier to maintain

    • Faster

Intelligent Databases

  • Databases allow you to describe your data in detail

  • This is the best place for this information

  • Not in your application

  • Application logic needs to be duplicated when another application connects to your database

  • Database logic is already there

Practicalities

  • Advanced databases need an advanced database server

  • Commercial database systems

    • Oracle

    • Sybase

    • DB2

    • etc...

Practicalities (cont)

  • Open source database systems

    • PostgreSQL

    • MySQL (from version 4.0)

      • Version 5 is better

      • Version 3 is much much worse

    • SQLite doesn't do a lot of this stuff

  • Examples based on MySQL 5.0

What We Will Cover

  • Naming conventions

  • Data types

  • Relationships

  • Normalisation

  • Indexes

  • Joins, Subqueries and Views

  • Triggers

What We Won't Cover

  • Performance Tuning

  • Stored Procedures

  • Too large to cover

  • Too much difference between database systems

What You Should Know Already

  • Simple table creation statements

    •   CREATE TABLE foo (id INTEGER, name CHAR(20))
  • Simple use of INSERT/SELECT/UPDATE/DELETE

Naming Conventions

  • Just as important as in code

  • Consistancy is important

  • Many RDBMSs not case sensitive

    • therefore my_table better than MyTable

    • and my_column better than MyColumn

  • my_objects vs my_object - pick one and stick with it

Naming Keys

  • Consistant name for primary key (use "id")

  •   CREATE TABLE book (
        id INTEGER,
        ...
      }
  • Consistant naming convention for foreign keys (use table name)

  •   CREATE TABLE book (
        ...
        author INTEGER
      )

Naming Convention examples

  • Bad example:

     CREATE table book (
       book_id INTEGER,
       book_title VARCHAR(100),
       book_author_id INTEGER
     }
  • Good example

     CREATE table book (
       id INTEGER,
       title VARCHAR(100),
       author INTEGER
     )

Naming Conventions vs ORM

  • Your ORM may insist on particular naming conventions

  • For tables

  • For primary keys

  • For foreign keys

  • This is generally a clue that your ORM is flawed

Data Types

  • Use the right data type for your data

  • Numbers for numbers, characters for characters

  • Choose the right size data type

    • TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT

    • CHAR, VARCHAR or TEXT

  • Use NOT NULL where appropriate

Data Types (cont)

  • You won't get it right first time

  • Learn to love ALTER TABLE

  • Sizing columns correctly can save a lot of disk space

  • Also effects index efficiency

Dates and Times

  • Store dates and times in datetime columns

  • Most databases can do date/time arithmetic on datetime columns

    • ADDDATE, ADDTIME, DATEDIFF

  • Functions to extract parts of date

    • YEAR, MONTH, MONTHNAME, DAY, DAYNAME, HOUR, MINUTE, SECOND

  • It's often much easier to do this in the database

Number 1 Date/Time Database Tip

  • Store dates and times as UTC

  • Treat all other timezones as display variations

Enumerated Data Types

  • Many database systems allow you to define enumerated data types

  • String type that allows one value from a predefined set

  •   CREATE TABLE person (
        id INTEGER,
        name CHAR(30),
        date_of_birth DATE,
        sex ENUM('M', 'F')
      )
  • Saves space

Keys and Relationships

  • Build relationships between tables

  • This is the power of relational databases

    • (Tho' it's not why they are called "relational")

Primary Keys

  • Give each table in your database a primary key

  • Primary key is a unique identifier

  • Tip: Make it an integer

Choosing Primary Keys

  • Usually best if it's a purely internal value

  • A value you can control

  • External keys don't always remain integers

    • "We're appending the location code to this key"

  • External keys don't remain unique

    • "We're now going to take data from two systems"

    • "This external system reuses old keys"

Defining Primary Keys

  • Define primary key in CREATE TABLE statement

     CREATE TABLE artist (
       id INTEGER AUTO_INCREMENT PRIMARY KEY,
       name VARCHAR(50)
     )
  • Or

     CREATE TABLE artist (
       id INTEGER AUTO_INCREMENT,
       name VARCHAR(50),
       PRIMARY KEY (id)
     )

Defining Primary Keys (cont)

  • In MySQL use AUTO_INCREMENT

  • Can also use multi-column keys

    • Not often necessary though

  • Defining a column as a primary key automatically makes it unique and not null

Fundamental Rules of Primary Keys

  • Never update a primary key

  • Never reuse a primary key

Foreign Keys

  • The other end of the relationship

  • A column in one table that contains values from the primary key of another table

  • For example an album table would contain the key of the artist

     CREATE TABLE album (
       id INTEGER PRIMARY KEY AUTO_INCREMENT,
       artist INTEGER,
       title VARCHAR(50),
       release_date DATE
     )

Using Foreign Keys

  • You can then get the artist details given the album id

     SELECT artist.name
     FROM   artist
     JOIN   album
     ON     album.artist = artist.id
     WHERE  album.id = ?
  • Or all albums for an artist

     SELECT title
     FROM   album
     JOIN   artist
     ON     album.artist = artist.id
     WHERE  artist.id = ?

Defining Foreign Keys

  • Notice that we haven't defined album.artist as a foreign key anywhere

  • Joins don't require foreign keys to be defined

  • But you can define them

     CREATE TABLE album (
       id INTEGER AUTO_INCREMENT PRIMARY KEY,
       artist INTEGER REFERENCES artist(id),
       title VARCHAR(50),
       release_date DATE
     )

Defining Foreign Keys (cont)

  • Alternative Syntax

     CREATE TABLE album (
       id INTEGER AUTO_INCREMENT PRIMARY KEY,
       artist INTEGER,
       title VARCHAR(50),
       release_date DATE,
       FOREIGN KEY (artist) REFERENCES artist (id)
     )

Why Define Foreign Keys?

  • If table joins work without defining foreign keys then why bother defining them?

  • Referential integrity

  • With foreign keys defined the database will watch your back and stop you doing anything stupid

  • Can't insert foreign keys where the primary keys don't exist

  • Can't delete primary keys that exist as a foreign key

Referential Integrity in Action

  •   mysql> insert album values (1, 'Alright Still', '2006-07-17', 1);
      ERROR 1452 (23000): Cannot add or update a child row:
       a foreign key constraint fails (`album/album`, CONSTRAINT
       `album_ibfk_1` FOREIGN KEY (`artist`) REFERENCES `artist` (`id`))
  •   mysql> insert artist values (1, 'Lily Allen');
      Query OK, 1 row affected (0.02 sec)
      mysql> insert album values (1, 'Alright Still', '2006-07-17', 1);
      Query OK, 1 row affected (0.01 sec)
      mysql> delete from artist where id = 1;
      ERROR 1451 (23000): Cannot delete or update a parent row:
       a foreign key constraint fails (`album/album`, CONSTRAINT
       `album_ibfk_1` FOREIGN KEY (`artist`) REFERENCES `artist` (`id`))

Big MySQL Foreign Key Caveat

  • Declaring foreign keys has no effect in the default (MyISAM) storage engine

  • To get referential integrity features use InnoDB

       CREATE TABLE album (
         id INTEGER AUTO_INCREMENT,
         artist INTEGER REFERENCES artist(id),
         title VARCHAR(50),
         release_date DATE
       ) ENGINE=InnoDB;

Configuring Foreign Keys

  • The deletion strategy is configurable

  • FOREIGN KEY (column) REFERENCES table (column) ON DELETE RESTRICT

    • Default action. Can't delete

  • FOREIGN KEY (column) REFERENCES table (column) ON DELETE CASCADE

    • Deletes all child records

    • If you delete an artist all associated albums are automatically deleted

Configuring Foreign Keys (cont)

  • FOREIGN KEY (column) REFERENCES table (column) ON DELETE SET NULL

    • All child records have foreign key set to null

    • Effectively marks them as "unowned"

  • Also works for updates

    • ON UPDATE RESTRICT, etc

    • But you should never update a primary key

Relationship Cookbook

  • There are a small number of standard relationships that you often see

  • Standard ways of handling them

  • No point in reinventing wheels

One to Many

  • A parent object has many child objects

    • Note: "many" can include "none"

  • A child object has exactly one parent object

  • Standard primary key/foreign key set-up

One to One

  • A parent object has exactly one child object

    • Sometimes zero or one

  • A child object has exactly one parent object

  • Standard primary key/foreign key set-up

    • But make the foreign key unique

Many to Many

  • A parent object has many child objects

  • A child object has many parent objects

  • e.g. An album has many songs, a song is on many albums

  • Create a linking table (album_track)

  • Two one to many relationships

  • Can also add more information to link table

    • e.g. Track number

Normalisation

  • "Don't Repeat Yourself" for databases

  • Each piece of data appears once in the database

  • One definitive source for each fact

  • If the same information is stored more than once then versions can get out of step

  • Which instance do you trust?

Unnormalised Table

  • An example of unnormalised data

  • MP3 metadata

    tracktitleartist albumyear
    1SmileLily Allen Alright Still2006
    2Knock 'Em OutLily Allen Alright Still2006
    3LDN Lily Allen Alright Still2006

  • Far too much repeated data

Real Life Horror Story: MP3 Downloads

  • Converting Access database to MySQL

  • Artist table

  • Couldn't find the MP3 table

  • Took a closer look at the artist table

Real Life Horror Story: MP3 Downloads (cont)

  • name, contact details, track titles, file names

  • Last two were comma separated fields

  • artist table:

    namecontact detailstracksfiles
    Lily Allen lily@example.com Smile,
    Knock 'Em Out,
    LDN
    smile.mp3,
    knock_em_out.mp3,
    ldn.mp3

  • Good job song titles never contain commas!

Normalising MP3 Data

  • Need three tables

  • Artist, Album, Track

  • artist table:

    artist
    id name
    1Lily Allen

    Normalising MP3 Data (cont)

    • album table:

    album
    idartisttitleyear
    11Alright Still2006

  • track table

    track
    idalbumnumbertitle
    111Smile
    212Knock 'Em Out
    313LDN

Normalising Data

  • Many different normal forms

    • Six or seven last time I checked

  • See Wikipedia article "Database Normalisation"

  • Third normal form is usually good enough

    • Aka "common sense"

  • Not as complex as it might seem

Simple Guide to Normalising Data

  • Identify data objects (tables)

  • Identify data items (columns)

  • Assign items to appropriate objects

  • You need to understand the data

    • (Or have good access to someone who does)

Normalised Data

  • A data table is in (third) normal form when its columns are dependent on...

    • The key

    • The whole key

    • And nothing but the key

Know When To Stop

  • Normalisation can be taken too far

  • I once worked with a database that went far too far

  • Only three tables in the database

    • Objects

    • Attribute

    • Associations

  • Very hard to work with

Denormalisation

  • Sometimes a normalised database has performance issues

  • Need to join too many tables to get data out

  • May consider planned denormalisation to counter this

  • Always start from a normalised database

  • Always know exactly which problems you are solving

  • Always use triggers to control the denormalised columns

Indexes

  • Indexes have two purposes in a database

  • To improve query performance

  • To impose uniqueness constraints

Indexes for Performance

  • An index is just like an index in a book

  • Provides a quick way to find data

  • Query engine looks up data in an index

  • Index points to the relevant data page

  • Faster than doing a table scan

Index Downsides

  • Indexes make queries faster

  • But they need to be kept up to date

  • Therefore updates are slower

  • No free lunches

  • Only add indexes that are going to be useful

Creating Indexes

  • Create indexes with CREATE INDEX

      CREATE INDEX index_name
      ON table_name (column1, ...)
  • Example:

      CREATE INDEX artist_ind
      ON album (artist)

Using Indexes

  • Now queries that use the artist column on album will be faster

  •   SELECT *
      FROM   album
      WHERE  artist = ?
  • Don't need to explicity use the index

Multi-Column Indexes

  • An index can contain more than one column

  • Example:

      CREATE INDEX artist_year_ind
      ON album (artist, year)
  • Speeds up queries of the form

      SELECT *
      FROM   album
      WHERE  artist = ? AND year = ?

Creating Unique Indexes

  • Unique indexes prevent duplicate rows from being inserted into tables

  • Primary keys are (by definition) also unique indexes

  • Add UNIQUE keywork to CREATE INDEX statement

      CREATE UNIQUE INDEX index_name
      ON table (column1, ...)

Unique Index Example

  • Each album can only have one track with each track number

      CREATE UNIQUE INDEX album_track_ind
      ON track (album, number)

Breaking Unique Indexes

  • Insert one track

     mysql> insert track (title,album,number)
         -> values ('Smile',1,1);
     Query OK, 1 row affected (0.02 sec)
  • Insert second track

      mysql> insert track (title,album,number)
          -> values ('Knock Em Out',1,1);
      ERROR 1062 (23000): Duplicate entry '1-1' for key 2

Alternative Index syntax

  • Can also create single-column unique indexes in column definitions

      CREATE TABLE foo (
        id INTEGER PRIMARY KEY AUTO_INCREMENT,
        code CHAR(10) UNIQUE
      )
  • Can also create indexes in table definitions

      CREATE TABLE foo (
        id INTEGER PRIMARY KEY AUTO_INCREMENT,
        code CHAR(10),
        UNIQUE (code)
      )

Joining Tables

  • Now you have a nicely normalised database

  • You need to query multiple tables to get any useful data out

  • Query multiple tables using table joins

Old Style Join Syntax

  • It's possible to join tables just using the WHERE clause

  •   SELECT album.title, album.release_date
      FROM   album, artist
      WHERE  artist.name = 'Pink Floyd'
      AND    artist.id = album.artist
  • Join clauses muddled up with selection clauses

  • Potentially confusing

  • Still seen, but best avoided

New Style Join Syntax

  • SQL now supports a more explicit JOIN clause

  •   SELECT album.title, album.release_date
      FROM   album
      JOIN   artist
      ON     artist.id = album.artist
      WHERE  artist.name = 'Pink Floyd'
  • Join clause clearly separated from selection clauses

  • Also more flexible

Types of Join

  • Cross join

  • Inner join

  • Outer join

    • Left, right and full

Cross Join

  •   SELECT some columns
      FROM   table1, table2
  • No join clause

  • All rows in table 1 returned matched with all rows in table 2

  • If table 1 has m rows and table 2 has n rows, returns m*n rows

  • Also known as "Cartesian Product"

  • Usually not what you want

Inner Join

  •   SELECT some columns
      FROM   table1
      JOIN   table2
      ON     table1.col1 = table2.col2
  • Finds intersection between two tables

  • Returns matches where row exists in both tables

  • Default join type

  • Most used type

Left Outer Join

  •   SELECT some columns
      FROM   table1
      LEFT OUTER JOIN table2
      ON     tab1e1.col1 = table2.col2
  • Returns all rows from table 1 and matching data from table 2

  • Returns NULLs where no matching data in table 2

Right Outer Join

  •   SELECT some columns
      FROM   table1
      RIGHT OUTER JOIN table2
      ON     tab1e1.col1 = table2.col2
  • Returns all rows from table 2 and matching data from table 1

  • Returns NULLs where no matching data in table 1

Full Outer Join

  •   SELECT some columns
      FROM   table1
      FULL OUTER JOIN table2
      ON     tab1e1.col1 = table2.col2
  • Returns the combination of LEFT and RIGHT outer joins

  • Currently not supported by MySQL

  • That syntax doesn't work

Subqueries

  • Subqueries are another way to extract data from multiple tables

  •   SELECT *
      FROM   some_table
      WHERE  some_column IN
        (SELECT some_other_column
         FROM   some_other_table)
  • Selects all values of some_other_column from some_other_table

  • Selects rows from some_table where some_column contains one of those values

Nested Subqueries

  • Subqueries can be nested

  •   SELECT *
      FROM   some_table
      WHERE  some_column IN
        (SELECT some_other_column
         FROM   some_other_table
         WHERE  a_different_column IN
           (SELECT yet_another_column
            FROM   yet_another_table))

Comparison Subqueries

  • Can use other comparison operators with subqueries

  •   SELECT *
      FROM   some_table
      WHERE  some_column >
        (SELECT some_other_column
         FROM   some_other_table)

Comparisons with ANY or ALL

  • Useful when subquery doesn't return a single value

  •   SELECT *
      FROM   some_table
      WHERE  some_column > ANY
        (SELECT some_other_column
         FROM   some_other_table)
  •   SELECT *
      FROM   some_table
      WHERE  some_column > ALL
        (SELECT some_other_column
         FROM   some_other_table)

Row Subqueries

  • Can compare multiple values from the subquery

  •   SELECT *
      FROM   some_table
      WHERE  (some_column, another_column) =
        (SELECT some_other_column, a_second_column
         FROM   some_other_table)

Correlated Subqueries

  • Subqueries can use values from the outer query

  •   SELECT *
      FROM   some_table
      WHERE  some_column = ANY
        (SELECT some_other_column
         FROM   some_other_table
         WHERE  some_other_table.another_column
                = some_table.yet_another_column)

EXISTS and NOT EXISTS

  •   SELECT *
      FROM   some_table
      WHERE EXISTS
        (SELECT *
         FROM   some_other_table
         WHERE  some_table.some_column
                = some_other_table.some_other_column)
  •   SELECT *
      FROM   some_table
      WHERE NOT EXISTS
        (SELECT *
         FROM   some_other_table
         WHERE  some_table.some_column
                = some_other_table.some_other_column)

Subqueries Summary

  • Subqueries are a very powerful feature of SQL

  • The key to many kinds of complex data manipulation

  • Good support for them in MySQL 4.1

  • Well worth learning how to use them

Views

  • A view is a pre-built select statement that can be treated as a table

  •   CREATE VIEW view_name AS
      SELECT ...
      SELECT * FROM view_name
  • Display only certain columns from a table

  • Display only certain rows from a table

  • Create "denormalised" tables to make queries simpler

  • Create summary tables

Single Table Views

  • Only display certain columns

      CREATE VIEW view_name AS
      SELECT a_column, another_column
      FROM   a_table
  • Useful for security

  • Some users only allowed to see certain columns

Single Table Views (cont)

  • Only display certain rows

      CREATE VIEW view_name AS
      SELECT *
      FROM   table
      WHERE  some_condition_is_true
  • This year's albums

      CREATE VIEW albums_2006 AS
      SELECT *
      FROM   album
      WHERE  YEAR(release_date) = 2006

Denormalised Tables

  • Need to extract data by joining many tables

  • Pre-build complex joins as views

  •   CREATE VIEW mp3 AS
      SELECT artist.name AS artist,
             album.title AS title,
             track.title AS track
      FROM   artist
      JOIN   album ON album.artist = artist.id
      JOIN   track ON track.album = album.id;
  • Rename columns in result set

Summary Tables

  • Pre-build common summary tables as views

  •   CREATE VIEW albums_by_year AS
      SELECT YEAR(release_date),
             COUNT(*) AS album_count
      FROM   album
      GROUP BY YEAR(release_date)

Triggers

  • A trigger is a piece of executable code that is fired (triggered) when a particular database action occurs

  • Useful to run actions when data is inserted, updated or deleted

  • Triggers can run before or after INSERT, UPDATE or DELETE events

Defining Triggers

  • MySQL 5.x syntax

  •   CREATE TRIGGER name 
      BEFORE|AFTER INSERT|UPDATE|DELETE
      ON table_name
      FOR EACH ROW some_code
  • example:

      CREATE TRIGGER name BEFORE INSERT
      ON artist
      FOR EACH ROW some useful action

NEW and OLD

  • Triggers have access to "pseudo-tables" called NEW and OLD

  • NEW contains the new data for each row

  • OLD contains the old data for each row

  • OLD isn't available in INSERT triggers

  • NEW isn't available in DELETE triggers

Trigger Example: Auditing

  • Create a table with two extra columns - updater and updated

      CREATE TABLE some_table (
        id INTEGER PRIMARY KEY AUTO_INCREMENT,
        name CHAR(20),
        updater VARCHAR(60),
        updated DATETIME
      )

Trigger Example: Auditing (2)

  • Create triggers

      CREATE TRIGGER ins BEFORE INSERT
      ON some_table
      FOR EACH ROW
      SET NEW.updater = USER(), NEW.updated = NOW()
      CREATE TRIGGER upd BEFORE UPDATE
      ON some_table
      FOR EACH ROW
      SET NEW.updater = USER(), NEW.updated = NOW()

Trigger Example: Auditing (3)

  • Do some updates and inserts

      INSERT some_table (name) values ('Dave')
      INSERT some_table (name) values ('Perl')
      UPDATE some_table
      SET name = 'davorg' WHERE name = 'Dave'
  •   SELECT * FROM some_table
      +----+--------+----------------+---------------------+
      | id | name   | updater        | updated             |
      +----+--------+----------------+---------------------+
      |  1 | davorg | dave@localhost | 2006-08-27 16:52:45 |
      |  2 | Perl   | dave@localhost | 2006-08-27 16:52:05 |
      +----+--------+----------------+---------------------+
  • Possible extension - handling deletions

Trigger Example: Denormalisation (1)

  • Create parent and child tables with one parent column denormalised into the child table

      CREATE TABLE parent (
        id INTEGER PRIMARY KEY AUTO_INCREMENT,
        name CHAR(20)
      )
      CREATE TABLE child (
        id INTEGER PRIMARY KEY AUTO_INCREMENT,
        name CHAR(20),
        parent INTEGER REFERENCES parent(id),
        parent_name char(20)
      )

Trigger Example: Denormalisation (2)

  • Create denormalisation triggers

      CREATE TRIGGER ins_chd BEFORE INSERT
      ON child
      FOR EACH ROW SET NEW.parent_name =
                          (SELECT name
                           FROM   parent
                           WHERE  id = NEW.parent)
      CREATE TRIGGER upd_par AFTER UPDATE
      ON parent
      FOR EACH ROW UPDATE child
                   SET    parent_name = NEW.name
                   WHERE  parent = NEW.id

Trigger Example: Denormalisation (3)

  • Insert rows

      INSERT parent (name)
      VALUES ('Foo')
      INSERT child (name, parent)
      VALUES ('Child', 1)
  •   select * from child;
      +----+-------+--------+-------------+
      | id | name  | parent | parent_name |
      +----+-------+--------+-------------+
      |  1 | Child |      1 | Foo         |
      +----+-------+--------+-------------+

Trigger Example: Denormalisation (4)

  • Update parent

      UPDATE parent
      SET    name = 'Bar'
      WHERE  name = 'Foo'
  •   select * from child;
      +----+-------+--------+-------------+
      | id | name  | parent | parent_name |
      +----+-------+--------+-------------+
      |  1 | Child |      1 | Bar         |
      +----+-------+--------+-------------+

Book Recommendations

  • SQL In Nutshell - Kevin Kline

Book Recommendations

  • Introduction to Database Systems - CJ Date

Book Recommendations

  • SQL For Smarties - Joe Celko

Book Recommendations

  • Database in Depth - CJ Date

Some Conclusions

  • Database design is a complex and interesting area

  • Well designed databases are easier to work with

  • Use your database to describe your data in as much detail as possible

  • Upgrade to a database that supports these features

  • Hire a database expert