Relational Databases

What is a database?

An organized collection of data organized to model aspects of reality in a way that supports processes requiring information.
A relational database is a database whose organization is based on the relational model of data, as proposed by E. F. Codd in 1970.
Most relational database systems use SQL (Structured Query Language) as the language for querying and maintaining the database.

Motivation

Do you have multiple spreadsheets that are related?
Do you have multiple tabs in a spreadsheet that are related?
Do you have a gigantic spreadsheet (> 1M rows) that freezes Excel when you open it?
Do you have a slowly changing spreadsheet that you want to keep track of?

Relational Model

A database consists of a series of unordered tables (or relations) that can be manipulated using operations that return tables.
Purpose: provide a declarative method for specifying data and queries.
Users directly state what information the database contains and what information they want from it.

Design (more later)

Unless you walk into a project with a database already in production, you will probably have to design one yourself. You have to make decisions about

what tables to create,
what columns they contain, and
what are the relationships between tables.

This is usually not easy and takes time and effort.

We will talk more about strategies for this later (week 5).

Tables

Tables represent “things” in the real world.
Each table should only represent one “thing”
A “thing” can be an object or an event
- e.g. Customers, students, locations, sample
Tables are made up of rows and columns
Can imagine a table is kind of like a spreadsheet of data

Your turn

Discuss

Think of a spreadsheet that you have or have encountered
What “things” does it contain?
- More than one?
- Events, objects, both?
What columns are contained?

Uniqueness

Each row must be unique in a table
- This helps us avoid confusion in our tables
Can guarantee uniqueness by assigning a key to a row

Primary keys

Each table can have only one primary key
All columns (or combination of columns) in a table with unique values are referred to as candidate keys
Keys can be simple or composite
- simple key made up of one column
- composite key made up of two or more columns
You choose this in design phase of a DB (no right answer)
- choose the fewest columns necessary
- choose a key that seldom changes
- choose a key that is both simple and familiar to users

Your turn

Discuss

Pick one of the “things” from your spreadsheet
Are the rows unique?
If not, how would you ensure that they are?
What is a good primary key for your “thing”?

Foreign keys

Primary keys are functions of individual tables
When you create relationships between tables, need a way to join multiple tables together
A foreign key is a column in a table used to reference a primary key in another table

… can be complex

Foreign keys in a database created to model relationships in the real world.

Relationships between real-world entities can be complex, involving numerous entities each having multiple relationships with each other.

Tables can be related in one of three different ways: one-to-one, one-to-many or many-to-many.

One-to-one

Two tables are related in a one-to-one relationship if:

for every row in the first table, there is at most one row in the second table.

True one-to-one relationships seldom occur in the real world.

Often created to get around some limitation of the database management software rather than to model a real-world situation.

One-to-one related tables should always have the same primary key.

One-to-many

Two tables are related in a one-to-manyrelationship if:

for every row in the first table, there can be zero, one, or many rows in the second table, but for every row in the second table there is exactly one row in the first table.

E.g. link base tables to information stored in lookup tables - experiment table has date, treatment, plot; treatment table has information about each possible treatment.

Many-to-many

Two tables are related in a many-to-many relationship if:

for every row in the first table, there can be many rows in the second table, and for every row in the second table, there can be many rows in the first table.

E.g. student table can include classes; class table has multiple students as class list.

Most complicated relationship, may need an intermediary table depending on DBMS.

Your turn

Discuss

If your spreadsheet contains multiple “things”, how do they relate?
If not, think of multiple spreadsheets that you have that are related. What is the relationship?
What are the foreign keys you would use to link “things”?
What type of relationship(s) are you dealing with?

Relational Databases

It’s all relative

What is a database?

Motivation

Relational Model

Design (more later)

Tables and keys

Tables

Your turn

Uniqueness

Primary keys

Your turn

Foreign keys

Relationships…

… can be complex

One-to-one

One-to-many

Many-to-many

Your turn

Software

Before Friday…