What is a database?
- An organized collection of data organized to model aspects of reality in a way that supports processes requiring information.
- A relational database is a database whose organization is based on the relational model of data, as proposed by E. F. Codd in 1970.
- Most relational database systems use SQL (Structured Query Language) as the language for querying and maintaining the database.
Motivation
- Do you have multiple spreadsheets that are related?
- Do you have multiple tabs in a spreadsheet that are related?
- Do you have a gigantic spreadsheet (> 1M rows) that freezes Excel when you open it?
- Do you have a slowly changing spreadsheet that you want to keep track of?
Relational Model
- A database consists of a series of unordered tables (or relations) that can be manipulated using operations that return tables.
- Purpose: provide a declarative method for specifying data and queries.
- Users directly state what information the database contains and what information they want from it.
Design (more later)
Unless you walk into a project with a database already in production, you will probably have to design one yourself. You have to make decisions about
- what tables to create,
- what columns they contain, and
- what are the relationships between tables.
This is usually not easy and takes time and effort.
We will talk more about strategies for this later (week 5).
Tables
- Tables represent “things” in the real world.
- Each table should only represent one “thing”
- A “thing” can be an object or an event
- e.g. Customers, students, locations, sample
- Tables are made up of rows and columns
- Can imagine a table is kind of like a spreadsheet of data
Your turn
Discuss
-
Think of a spreadsheet that you have or have encountered
-
What “things” does it contain?
-
More than one?
-
Events, objects, both?
-
What columns are contained?
Uniqueness
- Each row must be unique in a table
- This helps us avoid confusion in our tables
- Can guarantee uniqueness by assigning a key to a row
Primary keys
- Each table can have only one primary key
- All columns (or combination of columns) in a table with unique values are referred to as candidate keys
- Keys can be simple or composite
- simple key made up of one column
- composite key made up of two or more columns
- You choose this in design phase of a DB (no right answer)
- choose the fewest columns necessary
- choose a key that seldom changes
- choose a key that is both simple and familiar to users
Your turn
Discuss
-
Pick one of the “things” from your spreadsheet
-
Are the rows unique?
-
If not, how would you ensure that they are?
-
What is a good primary key for your “thing”?
Foreign keys
- Primary keys are functions of individual tables
- When you create relationships between tables, need a way to join multiple tables together
- A foreign key is a column in a table used to reference a primary key in another table
… can be complex
Foreign keys in a database created to model relationships in the real world.
Relationships between real-world entities can be complex, involving numerous entities each having multiple relationships with each other.
Tables can be related in one of three different ways: one-to-one, one-to-many or many-to-many.
One-to-one
Two tables are related in a one-to-one relationship if:
for every row in the first table, there is at most one row in the second table.
True one-to-one relationships seldom occur in the real world.
Often created to get around some limitation of the database management software rather than to model a real-world situation.
One-to-one related tables should always have the same primary key.
One-to-many
Two tables are related in a one-to-manyrelationship if:
for every row in the first table, there can be zero, one, or many rows in the second table, but for every row in the second table there is exactly one row in the first table.
E.g. link base tables to information stored in lookup tables - experiment table has date, treatment, plot; treatment table has information about each possible treatment.
Many-to-many
Two tables are related in a many-to-many relationship if:
for every row in the first table, there can be many rows in the second table, and for every row in the second table, there can be many rows in the first table.
E.g. student table can include classes; class table has multiple students as class list.
Most complicated relationship, may need an intermediary table depending on DBMS.
Your turn
Discuss
-
If your spreadsheet contains multiple “things”, how do they relate?
-
If not, think of multiple spreadsheets that you have that are related. What is the relationship?
-
What are the foreign keys you would use to link “things”?
-
What type of relationship(s) are you dealing with?
Before Friday…