/
SQL

SQL

This brief guide is not comprehensive - it is intended to be easy going.  Words in UPPERCASE are SQL keywords.

SQL is pronounced sequel or S-Q-L.  It stands for structured query language and is a simple language used with databases.  You can use SQL to create databases, but most of the SQL you see is used to query databases - to get information out in the format you want.

SQL is like origami - you start with big spread of data, and use commands to fold the data down into the format you need.

To run SQL queries, you need to know HowToRunSQLQueries

Database

A database consists of several tables.

Table

Tables consist of rows and columns.  Columns go left to right, rows go top to bottom.

This is a table called Students:

ID

Surname

Forename

Country

Course

FTE

1

Smith

John

02

Biology

1.0

2

Cooper

Brian

05

Politics

1.0

3

Wertz

Heidi

14

Biology

0.5

4

Goodman

Pete

 

Archaeology

1.0

Query

To get this data, we run the following SQL query:

SELECT ID, Surname, Forename, Country, Course, FTE
FROM Students

There are two keywords here. 

FROM

The FROM keyword specifies the tables you want to choose data from. 

SELECT

The SELECT keyword specifies the columns you want to choose.

The Dangerous But Useful Asterisk

If you don't know what columns are in the table, you can use an asterisk temporarily, but you should always replace it with the explicit column names for safety.

SELECT *
FROM Students

The SELECT keyword is what you use to filter your data by column, so if you're just after course and ID:

SELECT ID, Course
FROM Students

gives:

ID

Course

1

Biology

2

Politics

3

Biology

4

Archaeology

WHERE

To filter by row, we need a new keyword: WHERE.

SELECT ID, Course
FROM Students
WHERE Course = "Biology"

gives:

ID

Course

1

Biology

3

Biology

If you want to be more restrictive, you can provide multiple WHERE conditions:

SELECT ID, Course
FROM Students
WHERE Course = "Biology" AND ID = 1

will return a single record. You can specify exact matches, or other conditions such as great than/less than ( ID > 10 ), logical rules (Course = "Biology" Or Course = "Politics") and so on.

ORDER BY

If you want to sort your results, use the ORDER BY clause.

SELECT Surname, Forename
FROM Students
WHERE Course = "Biology"
ORDER BY Surname, Forename

(note you can use a column in a WHERE clause even if it isn't in the SELECT clause)

Surname

Forename

Smith

John

Wertz

Heidi

The default sort order is sensible. If it doesn't suit, you can use ASC or DESC:

SELECT Surname, Forename
FROM Students
WHERE Course = "Biology"
ORDER BY Surname DESC, Forename ASC

If you don't specify a sort order for each column, it will be random.

Recap

SELECT choose columns.
FROM chooses tables.
WHERE chooses rows.
ORDER BY sorts rows.

GROUP BY

To aggregate data, use GROUP BY:

SELECT Course
FROM Students
GROUP BY Course

Course

Biology

Archaeology

Politics

Awesome.

This isn't usually what you want - aggregation is about counting/summing things. We need to create a new column:

COUNT

SELECT Course, COUNT(ID)
FROM Students
GROUP BY Department

Course

COUNT(ID)

Biology

2

Archaeology

1

Politics

1

We've used the ID column here as the thing to count, but it doesn't really matter - COUNT just counts up the number of rows which match your aggregation criteria.

SUM

If instead you want to total up a number column, use SUM. To relabel columns, use AS.

SELECT Course, SUM(FTE) AS TotalFTE, SUM(1) AS TotalStudents
FROM Students
GROUP BY Department

Course

TotalFTE

TotalStudents

Biology

1.5

2

Archaeology

1

1

Politics

1

1

Note you can specify numbers instead of column names to do calculations - e.g. Price * 1.2. Or you can calculate across columns, e.g. Price * VATRate.  There are also MIN() and MAX() and stuff like that.

JOIN

In a good database we try to use codes and lookup tables to minimise the need to change data in lots of places - say a department changes from "Educational Studies" to "Education".  If you used the text "Educational Studies" in your database, you'd need to replace /every/ reference.  If instead you used a code (e.g. 40), you can then maintain a lookup table which gives you the label relating to the code, and you only need to change the label in one place.

We have a code in our Students table so we need to relate it to its lookup, Country:

CountryCode

CountryName

02

England

05

France

06

Wales

14

Germany

We relate tables using JOIN, which comes in a couple of varieties:

INNER JOIN

SELECT Students.Surname, Students.Forename, Country.CountryName
FROM Students
INNER JOIN Country ON Students.Country = Country.CountryCode

gives:

Surname

Forename

CountryName

Smith

John

England

Cooper

Brian

France

Wertz

Heidi

Germany

The INNER JOIN part says we want to return rows where a match exists in both tables, using the match criteria of Students.Country = Country.CountryCode. The first part is the table name, the second is the column name. We don't get a result for Pete Goodman because his country is blank, and there isn't a valid lookup row in the Country table.

INNER JOINS are OK if you are certain about your data's integrity, but in the real world you want a way to see the empty values too, so you can then plug the gaps and improve your reporting.

LEFT JOIN

Left join gets all the values from the first table, and any rows that match from the second:

SELECT Students.Surname, Students.Forename, Country.CountryName
FROM Students
LEFT JOIN Country ON Students.Country = Country.CountryCode

gives:

Surname

Forename

CountryName

Smith

John

England

Cooper

Brian

France

Wertz

Heidi

Germany

Goodman

Pete

 

Now it's clearer that you have all your students showing, and one has a missing value you need to look into.

You can join with multiple conditions in a table:

LEFT JOIN Country ON Students.Country = Country.CountryCode AND Students.CensusYear = Country.CensusYear

And you can join multiple tables flexibly:

LEFT JOIN Country ON Students.Country = Country.CountryCode
LEFT JOIN Continents ON Country.CountryCode = Continents.CountryCode
LEFT JOIN Departments ON Students.Department = Departments.DepartmentCode

There's a RIGHT JOIN as well but it can become confusing to mix them together.

Some older SQL queries don't use JOIN and instead specify the relationship in the WHERE clause and the left join with a (+).  This isn't recommended as it's not compliant with the main SQL standard and is more difficult to read when it gets complex.  Compare:

SELECT *
FROM Table1, Table2, Table3, Table4
WHERE Table1.Col3 > 100
AND Table1.Col1 = Table2.Col1(+)
AND Table1.Col2 = Table2.Col2(+)
AND Table2.Col1 = Table3.Col1(+)
AND Table2.Col2 = Table3.Col2
AND Table3.Col1 = Table4.Col1(+)
AND Table3.Col2 = Table4.Col2(+)

(spot the mistake) with:

SELECT *
FROM Table1
LEFT JOIN Table2 ON Table1.Col1 = Table2.Col1 AND Table1.Col2 = Table2.Col2
LEFT JOIN Table3 ON Table2.Col1 = Table3.Col1 AND Table2.Col2 = Table3.Col2
LEFT JOIN Table4 ON Table3.Col1 = Table4.Col1 AND Table4.Col2 = Table4.Col2
WHERE Table1.Col3 > 100

It's hard to make the same mistake in the second example because the left join type is more explicit and visible.

Nesting

You can use queries in place of a table in the FROM clause which lets you do more complex querying in one statement - say you only want to get countries which have more than 100 students - first, do the sum:

SELECT Country, SUM(1) AS TotalStudents
FROM Students
GROUP BY Country

gives:

Country

TotalStudents

02

1

03

2

19

450

16

103

Then wrap this in brackets and use it in your FROM clause:

SELECT Country.CountryName, StuCount.TotalStudents
FROM
(
    SELECT Country, SUM(1) AS TotalStudents
    FROM Students
    GROUP BY Country
) StuCount
LEFT JOIN Country ON Students.Country = Country.CountryCode
WHERE StuCount.TotalStudents >= 100

gives:

CountryName

TotalStudents

Bulgaria

450

Micronesia

103

Nesting is a powerful way to take oodles of big data, perform a sequence of steps on it, in order to squish it down into exactly the simple form you want. It's also a superfast way to get lost in onion layers of logic and make small mistakes that are obscured until a few years later when something new crops up and breaks things.

Standards

This is about as far as you can get with the standard SQL syntax. Despite agreements all round, database vendors have decided they know best and have their own implementation of things that are so common across the world that you sometimes can't believe it. Say you want the last character of a value (we often do, as we keep programme length in this place of many codes).

Microsoft SQL Server / MySQL

SELECT RIGHT(ProgrammeCode,1)

Oracle

SELECT SUBSTR(ProgrammeCode, -1, 1)

Consult the documentation.