SQL
This brief guide is not comprehensive - it is intended to be easy going. Words in UPPERCASE are SQL keywords.
SQL is pronounced sequel or S-Q-L. It stands for structured query language and is a simple language used with databases. You can use SQL to create databases, but most of the SQL you see is used to query databases - to get information out in the format you want.
SQL is like origami - you start with big spread of data, and use commands to fold the data down into the format you need.
To run SQL queries, you need to know HowToRunSQLQueries
Database
A database consists of several tables.
Table
Tables consist of rows and columns. Columns go left to right, rows go top to bottom.
This is a table called Students:
ID | Surname | Forename | Country | Course | FTE |
---|---|---|---|---|---|
1 | Smith | John | 02 | Biology | 1.0 |
2 | Cooper | Brian | 05 | Politics | 1.0 |
3 | Wertz | Heidi | 14 | Biology | 0.5 |
4 | Goodman | Pete |
| Archaeology | 1.0 |
Query
To get this data, we run the following SQL query:
SELECT ID, Surname, Forename, Country, Course, FTE FROM Students
There are two keywords here.
FROM
The FROM keyword specifies the tables you want to choose data from.
SELECT
The SELECT keyword specifies the columns you want to choose.
The Dangerous But Useful Asterisk
If you don't know what columns are in the table, you can use an asterisk temporarily, but you should always replace it with the explicit column names for safety.
SELECT * FROM Students
The SELECT keyword is what you use to filter your data by column, so if you're just after course and ID:
SELECT ID, Course FROM Students
gives:
ID | Course |
---|---|
1 | Biology |
2 | Politics |
3 | Biology |
4 | Archaeology |
WHERE
To filter by row, we need a new keyword: WHERE.
SELECT ID, Course FROM Students WHERE Course = "Biology"
gives:
ID | Course |
---|---|
1 | Biology |
3 | Biology |
If you want to be more restrictive, you can provide multiple WHERE conditions:
SELECT ID, Course FROM Students WHERE Course = "Biology" AND ID = 1
will return a single record. You can specify exact matches, or other conditions such as great than/less than ( ID > 10 ), logical rules (Course = "Biology" Or Course = "Politics") and so on.
ORDER BY
If you want to sort your results, use the ORDER BY clause.
SELECT Surname, Forename FROM Students WHERE Course = "Biology" ORDER BY Surname, Forename
(note you can use a column in a WHERE clause even if it isn't in the SELECT clause)
Surname | Forename |
---|---|
Smith | John |
Wertz | Heidi |
The default sort order is sensible. If it doesn't suit, you can use ASC or DESC:
SELECT Surname, Forename FROM Students WHERE Course = "Biology" ORDER BY Surname DESC, Forename ASC
If you don't specify a sort order for each column, it will be random.
Recap
SELECT choose columns.
FROM chooses tables.
WHERE chooses rows.
ORDER BY sorts rows.
GROUP BY
To aggregate data, use GROUP BY:
SELECT Course FROM Students GROUP BY Course
Course |
---|
Biology |
Archaeology |
Politics |
Awesome.
This isn't usually what you want - aggregation is about counting/summing things. We need to create a new column:
COUNT
SELECT Course, COUNT(ID) FROM Students GROUP BY Department
Course | COUNT(ID) |
---|---|
Biology | 2 |
Archaeology | 1 |
Politics | 1 |
We've used the ID column here as the thing to count, but it doesn't really matter - COUNT just counts up the number of rows which match your aggregation criteria.
SUM
If instead you want to total up a number column, use SUM. To relabel columns, use AS.
SELECT Course, SUM(FTE) AS TotalFTE, SUM(1) AS TotalStudents FROM Students GROUP BY Department
Course | ||
---|---|---|
Biology | 1.5 | 2 |
Archaeology | 1 | 1 |
Politics | 1 | 1 |
Note you can specify numbers instead of column names to do calculations - e.g. Price * 1.2. Or you can calculate across columns, e.g. Price * VATRate. There are also MIN() and MAX() and stuff like that.
JOIN
In a good database we try to use codes and lookup tables to minimise the need to change data in lots of places - say a department changes from "Educational Studies" to "Education". If you used the text "Educational Studies" in your database, you'd need to replace /every/ reference. If instead you used a code (e.g. 40), you can then maintain a lookup table which gives you the label relating to the code, and you only need to change the label in one place.
We have a code in our Students table so we need to relate it to its lookup, Country:
02 | England |
05 | France |
06 | Wales |
14 | Germany |
We relate tables using JOIN, which comes in a couple of varieties:
INNER JOIN
SELECT Students.Surname, Students.Forename, Country.CountryName FROM Students INNER JOIN Country ON Students.Country = Country.CountryCode
gives:
Surname | Forename | |
---|---|---|
Smith | John | England |
Cooper | Brian | France |
Wertz | Heidi | Germany |
The INNER JOIN part says we want to return rows where a match exists in both tables, using the match criteria of Students.Country = Country.CountryCode. The first part is the table name, the second is the column name. We don't get a result for Pete Goodman because his country is blank, and there isn't a valid lookup row in the Country table.
INNER JOINS are OK if you are certain about your data's integrity, but in the real world you want a way to see the empty values too, so you can then plug the gaps and improve your reporting.
LEFT JOIN
Left join gets all the values from the first table, and any rows that match from the second:
SELECT Students.Surname, Students.Forename, Country.CountryName FROM Students LEFT JOIN Country ON Students.Country = Country.CountryCode
gives:
Surname | Forename | |
---|---|---|
Smith | John | England |
Cooper | Brian | France |
Wertz | Heidi | Germany |
Goodman | Pete |
|
Now it's clearer that you have all your students showing, and one has a missing value you need to look into.
You can join with multiple conditions in a table:
LEFT JOIN Country ON Students.Country = Country.CountryCode AND Students.CensusYear = Country.CensusYear
And you can join multiple tables flexibly:
LEFT JOIN Country ON Students.Country = Country.CountryCode LEFT JOIN Continents ON Country.CountryCode = Continents.CountryCode LEFT JOIN Departments ON Students.Department = Departments.DepartmentCode
There's a RIGHT JOIN as well but it can become confusing to mix them together.
Some older SQL queries don't use JOIN and instead specify the relationship in the WHERE clause and the left join with a (+). This isn't recommended as it's not compliant with the main SQL standard and is more difficult to read when it gets complex. Compare:
SELECT * FROM Table1, Table2, Table3, Table4 WHERE Table1.Col3 > 100 AND Table1.Col1 = Table2.Col1(+) AND Table1.Col2 = Table2.Col2(+) AND Table2.Col1 = Table3.Col1(+) AND Table2.Col2 = Table3.Col2 AND Table3.Col1 = Table4.Col1(+) AND Table3.Col2 = Table4.Col2(+)
(spot the mistake) with:
SELECT * FROM Table1 LEFT JOIN Table2 ON Table1.Col1 = Table2.Col1 AND Table1.Col2 = Table2.Col2 LEFT JOIN Table3 ON Table2.Col1 = Table3.Col1 AND Table2.Col2 = Table3.Col2 LEFT JOIN Table4 ON Table3.Col1 = Table4.Col1 AND Table4.Col2 = Table4.Col2 WHERE Table1.Col3 > 100
It's hard to make the same mistake in the second example because the left join type is more explicit and visible.
Nesting
You can use queries in place of a table in the FROM clause which lets you do more complex querying in one statement - say you only want to get countries which have more than 100 students - first, do the sum:
SELECT Country, SUM(1) AS TotalStudents FROM Students GROUP BY Country
gives:
Country | |
---|---|
02 | 1 |
03 | 2 |
19 | 450 |
16 | 103 |
Then wrap this in brackets and use it in your FROM clause:
SELECT Country.CountryName, StuCount.TotalStudents FROM ( SELECT Country, SUM(1) AS TotalStudents FROM Students GROUP BY Country ) StuCount LEFT JOIN Country ON Students.Country = Country.CountryCode WHERE StuCount.TotalStudents >= 100
gives:
Bulgaria | 450 |
Micronesia | 103 |
Nesting is a powerful way to take oodles of big data, perform a sequence of steps on it, in order to squish it down into exactly the simple form you want. It's also a superfast way to get lost in onion layers of logic and make small mistakes that are obscured until a few years later when something new crops up and breaks things.
Standards
This is about as far as you can get with the standard SQL syntax. Despite agreements all round, database vendors have decided they know best and have their own implementation of things that are so common across the world that you sometimes can't believe it. Say you want the last character of a value (we often do, as we keep programme length in this place of many codes).
Microsoft SQL Server / MySQL
SELECT RIGHT(ProgrammeCode,1)
Oracle
SELECT SUBSTR(ProgrammeCode, -1, 1)
Consult the documentation.