Subsections

8.2 SQL queries

The basic format of an SQL query is this:
SELECT columns
    FROM tables
    WHERE row_condition
    ORDER BY order_by_columns
The SQL keywords, SELECT, FROM, WHERE, and ORDER BY, are traditionally written in uppercase, though this is not necessary. The names of tables and columns depend on the database being queried, but they should always start with a letter and only contain letters, digits, and the underscore character, `_'. This will select the named columns from the specified tables and return all rows matching the row_condition. The order of the rows in the result is based on the columns named in the order_by_columns clause.


8.2.1 Selecting columns

The special character * selects all columns; otherwise, only those columns named are included in the result. If more than one column name is given, the column names must be separated by commas.
SELECT *
    ...

SELECT colname
    ...

SELECT colname1, colname2
    ...
The column name may be followed by a column alias, which can then be used anywhere within the query in place of the original column name (e.g., in the WHERE clause).

SELECT colname colalias
    ...

If more than one table is included in the query, and the tables share a column with the same name, a column name must be preceded by the relevant table name, with a full stop in between.

SELECT tablename.colname
    ...

Functions and operators may be used to produce results that are calculated from the column. The set of functions that is provided varies widely between DBMS, but the normal mathematical operators for addition, subtraction, multiplication, and division, plus a set of basic aggregation functions for maximum value (MAX), minimum value (MIN), summation (SUM), and arithmetic mean (AVG), should always be available.

SELECT MAX(colname)
    ...

SELECT colname1 + colname2
    ...

A column name can also be a constant value (number or text), in which case the value is replicated for every row of the result.

8.2.2 Specifying tables: The FROM clause

The FROM clause must contain at least one table and all columns that are referred to in the query must exist in at least one of the tables in the FROM clause.

If a single table is specified, then the result is all rows of that table, subject to any filtering applied by a WHERE clause.

SELECT colname
    FROM tablename
    ...

A table name may be followed by a table alias, which can be used in place of the original table name anywhere else in the query.

SELECT talias.colname
    FROM tablename talias
    ...

If two tables are specified, separated only by a comma, the result is all possible combinations of the rows of the two tables (a Cartesian product). This is known as a cross join.

SELECT ...
    FROM table1, table2
    ...

An inner join is created from a cross join by specifying a condition so that only rows that have matching values are returned (typically using a foreign key to match with a primary key). The condition may be specified within the WHERE clause (see Section 8.2.3), or as part of an INNER JOIN syntax as shown below.

SELECT ...
    FROM table1 INNER JOIN table2
        ON table1.primarykey = table2.foreignkey
    ...

An outer join extends the inner join by including in the result rows from one table that have no match in the other table. There are left outer joins (where rows are retained from the table named on the left of the join syntax), right outer joins, and full outer joins (where non-matching rows from both tables are retained).

SELECT ...
    FROM table1 LEFT OUTER JOIN table2
        ON table1.primarykey = table2.foreignkey
    ...

A self join is a join of a table with itself. This requires the use of table aliases.

SELECT ...
    FROM tablename alias1, tablename alias2
    ...


8.2.3 Selecting rows: The WHERE clause

By default, all rows from a table, or from a combination of tables, are returned. However, if the WHERE clause is used to specify a condition, then only rows matching that condition will be returned.

Conditions may involve any column from any table that is included in the query. Conditions usually involve a comparison between a column and a constant value, or a comparison between two columns.

A constant text value should be enclosed in single-quotes.

Valid comparison operators include: equality (=), greater-than or less-than (>, <), greater-than or equal-to or less-than or equal-to (>=, <=), and inequality (!= or <>).

SELECT ...
    FROM ...
    WHERE colname = 0;

SELECT ...
    FROM ...
    WHERE column1 > column2;

Complex conditions can be constructed by combining simple conditions with logical operators: AND, OR, and NOT. Parentheses should be used to make the order of evaluation explicit.

SELECT ...
    FROM ...
    WHERE column1 = 0 AND
          column2 != 0;

SELECT ...
    FROM ...
    WHERE NOT (stroke = 'Br' AND
               (distance = 50 OR
                distance = 100));

For the case where a column can match several possible values, the special IN keyword can be used to specify a range of valid values.

SELECT ...
    FROM ...
    WHERE column1 IN (value1, value2);

Comparison with text constants can be generalized to allow patterns using the special LIKE comparison operator. In this case, within the text constant, the underscore character, _, has a special meaning; it will match any single character. The percent character, %, is also special and it will match any number of characters of any sort.

SELECT ...
    FROM ...
    WHERE stroke LIKE '%stroke';

8.2.4 Sorting results: The ORDER BY clause

The order of the columns in the results of a query is based on the order of the column names in the query.

The order of the rows in a result is undetermined unless an ORDER BY clause is included in the query.

The ORDER BY clause consists of one or more column names. The rows are ordered according to the values in the named columns. The keyword ASC is used to indicate ascending order and DESC is used for descending order.

SELECT ...
    FROM ...
    ORDER BY columnname ASC;

The results can be ordered by the values in several columns simply by specifying several column names, separated by commas. The results are ordered by the values in the first column, but if several rows in the first column have the same value, those rows are ordered by the values in the second column.

SELECT ...
    FROM ...
    ORDER BY column1 ASC, column2 DESC;

8.2.5 Aggregating results: The GROUP BY clause

The aggregation functions MAX, MIN, SUM, and AVG (see Section 8.2.1) all return a single value from a column. If a GROUP BY clause is included in the query, aggregated values are reported for each unique value of the column specified in the GROUP BY clause.

SELECT column1, SUM(column2)
    FROM ...
    GROUP BY column1;

Results can be reported for combinations of unique values of several columns simply by naming several columns in the GROUP BY clause.

SELECT column1, column2, SUM(column3)
    FROM ...
    GROUP BY column1, column2;

The GROUP BY clause can include a HAVING sub-clause that works like the WHERE clause but operates on the rows of aggregated results rather than the original rows.

SELECT column1, SUM(column2) colalias
    FROM ...
    GROUP BY column1
        HAVING colalias > 0;

8.2.6 Subqueries

The result of an SQL query may be used as part of a larger query. The subquery is placed within parentheses but otherwise follows the same syntax as a normal query.

Subqueries can be used in place of table names within the FROM clause and to provide comparison values within a WHERE clause.

SELECT column1
    FROM table1
    WHERE column1 IN
        ( SELECT column2
              FROM table2
                  ...  );

Paul Murrell

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 New Zealand License.