In this article we shall examine the two modes of data search and retrieval using indexes- index seeks and index scans, and the differences between the two.
Firstly, let us revisit indexes briefly. An index in a SQL Server database is analogous to the index at the start of a book. That is, its function is to allow you to quickly find the data you are searching for inside the book; in the case of a database, the “book” is a table.
An index scan means that SQL Server reads all rows in a table, and then returns only those rows that satisfy the search criteria. When an index scan is performed, all the rows in the leaf level of the index are scanned. This essentially means that all of the rows of the index are examined instead of the table directly. This is sometimes contrasted to a table scan, in which all the table data is read directly. However, there is usually little difference between an index scan and a table scan.
You may wonder why the Query Optimizer may choose to do an index or table scan. Surely it is much faster to first look up data using an index than to go through all the rows in a table? In fact, for small tables data retrieval via an index or table scan is faster than using the index itself for selection. This is because the added overhead of first reading the index, then reading the pages containing the rows returned by the index, does not offer any performance improvement for a table with only a few rows.
Other reasons to use an index scan would be when an index is not selective enough, and when a query will return a large percentage (greater than 50%) of rows from the table. In such cases the additional overhead of first using the index may result in a small degradation of performance.
An index seek, on the other hand, means that the Query Optimizer relies entirely on the index leaf data to locate rows satisfying the query condition. An index seek will be most beneficial in cases where a small percentage (less than 10 or 15%) of rows will be returned. An index seek will only affect the rows that satisfy a query condition and the pages that contain these qualifying rows; this is highly beneficial, in performance terms, when a table has a very large number of rows.
It is also worth noting that it is usually not worthwhile to create indexes on low-cardinality columns as they would rarely be used by the Query Optimizer. A low-cardinality column is one that contains a very small range of distinct values, for example a ‘Gender’ column would have only two distinct values- Male or Female. An example of a high-cardinality column is of course the primary key column, in which each value is distinct.
In summary, the Query Optimizer generally tries to perform an index seek. If this is not possible or beneficial (for example when the total number of rows is very small) then an index scan is used instead.