标签:
DataType 决定了Column能够存储的数据范围和能够进行的操作,例如bit 类型只能存储1,0 和 null,而varchar(10)能够存储最多10个字符,并可以对该column使用字符串函数。数据类型的选择标准既要满足业务和扩展性的需求,又要使行宽到达最小(行宽是一行中所有column占用的Byte)。最佳实践:使用与Column Maximum Value 最接近的数据类型。
1,窄行会节省存储空间,减少IO次数
在满足业务需求和扩展性要求的情况下,使用窄的数据类型,使行的宽度达到最小。例如,如果要存储‘yyyy-mm-dd MM:HH:SS’格式的日期数据,使用字符串varchar(19)或 nvarchar(19)是十分不明智的,使用datetime2(7)虽然精度更高,但是毫秒都是0,浪费存储空间,使用datetime是最理想的。
由于SQL Server存储数据是按照row存储数据的,每个Page能够存储的数据行是有限的。在查询同等数量的数据行时,如果row宽度窄,那么每个page会容纳更多的数据行,不仅减少IO次数,而且节省存储空间。
2,使用正确的数据类型,减少转换的次数
在SQL Server中,对数据进行强制类型转换或隐式类型转换都需要付出代价,所以,使用正确的数据类型,避免类型转换是十分必要的。例如,如果存储的数据格式是‘yyyy-mm-dd MM:HH:SS’,虽然字符串类型和Datetime类型能够隐式转换,但是使用varchar(19)或 nvarchar(19)类型的字符串存储是十分不明智的。不仅浪费存储空间,而且隐式转换对性能有负作用。
3,在窄的数据列上创建index性能更高
clustered index最好是在narror,static,unique 和 ever-increasing 的数据列上创建。窄的Column会使每一个Index page存储更多的index key,SQL Server Engine定位到某一行所经过的节点数更少,即导航的Path更短,查询速度更快。
由于每一个nonclustered index的Index pages或index key columns中都会包含Clustered Index key columns,如果Clustered Index key columns的宽度比较大,这会导致所有nonclustered index的索引树占用较大的存储空间,IO此次更多,更新和查询都会变慢。
In general, it is best practice to create a clustered index on narrow, static, unique, and ever-increasing columns. This is for numerous reasons. First, using an updateable column as the clustering key can be expensive, as updates to the key value could require the data to be moved to another page. This can result in slower writes and updates, and you can expect higher levels of fragmentation. Secondly, the clustered key value is used in non-clustered indexes as a pointer back into the leaf level of the clustered index. This means that the overhead of a wide clustered key is incurred in every index created.
4,在创建Clustered Index时,最好创建 Unique Clustered Indexes
引用《Performance Considerations of Data Types》
A clustered index created as part of a primary key will, by definition, be unique. However, a clustered index created with the following syntax,
CREATE CLUSTERED INDEX <index_name> ON <schema>.<table_name> (<key columns>);
will not be unique unless unique is explicitly declared, i.e.
CREATE UNIQUE CLUSTERED INDEX <index_name> ON <schema>.<table_name> (<key columns>);
In order for SQL Server to ensure it navigates to the appropriate record, for example when navigating the B-tree structure of a non-clustered index, SQL Server requires every row to have an internally unique id. In the case of unique clustered index, this unique row id is simply the clustered index key value. However, as SQL Server will not require a clustered index to be unique - that is, it will not prevent a clustered index
from accepting duplicate values - it will ensure uniqueness internally by adding a 4-byte uniquifier to any row with a duplicate key value.
In many cases, creating a non-unique clustered index on a unique or mostly unique column will have little-to-no impact. This is because the 4-byte overhead is only added to duplicate instances of an existing clustered key value. An example of this would be creating a non-unique clustered index on an identity column. However, creating a non-unique clustered index on a column with many duplicate values, perhaps on a column of date data type where you might have thousands of records with the same clustered key value, could result in a significant amount of internal overhead.
Moreover, SQL Server will store this 4-byte uniquifier as a variable-length column. This is significant in that a table with all fixed columns and a large number of duplicate clustered values will actually incur 8 bytes of overhead per row, because SQL Server requires 4 bytes to manage this variable column (2 bytes for the count of variable-length columns in the row and 2 bytes for the offset of the the variable-length column of the uniquifier column). If there are already variable-length columns in the row, the overhead is only 6 bytes—two for the offset and four for the uniquifier value. Also, this value will be present in all nonclustered indexes too, as it is part of the clustered index key.
标签:
原文地址:http://www.cnblogs.com/ljhdo/p/5521043.html