码迷,mamicode.com
首页 > 数据库 > 详细

mongodb基础-索引

时间:2016-05-23 15:19:12      阅读:217      评论:0      收藏:0      [点我收藏+]

标签:

1.索引介绍

mongodb的索引和我们遇到的rdbms的索引含义一样,原理也基本一样
首先我们先在一个没有索引的集合上做一个查询,具体的查询计划可以通过explain()函数获取:
> db.t1.find({"username":"user101"}).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "suq.t1",
"indexFilterSet" : false,
"parsedQuery" : {
"username" : {
"$eq" : "user101"
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"username" : {
"$eq" : "user101"
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "mongodb1",
"port" : 27017,
"version" : "3.2.6",
"gitVersion" : "05552b562c7a0b3143a729aaa0838e558dc49b25"
},
"ok" : 1
}
如果要看详细的执行计划可以在explain添加executionStats或者allPlansExecution
> db.t1.find({"username":"user101"}).explain("allPlansExecution")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "suq.t1",
"indexFilterSet" : false,
"parsedQuery" : {
"username" : {
"$eq" : "user101"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"username" : 1
},
"indexName" : "username_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"username" : [
"[\"user101\", \"user101\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"username" : 1
},
"indexName" : "username_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"username" : [
"[\"user101\", \"user101\"]"
]
},
"keysExamined" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "mongodb1",
"port" : 27017,
"version" : "3.2.6",
"gitVersion" : "05552b562c7a0b3143a729aaa0838e558dc49b25"
},
"ok" : 1
}

这里的执行计划是collscan表示集合扫描,我们给文件username创建一个索引,使用db.collectionname.ensureIndex({"colname":[1/-1]})来创建   --新版本中使用createIndex来替换ensureIndex
其中1表示列的排序是从小大到排序,-1表示从大到小排序.
> db.t1.ensureIndex({"username":1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
再执行上面的语句,发现已经可以使用index scan,下面是部分执行计划:
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"username" : 1
},
"indexName" : "username_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"username" : [
"[\"user101\", \"user101\"]"
]
}
}

2.复合索引

和其它数据库医院,mongodb也有复合索引,上面的例子当我们查询age和username,并且根据age和username进行排序的话,上面的那个例子中的索引就没什么作用了.这时候就要创建一个关于age和username的复合索引.
> db.t1.ensureIndex({"age":1,"username":1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}
执行计划:
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"age" : 1,
"username" : 1
},
"indexName" : "age_1_username_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"age" : [
"[MinKey, MaxKey]"
],
"username" : [
"[MinKey, MaxKey]"
]
}
上面这个索引是先对age键进行排序,在对于age值相同的情况下再对username进行排序,
对于上面的复合索引的使用方式取决于查询类型,下面是主要的三种方式:
(1)db.t1.find({"age":21}).sort({"username":-1})
这种查询能够有效的利用上面的索引,因为age是排过序的首先通过索引定位到age的值,然后再按照索引的第二个键username进行扫描即可.这个查询不需要进行内存排序,因为username已经是排过序的了.
(2)db.t1.find({"age":{"$gte":21,"$lte":30}})
这个是一个范围查询,由于age是排过序的,所以也能很好的利用上面的索引.
(3)db.t1.find({"age":{"$gte":21,"$lte":30}}).sort("username":1})
这个是一个范围和排序查询.由于索引的前缀是age,那么首先通过索引范围扫描查询到满足age条件的索引,然后根据username进行排序,因为通过age得到的数据中的username是无序的,所以需要进行内存排序
对于上面这个查询还有一种索引即将username和age反转:
> db.t1.ensureIndex({"username":1,"age":1})
如果是上面这个索引,那么执行的顺序是,由于username是经过排序的,那么需要进行索引全部扫描,然后在扫描的结果中过滤出age在21到30直接的数据,这样的好处是不需要进行内存排序,但是需要扫描整个索引树.而上面的的情况是只要扫描索引范围然后进行内存排序.
这两种索引具体谁优谁强呢?没有定论,当数据量比较大,而且结果集却很小的时候用第一种方式的索引会比较好,因为数据量比较大的话全索引扫描是比较慢的,而结果集小的话在内存排序还是比较快的.
当数据量不大或者查询中使用了limit限制了查询结果数据量的时候用第二种方式比较好,因为扫描的index变小而且不需要进行排序.

$操作符如何使用索引

有些查询无法使用索引,有些查询能高效的使用索引,下面是不同的查询操作符的处理
(1)低效的操作符
有一些操作符完全无法使用索引,比如"$where"查询和检查一个键是否存在"$exists"
"$ne"查询不等查询通常需要扫描整个索引.
"$not"通常并不知道如何使用索引,大多数的"$not"都会退化为全表扫描
"$nin"就总是全表扫描
(2)范围
在使用复合索引的时候,一般将用于精确匹配的字段放在索引的前面,将用于范围匹配的字段放在后面.
(3)or查询
和oracle这些关系型数据库一样, "$or"的效率并不高,mongodb来说能使用"$in"的地方尽量不要使用"$or" 

索引和数组

mongodb可以深入文档内部,对嵌套字段和数据组建立索引.嵌套对象的和数组字段可以与复合索引中的顶级字段一起使用,大多数情况下和正常的索引字段的行为是一致的.
(1)索引嵌套文档
可以在嵌套文档的键上创建索引,方式与正常的键一样.例如下面这个文档:
> db.test4.findOne()
{
"_id" : ObjectId("573f390c9e178b5475b29d92"),
"name" : "brent",
"comment" : [
{
"name" : "bob",
"content" : "good"
},
{
"name" : "jack",
"content" : "repost"
}
]
}
我们可以对comment的name创建索引:
> db.test4.ensureIndex({"comment.name":1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
可以使用这种方法对任意深层次的字段建立索引.
注意对嵌套文档本身建立索引和对嵌套文档中的某个字段建立索引是不一样的,最整个子文档建立索引只有在进行与子文档字段顺序完全匹配的时候才会用到此索引,无法对查询像db.test4.find({"comment.name":"brent"})这样的查询使用索引.
(2)索引数组
也可以对数组建立索引,这样就可以高效的搜索数组中的特定元素.例如下面这个集合,我们可以对fav字段建立索引:
> db.test3.find()
{ "_id" : ObjectId("573e874c9e178b5475b29d8e"), "name" : "brent", "fav" : [ "game", "film", "read" ] }
{ "_id" : ObjectId("573e87849e178b5475b29d8f"), "name" : "bob", "fav" : [ "drink", "football", "runing" ] }
{ "_id" : ObjectId("573e87cc9e178b5475b29d90"), "name" : "jack", "fav" : [ "read", "basketball", "drink" ] }
{ "_id" : ObjectId("573e8b719e178b5475b29d91"), "name" : "tom", "fav" : [ "chess", "cooking" ] }
> db.test3.ensureIndex({"fav":1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
对数组建立索引,其实是对数组中的每个元素建立索引,无法将数组作为一个整体建立索引.无法使用数组索引查找特定的位置的数组元素,例如查找"fav.1":"game"这样是不能使用数组的.
一个索引中的数组字段最多只能有一个,这是为了防止索引条目的爆炸增长,否则文档会有N*M个索引条目
(3)多键数组
对于某个索引,如果这个键在某个文档中是一个数组,那么这个索引会被标记为多键索引,可以从explain()的输出中看到:
"isMultiKey" : true,
多键索引无法转变成非多键索引,唯一的方法就删除再重建索引,多键索引可能会比非多键索引慢一些.

索引基数

基数和oracle中的基数是一个道理,表示一个索引键的数据分布情况,如果一个字段只有两种可能,例如性别,那么这种键的基数就是非常低的.
通常一个键的基数越高,那么这个键上的索引就越有用.因为基数越高索引就能越快速的缩小搜索范围.

3.使用explain()和hint()

explain()分为三种模式,分别为:
(1)queryPlanner Mode
explain()不加任何参数默认即是这种模式,例如:
db.products.explain().count( { quantity: { $gt: 50 } } )
(2)executionStats Mode
explain("executionStats")添加了executionStats参数即是这种模式,例如:
db.products.explain("executionStats").find(
{ quantity: { $gt: 50 }, category: "apparel" }
)
(3)allPlansExecution Mode
explain()添加allPlansExecution参数即是这种模式,例如:
db.products.explain("allPlansExecution").update(
{ quantity: { $lt: 1000}, category: "apparel" },
{ $set: { reorder: true } }
)
注意,当我们做除了查询以外的explain的时候,需要将explain写在update这些函数之前,而且explain并不改变实际的数据,它只是显示具体的执行计划.
下面是对explain("allPlansExecution")的一些参数的解释:
> db.t1.explain("allPlansExecution").find({"username":"user101"})
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "suq.t1",
"indexFilterSet" : false,
"parsedQuery" : {
"username" : {
"$eq" : "user101"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"username" : 1
},
"indexName" : "username_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"username" : [
"[\"user101\", \"user101\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"username" : 1
},
"indexName" : "username_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"username" : [
"[\"user101\", \"user101\"]"
]
},
"keysExamined" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "mongodb1",
"port" : 27017,
"version" : "3.2.6",
"gitVersion" : "05552b562c7a0b3143a729aaa0838e558dc49b25"
},
"ok" : 1
}

"stage" : "COLLSCAN",这里表示访问数据的方式,COLLSCAN就是全表扫描
"nReturned" : 2,这里表示运行返回的行数为2
executionTimeMillis,这个表示执行的时间,单位的毫秒
hint()
类似于oracle的hint,可以强制mongodb使用特定的索引,例如强制查询使用age,username上的索引:
> db.t1.find({"username":"user11"}).hint({"age":1,"username":1}).explain()

4.索引类型

下面是所有的索引类型:
Parameter Type Description
background boolean Optional. Builds the index in the background so that building an index does not block other database activities. Specify true to build in the background. The default value is false.
unique boolean

Optional. Creates a unique index so that the collection will not accept insertion of documents where the index key or keys match an existing value in the index. Specify true to create a unique index. The default value is false.

The option is unavailable for hashed indexes.

name string

Optional. The name of the index. If unspecified, MongoDB generates an index name by concatenating the names of the indexed fields and the sort order.

Whether user specified or MongoDB generated, index names including their full namespace (i.e. database.collection) cannot be longer than the Index Name Limit.

partialFilterExpression document

Optional. If specified, the index only references documents that match the filter expression. See Partial Indexes for more information.

A filter expression can include:

You can specify a partialFilterExpression option for all MongoDB index types.

New in version 3.2.

sparse boolean

Optional. If true, the index only references documents with the specified field. These indexes use less space but behave differently in some situations (particularly sorts). The default value is false. SeeSparse Indexes for more information.

Changed in version 3.2: Starting in MongoDB 3.2, MongoDB provides the option to create partial indexes. Partial indexes offer a superset of the functionality of sparse indexes. If you are using MongoDB 3.2 or later, partial indexes should be preferred over sparse indexes.

Changed in version 2.6: 2dsphere indexes are sparse by default and ignore this option. For a compound index that includes 2dsphere index key(s) along with keys of other types, only the 2dsphere index fields determine whether the index references a document.

2dgeoHaystack, and text indexes behave similarly to the 2dsphereindexes.

expireAfterSeconds integer Optional. Specifies a value, in seconds, as a TTL to control how long MongoDB retains documents in this collection. See Expire Data from Collections by Setting TTL for more information on this functionality. This applies only to TTL indexes.
storageEngine document

Optional. Allows users to specify configuration to the storage engine on a per-index basis when creating an index. The value of thestorageEngine option should take the following form:

{ <storage-engine-name>: <options> }

Storage engine configuration specified when creating indexes are validated and logged to the oplog during replication to support replica sets with members that use different storage engines.

New in version 3.0.


5.索引管理

下面是索引的相关函数:
Name Description
db.collection.createIndex() Builds an index on a collection.
db.collection.dropIndex() Removes a specified index on a collection.
db.collection.dropIndexes() Removes all indexes on a collection.
db.collection.getIndexes() Returns an array of documents that describe the existing indexes on a collection.
db.collection.reIndex() Rebuilds all existing indexes on a collection.
db.collection.totalIndexSize() Reports the total size used by the indexes on a collection. Provides a wrapper around the totalIndexSize field of the collStats output.
cursor.explain() Reports on the query execution plan for a cursor.
cursor.hint() Forces MongoDB to use a specific index for a query.
cursor.max() Specifies an exclusive upper index bound for a cursor. For use withcursor.hint()
cursor.min() Specifies an inclusive lower index bound for a cursor. For use withcursor.hint()
cursor.snapshot() Forces the cursor to use the index on the _id field. Ensures that the cursor returns each document, with regards to the value of the _id field, only once.

Indexing Database Commands

Name Description
createIndexes Builds one or more indexes for a collection.
dropIndexes Removes indexes from a collection.
compact Defragments a collection and rebuilds the indexes.
reIndex Rebuilds all indexes on a collection.
validate Internal command that scans for a collection’s data and indexes for correctness.
geoNear Performs a geospatial query that returns the documents closest to a given point.
geoSearch Performs a geospatial query that uses MongoDB’s haystack index functionality.
checkShardingIndex Internal command that validates index on shard key.

Geospatial Query Selectors

Name Description
$geoWithin Selects geometries within a bounding GeoJSON geometry. The 2dsphere and 2dindexes support $geoWithin.
$geoIntersects Selects geometries that intersect with a GeoJSON geometry. The 2dsphere index supports $geoIntersects.
$near Returns geospatial objects in proximity to a point. Requires a geospatial index. The2dsphere and 2d indexes support $near.
$nearSphere Returns geospatial objects in proximity to a point on a sphere. Requires a geospatial index. The 2dsphere and 2d indexes support $nearSphere.

Indexing Query Modifiers

Name Description
$explain Forces MongoDB to report on query execution plans. See explain().
$hint Forces MongoDB to use a specific index. See hint()
$max Specifies an exclusive upper limit for the index to use in a query. See max().
$min Specifies an inclusive lower limit for the index to use in a query. See min().
$returnKey Forces the cursor to only return fields included in the index.
$snapshot Guarantees that a query returns each document no more than once. Seesnapshot().

上面我们一直用ensureIndex来创建索引那是旧版本所提供的函数,在最新的版本中推荐使用createIndex()函数,当然旧的也是可以使用的.
下面介绍创建几种索引的方法
创建索引的定义如下:
db.collection.createIndex(keysoptions)
其中keys表示创建的键,options是参数:
Parameter Type Description
background boolean Optional. Builds the index in the background so that building an index does not block other database activities. Specify true to build in the background. The default value is false.
unique boolean

Optional. Creates a unique index so that the collection will not accept insertion of documents where the index key or keys match an existing value in the index. Specify true to create a unique index. The default value is false.

The option is unavailable for hashed indexes.

name string

Optional. The name of the index. If unspecified, MongoDB generates an index name by concatenating the names of the indexed fields and the sort order.

Whether user specified or MongoDB generated, index names including their full namespace (i.e. database.collection) cannot be longer than the Index Name Limit.

partialFilterExpression document

Optional. If specified, the index only references documents that match the filter expression. See Partial Indexes for more information.

A filter expression can include:

You can specify a partialFilterExpression option for all MongoDB index types.

New in version 3.2.

sparse boolean

Optional. If true, the index only references documents with the specified field. These indexes use less space but behave differently in some situations (particularly sorts). The default value is false. SeeSparse Indexes for more information.

Changed in version 3.2: Starting in MongoDB 3.2, MongoDB provides the option to create partial indexes. Partial indexes offer a superset of the functionality of sparse indexes. If you are using MongoDB 3.2 or later, partial indexes should be preferred over sparse indexes.

Changed in version 2.6: 2dsphere indexes are sparse by default and ignore this option. For a compound index that includes 2dsphere index key(s) along with keys of other types, only the 2dsphere index fields determine whether the index references a document.

2dgeoHaystack, and text indexes behave similarly to the 2dsphereindexes.

expireAfterSeconds integer Optional. Specifies a value, in seconds, as a TTL to control how long MongoDB retains documents in this collection. See Expire Data from Collections by Setting TTL for more information on this functionality. This applies only to TTL indexes.
storageEngine document

Optional. Allows users to specify configuration to the storage engine on a per-index basis when creating an index. The value of thestorageEngine option should take the following form:

{ <storage-engine-name>: <options> }

Storage engine configuration specified when creating indexes are validated and logged to the oplog during replication to support replica sets with members that use different storage engines.

New in version 3.0.


创建普通唯一索引:
> db.test2.createIndex({"name":1},{"background":true,"unique":true})
db.test2.createIndex({"name":1},{"name":"myindex1","background":true,"unique":true})

其中指定background参数这样在创建索引的时候就不会阻塞其它会话.类似于oracle的online.name是手工指定索引的名字.unique是指定索引是否为唯一索引
获取索引信息:
> db.test2.getIndexes()
删除索引,在这里即可以写上索引的名字(通过getIndexes获取)也可以写键的名字:
db.test2.dropIndex("myindex1")
删除集合中全部索引:
> db.test2.dropIndexes()
重建索引:
> db.test2.reIndex({"name":"name_1"})
查看索引的大小:
> db.test2.totalIndexSize()
32768
> db.test2.totalIndexSize({"name":"name_1"})
_id_ 16384
name_1 16384
32768
除了使用db函数以外还可以使用数据库命令,例如删除索引:
> db.runCommand({"dropIndexes":"test2",index:"myindex1"})
第一个键是删除动作,值为集合名,第二个是索引的名字.
例如createindex的命令如下:
db.runCommand(
{
createIndexes: <collection>,
indexes: [
{
key: {
<key-value_pair>,
<key-value_pair>,
...
},
name: <index_name>,
<option1>,
<option2>,
...
},
{ ... },
{ ... }
]
}
)
一个例子如下:
db.getSiblingDB("products").runCommand(
{
createIndexes: "inventory",
indexes: [
{
key: {
item: 1,
manufacturer: 1,
model: 1
},
name: "item_manufacturer_model",
unique: true
},
{
key: {
item: 1,
supplier: 1,
model: 1
},
name: "item_supplier_model",
unique: true
}
]
}
)
其它详细的操作可以参考官方文档reference中的索引部分:
我发现我看的这本书是基于老版本的mongodb2.4的,新的3.2有很多新特性和改进可能和老版的不一样,平时还是要多看官方文档.























mongodb基础-索引

标签:

原文地址:http://blog.csdn.net/su377486/article/details/51477375

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!