码迷,mamicode.com
首页 > 数据库 > 详细

mongodb文档 aggregate章节阅读的笔记

时间:2015-03-15 20:00:18      阅读:170      评论:0      收藏:0      [点我收藏+]

标签:

aggregate 包含3中不同的类型:

1.管道

2.单一功能聚合   (count,group,distinct)

3.map-reduce


管道表达式


管道表达式仅能操作当前在管道中的文档,不能代表其他的文档.

通常,表达式是没有状态的,并且在聚合过程中被计算,但是有一个例外的:累加器表达式


累加器,只能用在group  管道操作器中,主要的状态有:总共,最大,最小,和关联数据.



优化的方式:

$match he $sort管道操作器可以利用索引,当他们出现在管道的第一个位置.避免扫描集合内所有

文档.

从2.4开始引入的$geoNear管道操作器可以利用geospatial index.当使用,必须放在管道的开始位置.


尽管管道使用了索引.聚合仍然需要访问实际的文档.索引无法完全覆盖聚合管道.


初期过滤(Early Filtering)

如果聚合管道操作仅仅需要整个集合的部分数据,那么使用$match,$limit, 和 $skip 步骤来限制进入管道

的文档数量.在管道的入口端,$match操作会使用合适的索引来扫描,仅仅使复合条件的文档进入该管道.


在管道的开始处使用$match,紧接着使用$sort逻辑上等价于简单的利用sort的查询,使用索引.如果可能,

尽量将$match操作器放在管道的开始端.


额外的功能:

聚合索引有一个内部的优化阶段以提高聚合性能.

聚合管道支持在分片的集合上进行.



mongodb 的mapreduce操作

map方法处理每一个输入文档,map方法最后生成key-value 对.



输出的限制:必须在 bson document 的尺寸范围内,当前为16M.


mapreduce  输入输出的集合均支持在分片中的集合.




内置的优化机制:

1.project优化:聚合管道可以确定管道中需要多少个字段,因此,

管道只使用需要用到的字段


管道顺序优化:

 如果写成了:

$sort=>$match   会优化成:

$match=>$sort



{ $sort: { age : -1 } },

{ $match: { status: ‘A‘ } }


{ $match: { status: ‘A‘ } },

{ $sort: { age : -1 } }



{ $skip: 10 },

{ $limit: 5 }


{ $limit: 15 },

{ $skip: 10 }

 减少$skip的数量



{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },

{ $match: { year: 2014, category: { $ne: "Z" } } }


{ $match: { year: 2014 } },

{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },

{ $match: { year: 2014, category: { $ne: "Z" } } }


合并优化

$sort + $limit 合并优化

当sort  后面紧接着 是limit.优化器会将 limit合并到 sort中,这样   就只需要存取前 n个值.

节省了内存.

The optimization will still apply when allowDiskUse is true and the n items exceed the aggregation memory limit (page 403).



$limit + $limit合并优化

当连续2个$limit在一起时,会合并成1个,

并且选用  $limit较小的那个数字


连续两个 $skip

连续两个的$skip会合并成一个, 后面的数字为两个$skip的和.


连续两个的$match. 会合并成1个,条件会合并在一起.



{ $sort: { age : -1 } },

{ $skip: 10 },

{ $limit: 5 }



{ $sort: { age : -1 } },

{ $limit: 15 }

{ $skip: 10 }



example:

{ $sort: { age : -1 } },

{ $skip: 10 },

{ $limit: 5 }  

====>>>>

{ $sort: { age : -1 } },

{ $limit: 15 }

{ $skip: 10 }

====>>>>

$sort+$limit合并




{$limit: 100 },

{$skip: 5 },

{$limit: 10 },

{$skip: 2 }


==========>>>>>>>

{$limit: 100 },

{$limit: 15},

{$skip: 5 },

{$skip: 2 }

=========>>>>>>>>>

{ $limit: 15 },

{ $skip: 7 }



Result Size Restrictions

manage result sets that exceed this limit, the aggregate command can return result sets of any size if the command

return a cursor or store the results to a collection.

Changed in version 2.6: The aggregate command can return results as a cursor or store the results in a collection,

which are not subject to the size limit. The db.collection.aggregate() returns a cursor and can return result

sets of any size.



Memory Restrictions

Changed in version 2.6.

Pipeline stages have a limit of 100 megabytes of RAM. If a stage exceeds this limit, MongoDB will produce an error.

To allow for the handling of large datasets, use the allowDiskUse option to enable aggregation pipeline stages to

write data to temporary files.



 当聚合操作执行在分片的集合中时,聚合管道被拆分成两部分.第一个管道执行在每个分片.或者当初期$match可以

通过分片key的断言排除一些分片集.管道只运行在相关联的分片上.

第二个管道运行在主分片上.合并第一个阶段执行的结果.并且在合并的结果的基础上在执行.主分片将最后的结果转发到

mongos.在2.6 以前.第二个管道运行在mongos上.]



map reduce  执行输入/输出在分片的collection.

如果是输入的集合为分片的集合,mongos 会自动平行地调度 map-reduce到每个分片中.无需额外处理.

如果输出的集合为分片的集合,

If the out field for mapReduce has the sharded value, MongoDB shards the output collection using the _id field

as the shard key.

? If the output collection does not exist, MongoDB creates and shards the collection on the _id field.

? For a new or an empty sharded collection, MongoDB uses the results of the first stage of the map-reduce

operation to create the initial chunks distributed among the shards.

? mongos dispatches, in parallel, a map-reduce post-processing job to every shard that owns a chunk. During

the post-processing, each shard will pull the results for its own chunks from the other shards, run the final

reduce/finalize, and write locally to the output collection.



Map Reduce Concurrency





Return States with Populations above 10 Million

SELECT state, SUM(pop) AS totalPop

FROM zipcodes

GROUP BY state

HAVING totalPop >= (10*1000*1000)



db.zipcodes.aggregate( { $group :

{ _id : "$state",

totalPop : { $sum : "$pop" } } },

{ $match : {totalPop : { $gte : 10*1000*1000 } } } )



Return Average City Population by State

根据stat和city 取平均值.

db.zipcodes.aggregate( [

{ $group : { _id : { state : "$state", city : "$city" }, pop : { $sum : "$pop" } } },

{ $group : { _id : "$_id.state", avgCityPop : { $avg : "$pop" } } }

] )


// 对应的sql语句

select state, avg(sum_pop) from (select state, city, sum(pop) as sum_pop from zipcodes group by city, state) as temp 

group by temp.state


Return Largest and Smallest Cities by State


 返回最大.最小值


db.zipcodes.aggregate( { $group:

{ _id: { state: "$state", city: "$city" },

pop: { $sum: "$pop" } } },

{ $sort: { pop: 1 } },

{ $group:

{ _id : "$_id.state",

biggestCity: { $last: "$_id.city" },

biggestPop:

{ $last: "$pop" },

smallestCity: { $first: "$_id.city" },

smallestPop: { $first: "$pop" } } },

// the following $project is optional, and

// modifies the output format.

{ $project:

{ _id: 0,

state: "$_id",

biggestCity: { name: "$biggestCity", pop: "$biggestPop" },

smallestCity: { name: "$smallestCity", pop: "$smallestPop" } } } )




Return the Five Most Common “Likes”



db.users.aggregate(

[

{ $unwind : "$likes" },   //   这里得注释一下:? The $unwind operator separates each value in the likes array, and creates a new version of the source document for every element in the array.   拆分数组用的...

{ $group : { _id : "$likes" , number : { $sum : 1 } } },

{ $sort : { number : -1 } },

{ $limit : 5 }

]

)





map-reduce   的例子:


数据结构:


{

_id: ObjectId("50a8240b927d5d8b5891743c"),

cust_id: "abc123",

ord_date: new Date("Oct 04, 2012"),

status: ‘A‘,

price: 25,

items: [ { sku: "mmm", qty: 5, price: 2.5 },

{ sku: "nnn", qty: 5, price: 2.5 } ]

}



Return the Total Price Per Customer


map   方法:

var mapFunction1 = function() {

emit(this.cust_id, this.price);

};


reduce 方法:

var reduceFunction1 = function(keyCustId, valuesPrices) {

return Array.sum(valuesPrices);

};


结合使用:

db.orders.mapReduce(

mapFunction1,

reduceFunction1,

{ out: "map_reduce_example" }     //指定输出到   "map_reduce_example"  这个集合中

)


========

待续...

文档下载地址:

http://pan.baidu.com/s/1jiFOM



mongodb文档 aggregate章节阅读的笔记

标签:

原文地址:http://my.oschina.net/u/1466553/blog/387272

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!