标签:oal cti nat connected att 比较 repr 容量 社交
原文: https://devblogs.nvidia.com/parallelforall/intersection-large-scale-graph-analytics-deep-learning/
摘要:
1)图在社交网络的数据分析中非常重要,而图变的越来越大,尽管内存容量也不断增加,in-memory的图处理仍然有局限,因此使用了一个基于Parallel Sliding Windows (PSW) 的分割技术,减小图处理对内存的需要。
2)图的并行化,基于edge的处理,比较容易负载均衡,但是数据共享少,效率不高。基于vertex的图处理,容易复用数据,但是由于现实中很多图的边呈现幂指分布,导致基于vertex的处理出现负载均衡问题。为此,提出了将vertex按照其degree的大小进行分类,degree大的vertex放在多个thread上并行,degree小的放在单个thread上执行,从而实现负载均衡,又不影响数据复用。
3)图和深度学习的关系,比较有趣的样子,就不勉强翻译了,上原文。 “Another challenge is its reliance on subject matter expertise. As with traditional machine learning, graph analysis requires a data scientist to identify important features of the data and build a solution from several algorithms to solve a larger problem. In the past few years, deep learning has emerged as a data analysis methodology that not only has had great success in numerous fields, but is also able to identify important features in data. In fact, in many cases deep learning solutions have out-performed systems built by subject matter experts with hand-crafted features. We would like to be able to take advantage of this quality by applying deep learning to graph analysis, but this is not straightforward. Deep learning requires regularized input, namely a vector of values, and real world graph data is anything but regular.”
4)如何把图数据转化为深度学习可以处理的数据
思路就是把在vertex上随机访问固定hop的节点,然后将这些节点组合起来,跟自然语言处理里面,使用word组合句子一样的,然后用这些向量作为vertex的特征数据。
原文“Our goal then is to generate a representation of each node that encodes the information about its neighborhood in a relatively small and fixed sized vector. There are a number of different ways to accomplish this including extracting locally connected regions of the graph for analysis with convolutional neural networks and using the graph structure itself to build a recurrent neural network. However, we chose to implement an algorithm in FUNL that takes inspiration from natural language processing. The algorithm, called DeepWalk, was developed by Perozzi et al. (2014) at Stonybrook University.”
不是很了解这个是不是社交网络里面的基本问题,不过应该是比较有趣的计算加速问题。
标签:oal cti nat connected att 比较 repr 容量 社交
原文地址:http://www.cnblogs.com/daxuelangren/p/5997726.html