标签:
一、前言
上集介绍了使用XmlWriter如何写一个大型的xml,不难发现XmlWriter使用起来略比Linq to Xml麻烦一些,不过优势却是基本不消耗内存。不过XmlWriter的功能仅仅是写Xml,要读取Xml则需要依靠XmlReader,这就是今天的主角。、
二、准备工作
首先,准备一个大型的Xml:
private static void CreateLargeXmlFile(string fileName) { using (var writer = XmlWriter.Create(fileName, new XmlWriterSettings { Indent = true })) { writer.WriteStartDocument(); writer.WriteStartElement("root"); for (int i = 0; i < 100; i++) { writer.WriteStartElement("folder"); writer.WriteAttributeString("name", i.ToString()); for (int j = 0; j < 100; j++) { writer.WriteStartElement("folder"); writer.WriteAttributeString("name", j.ToString()); for (int k = 0; k < 1000; k++) { writer.WriteStartElement("file"); writer.WriteAttributeString("name", k.ToString()); writer.WriteEndElement(); } writer.WriteEndElement(); } writer.WriteEndElement(); } writer.WriteFullEndElement(); writer.WriteEndDocument(); } }
执行这个方法,就可以获得一个250M的大型Xml。
三、使用XmlReader读取Xml
1. 基础
using (var reader = XmlReader.Create("test.xml")) { while (reader.ReadToFollowing("file")) { var name = reader.GetAttribute("name"); if (name.EndsWith("0")) { Console.WriteLine(name); } } }
这段代码将打印出Xml中所有file节点,并且name是以"0"位结尾的name属性。
2. 升级版
XmlReader还有不少的方法,不过这样使用,实在是太累了。对于大多数的操作而言,Linq to Xml的API已经足够强大,所以不妨依葫芦画瓢,来一个简易版的Linq to XmlReader。
a. 扩展方法
public static class ElementsExtension { public static IEnumerable<XmlReader> Elements(this XmlReader reader, string name) { reader.Read(); while (reader.ReadToNextSibling(name)) { var result = reader.ReadSubtree(); result.Read(); yield return result; result.Close(); } } }
使用扩展方法:
private static void ReadByReader() { using (var reader = XmlReader.Create("test.xml")) { foreach (var item in from x in reader.Elements("root").First().Elements("folder") let parentName = x.GetAttribute("name") where int.Parse(parentName) % 20 == 0 from y in x.Elements("folder") let folderName = y.GetAttribute("name") where int.Parse(folderName) % 30 == 0 select string.Format("folder:{0}\tfolder:{1}", parentName, folderName)) { Console.WriteLine(item); } } }
结果:
folder:0 folder:0
folder:0 folder:30
folder:0 folder:60
folder:0 folder:90
folder:20 folder:0
folder:20 folder:30
folder:20 folder:60
folder:20 folder:90
folder:40 folder:0
folder:40 folder:30
folder:40 folder:60
folder:40 folder:90
folder:60 folder:0
folder:60 folder:30
folder:60 folder:60
folder:60 folder:90
folder:80 folder:0
folder:80 folder:30
folder:80 folder:60
folder:80 folder:90
这就是需要的结果,不过注意,在这个方法运行期间,整个应用程序的内存占用几乎维持在初始的内存占用状态(7M),完全无视读取的文件的巨大尺寸。
b. 注意事项
不过要注意的是这个扩展是有部分问题的,因为XmlReader是用流的方式工作的,所以有些写法会产生不想要的结果,例如:
private static void ReadByReader_Incorrect() { using (var reader = XmlReader.Create("test.xml")) { foreach (var item in from x in reader.Elements("root").First().Elements("folder") where int.Parse(x.GetAttribute("name")) % 20 == 0 from y in x.Elements("folder") where int.Parse(y.GetAttribute("name")) % 30 == 0 select string.Format("folder:{0}\tfolder:{1}", x.GetAttribute("name"), y.GetAttribute("name"))) { Console.WriteLine(item); } } }
看看和正确版本的区别吧,少了let语句,表面上看语义是完全正确的,但是,实际上x和y这两个XmlReader是共用一个流的,也就是说y移动了流的位置之后,x的流位置也被移动了,所以运行出来的结果是:
folder:0 folder:0
folder:30 folder:30
folder:60 folder:60
folder:90 folder:90
folder:0 folder:0
folder:30 folder:30
folder:60 folder:60
folder:90 folder:90
folder:0 folder:0
folder:30 folder:30
folder:60 folder:60
folder:90 folder:90
folder:0 folder:0
folder:30 folder:30
folder:60 folder:60
folder:90 folder:90
folder:0 folder:0
folder:30 folder:30
folder:60 folder:60
folder:90 folder:90
不难发现,x和y的读取出来的内容是完全一样的,这显然是这个扩展方法本身的缺陷。但是只要注意这些缺陷,还是可以在很大程度上简化编程的。
四、性能比较
比较双方:XmlReader加上文中的两个扩展 VS Linq to Xml
1. 理解比较:
2. 测试
测试工具:CodeTimer
测试数据:前面产生的250M左右的大型Xml,一个使用下面的code产生的小型xml(29K)
static void CreateSmallXmlFile(string fileName) { using (var writer = XmlWriter.Create(fileName, new XmlWriterSettings { Indent = true })) { writer.WriteStartDocument(); writer.WriteStartElement("root"); for (int i = 0; i < 10; i++) { writer.WriteStartElement("folder"); writer.WriteAttributeString("name", i.ToString()); for (int j = 0; j < 10; j++) { writer.WriteStartElement("folder"); writer.WriteAttributeString("name", j.ToString()); for (int k = 0; k < 10; k++) { writer.WriteStartElement("file"); writer.WriteAttributeString("name", k.ToString()); writer.WriteEndElement(); } writer.WriteEndElement(); } writer.WriteEndElement(); } writer.WriteEndElement(); writer.WriteEndDocument(); } }
测试代码:
static void ReadByReader(string fileName) { using (var reader = XmlReader.Create(fileName)) foreach (var item in from x in reader.Elements("root").First().Elements("folder") where x.GetAttribute("name") == "66" from y in x.Descendants("file") where y.GetAttribute("name") == "666" select y.GetAttribute("name")) ; } static void ReadByLinq_NoCache(string fileName) { XDocument doc = XDocument.Load(fileName); foreach (var item in from x in doc.Root.Elements("folder") where (string)x.Attribute("name") == "66" from y in x.Descendants("file") where (string)y.Attribute("name") == "666" select (string)x.Attribute("name")) ; } static XDocument m_doc; static void ReadByLinq_Cache(string fileName) { if (m_doc == null) m_doc = XDocument.Load(fileName); foreach (var item in from x in m_doc.Root.Elements("folder") where (string)x.Attribute("name") == "66" from y in x.Descendants("file") where (string)y.Attribute("name") == "666" select (string)x.Attribute("name")) ; } //main方法 Console.Write("Ready:"); string fileName = "LargeXmlFile.xml"; int turnCount = 10; Console.ReadLine(); CodeTimer.Time("Read by Xml Reader", turnCount, () => ReadByReader(fileName)); //CodeTimer.Time("Read by Linq to Xml (cache)", turnCount, () => ReadByLinq_Cache(fileName)); //CodeTimer.Time("Read by Linq to Xml (no cache)", turnCount, () => ReadByLinq_NoCache(fileName)); Console.ReadLine();
每次执行仅仅跑一个测试,避免不必要的误差,大xml文件仅跑10次,要不然时间太长,小xml文件跑1000次,结果如下:
a. 大Xml文件测试结果:
Ready:
Read by Xml Reader
Time Elapsed: 81,106ms
Time Elapsed (one time):8,110ms
CPU time: 81,062,500,000ns
CPU time (one time): 8,106,250,000ns
Gen 0: 310
Gen 1: 5
Gen 2: 0
内存占用停留在8M
Ready:
Read by Linq to Xml (cache)
Time Elapsed: 153,335ms
Time Elapsed (one time):15,333ms
CPU time: 77,687,500,000ns
CPU time (one time): 7,768,750,000ns
Gen 0: 209
Gen 1: 171
Gen 2: 27
内存占用800余M,由于Windows动用了虚拟内存,所以即使Cache了Xml的内容,速度依然很糟糕
Ready:
Read by Linq to Xml (no cache)
Time Elapsed: 1,100,787ms
Time Elapsed (one time):110,078ms
CPU time: 532,859,375,000ns
CPU time (one time): 53,285,937,500ns
Gen 0: 1645
Gen 1: 1131
Gen 2: 124
内存10次上涨到800余M,笔记本被痛苦的折磨了以后,终于得到结果了。。。
对大型Xml文件来说,XmlReader以极小的内存占用和极少的垃圾对象完胜。
b. 小型Xml文件的测试代码
Console.Write("Ready:"); string fileName = "SmallXmlFile.xml"; int turnCount = 1000; Console.ReadLine(); CodeTimer.Time("Read by Xml Reader", turnCount, () => ReadByReader(fileName)); //CodeTimer.Time("Read by Linq to Xml (cache)", turnCount, () => ReadByLinq_Cache(fileName)); //CodeTimer.Time("Read by Linq to Xml (no cache)", turnCount, () => ReadByLinq_NoCache(fileName)); Console.ReadLine();
看看测试结果:
Ready:
Read by Xml Reader
Time Elapsed: 1,835ms
Time Elapsed (one time):1ms
CPU time: 1,156,250,000ns
CPU time (one time): 1,156,250ns
Gen 0: 19
Gen 1: 0
Gen 2: 0
内存几乎没增加,没有1代和2代的垃圾
Ready:
Read by Linq to Xml (cache)
Time Elapsed: 258ms
Time Elapsed (one time):0ms
CPU time: 31,250,000ns
CPU time (one time): 31,250ns
Gen 0: 0
Gen 1: 0
Gen 2: 0
速度飞快,并且没有垃圾产生(因为整个XDocument还被缓存引用着)
Ready:
Read by Linq to Xml (no cache)
Time Elapsed: 1,589ms
Time Elapsed (one time):1ms
CPU time: 1,546,875,000ns
CPU time (one time): 1,546,875ns
Gen 0: 74
Gen 1: 11
Gen 2: 0
Xml比较小,所以没有内存压力,但是有1代的垃圾
小Xml文件的测试结果表明,Linq to Xml的主要代价在加载Xml本身,查找的代价非常小,也就是有Cache时,整个过程的时间仅仅是No Cache的1/6,而XmlReader方式无法Cache。
在同样No Cache的情况下,XmlReader的使用时间较长(多消耗15%),但是GC的压力相对较小(少消耗25%),同时CPU time也略小于Linq to Xml。
3. 结论
如果仅仅是读取Xml的话,XmlReader的优势还是相当大的。当Xml很大时,XmlReader是唯一能保证内存不会成为制约因素的读取方式,即使Xml文件不是很大的情况下,XmlReader也不会落后于其他方式太多。要说缺点的话,最大的确定就是API并不怎么容易使用,除非自己添加扩展方法。
标签:
原文地址:http://www.cnblogs.com/sunshineground/p/4581139.html