码迷,mamicode.com
首页 > 其他好文 > 详细

Individual Project Records

时间:2014-09-22 18:36:53      阅读:328      评论:0      收藏:0      [点我收藏+]

标签:style   blog   http   io   os   ar   strong   for   2014   

At the midnight of September 20, I finished my individual projcet -- a word frequency program. You can find requirements in details at http://www.cnblogs.com/jiel/p/3978727.html

Before beginning coding, I suppose I can finfish it in about 4 hours or less, because it seems not difficult. Maybe IO part will cost 1hours and functional part cost 3 hours. But in fact, the program cost me about 6 hours, not including optimzing or other things.

Since the program is not complex, I didn‘t use object-oriented technology. I divide it in five modules, which are format checking, parameter configuration, words counting, words sorting and file writing. One module corresponds to one function ——checkFormat, config, countWords, sortWords and writeFile.

The first two modules and the last one mudule are relatively simple ones. I just want to talk about the last mudule, file writing, because I am not familiar with file io in c#. But this module is not hard. Use static method of FIle class  to get a filestream, and use the stream to initialize a streamwriter, and then you can write in file easy.

The other modules, words counting and words sorting, are the core of this program. Naturally, I use HashTable to implement words counting function, and use ArrayList to implement words sorting function. In simply mode, I first use Regex.split to get all the strings separated by separator, and then check these strings one by one whether they are a word satisfying definition. If a string is a word, I will add it into a word-num HashTable and a word-word HashTable. In the second HashTable I record the minimun directory order format. In mode2 and mode3, things change. I can not just use regex to pick up all continuous two or thress words. For example, if the content is "how are you", I should get two continuous two words, which are "how are" and "are you". My solution is assigning the index parameter in regex.Match method. I did not write Regex.Match here means regex is an object and the method is not a static method.

Review the precess of this code writing, I think it is not difficult as a whole. What costs really a lot of time is looking up information, because I am not very familiar with c#.

Works on performance analysis are as follows.

I have to say the performance analysis task really makes me agitated. I search on Internet for guide and do as guide says, but I cannot get expected result. Below is my permance analysis graph.

bubuko.com,布布扣

bubuko.com,布布扣

From the report I can see that countWords function toke most time. After all the main function of this program is counting words. In function countWords, function GetFiles and function myCountWordsInFiles divide the time about half to half. My center of optimizing is function myCountWordsInFiles.

Let‘s see information about function myCountWordsFiles.

bubuko.com,布布扣

Now, I know the optimizing target is function count.

In order to test my program, I structure 10 test cases.

1. Test recognition of word

"file123 123file 1er u4y5 asd"

Should be:

asd:1
file123:1

2. Test processing of same words when ignoring case

"File FILE file asd Asd ASD AsD"

Should be:

ASD:4
FILE:3

3.Test recognition of continuous two words

"abc def ghi jkl mno"

Should be:

abc def:1
def ghi:1
ghi jkl:1
jkl mno:1

4. Test sorting

"FILE file ASD asd asD ASC asc ASc"

Should be:

ASC:3
ASD:3
FILE:2

5. Test sorting continuous two words

"hello World
hello world

how are you
how Are you
How are you"

Should be:

Are you:3
How are:3
hello World:2

6. Test sorting continous three words

"how are you

how Are you

fine thank you and YOU
fine Thank you And you
fine thank YOU and"

Should be

fine Thank you:3
Thank you And:3
how Are you:2
you And you:2

7. Test empty directory and empty file

Should be: a empty output file

8. Test separator

"sgq&qwge#wet@wqe t$111sdf"

Should recognize words: sgq qwge wet wqe

9. Test files with suffix ".h", ".cs", ".cpp", ".txt" and files with other suffixes. 

Only content in files with suffix ".h", ".cs", ".cpp", ".txt" should be counted

10. Test with vast files including all above cases

Maybe I can consider that I have finished this project. But I do not think I obtain enough payback, compared to the time I spent on it. And I think the standard of evaluation that grade n correspond to 1/n of full points is really really sucks. It gives me heavy pressure. Maybe the biggest harvest is that I just wrote my firsh English blog. But thinking I wrote it for CE with that kind of standard of evaluation, I can not say I am happy.

 

Individual Project Records

标签:style   blog   http   io   os   ar   strong   for   2014   

原文地址:http://www.cnblogs.com/buaasts/p/3985877.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!