码迷,mamicode.com
首页 > 其他好文 > 详细

第二周-词频统计更新

时间:2016-09-14 23:12:08      阅读:271      评论:0      收藏:0      [点我收藏+]

标签:

词频统计功能新增:

HTTPS:https://git.coding.net/li_yuhuan/WordFrequency.git

SSH:git@git.coding.net:li_yuhuan/WordFrequency.git

 

代码:

        static void Main(string[] args)
        {
            string str = "";
            int length = args.Length;
            
            switch (length)
            {
                case 0:
                    {
                        string line = Console.ReadLine();
                        Frequency(line);
                        break;
                    }

                case 1:
                    {
                        str = m_workPath + args[0] + ".txt";

                        if (File.Exists(str))
                        {
                            LoadFile(str);
                            DictionarySort(m_wordList);
                        }

                        break;
                    }

                case 2:
                    {
                        if ("-s" == args[0])
                        {
                            str = args[1];

                            if (File.Exists(str))
                            {
                                LoadFile(str);
                                DictionarySort(m_wordList);
                            }
                        }
                        else if ("dir" == args[0])
                        {
                            if (Directory.Exists(args[1]))
                            {
                                m_top = 10;
                                m_pathList.AddRange(Directory.GetFiles(args[1], "*.txt"));

                                int index;

                                foreach (string path in m_pathList)
                                {
                                    index = path.LastIndexOf("\\");

                                    if (index > 0)
                                    {
                                        string name = path.Substring(index + 1, path.Length - index - 1);
                                        Console.WriteLine(name);
                                    }

                                    LoadFile(path);
                                    DictionarySort(m_wordList);
                                }
                            }
                        }

                        break;
                    }

                default:
                    {
                        break;
                    }
            }
        }

判断通过控制台传入主函数的参数个数分情况处理;

1.没有参数时,则统计输入的一段文字中的单词总数及频次并排序;

2.有一个参数时判断当前的工作目录下是否存在为该名字的txt文件,存在则统计文件中单词总数及频次并排序;

3.有两个参数:

    1)当参数为-s + 文件时,判断文件是否存在,统计单词总数频次并排序;

    2)当参数为dir + 路径,判断路径是否存在,分别统计路径中所有txt文件中单词总数,频次,并排序;

--------------------------------------------------------------------------------------------------------------------------

 

        static private void LoadFile(string filepath)
        {
            string line = string.Empty;
            
            using (StreamReader reader = new StreamReader(filepath))
            {
                line = reader.ReadLine();

                while (line != null)
                {
                    Frequency(line);
                    line = reader.ReadLine();
                }
            }
        }

按行读取文件并对该行进行统计;

--------------------------------------------------------------------------------------------------------------------------

 

        static private void Frequency(string line)
        {
            List<string> words = new List<string>();
            string word = string.Empty;
            char[] split = {  , ,, ?, , ., , -, , ", :, , \r, \n, (, ), ,  };

            words.AddRange(line.Split(split));

            foreach (string str in words)
            {
                if (str != "" && str != null)
                {
                    word = str.ToLower();

                    if (m_wordList.ContainsKey(word))
                    {
                        m_wordList[word] += 1;
                    }
                    else
                    {
                        m_wordList.Add(word, 1);
                    }
                }
            }
        }

对传入的一行进行分割,存入list,遍历list进行比较统计,数据存入Dictionary;

--------------------------------------------------------------------------------------------------------------------------

 

        static private void DictionarySort(Dictionary<string, int> dictionary)
        {
            if (dictionary.Count > 0)
            {
                List<KeyValuePair<string, int>> lst = new List<KeyValuePair<string, int>>(dictionary);

                lst.Sort(delegate (KeyValuePair<string, int> s1, KeyValuePair<string, int> s2)
                {
                    return s2.Value.CompareTo(s1.Value);
                });

                dictionary.Clear();

                Console.WriteLine("total  " + lst.Count + "  words\n");

                int k = 0;

                foreach (KeyValuePair<string, int> kvp in lst)
                {
                    if (k < m_top)
                    {
                        Console.WriteLine(kvp.Key + ":" + kvp.Value);
                        k++;
                    }
                }

                Console.WriteLine("----------------------------\n");
            }
        }
    }

把dictionary中的键值对存入list,利用list进行排序;

 --------------------------------------------------------------------------------------------------------------------------

运行示例:

功能1:

技术分享

--------------------------------------------------------------------------------------------------------------------------

功能2:

技术分享

--------------------------------------------------------------------------------------------------------------------------

功能3:

技术分享

--------------------------------------------------------------------------------------------------------------------------

功能4:(未完成)

 

第二周-词频统计更新

标签:

原文地址:http://www.cnblogs.com/li-yuhuan/p/5873704.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!