用C++处理文本的例子

时间：2015-07-01 10:03:06 阅读：190 评论：0 收藏：0 [点我收藏+]

最近又萌生了背单词的想法。在网上找到了一个词频表，里面包含使用频率最高的两万个单词，是 pdf 格式的，由于要把单词导入到手机软件中，我首先将它转化成了 txt 格式。转换后得到的文本格式很乱，有很多地方的顺序乱了，但是都是一个数字（单词使用频率的排名）后面紧接着一个单词，于是我考虑用程序把格式整理一下。目标效果是，每一行两个字符串，第一个字符串是排名，第二个字符串是相对应的单词，单词按照使用频率由高到低排序。

脚本语言处理文本似乎更方便一些，我只会一点 Python，而且并不是很熟悉。我对 C 和 C++ 更了解一些，前不久刚看完 C++ Primer，觉得使用 C++ 标准库也可以很方便快捷地达到目的。最后设计了如下的方案：

使用标准库中的 fstream 读取文本，可以实现忽略非打印字符，从文本中一次读取一个字符串
对读取到的字符串进行计数，如果是奇数则为排名，如果是偶数则为单词
使用一个 pair<int, string> 类型存储每一个“排名-单词”对，首先要使用 stoi 函数将 string 类型的排名转化成 int 类型
每得到一个 pair 对象，将其存入 vector 中，最后使用 sort 对 vector 进行排序，排序的依据是单词的排名，即 pair<int, string> 中的那个 int 值的大小
最后将 vector 格式化输出，得到最终的结果

最终的代码如下：

#include <string>
#include <vector>
#include <fstream>
#include <iostream>
#include <algorithm>
#include <utility>
#include <iomanip>

using namespace std;

int main()
{
    int count = 0;
    int key;
    string str;
    vector< pair<int, string> > words;

    ifstream fin("wordfreq-20000.txt");
    // Read a word each time
    while (fin >> str) {
        ++count;
        if (count % 2 == 1)
            key = stoi(str);
        else {
            pair<int, string> tmp{key, str};
            words.push_back(tmp);
        }
    }

    sort(words.begin(), words.end());
    for (const auto &w : words)
        cout << setw(5) << w.first << "  " << w.second << endl;

    return 0;
}

这么说来，码农真是一个还不错的职业:)

用C++处理文本的例子

标签：c++ 文本处理

原文地址：http://blog.csdn.net/kristpan/article/details/46706661

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行