tf.contrib.learn.preprocessing.VocabularyProcessor()

时间：2018-05-07 14:47:35 阅读：794 评论：0 收藏：0 [点我收藏+]

标签：text item color 填充 mbed processor 运行 ppi lam

tf.contrib.learn.preprocessing.VocabularyProcessor (max_document_length, min_frequency=0, vocabulary=None, tokenizer_fn=None)

参数：

max_document_length: 文档的最大长度。如果文本的长度大于最大长度，那么它会被剪切，反之则用0填充。
min_frequency: 词频的最小值，出现次数小于最小词频则不会被收录到词表中。
vocabulary: CategoricalVocabulary 对象。
tokenizer_fn：分词函数

例子：

from tensorflow.contrib import learn
import numpy as np
max_document_length = 4
x_text =[
    ‘i love you‘,
    ‘me too‘
]
vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)
vocab_processor.fit(x_text)
print next(vocab_processor.transform([‘i me too‘])).tolist()
x = np.array(list(vocab_processor.fit_transform(x_text)))
print x

运行结果为：

[1, 2, 3, 0]
[[1 4 5 0]
 [2 3 0 0]]

看一下词和索引的对应：

embedding_size = len(vocab_processor.vocabulary_)
print embedding_size
vocab_dict = vocab_processor.vocabulary_._mapping
sorted_vocab = sorted(vocab_dict.items(), key = lambda x : x[1])
vocabulary = list(list(zip(*sorted_vocab))[0])
print vocabulary

结果是：

6
[‘<UNK>‘, ‘i‘, ‘me‘, ‘too‘, ‘love‘, ‘you‘]

tf.contrib.learn.preprocessing.VocabularyProcessor()

标签：text item color 填充 mbed processor 运行 ppi lam

原文地址：https://www.cnblogs.com/helloworld0604/p/9002337.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行