码迷,mamicode.com
首页 > 其他好文 > 详细

deepctr_torch Experience

时间:2020-07-18 00:41:06      阅读:80      评论:0      收藏:0      [点我收藏+]

标签:tor   ase   current   eva   ade   gen   and   max   test   

1. We need to build the environemnt for the deepctr first

pip install -U deepctr_torch

 2. The current version is not supported by torch 1.5.0, we need to downgrade the version to 1.4.0

pip install --default-timeout=100 --no-cache-dir torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

3. Then we can run the examples in the directly

python run_deepfm.py

4. The code for deepfm is copied from Zhihu

import pandas as pd
from sklearn.metrics import log_loss, roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, MinMaxScaler

from deepctr_torch.models import DeepFM
from deepctr_torch.inputs import  SparseFeat, DenseFeat, get_feature_names

if __name__ == "__main__":
    data = pd.read_csv(criteo_sample.txt)

    sparse_features = [C + str(i) for i in range(1, 27)]
    dense_features = [I + str(i) for i in range(1, 14)]

    data[sparse_features] = data[sparse_features].fillna(-1, )
    data[dense_features] = data[dense_features].fillna(0, )
    target = [label]

    # 1.Label Encoding for sparse features,and do simple Transformation for dense features
    for feat in sparse_features:
        lbe = LabelEncoder()
        data[feat] = lbe.fit_transform(data[feat])
    mms = MinMaxScaler(feature_range=(0, 1))
    data[dense_features] = mms.fit_transform(data[dense_features])

    # 2.count #unique features for each sparse field,and record dense feature field name

    fixlen_feature_columns = [SparseFeat(feat, vocabulary_size=data[feat].nunique(),embedding_dim=4)
                           for i,feat in enumerate(sparse_features)] + [DenseFeat(feat, 1,)
                          for feat in dense_features]

    dnn_feature_columns = fixlen_feature_columns
    linear_feature_columns = fixlen_feature_columns

    feature_names = get_feature_names(linear_feature_columns + dnn_feature_columns)

    # 3.generate input data for model

    train, test = train_test_split(data, test_size=0.2)
    train_model_input = {name:train[name] for name in feature_names}
    test_model_input = {name:test[name] for name in feature_names}

    # 4.Define Model,train,predict and evaluate
    model = DeepFM(linear_feature_columns, dnn_feature_columns, task=binary)
    model.compile("adam", "binary_crossentropy",
                  metrics=[binary_crossentropy], )

    history = model.fit(train_model_input, train[target].values,
                        batch_size=256, epochs=10, verbose=2, validation_split=0.2, )
    pred_ans = model.predict(test_model_input, batch_size=256)
    print("test LogLoss", round(log_loss(test[target].values, pred_ans), 4))
    print("test AUC", round(roc_auc_score(test[target].values, pred_ans), 4))

5. The results are shown below

cpu
Train on 128 samples, validate on 32 samples, 1 steps per epoch
Epoch 1/10
0s - loss:  0.7001 - binary_crossentropy:  0.7001 - val_binary_crossentropy:  0.6912
Epoch 2/10
0s - loss:  0.6863 - binary_crossentropy:  0.6863 - val_binary_crossentropy:  0.6804
Epoch 3/10
0s - loss:  0.6727 - binary_crossentropy:  0.6727 - val_binary_crossentropy:  0.6702
Please check the latest version manually on https://pypi.org/project/deepctr-torch/#history
Epoch 4/10
0s - loss:  0.6596 - binary_crossentropy:  0.6596 - val_binary_crossentropy:  0.6602
Epoch 5/10
0s - loss:  0.6469 - binary_crossentropy:  0.6469 - val_binary_crossentropy:  0.6506
Epoch 6/10
0s - loss:  0.6343 - binary_crossentropy:  0.6343 - val_binary_crossentropy:  0.6414
Epoch 7/10
0s - loss:  0.6220 - binary_crossentropy:  0.6220 - val_binary_crossentropy:  0.6329
Epoch 8/10
0s - loss:  0.6101 - binary_crossentropy:  0.6101 - val_binary_crossentropy:  0.6251
Epoch 9/10
0s - loss:  0.5985 - binary_crossentropy:  0.5985 - val_binary_crossentropy:  0.6176
Epoch 10/10
0s - loss:  0.5870 - binary_crossentropy:  0.5870 - val_binary_crossentropy:  0.6102
test LogLoss 0.6146
test AUC 0.5273

6. Then we are done with implementing deepfm in deepctr_torch

 

deepctr_torch Experience

标签:tor   ase   current   eva   ade   gen   and   max   test   

原文地址:https://www.cnblogs.com/huzdong/p/13333980.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!