Code Monkey home page Code Monkey logo

Comments (6)

shenweichen avatar shenweichen commented on May 27, 2024

@ThanatosXPF 您好,感谢使用,您的这个情况是由于预测的时候样本中出现的特征索引超出了预定义的Embedding字典的范围,建议您在进行feature config的时候考虑上测试集合的样本,或者使用哈希编码的方式将所有特征映射到一个定长的空间。predictpredict_on_batch方法本身应该是没有问题的。
您可以简单运行下下面的例子:

import pandas as pd

from sklearn.preprocessing import LabelEncoder, MinMaxScaler

from sklearn.model_selection import train_test_split

from sklearn.metrics import log_loss, roc_auc_score

from deepctr.models import DeepFM



if __name__ == "__main__":

    data = pd.read_csv('./criteo_sample.txt')



    sparse_features = ['C' + str(i) for i in range(1, 27)]

    dense_features = ['I'+str(i) for i in range(1, 14)]



    data[sparse_features] = data[sparse_features].fillna('-1', )

    data[dense_features] = data[dense_features].fillna(0,)

    target = ['label']



    # 1.Label Encoding for sparse features,and do simple Transformation for dense features

    for feat in sparse_features:

        lbe = LabelEncoder()

        data[feat] = lbe.fit_transform(data[feat])

    mms = MinMaxScaler(feature_range=(0, 1))

    data[dense_features] = mms.fit_transform(data[dense_features])



    # 2.count #unique features for each sparse field,and record dense feature field name



    sparse_feature_dict = {feat: data[feat].nunique()

                           for feat in sparse_features}

    dense_feature_list = dense_features



    # 3.generate input data for model



    train, test = train_test_split(data, test_size=0.2)

    train_model_input = [train[feat].values for feat in sparse_feature_dict] + \

        [train[feat].values for feat in dense_feature_list]

    test_model_input = [test[feat].values for feat in sparse_feature_dict] + \

        [test[feat].values for feat in dense_feature_list]



    # 4.Define Model,train,predict and evaluate

    model = DeepFM({"sparse": sparse_feature_dict,

                    "dense": dense_feature_list}, final_activation='sigmoid')

    model.compile("adam", "binary_crossentropy",

                  metrics=['binary_crossentropy'], )



    history = model.fit(train_model_input, train[target].values,

                        batch_size=256, epochs=10, verbose=2, validation_split=0.2, )

    pred_ans = model.predict(test_model_input, batch_size=256)

    print("test LogLoss", round(log_loss(test[target].values, pred_ans), 4))

    print("test AUC", round(roc_auc_score(test[target].values, pred_ans), 4))

希望能解决您的问题~

from deepctr.

daisy-belle avatar daisy-belle commented on May 27, 2024

@shenweichen 您好,我运行run_regression_movielens.py这个文件的时候,第一次运行正确没有问题,保持全部一样接着运行第二次就会报上面同样的错误
第一次结果:
WARNING:tensorflow:From C:\work\project\learn_test\DeepCTR-master\deepctr\layers.py:411: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2019-01-23 10:47:09.156000: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Train on 128 samples, validate on 32 samples
Epoch 1/1

  • 1s - loss: 13.5552 - mean_squared_error: 13.5552 - val_loss: 15.7727 - val_mean_squared_error: 15.7727
    test MSE 14.5247

Process finished with exit code 0

第二次报错结果:
Epoch 1/1
Traceback (most recent call last):
File "C:/work/project/learn_test/DeepCTR-master/examples/run_regression_movielens.py", line 31, in
batch_size=256, epochs=1, verbose=2, validation_split=0.2,)
File "C:\ProgramData\Anaconda2\envs\py3\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1639, in fit
validation_steps=validation_steps)
File "C:\ProgramData\Anaconda2\envs\py3\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 215, in fit_loop
outs = f(ins_batch)
File "C:\ProgramData\Anaconda2\envs\py3\lib\site-packages\tensorflow\python\keras\backend.py", line 2986, in call
run_metadata=self.run_metadata)
File "C:\ProgramData\Anaconda2\envs\py3\lib\site-packages\tensorflow\python\client\session.py", line 1439, in call
run_metadata_ptr)
File "C:\ProgramData\Anaconda2\envs\py3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 86 is not in [0, 7)
[[{{node linear_emb_4-age/embedding_lookup}} = ResourceGather[Tindices=DT_INT32, _class=["loc:@training/Adam/gradients/linear_emb_4-age/embedding_lookup_grad/Reshape"], dtype=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](linear_emb_4-age/embeddings, sparse_emb_4-age/Cast)]]

Process finished with exit code 1
而且是同样的数据,这个还会是特征索引超出了预定义的Embedding字典的范围的问题吗?

from deepctr.

shenweichen avatar shenweichen commented on May 27, 2024

@daisy-belle 您好,请问您是怎么运行第二次的?是通过

$  python .\run_regression_movielens.py

命令吗?两次程序运行应该是互不影响的

from deepctr.

daisy-belle avatar daisy-belle commented on May 27, 2024

@shenweichen 我是在windows环境下使用pycharm IDE 运行的,两次都是,一次运行完后什么都不改变接着运行第二次,就出现了上述那个问题

from deepctr.

shenweichen avatar shenweichen commented on May 27, 2024

@daisy-belle 我用pycharm试了一下没有问题呀....你试试用命令行运行两次...

from deepctr.

daisy-belle avatar daisy-belle commented on May 27, 2024

@shenweichen 嗯呀,我刚又试了一下,两次串行运行确实是没问题的,不过当运行第一个还没运行完,接着运行第二个才会出现上面的错误。开始的时候没注意以为第一次已经运行完了😂

from deepctr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.