Comments (6)
@ThanatosXPF 您好,感谢使用,您的这个情况是由于预测的时候样本中出现的特征索引超出了预定义的Embedding字典的范围,建议您在进行feature config的时候考虑上测试集合的样本,或者使用哈希编码的方式将所有特征映射到一个定长的空间。predict
和predict_on_batch
方法本身应该是没有问题的。
您可以简单运行下下面的例子:
import pandas as pd
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss, roc_auc_score
from deepctr.models import DeepFM
if __name__ == "__main__":
data = pd.read_csv('./criteo_sample.txt')
sparse_features = ['C' + str(i) for i in range(1, 27)]
dense_features = ['I'+str(i) for i in range(1, 14)]
data[sparse_features] = data[sparse_features].fillna('-1', )
data[dense_features] = data[dense_features].fillna(0,)
target = ['label']
# 1.Label Encoding for sparse features,and do simple Transformation for dense features
for feat in sparse_features:
lbe = LabelEncoder()
data[feat] = lbe.fit_transform(data[feat])
mms = MinMaxScaler(feature_range=(0, 1))
data[dense_features] = mms.fit_transform(data[dense_features])
# 2.count #unique features for each sparse field,and record dense feature field name
sparse_feature_dict = {feat: data[feat].nunique()
for feat in sparse_features}
dense_feature_list = dense_features
# 3.generate input data for model
train, test = train_test_split(data, test_size=0.2)
train_model_input = [train[feat].values for feat in sparse_feature_dict] + \
[train[feat].values for feat in dense_feature_list]
test_model_input = [test[feat].values for feat in sparse_feature_dict] + \
[test[feat].values for feat in dense_feature_list]
# 4.Define Model,train,predict and evaluate
model = DeepFM({"sparse": sparse_feature_dict,
"dense": dense_feature_list}, final_activation='sigmoid')
model.compile("adam", "binary_crossentropy",
metrics=['binary_crossentropy'], )
history = model.fit(train_model_input, train[target].values,
batch_size=256, epochs=10, verbose=2, validation_split=0.2, )
pred_ans = model.predict(test_model_input, batch_size=256)
print("test LogLoss", round(log_loss(test[target].values, pred_ans), 4))
print("test AUC", round(roc_auc_score(test[target].values, pred_ans), 4))
希望能解决您的问题~
from deepctr.
@shenweichen 您好,我运行run_regression_movielens.py这个文件的时候,第一次运行正确没有问题,保持全部一样接着运行第二次就会报上面同样的错误
第一次结果:
WARNING:tensorflow:From C:\work\project\learn_test\DeepCTR-master\deepctr\layers.py:411: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2019-01-23 10:47:09.156000: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Train on 128 samples, validate on 32 samples
Epoch 1/1
- 1s - loss: 13.5552 - mean_squared_error: 13.5552 - val_loss: 15.7727 - val_mean_squared_error: 15.7727
test MSE 14.5247
Process finished with exit code 0
第二次报错结果:
Epoch 1/1
Traceback (most recent call last):
File "C:/work/project/learn_test/DeepCTR-master/examples/run_regression_movielens.py", line 31, in
batch_size=256, epochs=1, verbose=2, validation_split=0.2,)
File "C:\ProgramData\Anaconda2\envs\py3\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1639, in fit
validation_steps=validation_steps)
File "C:\ProgramData\Anaconda2\envs\py3\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 215, in fit_loop
outs = f(ins_batch)
File "C:\ProgramData\Anaconda2\envs\py3\lib\site-packages\tensorflow\python\keras\backend.py", line 2986, in call
run_metadata=self.run_metadata)
File "C:\ProgramData\Anaconda2\envs\py3\lib\site-packages\tensorflow\python\client\session.py", line 1439, in call
run_metadata_ptr)
File "C:\ProgramData\Anaconda2\envs\py3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 86 is not in [0, 7)
[[{{node linear_emb_4-age/embedding_lookup}} = ResourceGather[Tindices=DT_INT32, _class=["loc:@training/Adam/gradients/linear_emb_4-age/embedding_lookup_grad/Reshape"], dtype=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](linear_emb_4-age/embeddings, sparse_emb_4-age/Cast)]]
Process finished with exit code 1
而且是同样的数据,这个还会是特征索引超出了预定义的Embedding字典的范围的问题吗?
from deepctr.
@daisy-belle 您好,请问您是怎么运行第二次的?是通过
$ python .\run_regression_movielens.py
命令吗?两次程序运行应该是互不影响的
from deepctr.
@shenweichen 我是在windows环境下使用pycharm IDE 运行的,两次都是,一次运行完后什么都不改变接着运行第二次,就出现了上述那个问题
from deepctr.
@daisy-belle 我用pycharm试了一下没有问题呀....你试试用命令行运行两次...
from deepctr.
@shenweichen 嗯呀,我刚又试了一下,两次串行运行确实是没问题的,不过当运行第一个还没运行完,接着运行第二个才会出现上面的错误。开始的时候没注意以为第一次已经运行完了😂
from deepctr.
Related Issues (20)
- estimator with Multi-value Input HOT 1
- mmoe训练模型,测试集ctr和cvr的auc完全相等。
- deepfm模型如何实现多头输出?
- SDM 模型中,movielens中 genres 这种多值离散特征怎么处理
- The following Variables were used a Lambda layer's call,BatchNormalization
- Linear logic in DCNMIX
- The use of linear logic in DeepFM/DCNMIX
- ple可以只用于单任务吗
- 安装gpu版本报错 HOT 1
- 如何保存deepctr-torch训练好的deepfm模型 HOT 1
- DIN mask为何没有传入mask参数 HOT 1
- Implementing fix from Issue#344
- 多值特征代码有bug HOT 3
- save/load model error HOT 1
- model.predict only support np.array ?
- py3.11 to install error for h5py==3.7.0 which not support for py3.11 HOT 1
- 为什么GPU运行时SparseFeat中vocabulary_size的值大小不会引起错误
- How to self define metric instead of using one of the pre-defined metrics HOT 1
- feature interaction visualization
- I'm using this model with cpu, so I'm getting an error.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepctr.