Comments (1)
import random
fi = open("local_test", "r") ftrain = open("local_train_splitByUser", "w") ftest = open("local_test_splitByUser", "w")
while True: rand_int = random.randint(1, 10) noclk_line = fi.readline().strip() clk_line = fi.readline().strip() if noclk_line == "" or clk_line == "": break if rand_int == 2: print >> ftest, noclk_line print >> ftest, clk_line else: print >> ftrain, noclk_line print >> ftrain, clk_line 这个脚本对测试集划分为train和test,写的有问题吧? 不过看起来之前的步骤local_aggretor.py里就已经划分好了吧,
同样感觉是这样的,在DIN,DIEN,CAN代码中都是这样划分的,local_aggretor.py中划分好了已经:一个用户n个历史行为,训练集用的是n-1个行为。预测的时候是用前n-1个行为预测目标第n个行为。
至于split_by_user.py,个人感觉没有太大用。因为照这样划分的话,只是每个用户生成两个样本(一个正,一个负)。而预测的时候user跟训练时候的user又完全不是一个,所以user_id这个特征就不能用了(但是论文中用了这个特征,所以实际上处理时我们不需要split_by_user.py文件,可能是作者做测试的文件放进来了)。这样子看起来泛化性更高了。
建议两种划分方式可以都试一下,不过模型的输入可能要改!而且第一种会导致训练样本急剧增加(大小从30M->28.8GB)。
from dien.
Related Issues (20)
- split_by_user with file local_test??? HOT 1
- embedding size
- I got a super overfitting problem when I train DIN
- How to get the files "local_train_splitByUser", "uid_voc.pkl" etc. ?? HOT 1
- 有没有朋友能教一下怎么跑通这个程序啊? HOT 2
- how to run the code on gpu
- problem at fcn inputlayer:concat u_h_embedding sum pooling with AUGRU state
- module 'tensorflow.python.ops.rnn_cell_impl' has no attribute '_like_rnncell' HOT 1
- No module named 'cPickle' HOT 5
- About "queries-facts" and softmax in din attention part
- Perform steps 2 to 5 in the prepare_data.sh file, but the reviews_Books_5.json file required in step 6 is not available. HOT 1
- 请问一下这个amazon数据集怎么分成Electronics和Books两个子数据集呢? HOT 1
- QUESTION
- When i run as<python3 train.py train DIEN>
- I cant't find reviews-info in the data2.tar.gz? Can you reupload the reviews-info data,Thank you. HOT 1
- 可以分享下 Electronics 和 Industrial dataset 处理代码吗 HOT 1
- pytorch HOT 1
- Amazon Beauty has user profile information?
- Alibaba数据集
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dien.