Code Monkey home page Code Monkey logo

recbole-cdr's Introduction


RecBole-CDR

License

中文版

RecBole-CDR is a library built upon RecBole for reproducing and developing cross-domain recommendation algorithms.

Highlights

  • Automatic and compatible data processing for cross-domain recommendation: Our library designs a unified data structure for cross-domain recommendation, which inherits all the data pre-processing strategies in RecBole. The overlapped data in different domains can be matched automatically.
  • Flexible and customized model training strategies: Our library provides four basic training modes for cross-domain recommendation, which can be combined arbitrarily by users. It is also easy to customize training strategy in original way.
  • Extensive cross-domain recommendation algorithms: Based on unified data structure and flexible training strategies, several cross-domain recommendation algorithms are implemented and compared with others fairly.

Requirements

recbole==1.0.1
torch>=1.7.0
python>=3.7.0

Quick-Start

With the source code, you can use the provided script for initial usage of our library:

python run_recbole_cdr.py

This script will run the CMF model with ml-1m as source domain dataset and ml-100k as target domain dataset.

If you want to change the models, just run the script by setting additional command parameters:

python run_recbole_cdr.py --model=[model]

Implemented Models

We list currently supported Cross-Domain Recommendation models:

Result

Dataset

We collected and organized three pairs of datasets with one source domain and one target domain which are commonly used in cross-domain recommendation. Here we provide these datasets for reference:

Hyper-parameters

We carefully tune the hyper-parameters of the implemented models on these datasets and we provide these hyper-parameters here for reference:

  • Cross-domain-recommendation on Amazon datasets;
  • Cross-domain-recommendation on Book-Crossing datasets;
  • Cross-domain-recommendation on Douban datasets;

Contributing

Please let us know if you encounter a bug or have any suggestions by filing an issue.

We welcome all contributions from bug fixes to new features and extensions.

We expect all contributions discussed in the issue tracker and going through PRs.

The Team

RecBole-CDR is developed and maintained by members from RUCAIBox, the main developers are Zihan Lin (@linzihan-backforward), Gaowei Zhang (@Wicknight) and Shanlei Mu (@ShanleiMu).

Acknowledgement

The implementation is based on the open-source recommendation library RecBole.

Please cite the following paper as the reference if you use our code or processed datasets.

@inproceedings{zhao2021recbole,
  title={Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms},
  author={Wayne Xin Zhao and Shanlei Mu and Yupeng Hou and Zihan Lin and Kaiyuan Li and Yushuo Chen and Yujie Lu and Hui Wang and Changxin Tian and Xingyu Pan and Yingqian Min and Zhichao Feng and Xinyan Fan and Xu Chen and Pengfei Wang and Wendi Ji and Yaliang Li and Xiaoling Wang and Ji-Rong Wen},
  booktitle={{CIKM}},
  year={2021}
}

recbole-cdr's People

Contributors

linzihan-backforward avatar wicknight avatar wyh-han avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

recbole-cdr's Issues

求问:使用auc和logloss指标效果很差

您好,我是一名使用者,想用recbole-cdr进行跨域CTR任务,需要AUC与logloss做输出,但发现这两个指标输出效果很差。希望寻求参数/模型调整建议。
测试使用的是代码recbole_cdr/dataset_example下的两个数据集(source:ml-1m, target: ml-100k),使用theshold=4过滤标签。不论基础模型是哪个输出的AUC都在0.6左右。但相同的target数据集使用其他地方的单域模型代码(测试用的deepfm)都能达到AUC>0.75
我对一些超参数进行过调整(如xx_xx_num_interval, 学习率,valid_metric,甚至theshold=3等),但没有明显提升效果。
下面是我使用的recbole-cdr模型参数,请参考:

1.参数文件sample.yaml:

# dataset config
gpu_id: 0
state: INFO
field_separator: "\t"
use_gpu: True
seed: 2000
reproducibility: True
data_path: 'dataset/'
checkpoint_dir: 'saved'
show_progress: True
save_dataset: False
dataset_save_path: ~
save_dataloaders: False
dataloaders_save_path: ~
log_wandb: False
wandb_project: 'recbole_cdr'
normalize_all: True

# training settings
train_epochs: ["BOTH:300"]
train_batch_size: 2048
learner: adam
neg_sampling:
  uniform: 1
eval_step: 1
stopping_step: 10
clip_grad_norm: ~
weight_decay: 1e-3
loss_decimal_place: 6
require_pow: False

# evaluation settings
eval_args: 
  split: {'RS':[0.8,0.1,0.1]}
  group_by: None
  mode: labeled
repeatable: False
metrics: ['AUC', 'LogLoss']
valid_metric: AUC
valid_metric_bigger: True
eval_batch_size: 2048
metric_decimal_place: 6

source_domain:
  dataset: ml-1m
  data_path: 'dataset/'
  seq_separator: " "
  USER_ID_FIELD: user_id
  ITEM_ID_FIELD: item_id
  RATING_FIELD: rating
  TIME_FIELD: timestamp
  NEG_PREFIX: neg_
  LABEL_FIELD: label
  threshold:
    rating: 4
  load_col:
    inter: [user_id, item_id, rating]
  user_inter_num_interval: "[5,inf)"
  item_inter_num_interval: "[5,inf)"
  val_interval:
    rating: "[3,inf)"
  drop_filter_field: True

target_domain:
  dataset: ml-100k
  data_path: 'dataset/'
  seq_separator: ","
  USER_ID_FIELD: user_id
  ITEM_ID_FIELD: item_id
  RATING_FIELD: rating
  TIME_FIELD: timestamp
  NEG_PREFIX: neg_
  LABEL_FIELD: label
  threshold:
    rating: 4
  load_col:
    inter: [user_id, item_id, rating]
  user_inter_num_interval: "[5,inf)"
  item_inter_num_interval: "[5,inf)"
  val_interval:
    rating: "[3,inf)"
  drop_filter_field: True

2.python 文件:

import argparse
from recbole_cdr.quick_start import run_recbole_cdr


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--model', '-m', type=str, default='DTCDR', help='name of models')
    parser.add_argument('--config_files', type=str, default='sample.yaml', help='config files')

    args, _ = parser.parse_known_args()

    config_file_list = args.config_files.strip().split(' ') if args.config_files else None
    print(config_file_list)
    run_recbole_cdr(model=args.model, config_file_list=config_file_list)
  1. 其中一个基础模型DTCDR的yaml参数:
embedding_size: 64
base_model: NeuMF
learning_rate: 0.0005
mlp_hidden_size: [64, 64]
dropout_prob: 0.3
alpha: 0.3

感谢您的帮助!

[💡SUG] EMCDR训练花费时间

我按照参数设定,将EMCDR的参数放在parameter_dict中传入。我还把eval_batch_size改为了40960000。我使用的是Amazon的Book和Movie数据集,5-cores。但是已经跑了10个小时了,epoch为267了,还没有要停的趋势,性能也是0.001的增长。请问作者有没有模型性能的参数对照嘞?大概要跑多少epoch?

Ranking Metrics Question

Hi, I am trying to calculate the NDCG@10, MRR@10, and HR@10 on my dataset. My dataset is quite small, and some users may not rate up to 10 items in the test set. Usually, we look at the top 10 predictions and check to see if the predictions are in the ground truth for the given user. But, if the user in the test set does not rate that many items, how does RecBole-CDR handle this case?

[🐛BUG] pip install recbole-cdr问题

描述这个 bug
pip install recbole-cdr会出现一下问题
error: subprocess-exited-with-error

× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [149 lines of output]
setup.py:461: UserWarning: Unrecognized setuptools command ('dist_info --egg-base C:\Users\Z\AppData\Local\Temp\pip-modern-metadata-5czwbrdp'), proceeding with generating Cython sources and expanding templates
warnings.warn("Unrecognized setuptools command ('{}'), proceeding with "
setup.py:563: DeprecationWarning:

    `numpy.distutils` is deprecated since NumPy 1.23.0, as a result
    of the deprecation of `distutils` itself. It will be removed for
    Python >= 3.12. For older Python versions it will remain present.
    It is recommended to use `setuptools < 60.0` for those Python versions.
    For more details, see:
      https://numpy.org/devdocs/reference/distutils_status_migration.html


    from numpy.distutils.core import setup
  Running from SciPy source directory.
  INFO: lapack_opt_info:
  INFO: lapack_armpl_info:
  INFO: No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
  INFO: customize MSVCCompiler
  INFO:   libraries armpl_lp64_mp not found in ['D:\\app\\anaconda3\\download\\envs\\mygpu\\lib', 'C:\\', 'D:\\app\\anaconda3\\download\\envs\\mygpu\\libs', 'D:\\app\\anaconda3\\download\\Library\\lib']
  INFO:   NOT AVAILABLE
  INFO:
  INFO: lapack_mkl_info:
  INFO:   libraries mkl_rt not found in ['D:\\app\\anaconda3\\download\\envs\\mygpu\\lib', 'C:\\', 'D:\\app\\anaconda3\\download\\envs\\mygpu\\libs', 'D:\\app\\anaconda3\\download\\Library\\lib']
  INFO:   NOT AVAILABLE
  INFO:
  INFO: lapack_ssl2_info:
  INFO:   libraries fjlapackexsve not found in ['D:\\app\\anaconda3\\download\\envs\\mygpu\\lib', 'C:\\', 'D:\\app\\anaconda3\\download\\envs\\mygpu\\libs', 'D:\\app\\anaconda3\\download\\Library\\lib']
  INFO:   NOT AVAILABLE
  INFO:
  INFO: openblas_lapack_info:
  INFO:   libraries openblas not found in ['D:\\app\\anaconda3\\download\\envs\\mygpu\\lib', 'C:\\', 'D:\\app\\anaconda3\\download\\envs\\mygpu\\libs', 'D:\\app\\anaconda3\\download\\Library\\lib']
  INFO: get_default_fcompiler: matching types: '['gnu', 'intelv', 'absoft', 'compaqv', 'intelev', 'gnu95', 'g95', 'intelvem', 'intelem', 'flang']'
  INFO: customize GnuFCompiler
  WARN: Could not locate executable g77
  WARN: Could not locate executable f77
  INFO: customize IntelVisualFCompiler
  WARN: Could not locate executable ifort
  WARN: Could not locate executable ifl
  INFO: customize AbsoftFCompiler
  WARN: Could not locate executable f90
  INFO: customize CompaqVisualFCompiler
  INFO: Found executable D:\app\git\Git\usr\bin\DF.exe
  INFO: customize IntelItaniumVisualFCompiler
  WARN: Could not locate executable efl
  INFO: customize Gnu95FCompiler
  WARN: Could not locate executable gfortran
  WARN: Could not locate executable f95
  INFO: customize G95FCompiler
  WARN: Could not locate executable g95
  INFO: customize IntelEM64VisualFCompiler
  INFO: customize IntelEM64TFCompiler
  WARN: Could not locate executable efort
  WARN: Could not locate executable efc
  INFO: customize PGroupFlangCompiler
  WARN: Could not locate executable flang
  WARN: don't know how to compile Fortran code on platform 'nt'
  INFO:   NOT AVAILABLE
  INFO:
  INFO: openblas_clapack_info:
  INFO:   libraries openblas,lapack not found in ['D:\\app\\anaconda3\\download\\envs\\mygpu\\lib', 'C:\\', 'D:\\app\\anaconda3\\download\\envs\\mygpu\\libs', 'D:\\app\\anaconda3\\download\\Library\\lib']
  INFO:   NOT AVAILABLE
  INFO:
  INFO: flame_info:
  INFO:   libraries flame not found in ['D:\\app\\anaconda3\\download\\envs\\mygpu\\lib', 'C:\\', 'D:\\app\\anaconda3\\download\\envs\\mygpu\\libs', 'D:\\app\\anaconda3\\download\\Library\\lib']
  INFO:   NOT AVAILABLE
  INFO:
  INFO: accelerate_info:
  INFO:   NOT AVAILABLE
  INFO:
  INFO: atlas_3_10_threads_info:
  INFO: Setting PTATLAS=ATLAS
  INFO:   libraries tatlas,tatlas not found in D:\app\anaconda3\download\envs\mygpu\lib
  INFO:   libraries tatlas,tatlas not found in C:\
  INFO:   libraries tatlas,tatlas not found in D:\app\anaconda3\download\envs\mygpu\libs
  INFO:   libraries tatlas,tatlas not found in D:\app\anaconda3\download\Library\lib
  INFO: <class 'numpy.distutils.system_info.atlas_3_10_threads_info'>
  INFO:   NOT AVAILABLE
  INFO:
  INFO: atlas_3_10_info:
  INFO:   libraries satlas,satlas not found in D:\app\anaconda3\download\envs\mygpu\lib
  INFO:   libraries satlas,satlas not found in C:\
  INFO:   libraries satlas,satlas not found in D:\app\anaconda3\download\envs\mygpu\libs
  INFO:   libraries satlas,satlas not found in D:\app\anaconda3\download\Library\lib
  INFO: <class 'numpy.distutils.system_info.atlas_3_10_info'>
  INFO:   NOT AVAILABLE
  INFO:
  INFO: atlas_threads_info:
  INFO: Setting PTATLAS=ATLAS
  INFO:   libraries ptf77blas,ptcblas,atlas not found in D:\app\anaconda3\download\envs\mygpu\lib
  INFO:   libraries ptf77blas,ptcblas,atlas not found in C:\
  INFO:   libraries ptf77blas,ptcblas,atlas not found in D:\app\anaconda3\download\envs\mygpu\libs
  INFO:   libraries ptf77blas,ptcblas,atlas not found in D:\app\anaconda3\download\Library\lib
  INFO: <class 'numpy.distutils.system_info.atlas_threads_info'>
  INFO:   NOT AVAILABLE
  INFO:
  INFO: atlas_info:
  INFO:   libraries f77blas,cblas,atlas not found in D:\app\anaconda3\download\envs\mygpu\lib
  INFO:   libraries f77blas,cblas,atlas not found in C:\
  INFO:   libraries f77blas,cblas,atlas not found in D:\app\anaconda3\download\envs\mygpu\libs
  INFO:   libraries f77blas,cblas,atlas not found in D:\app\anaconda3\download\Library\lib
  INFO: <class 'numpy.distutils.system_info.atlas_info'>
  INFO:   NOT AVAILABLE
  INFO:
  INFO: lapack_info:
  INFO:   libraries lapack not found in ['D:\\app\\anaconda3\\download\\envs\\mygpu\\lib', 'C:\\', 'D:\\app\\anaconda3\\download\\envs\\mygpu\\libs', 'D:\\app\\anaconda3\\download\\Library\\lib']
  INFO:   NOT AVAILABLE
  INFO:
  C:\Users\Z\AppData\Local\Temp\pip-build-env-i716z55i\overlay\Lib\site-packages\numpy\distutils\system_info.py:1974: UserWarning:
      Lapack (http://www.netlib.org/lapack/) libraries not found.
      Directories to search for the libraries can be specified in the
      numpy/distutils/site.cfg file (section [lapack]) or by setting
      the LAPACK environment variable.
    return getattr(self, '_calc_info_{}'.format(name))()
  INFO: lapack_src_info:
  INFO:   NOT AVAILABLE
  INFO:
  C:\Users\Z\AppData\Local\Temp\pip-build-env-i716z55i\overlay\Lib\site-packages\numpy\distutils\system_info.py:1974: UserWarning:
      Lapack (http://www.netlib.org/lapack/) sources not found.
      Directories to search for the sources can be specified in the
      numpy/distutils/site.cfg file (section [lapack_src]) or by setting
      the LAPACK_SRC environment variable.
    return getattr(self, '_calc_info_{}'.format(name))()
  INFO:   NOT AVAILABLE
  INFO:
  Traceback (most recent call last):
    File "D:\app\anaconda3\download\envs\mygpu\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
      main()
    File "D:\app\anaconda3\download\envs\mygpu\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "D:\app\anaconda3\download\envs\mygpu\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 149, in prepare_metadata_for_build_wheel       
      return hook(metadata_directory, config_settings)
    File "C:\Users\Z\AppData\Local\Temp\pip-build-env-i716z55i\overlay\Lib\site-packages\setuptools\build_meta.py", line 161, in prepare_metadata_for_build_wheel
      self.run_setup()
    File "C:\Users\Z\AppData\Local\Temp\pip-build-env-i716z55i\overlay\Lib\site-packages\setuptools\build_meta.py", line 253, in run_setup
      super(_BuildMetaLegacyBackend,
    File "C:\Users\Z\AppData\Local\Temp\pip-build-env-i716z55i\overlay\Lib\site-packages\setuptools\build_meta.py", line 145, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 588, in <module>
      setup_package()
    File "setup.py", line 584, in setup_package
      setup(**metadata)          raise NotFoundError(msg)
    File "C:\Users\Z\AppData\Local\Temp\pip-build-env-i716z55i\overlay\Lib\site-packages\numpy\distutils\core.py", line 136, in setup
      config = configuration()
    File "setup.py", line 499, in configuration
      raise NotFoundError(msg)
  numpy.distutils.system_info.NotFoundError: No BLAS/LAPACK libraries found.
  To build Scipy from sources, BLAS & LAPACK libraries need to be installed.
  See site.cfg.example in the Scipy source directory and
  https://docs.scipy.org/doc/scipy/reference/building/index.html for details.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

如何复现
我看到作者需要的requirements包是recbole==1.0.1,这个包貌似也没法正常安装了。但是我可以正常安装rebole==1.2版本。想问问有没有人也遇到了这个问题。

实验环境(请补全下列信息):

  • 操作系统: [Windows]
  • RecBole 版本 [如 1.2.0]
  • Python 版本 [如 3.10.13]
  • PyTorch 版本 [如2.1.0]

Hyperparameter tuning question

Hi, I am a bit new to hyperparameter tuning in Recbole-CDR. I ran the CoNet algorithm on my dataset, and the results seemed to be very poor. For the test dataset I am using, I am getting NDCG@10 values to be around 0.9 on a model that I coded up, so I believe the CoNet values should be around the same range, since CoNet is a strong baseline in Cross-Domain Recommendation. Below are the results I am getting when running CoNet.

INFO test result: OrderedDict([('recall@10', 0.0179), ('mrr@10', 0.0063), ('ndcg@10', 0.0087), ('hit@10', 0.0183), ('precision@10', 0.0018)])

I believe I need to tune the parameters for the model for the numbers to be a lot better. I want to tune the batch size, embedding size, the number of dense layers, the learning rate, and any other parameter that can be tuned. After tuning the hyperparameters, I want to use the best model to make recommendations on the test set.

Please let me know how I can tune the different hyperparameters, and used the best model on the test set to collect the metric values.

Right now, I am using the default values that come with the run_recbole_cdr.py file. Below are the default values that I am running CoNet with:

Evaluation Hyper Parameters:
eval_args = {'group_by': 'user', 'order': 'TO', 'split': {'RS': [0.7, 0.2, 0.1]}, 'mode': 'full'}
repeatable = False
metrics = ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']
topk = [10]
valid_metric = MRR@10
valid_metric_bigger = True
eval_batch_size = 4096
metric_decimal_place = 4
Other Hyper Parameters:
wandb_project = recbole_cdr
train_epochs = ['BOTH:300']
require_pow = False
embedding_size = 64
reg_weight = 0.01
mlp_hidden_size = [64, 32, 16, 8]
MODEL_TYPE = ModelType.CROSSDOMAIN
MODEL_INPUT_TYPE = InputType.POINTWISE
eval_type = EvaluatorType.RANKING
train_modes = ['BOTH']
epoch_num = ['300']
source_split = False
device = cuda
train_neg_sample_args = {'strategy': 'by', 'by': 1, 'distribution': 'uniform', 'dynamic': 'none'}
eval_neg_sample_args = {'strategy': 'full', 'distribution': 'uniform'}
Source domain: ./comedy_data/comedy
The number of users: 2217
Average actions of users: 16.08528880866426
The number of items: 4977
Average actions of items: 7.16338424437299
The number of inters: 35645
The sparsity of the dataset: 99.6769533176926%
Remain Fields: ['source_user_id', 'source_item_id', 'source_rating', 'source_timestamp']
Target domain: ./action_data/action
The number of users: 2217
Average actions of users: 19.935469314079423
The number of items: 2927
Average actions of items: 15.098086124401913
The number of inters: 44177
The sparsity of the dataset: 99.31921840719268%
Remain Fields: ['target_user_id', 'target_item_id', 'target_rating', 'target_timestamp']
Num of overlapped user: 2217
Num of overlapped item: 1
INFO  [Training]: train_batch_size = [2048] negative sampling: [{'uniform': 1}]
INFO  [Evaluation]: eval_batch_size = [4096] eval_args: [{'group_by': 'user', 'order': 'TO', 'split': {'RS': [0.7, 0.2, 0.1]}, 'mode': 'full'}] 

How to add new function to any particular model

I want to add some feature to SSCDR model. But I'm not able to call the function. I just need to call the function once and do further mapping. I tried every possible way but it gives "index out of bound error" since it falls under iteration.

Please help me out resolving this issue.

[🐛BUG] TypeError: expected Tensor as element 0 in argument 0, but got Interaction

Hi developers,
I am a new user of RecBole-CDR. I follow the tutorial and reproduce the DTCDR method, but got an error.

Here is the overall config:
I modified the eval_args.mode from full to uni999, I guss this change cause the error, could you tell me how to fix it?

# general
gpu_id: 0
use_gpu: True
seed: 2022
state: INFO
reproducibility: True
data_path: 'dataset/'
checkpoint_dir: 'saved'
show_progress: True
save_dataset: False
dataset_save_path: ~
save_dataloaders: False
dataloaders_save_path: ~
log_wandb: False
wandb_project: 'recbole_cdr'

# training settings
train_epochs: ["BOTH:300"]
train_batch_size: 4096
learner: adam
learning_rate: 0.0005 #0.001
neg_sampling:
  uniform: 1
eval_step: 1
stopping_step: 10
clip_grad_norm: ~
# clip_grad_norm:  {'max_norm': 5, 'norm_type': 2}
weight_decay: 0.0
loss_decimal_place: 4
require_pow: False

# evaluation settings
eval_args: 
  split: {'RS':[0.8,0.1,0.1]}
  split_valid: {'RS':[0.8,0.2]}
  group_by: user
  order: RO
  mode: uni999 # full
repeatable: False
metrics: ["Recall","MRR","NDCG","Hit","Precision"]
topk: [10]
valid_metric: MRR@10
valid_metric_bigger: True
eval_batch_size: 409600
metric_decimal_place: 4

Others config here

DTCDR.yaml

embedding_size: 64
base_model: NeuMF
mlp_hidden_size: [64, 64]
dropout_prob: 0.3
alpha: 0.3

dataset config

# dataset config
gpu_id: 0
state: INFO
seed: 2022
field_separator: "\t"
source_domain:
  dataset: AmazonBooks
  data_path: '/data/home/work/projects/RecBole-CDR/recbole_cdr/dataset_example/'
  USER_ID_FIELD: user_id
  ITEM_ID_FIELD: item_id
  RATING_FIELD: rating
  TIME_FIELD: timestamp
  NEG_PREFIX: neg_
  LABEL_FIELD: label
  load_col:
    inter: [user_id, item_id, rating]
  user_inter_num_interval: "[10,inf)"
  item_inter_num_interval: "[10,inf)"
  val_interval:
    rating: "[3,inf)"
  drop_filter_field: True

target_domain:
  dataset: AmazonMov
  data_path: '/data/home/work/projects/RecBole-CDR/recbole_cdr/dataset_example/'
  USER_ID_FIELD: user_id
  ITEM_ID_FIELD: item_id
  RATING_FIELD: rating
  TIME_FIELD: timestamp
  NEG_PREFIX: neg_
  LABEL_FIELD: label
  load_col:
    inter: [user_id, item_id, rating]
  user_inter_num_interval: "[10,inf)"
  item_inter_num_interval: "[10,inf)"
  val_interval:
    rating: "[3,inf)"
  drop_filter_field: True

Getting metric value for each user

Hi, i am currently working on a project where I am trying to compare different algorithms together to see if the results are statistically significant. How can I get the HR, NDCG and MRR values for each user instead of getting one value for the results.

For example, if my dataset has 10 users in the test set, how can I retrieve all the 10 MRR,NDCG and HR values for each user instead of an average. As of right now, the model is returning "INFO test result: OrderedDict([('recall@10', 0.0233), ('mrr@10', 0.0098), ('ndcg@10', 0.0125), ('hit@10', 0.025), ('precision@10', 0.0025)])". But, I want each metric for all the users in the dataset.

Please let me know how to get that.

Thank you!

关于 Bi-TGCF 代码实现的咨询

您好,在 Bi-TGCF 实现中,在类的初始化部分,发现指定这些行为 0(应该是不在 source/target 中的用户?),不知道有没有必要呢?

with torch.no_grad():
self.source_user_embedding.weight[self.overlapped_num_users: self.target_num_users].fill_(0)
self.source_item_embedding.weight[self.overlapped_num_items: self.target_num_items].fill_(0)
self.target_user_embedding.weight[self.target_num_users:].fill_(0)
self.target_item_embedding.weight[self.target_num_items:].fill_(0)

因为后面紧跟着又来了个参数初始化:

self.apply(xavier_normal_initialization)

Neither [dataset/DoubanBook] exists in the devicenor [DoubanBook] a known dataset name.

你好,当我试图把训练数据集改为DoubanBook和DoubanMoive的时候报了以下错误:(我按照https://github.com/RUCAIBox/RecBole-CDR/blob/main/results/Douban.md的内容修改properties文件)

<module 'recbole_cdr.data.dataset' from '/data/guzeng/RecBole-CDR/recbole_cdr/data/dataset.py'>
Traceback (most recent call last):
File "run_recbole_cdr.py", line 22, in
run_recbole_cdr(model=args.model, config_file_list=config_file_list)
File "/data/guzeng/RecBole-CDR/recbole_cdr/quick_start/quick_start.py", line 41, in run_recbole_cdr
dataset = create_dataset(config)
File "/data/guzeng/RecBole-CDR/recbole_cdr/data/utils.py", line 72, in create_dataset
dataset = dataset_class(config)
File "/data/guzeng/RecBole-CDR/recbole_cdr/data/dataset.py", line 312, in init
self.source_domain_dataset = CrossDomainSingleDataset(source_config, domain='source')
File "/data/guzeng/RecBole-CDR/recbole_cdr/data/dataset.py", line 31, in init
super().init(config)
File "/data/guzeng/anaconda3/envs/py37/lib/python3.7/site-packages/recbole/data/dataset/dataset.py", line 96, in init
self._from_scratch()
File "/data/guzeng/anaconda3/envs/py37/lib/python3.7/site-packages/recbole/data/dataset/dataset.py", line 106, in _from_scratch
self._load_data(self.dataset_name, self.dataset_path)
File "/data/guzeng/anaconda3/envs/py37/lib/python3.7/site-packages/recbole/data/dataset/dataset.py", line 246, in _load_data
self._download()
File "/data/guzeng/anaconda3/envs/py37/lib/python3.7/site-packages/recbole/data/dataset/dataset.py", line 218, in _download
url = self._get_download_url('url')
File "/data/guzeng/anaconda3/envs/py37/lib/python3.7/site-packages/recbole/data/dataset/dataset.py", line 213, in _get_download_url
f'Neither [{self.dataset_path}] exists in the device'
ValueError: Neither [dataset/DoubanBook] exists in the devicenor [DoubanBook] a known dataset name.

请问RecBole-CDR有相关的使用文档吗?谢谢

[💡SUG] Upgrade the code to be compatible to latest version(v1.2) of Bole升级兼容最新版的Belo(v1.2)

Is your feature request related to a problem? Please describe.
I was frustrated to install RecBole version 1.0.1 via either conda or pip, since dependency(scipy/colorlog ) version conflits (MAC OS 14.4.1 (23E224) M2 Pro).

Describe the solution you'd like
upgrade to latest version of RecBelo.

Describe alternatives you've considered
I'm trying to reproduce the result of one of the latest accepted paper, in which it refer to Cross-Domain RecBole, a very powerful and focused framework.
I'm trying to upgrade myself, but need help from original author or maintainer.

Additional context
我的思路:

  • 查主要依赖的类/文件的更新内容,同步现有更新。能否帮忙review我的更新?
  • 讨论重构一下以应对新的更新。能否约个时间沟通?

[🐛BUG] overlap user id

Hi, I am confused about the user ID preprocessing process:
For cross-domain recommendation (users partially overlap):
Source user id : [1,2,3,4], target user id : [1,2,3,5], the id list is shared.

How could i run the conet.py. I am very confused about

            self.source_user_embedding.weight[self.overlapped_num_users: self.target_num_users].fill_(0)
            self.source_item_embedding.weight[self.overlapped_num_items: self.target_num_items].fill_(0)

            self.target_user_embedding.weight[self.target_num_users:].fill_(0)
            self.target_item_embedding.weight[self.target_num_items:].fill_(0)

[🐛BUG] No module named 'recbole_cdr.model.cross_domain_recommender'

Describe the bug
I got this error

To Reproduce
Steps to reproduce the behavior:
Create new env
execute:

!pip install recbole==1.0.1
!pip install recbole-cdr
from recbole_cdr.quick_start import run_recbole_cdr

parameter_dict={
       # dataset info
       'source_domain': {
              'dataset': 'ml-1m',
              'data_path': 'dataset/'},
       'target_domain': {
              'dataset': 'ml-100k',
              'data_path': 'dataset/target/',
              'user_inter_num_interval': '[5,inf)'},
       # other settings
       'train_epochs': ['SOURCE:300','TARGET:300','OVERLAP:300']
       }

run_recbole_cdr(model='EMCDR', config_dict=parameter_dict)```

 - OS: Windows
- RecBole Version [e.g. 0.1.0]
- Python Version 3.8.9

Embedding Mismatch

Hi, I am running the CoNet algorithm on two datasets. On some datasets, the algorithm is outputting results, and is working fine. But, on some other cases, I am getting this error:

File "/home/akrish/test-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CoNet:
size mismatch for source_user_embedding.weight: copying a param with shape torch.Size([7572, 64]) from checkpoint, the shape in current model is torch.Size([8649, 64]).
size mismatch for target_user_embedding.weight: copying a param with shape torch.Size([7572, 64]) from checkpoint, the shape in current model is torch.Size([8649, 64]).
size mismatch for source_item_embedding.weight: copying a param with shape torch.Size([6843, 64]) from checkpoint, the shape in current model is torch.Size([4222, 64]).
size mismatch for target_item_embedding.weight: copying a param with shape torch.Size([6843, 64]) from checkpoint, the shape in current model is torch.Size([4222, 64]).

I am also getting this same error when running the DTCDR, CMF, and CLMF algorithms. I am also using a GPU, so I don't know if that may cause an issue.

My Yaml file looks like this, where the dataset points to the .inter files for each domain....

source_domain:

seed: 44
gpu_id: "0"
dataset: '/home/akrish/fall_2022/dataframes/action_data/action'
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
RATING_FIELD: rating

load_col:
    inter: [user_id, item_id, rating]

user_inter_num_interval: "[0,inf)"
item_inter_num_interval: "[0,inf)"
val_interval:
    rating: "[0,inf)"

target_domain:

seed: 44
gpu_id: "0"
dataset: '/home/akrish/fall_2022/dataframes/adventure_data/adventure'
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
RATING_FIELD: rating

load_col:
    inter: [user_id, item_id, rating]

user_inter_num_interval: "[0,inf)"
item_inter_num_interval: "[0,inf)"
val_interval:
    rating: "[0,inf)"

I would appreciate any assistance on this issue.

[💡SUG] Support for dual-target or multi-target cross-domain recommendation

Is your feature request related to a problem? Please describe.
Current RecBole-CDR only supports single-target CDR models. Will you add support for multi-target CDR problem mentioned in this survey in a future release? Multi-target CDR is to improve the recommendation accuracy in all domains simultaneously rather than just one target domain.

Describe the solution you'd like
New dataset, dataloader and trainer supporting multi-target CDR.

Describe alternatives you've considered
I find this part in dataset.py. But I still don't know how to set source_split_flag and change single-target CDR to multi-target CDR

if not source_split_flag:
source_domain_train_dataset = self.source_domain_dataset
source_domain_train_dataset._change_feat_format()
return [source_domain_train_dataset, None, target_domain_train_dataset,
target_domain_valid_dataset, target_domain_test_dataset]
else:
source_domain_train_dataset, source_domain_valid_dataset = self.source_domain_dataset.split_train_valid()
return [source_domain_train_dataset, source_domain_valid_dataset, target_domain_train_dataset,
target_domain_valid_dataset, target_domain_test_dataset]

[🐛BUG] CrossDomainSingleDataset。

描述这个 bug
由于field2id_token用于存储某个feature下,项目中的remap_id及其所对应的原始token。dataset.py中,CrossDomainSingleDataset的_remap_fields()中,通过该行代码实现:
self.field2id_token[field_name] = list(map_dict.keys())
但由于map_dict为ChainMap格式,通过.keys()取出其中的token时,顺序并非按照chain中dict的存储顺序。举个例子,针对user_id,顺序应该为先overlap_user再domain_specific_user,而上述代码得到的user_id顺序刚好相反。因此需要修改取出token的方式。
image

预期
修改取key方式,定义get_keys_from_chainmap_by_order()用于按正序取ChainMap的keys(即原始token):

def get_keys_from_chainmap_by_order(map_dict):
    merged_dict = dict()
    for dict_item in map_dict.maps:
        merged_dict.update(dict_item)
    return list(merged_dict.keys())

self.field2id_token[field_name] = get_keys_from_chainmap_by_order(map_dict)

按要求装好环境之后报这样的bug,请问是什么原因呢

Traceback (most recent call last):
File "/home/baoyanghao/.pycharm_helpers/pydev/pydevd.py", line 1438, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/baoyanghao/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/baoyanghao/ss/mycode/RecBole-CDR/run_recbole_cdr.py", line 18, in
run_recbole_cdr(model=args.model, config_file_list=config_file_list)
File "/home/baoyanghao/ss/mycode/RecBole-CDR/recbole_cdr/quick_start/quick_start.py", line 43, in run_recbole_cdr
train_data, valid_data, test_data = data_preparation(config, dataset)
File "/home/baoyanghao/ss/mycode/RecBole-CDR/recbole_cdr/data/utils.py", line 87, in data_preparation
dataloaders = load_split_dataloaders(config)
File "/home/baoyanghao/anaconda3/envs/pytorch17/lib/python3.7/site-packages/recbole/data/utils.py", line 78, in load_split_dataloaders
with open(saved_dataloaders_file, 'rb') as f:
TypeError: expected str, bytes or os.PathLike object, not CDRConfig

Process finished with exit code 1

[💡SUG]如何添加其它atomic files?

您希望添加的功能是否与某个问题相关?
如何添加.kg、.link或.net?
按照recbole中实现的模型编写rebole-cdr中模型的yaml,无法读取ent_id.

描述您希望的解决方案
是否可以提供一个示例来展示如何在源领域和目标领域下分别加载各自的atomic files(如.kg.link.ent_feature),以及如何preload_weight?

[🐛BUG] remap过程中的item feature设置

描述这个 bug
image

预期

self.user_feat[field_name] = self.item_feat[field_name].map(lambda x: map_dict.get(x, x))
修改为:
self.item_feat[field_name] = self.item_feat[field_name].map(lambda x: map_dict.get(x, x))

Reported data statistics do not match

Hi, I downloaded the Amazon dataset from here: https://recbole.s3-accelerate.amazonaws.com/CrossDomain/Amazon.zip
The dataset statistics that you report here do not match with what I compute from the original data.
I removed all rows with NaNs and compute the number of unique values present in the user_id column in the original .inter files. This gives the following statistics:

Number of users in AmazonBooks: 687827
Number of users in AmazonMov: 66317
Number of overlapping users: 27516

Am I doing something wrong?

[🐛BUG] Config._set_eval_neg_sample_args() missing 1 required positional argument: 'phase'

有人碰到这个bug吗?我直接运行python run_recbole_cdr.py会出现这个bug。其他的内容都没有改。我debug了之后,test.yaml文件里有mode:full这个参数,为啥会报错嘞?
Traceback (most recent call last):
File "E:\pycharmcode\RecBole-CDR-main\run_recbole_cdr.py", line 22, in
run_recbole_cdr(model=args.model, config_file_list=config_file_list)
File "E:\pycharmcode\RecBole-CDR-main\recbole_cdr\quick_start\quick_start.py", line 31, in run_recbole_cdr
config = CDRConfig(model=model, config_file_list=config_file_list, config_dict=config_dict)
File "E:\pycharmcode\RecBole-CDR-main\recbole_cdr\config\configurator.py", line 76, in init
self._set_eval_neg_sample_args()
TypeError: Config._set_eval_neg_sample_args() missing 1 required positional argument: 'phase'

实验环境(请补全下列信息):

  • 操作系统: [如 Linux, macOS 或 Windows]
  • RecBole 版本 [如1.2.0]
  • Python 版本 [如 3.10]
  • PyTorch 版本 [如 2.0.0]

Splitting target domain dataset to be consistent, while using different source domains

I want the target domain dataset to be split in order without shuffling. So when I run the algorithm CoNet, for example using the different source domains but the same target domain, I want the train, valid, and test set for the target domain to be the same through the multiple runs. For example, let's say I have three datasets. I make dataset 1 the target domain, and dataset 2 and dataset 3 as the source domains. When I run CoNet on the domain pair of dataset 2 and dataset 1, I want the train, valid, and test set for dataset 1 to be the same as when I run CoNet on the domain pair of dataset 3 and dataset 1. How can I achieve this?

My current Yaml file is below. Is this the correct way to do this, or do I have to add anything else?

source_domain:

seed: 44
gpu_id: "0"
dataset: '../source/data'
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
TIME_FIELD: timestamp
RATING_FIELD: rating

load_col:
    inter: [user_id, item_id, rating, timestamp]

embedding_size: 64
user_inter_num_interval: "[0,inf)"
item_inter_num_interval: "[0,inf)"
val_interval:
    rating: "[0,inf)"

target_domain:

seed: 44
gpu_id: "0"
dataset: '../target/data'
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
RATING_FIELD: rating
TIME_FIELD: timestamp

eval_args:
    group_by: user
    order: TO
    split: {'RS': [0.7,0.2,0.1]}
    mode: full

load_col:
    inter: [user_id, item_id, rating, timestamp]

embedding_size: 64
user_inter_num_interval: "[0,inf)"
item_inter_num_interval: "[0,inf)"
val_interval:
    rating: "[0,inf)

Custom Datasets for Cross Domain Recommendation

Hi, I'd like to use 2 custom datasets for the source and target domain. I understand that RecBole-CDR builds off of the existing RecBole library, and I was able to use a custom dataset in RecBole to run general recommendation algorithms.

How can I use custom datasets with RecBole-cdr?
(specifically specifying the source and target domain, and choosing the model I want to run).

[💡SUG] 推荐任务问题

首先感谢你们能开源这样一个强大的库。在你们的代码你们实现的topn任务中,我想请教一下是否可以将topn任务改成高度预测的任务?就更好了

为什么训练epoch很少,但验证epoch很多,这是正常的吗?

我把模型跑起来后,发现训练的epoch只有1803,但验证的epoch却有466232,这是正常的现象吗?感觉有点违背我的认识,而且数据划分时,我看是8:1:1,但程序跑起来之后,却感觉训练的数据加载器和验证的数据加载器对调了,我基本没有改动过代码,这是什么问题?

[💡SUG] 您好,请问有没有临时的使用手册?

各位开发者好,我是从RecBole过来的用户,一些CDR的功能对我来说很香。
我知道该项目仍在初期开发阶段,但是不知道有没有临时的/初步的手册指导一下大致的使用方法。
万分感谢

Running CDR algorithms on the datasets provided

I would like to run all the 10 CDR algorithms on the Amazon and Douban datasets.

I see that you provided the hyperparameters for each model, and I'd like to use them to get results.

Is it as simple as running:

python run_recbole_cdr.py --model=[model] --dataset=Amazon

If not, How can specify the dataset and run the models with the same hyperparamter configurations that you mentioned?

[🐛BUG] BiTGCF模型鲁棒性

描述这个 bug
BiTGCF模型中包含Feature Transfer部分,其中在公式(15)-(16)计算user(item)-related weight factor时,需要考虑多跳后可能出现某些节点的邻居为0,即公式(15)中分母可能为0的情况,则会导致模型训练中出现“nan”。报错如下:
image
image

预期
为了避免“除0”问题,针对项目代码中的user_laplace和item_laplace,即分母,添加一个极小数以避免出现“除0”问题。
image

屏幕截图
image

Converting DataLoader object into pandas dataframe

Hi, in the quick_start file, I can see this line:

train_data, valid_data, test_data = data_preparation(config, dataset)

test_data is stored as a <recbole.data.dataloader.general_dataloader.FullSortEvalDataLoader> object. I am assuming that test_data has the user_id, item_id, and rating for the target domain. How can I read this object as a pandas dataframe** to perform my own evaluation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.