Code Monkey home page Code Monkey logo

pytorch-fm's Introduction

Factorization Machine models in PyTorch

This package provides a PyTorch implementation of factorization machine models and common datasets in CTR prediction.

Available Datasets

Available Models

Model Reference
Logistic Regression
Factorization Machine S Rendle, Factorization Machines, 2010.
Field-aware Factorization Machine Y Juan, et al. Field-aware Factorization Machines for CTR Prediction, 2015.
Higher-Order Factorization Machines M Blondel, et al. Higher-Order Factorization Machines, 2016.
Factorization-Supported Neural Network W Zhang, et al. Deep Learning over Multi-field Categorical Data - A Case Study on User Response Prediction, 2016.
Wide&Deep HT Cheng, et al. Wide & Deep Learning for Recommender Systems, 2016.
Attentional Factorization Machine J Xiao, et al. Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks, 2017.
Neural Factorization Machine X He and TS Chua, Neural Factorization Machines for Sparse Predictive Analytics, 2017.
Neural Collaborative Filtering X He, et al. Neural Collaborative Filtering, 2017.
Field-aware Neural Factorization Machine L Zhang, et al. Field-aware Neural Factorization Machine for Click-Through Rate Prediction, 2019.
Product Neural Network Y Qu, et al. Product-based Neural Networks for User Response Prediction, 2016.
Deep Cross Network R Wang, et al. Deep & Cross Network for Ad Click Predictions, 2017.
DeepFM H Guo, et al. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction, 2017.
xDeepFM J Lian, et al. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems, 2018.
AutoInt (Automatic Feature Interaction Model) W Song, et al. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks, 2018.
AFN(AdaptiveFactorizationNetwork Model) Cheng W, et al. Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions, AAAI'20.

Each model's AUC values are about 0.80 for criteo dataset, and about 0.78 for avazu dataset. (please see example code)

Installation

pip install torchfm

API Documentation

https://rixwew.github.io/pytorch-fm

Licence

MIT

pytorch-fm's People

Contributors

drone-banks avatar rixwew avatar yfreedomlithu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-fm's Issues

Cannot run example on my mac

I execute python main.py under the examples/ folder, and get the error:

% python main.py --dataset_path criteo/train.txt
Traceback (most recent call last):
  File "main.py", line 189, in <module>
    main(args.dataset_name,
  File "main.py", line 151, in main
    dataset = get_dataset(dataset_name, dataset_path)
  File "main.py", line 33, in get_dataset
    return CriteoDataset(path)
  File "/Users/erlebach/opt/anaconda3/envs/torch/lib/python3.8/site-packages/torchfm/dataset/criteo.py", line 40, in __init__
    self.__build_cache(dataset_path, cache_path)
  File "/Users/erlebach/opt/anaconda3/envs/torch/lib/python3.8/site-packages/torchfm/dataset/criteo.py", line 56, in __build_cache
    feat_mapper, defaults = self.__get_feat_mapper(path)
  File "/Users/erlebach/opt/anaconda3/envs/torch/lib/python3.8/site-packages/torchfm/dataset/criteo.py", line 70, in __get_feat_mapper
    with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'criteo/train.txt'

Could you please help me understand how to make this example work? Would it be possible for you to provide a single example that works out of the box, or provide an example of command line arguments that will work? Thanks.

ModuleNotFoundError: No module named 'torchfm.model.hofm'

When I run the main.py, I get the following error:

Traceback (most recent call last):
File "main.py", line 16, in
from torchfm.model.hofm import HighOrderFactorizationMachineModel
ModuleNotFoundError: No module named 'torchfm.model.hofm'

FactorizationMachine implementation and paper are different

class FactorizationMachine(torch.nn.Module):

    def __init__(self, reduce_sum=True):
        super().__init__()
        self.reduce_sum = reduce_sum

    def forward(self, x):
        """
        :param x: Float tensor of size ``(batch_size, num_fields, embed_dim)``
        """
        square_of_sum = torch.sum(x, dim=1) ** 2
        sum_of_square = torch.sum(x ** 2, dim=1)
        ix = square_of_sum - sum_of_square
        if self.reduce_sum:
            ix = torch.sum(ix, dim=1, keepdim=True)
        return 0.5 * ix

image

What happened to v parameters?

But in layers.cpp

Hi,

In layer.py, class FeaturesEmbedding, one finds the line

self.offsets = np.array(0, *np.cumsum(field_dims)[:-1]), dtype=np.long)

My code generates an error. I replaced np.array by torch.tensor and np.cumsum by torch.cumsum , and the code works.

I believe the code as it is has a bug, which was fixed with my change. Could you please confirm this? I am not sure how the code can work as is. thanks.

why the fm model didn't consider the data value?

In FeaturesLinear

def __init__(self, field_dims, output_dim=1):
    super().__init__()
    self.fc = torch.nn.Embedding(sum(field_dims), output_dim)
    self.bias = torch.nn.Parameter(torch.zeros((output_dim,)))
    self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)

def forward(self, x):
    """
    :param x: Long tensor of size ``(batch_size, num_fields)``
    """
    x = x + x.new_tensor(self.offsets).unsqueeze(0)
    return torch.sum(self.fc(x), dim=1) + self.bias

The model only assign weight value for each field as embedding, but it didn't consider the data value here.
For example, if we has a data as [[1,0]], the model will work embedding("1")+embedding("0"), but in real, we need work embedding("1")*1+embedding("0")*0.

What reason the model delete the effect of data value?

field_dims

field_dims 这个有啥用,有没有详细的案例?

Trying to run fnfm on avazu dataset

Traceback (most recent call last): File "C:\Users\Faaiz-PC\Downloads\Avazu\main.py", line 189, in <module> main(args.dataset_name, File "C:\Users\Faaiz-PC\Downloads\Avazu\main.py", line 151, in main dataset = get_dataset(dataset_name, dataset_path) File "C:\Users\Faaiz-PC\Downloads\Avazu\main.py", line 35, in get_dataset return AvazuDataset(path) File "C:\Users\Faaiz-PC\anaconda3\envs\fnfm\lib\site-packages\torchfm\dataset\avazu.py", line 39, in __init__ self.field_dims = np.frombuffer(txn.get(b'field_dims'), dtype=np.uint32) TypeError: a bytes-like object is required, not 'NoneType'

I am trying to Trying to run fnfm on avazu dataset. this is the error I am getting. Can you help me fix this?

about "self.offsets" some questions

Dear DaLao:
what's the function of "self.offsets" ?

self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)
def forward(self, x):
x = x + x.new_tensor(self.offsets).unsqueeze(0)

Problems in PNN

Paper said :

The first hidden layer is fully connected with the product layer. The inputs to it consist of linear
signals lz and quadratic signals lp. With respect to lz and lp inputs, separately, the formulation of l1 is: l1 = relu(lz + lp + b1),

But in your code, the linear signals and quadratic signals are concated :

embed_x = self.embedding(x)
cross_term = self.pn(embed_x)
x = torch.cat([embed_x.view(-1, self.embed_output_dim), cross_term], dim=1)
x = self.mlp(x)
return torch.sigmoid(x.squeeze(1))

rank hyper parameter

In FM and HOFM papers, there is a hyper parameter k which helps to control the size of latent factor. Just wondering where is the k in the code? Is that same as field_dims?

some bug?

1.In the forward process, the dimension of input X in all models are (batch_size, field_size), here can I treat all these field size as feature size?

  1. in fnfm.py, you forgot define self.embed_layer in forward function methed. please check.

PaddingIdx

Is it necessary to set the arg padding_idx of torch.nn.embbedding, as all features start from 1.
I think it's better to avoid influence of 0.

F.dropout的问题

import torch.nn.functional as F
代码中如果需要使用F.dropout,需要加个training=self.training
F.dropout(attn_scores, p=0.1)改成F.dropout(attn_scores, p=0.1,training=self.training)
这样model.train()可以使用dropout,model.eval()不使用dropout。

FeatureEmbedding Implementation

Hi, congrats on your great work!
I was wondering why you add a offset before embedding?

self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)

As I am looking at the implementation of xDeepFM, I didn't find the corresponding code in original tensorflow implementation.

Real-valued features not supported?

HI, thanks for the amazing library.
I do not quite understand how the data should be fed. If i create a feature vector composed by real values, i receive an error because the embedding function can handle only Long tensors, i.e., integers.
I would suggest that you add some lines to understand how data should be prepared, as the documentation lack of this content.

Thanks and best regards

xDeepFM

关于CIN的结构,是否在卷积层前将二三维度调换一下?

dimension issue

In the forward process, the dimension of input X in all models are (batch_size, field_size), here can I treat all these field size as feature size?

tensor for argument #1

RuntimeError Traceback (most recent call last)
in
21 args.weight_decay,
22 args.device,
---> 23 args.save_dir)

in main(dataset_name, dataset_path, model_name, epoch, learning_rate, batch_size, weight_decay, device, save_dir)
23 early_stopper = EarlyStopper(num_trials=2, save_path=f'{save_dir}/{model_name}.pt')
24 for epoch_i in range(epoch):
---> 25 train(model, optimizer, train_data_loader, criterion, device)
26 auc = test(model, valid_data_loader, device)
27 print('epoch:', epoch_i, 'validation: auc:', auc)

in train(model, optimizer, data_loader, criterion, device, log_interval)
5 for i, (fields, target) in enumerate(tk0):
6 fields, target = fields.to(device), target.to(device)
----> 7 y = model(fields)
8 loss = criterion(y, target.float())
9 model.zero_grad()

H:\Anaconda\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

~\Desktop\量化炒股\pytorch-fm-master\torchfm\model\fm.py in forward(self, x)
22 :param x: Long tensor of size (batch_size, num_fields)
23 """
---> 24 x = self.linear(x) + self.fm(self.embedding(x))
25 return torch.sigmoid(x.squeeze(1))

H:\Anaconda\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

~\Desktop\量化炒股\pytorch-fm-master\torchfm\layer.py in forward(self, x)
17 """
18 x = x + x.new_tensor(self.offsets).unsqueeze(0)
---> 19 return torch.sum(self.fc(x), dim=1) + self.bias
20
21

H:\Anaconda\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

H:\Anaconda\lib\site-packages\torch\nn\modules\sparse.py in forward(self, input)
124 return F.embedding(
125 input, self.weight, self.padding_idx, self.max_norm,
--> 126 self.norm_type, self.scale_grad_by_freq, self.sparse)
127
128 def extra_repr(self) -> str:

H:\Anaconda\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1812 # remove once script supports set_grad_enabled
1813 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 1814 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1815
1816

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

RuntimeError: CUDA error: device-side assert triggered

Hi, I googled this error but didn't help solve this error.

Traceback (most recent call last):
  File "/root/pytorchfm/examples/main.py", line 157, in <module>
    args.save_dir)
  File "/root/pytorchfm/examples/main.py", line 127, in main
    train(model, optimizer, train_data_loader, criterion, device)
  File "/root/pytorchfm/examples/main.py", line 81, in train
    y = model(fields)
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/pytorchfm/torchfm/model/xdfm.py", line 27, in forward
    x = self.linear(x) + self.cin(embed_x) + self.mlp(embed_x.view(-1, self.embed_output_dim))
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/pytorchfm/torchfm/layer.py", line 232, in forward
    x = F.relu(self.conv_layers[i](x))
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 943, in relu
    result = torch.relu(input)
RuntimeError: CUDA error: device-side assert triggered

Can you take a look at it? Thank you!

why don't you do the pre-trianing in FNN model?

按照论文,FNN模型(即 FactorizationSupportedNeuralNetworkModel)应该是利用了FM模型pre-traing后的隐向量作为MLP的输入,但是我在您的FactorizationSupportedNeuralNetworkModel的code中发现并没有实现这一点。而且FNN model(即FactorizationSupportedNeuralNetworkModel) 的实现方式与您写的Wide&Deep model 的Deep部分的实现方式一模一样,这应该是不对。
希望大佬可以探讨一下,或许是我理解不到位,或者是代码实现有问题,谢谢!(抱歉,这部分拿英文复述我怕自己说得不明确,因此用中文)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.