Code Monkey home page Code Monkey logo

buffalo's Introduction

Linux/Mac Build Status

Buffalo

Buffalo is a fast and scalable production-ready open source project for recommender systems. Buffalo effectively utilizes system resources, enabling high performance even on low-spec machines. The implementation is optimized for CPU and SSD. Even so, it shows good performance with GPU accelerator, too. Buffalo, developed by Kakao, has been reliably used in production for various Kakao services.

For more information see the documentation

Requirements

  • Python 3.8+
  • cmake 3.17+
  • gcc/g++ (with std=c++14)

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2020 Kakao Corp. http://www.kakaocorp.com

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this project except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

buffalo's People

Contributors

cclauss avatar dbgsprw avatar dependabot[bot] avatar dkkim1005 avatar gony-noreply avatar h4wldev avatar hanmanhui avatar heekyungyoon avatar ita9naiwa avatar js1010 avatar kakao-tony-yoo avatar koorukuroo avatar lsh918 avatar oss-kakao avatar pakhy2380 avatar skyer9 avatar snyk-bot avatar ummae avatar yupyub avatar ziminpark avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

buffalo's Issues

LightFM Warp Comparison

@ita9naiwa
이기, WARP 구현과 벤치마크를 추가했어요. 공유도 하고, 한가지 찜찜한 부분이 있어서 이슈 생성했는데요. 벤치마크를 수행해보니 BPRMF와의 수행시간이나 정확도 차이는 납득할 수 있는 반면에 LightFM 쪽이 정확도가 너무 낮게 나와서 애매하네요. 혹시 이상해보이는거 있나요?

https://github.com/kakao/buffalo/blob/dev/benchmark/accuracy_warp.md
https://github.com/kakao/buffalo/blob/dev/benchmark/models.py#L337

validation 코드의 문제일까 싶기도 했는데, LightFM도 BPR 최적화로 실행하면 납득가능한 수치가 나오긴 합니다.

`save_factors` parameter no longer functional in v2.0.2

Bug

The save_factors parameter, which was functional in version v2.0.1, seems to have no effect in the recent v2.0.2 release.

To Reproduce

Steps to reproduce the behavior:

  1. Set up the environment with v2.0.2
  2. Set save_factors from als_option as True in
    def example1():
    log.set_log_level(log.DEBUG)
    als_option = ALSOption().get_default_option()
    als_option.validation = aux.Option({"topk": 10})
    data_option = MatrixMarketOptions().get_default_option()
    data_option.input.main = "../tests/ext/ml-100k/main"
    data_option.input.iid = "../tests/ext/ml-100k/iid"
    als = ALS(als_option, data_opt=data_option)
    als.initialize()
    als.train()
    print("MovieLens 100k metrics for validations\n%s" % json.dumps(als.get_validation_results(), indent=2))
    print("Similar movies to Star_Wars_(1977)")
    for rank, (movie_name, score) in enumerate(als.most_similar("49.Star_Wars_(1977)")):
    print(f"{rank + 1:02d}. {score:.3f} {movie_name}")
  3. Observe that the ALS model is not saved during training, whereas in v2.0.1 it was working as expected.

Windows에서의 buffalo 라이브러리 인스톨

안녕하세요 이번 카카오 아레나 Melon Playlist Continuation에 참가하는 대학원생입니다.
buffalo 라이브러리안의 misc폴더에 있는 aux.py파일명때문에 문제가 생깁니다.
aux가 windows에서 예약어라서 생기는 문제인 것 같습니다.

이 때문에 pip install, git clone 어떤 방법을 사용해도 라이브러리가 제대로 설치되지 않습니다.
windows와의 호환을 생각한다면 aux.py의 파일명을 바꾸는게 좋을 것 같습니다.

감사합니다.

docker on Mac

안녕하헤요 맥에서 버팔로를 사용하려고 하는데, 잘 되지 않네요.
관련된 이슈를 참조했습니다.
example 폴더에서 다음의 도커 커맨드를 사용했는데 에러가 나네요
docker build -t buffalo.dev .
뭔가를 잘못했을까요?

cmake --build .
Scanning dependencies of target cbuffalo
[ 11%] Building CXX object CMakeFiles/cbuffalo.dir/3rd/json11/json11.cpp.o
[ 22%] Building CXX object CMakeFiles/cbuffalo.dir/lib/algo.cc.o
CMakeFiles/cbuffalo.dir/build.make:86: recipe for target 'CMakeFiles/cbuffalo.dir/lib/algo.cc.o' failed
c++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
make[2]: *** [CMakeFiles/cbuffalo.dir/lib/algo.cc.o] Error 4
make[1]: *** [CMakeFiles/cbuffalo.dir/all] Error 2
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/cbuffalo.dir/all' failed
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
error: command 'cmake' failed with exit status 2
The command '/bin/sh -c /bin/bash -c "source ./venv/bin/activate && git clone -b master https://github.com/kakao/buffalo.git buffalo.git &&    cd buffalo.git && git submodule update --init && python setup.py install && pip install -r requirements.txt"' returned a non-zero code: 1

git clone fail on MS Windows

git clone https://github.com/kakao/buffalo.git c:\gitrepo\buffalo
Cloning into 'c:\gitrepo\buffalo'...
error: unable to create file buffalo/misc/aux.py: No such file or directory
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

Installation in Mac environment is very difficult

Is this code for only in LINUX?
Very hard to install in MAC OSX environment. I tried many things and tweak a little code to complete installation. But I couldn't succeed.
In my Mac, default gcc command is not properly works cause that this repository gcc version is higher than default. So I use gcc-9 that I install by brew and execute with prefix CC="gcc-9". And install command is not complete when building 'cbuffalo' extension step.

In another way, can someone share a Dockerfile plz?

build failure

        [ 90%] Building CXX object CMakeFiles/cbuffalo.dir/lib/misc/log.cc.o
        [100%] Linking CXX static library libcbuffalo.a
        [100%] Built target cbuffalo
        INFO:root:building 'cbuffalo' extension
        INFO:root:gcc -pthread -shared -o /project/build/lib.linux-x86_64-cpython-39/cbuffalo.cpython-39-x86_64-linux-gnu.so
        gcc: fatal error: no input files
        compilation terminated.

it showed no problem in local, but shows error in github action;

Training fails

I downloaded KakaoBrunch12M dataset and ran by copying example_als.py.

but it seems that training fails (Nan Error)

Code

def example1():
    log.set_log_level(log.DEBUG)
    als_option = ALSOption().get_default_option()
    als_option.validation = aux.Option({'topk': 10})
    data_option = MatrixMarketOptions().get_default_option()
    data_option.input.main = '../tests/data/ext/main'
    #data_option.input.iid = '../tests/data/iid'

    als = ALS(als_option, data_opt=data_option)
    als.initialize()
    als.train()
    print('metrics for validations\n%s' % json.dumps(als.get_validation_results(), indent=2))

    print('Run hyper parameter optimization for val_ndcg...')
    als.opt.num_workers = 4
    als.opt.evaluation_period = 10
    als.opt.optimize = aux.Option({
        'loss': 'val_ndcg',
        'max_trials': 100,
        'deployment': True,
        'start_with_default_parameters': True,
        'space': {
            'd': ['randint', ['d', 10, 128]],
            'reg_u': ['uniform', ['reg_u', 0.1, 1.0]],
            'reg_i': ['uniform', ['reg_i', 0.1, 1.0]],
            'alpha': ['randint', ['alpha', 1, 10]],
        }
    })
    log.set_log_level(log.INFO)

    als.opt.model_path = './example1.ml100k.als.optimize.bin'
    print(json.dumps({'alpha': als.opt.alpha, 'd': als.opt.d,
                      'reg_u': als.opt.reg_u, 'reg_i': als.opt.reg_i}, indent=2))
    als.optimize()
    als.load('./example1.ml100k.als.optimize.bin')

Logs

[ERROR   ] 2019-08-31 07:54:58 [als.py:22] ImportError CuALS, no cuda library exists. error message: No module named 'buffalo.algo.cuda'
[INFO    ] 2019-08-31 07:54:58 [mm.py:193] Create the database from matrix market file.
[DEBUG   ] 2019-08-31 07:54:58 [mm.py:198] Building meta part...
[DEBUG   ] 2019-08-31 07:54:58 progress:   0%| 00:00<?
[INFO    ] 2019-08-31 07:54:58 [mm.py:206] Creating working data...
[INFO    ] 2019-08-31 07:54:58 progress:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:08 [mm.py:210] Working data is created on /tmp/tmpi_15tjok
[INFO    ] 2019-08-31 07:55:08 [mm.py:211] Building data part...
[INFO    ] 2019-08-31 07:55:08 [base.py:346] Building compressed triplets for rowwise...
[INFO    ] 2019-08-31 07:55:08 [base.py:347] Preprocessing...
[INFO    ] 2019-08-31 07:55:08 [base.py:350] In-memory Compressing ...
[INFO    ] 2019-08-31 07:55:11 [base.py:249] Load triplet bin: b'/tmp//chunk.bin'
[INFO    ] 2019-08-31 07:55:12 [base.py:380] Finished
[INFO    ] 2019-08-31 07:55:12 [base.py:346] Building compressed triplets for colwise...
[INFO    ] 2019-08-31 07:55:12 [base.py:347] Preprocessing...
[INFO    ] 2019-08-31 07:55:12 [base.py:350] In-memory Compressing ...
[INFO    ] 2019-08-31 07:55:15 [base.py:249] Load triplet bin: b'/tmp//chunk.bin'
[INFO    ] 2019-08-31 07:55:15 [base.py:380] Finished
[INFO    ] 2019-08-31 07:55:16 [mm.py:225] DB built on ./mm.h5py
[INFO    ] 2019-08-31 07:55:16 [als.py:56] ALS({
  "evaluation_on_learning": true,
  "compute_loss_on_training": true,
  "early_stopping_rounds": 0,
  "save_best": false,
  "evaluation_period": 1,
  "save_period": 10,
  "random_seed": 0,
  "validation": {
    "topk": 10
  },
  "adaptive_reg": false,
  "save_factors": false,
  "accelerator": false,
  "d": 20,
  "num_iters": 10,
  "num_workers": 1,
  "hyper_threads": 256,
  "num_cg_max_iters": 3,
  "reg_u": 0.1,
  "reg_i": 0.1,
  "alpha": 8,
  "optimizer": "manual_cg",
  "cg_tolerance": 1e-10,
  "eps": 1e-10,
  "model_path": "",
  "data_opt": {}
})
[INFO    ] 2019-08-31 07:55:16 [als.py:58] MatrixMarket Header(306291, 505926, 12600038) Validation(500 samples)
[debug   ] 2019-08-31 07:55:16 [als.cc:72] P(306291 x 20) Q(505926 x 20) setted
[INFO    ] 2019-08-31 07:55:16 [buffered_data.py:71] Set data buffer size as 67108864(minimum required batch size is 35447).
[DEBUG   ] 2019-08-31 07:55:17 [base.py:342] Cannot find tensorboard configuration.
[DEBUG   ] 2019-08-31 07:55:17 rowwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:18 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.065s update: 1.25s)
[DEBUG   ] 2019-08-31 07:55:18 colwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:20 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.063s update: 1.76s)
[INFO    ] 2019-08-31 07:55:23 [als.py:173] Validation: ndcg:0.00678 map:0.00393 accuracy:0.01616 rmse:2.49048 error:1.70700 Elapsed 3.212 secs
[INFO    ] 2019-08-31 07:55:23 [als.py:176] Iteration 1: RMSE 0.034 Elapsed 3.134 secs
[DEBUG   ] 2019-08-31 07:55:23 rowwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:24 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.28s)
[DEBUG   ] 2019-08-31 07:55:24 colwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:26 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.84s)
[INFO    ] 2019-08-31 07:55:29 [als.py:173] Validation: ndcg:0.01974 map:0.01420 accuracy:0.03737 rmse:2.40212 error:1.58591 Elapsed 3.219 secs
[INFO    ] 2019-08-31 07:55:29 [als.py:176] Iteration 2: RMSE 0.030 Elapsed 3.122 secs
[DEBUG   ] 2019-08-31 07:55:29 rowwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:31 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.28s)
[DEBUG   ] 2019-08-31 07:55:31 colwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:32 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.79s)
[INFO    ] 2019-08-31 07:55:36 [als.py:173] Validation: ndcg:0.01994 map:0.01383 accuracy:0.03939 rmse:2.39055 error:1.56148 Elapsed 3.256 secs
[INFO    ] 2019-08-31 07:55:36 [als.py:176] Iteration 3: RMSE 0.029 Elapsed 3.069 secs
[DEBUG   ] 2019-08-31 07:55:36 rowwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:37 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.28s)
[DEBUG   ] 2019-08-31 07:55:37 colwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:39 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.79s)
[INFO    ] 2019-08-31 07:55:42 [als.py:173] Validation: ndcg:0.02089 map:0.01373 accuracy:0.04343 rmse:2.38696 error:1.55429 Elapsed 3.247 secs
[INFO    ] 2019-08-31 07:55:42 [als.py:176] Iteration 4: RMSE 0.029 Elapsed 3.080 secs
[DEBUG   ] 2019-08-31 07:55:42 rowwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:43 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.28s)
[DEBUG   ] 2019-08-31 07:55:43 colwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:45 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.79s)
[INFO    ] 2019-08-31 07:55:48 [als.py:173] Validation: ndcg:0.02544 map:0.01794 accuracy:0.04949 rmse:2.38435 error:1.55086 Elapsed 3.222 secs
[INFO    ] 2019-08-31 07:55:48 [als.py:176] Iteration 5: RMSE 0.029 Elapsed 3.066 secs
[DEBUG   ] 2019-08-31 07:55:48 rowwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:50 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.27s)
[DEBUG   ] 2019-08-31 07:55:50 colwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:51 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.84s)
[INFO    ] 2019-08-31 07:55:55 [als.py:173] Validation: ndcg:0.02673 map:0.01849 accuracy:0.05354 rmse:2.38284 error:1.54936 Elapsed 3.280 secs
[INFO    ] 2019-08-31 07:55:55 [als.py:176] Iteration 6: RMSE 0.029 Elapsed 3.113 secs
[DEBUG   ] 2019-08-31 07:55:55 rowwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:56 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.28s)
[DEBUG   ] 2019-08-31 07:55:56 colwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:55:58 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.81s)
[INFO    ] 2019-08-31 07:56:00 [als.py:173] Validation: ndcg:0.00000 map:0.00000 accuracy:0.00000 rmse:nan error:nan Elapsed 1.877 secs
[INFO    ] 2019-08-31 07:56:00 [als.py:176] Iteration 7: RMSE nan Elapsed 3.095 secs
[DEBUG   ] 2019-08-31 07:56:00 rowwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:56:01 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.26s)
[DEBUG   ] 2019-08-31 07:56:01 colwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:56:03 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.8s)
[INFO    ] 2019-08-31 07:56:05 [als.py:173] Validation: ndcg:0.00000 map:0.00000 accuracy:0.00000 rmse:nan error:nan Elapsed 1.882 secs
[INFO    ] 2019-08-31 07:56:05 [als.py:176] Iteration 8: RMSE nan Elapsed 3.060 secs
[DEBUG   ] 2019-08-31 07:56:05 rowwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:56:06 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.3s)
[DEBUG   ] 2019-08-31 07:56:06 colwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:56:08 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.77s)
[INFO    ] 2019-08-31 07:56:10 [als.py:173] Validation: ndcg:0.00000 map:0.00000 accuracy:0.00000 rmse:nan error:nan Elapsed 1.863 secs
[INFO    ] 2019-08-31 07:56:10 [als.py:176] Iteration 9: RMSE nan Elapsed 3.076 secs
[DEBUG   ] 2019-08-31 07:56:10 rowwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:56:11 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.26s)
[DEBUG   ] 2019-08-31 07:56:11 colwise:   0%| 00:00<?
[DEBUG   ] 2019-08-31 07:56:13 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.78s)
[INFO    ] 2019-08-31 07:56:14 [als.py:173] Validation: ndcg:0.00000 map:0.00000 accuracy:0.00000 rmse:nan error:nan Elapsed 1.867 secs
[INFO    ] 2019-08-31 07:56:14 [als.py:176] Iteration 10: RMSE nan Elapsed 3.050 secs
[INFO    ] 2019-08-31 07:56:14 [als.py:182] elapsed for full epochs: 57.79 sec
metrics for validations
{
  "ndcg": 0.0,
  "map": 0.0,
  "accuracy": 0.0,
  "rmse": NaN,
  "error": NaN
}
Run hyper parameter optimization for val_ndcg...
{
  "alpha": 8,
  "d": 20,
  "reg_u": 0.1,
  "reg_i": 0.1
}
[INFO    ] 2019-08-31 07:56:16 optimizing... :   0%| 00:00<?
[INFO    ] 2019-08-31 07:56:29 [optimize.py:44] Starting with default parameter result: {'train_loss': nan, 'val_ndcg': 0.0, 'val_map': 0.0, 'val_accuracy': 0.0, 'val_rmse': nan, 'val_error': nan, 'eval_time': 1567205789.465719, 'loss': -0.0, 'status': 'ok'}
세그멘테이션 오류 (core dumped)

How can I solve this problem?

I'm using version 1.0.4 of Buffalo.

Multi-GPU Support

Do you support training models among multiple gpus?

I'm asking this because my sparse matrix is quite large like:

  • shape: (80M, 60M)
  • nnz: 400M

Renew examples

current dev branch involves braking changes

Update examples

  • Fix example_als.py
  • Fix jupyter examples
  • Add optimizing example
  • Add model training callback examples

"TypeError: can't convert complex to float" in train

I tried training ALS with the code below.

from buffalo.algo.als import ALS
from buffalo.algo.options import ALSOption
from buffalo.data.mm import MatrixMarketOptions

als_option = ALSOption().get_default_option()
data_option = MatrixMarketOptions().get_default_option()
data_option.input.main = file_path

als = ALS(als_option, data_opt=data_option)
als.initialize()
als.train()

It works fine most of the times but sometime it shows following error.

/home/jungwoo/anaconda3/envs/tf-1.14/lib/python3.7/site-packages/buffalo-1.0.10-py3.7-linux-x86_64.egg/buffalo/algo/als.py", line 189, in train
    self.logger.info('Iteration %d: RMSE %.3f Elapsed %.3f secs' % (i + 1, rmse, train_t))
TypeError: can't convert complex to float

I checked types and values of the variables related to rmse.

loss_nume <class 'float'> -333616678.7680697
_loss_nume1 <class 'float'> 0.4414959544501471
_loss_nume2 <class 'float'> -333616679.20956564
loss_deno <class 'float'> 423116401.48691034
_loss_deno1 <class 'float'> 0.0
_loss_deno2 <class 'float'> 423116401.48691034
self.opt.eps <class 'float'> 1e-10
rmse <class 'complex'> (5.4371936760851956e-17+0.8879611133382754j)

rmse becomes complex

Colab에서 buffalo 라이브러리 load Error

안녕하세요. 이번에 카카오 아레나를 통해서 추천시스템을 공부중인 대학원생입니다.

서버의 사용 편의성을 위해 Google Colab으로 buffalo를 써보려고 시도중입니다.

pip install buffalo로는 colab에 설치가 되지 않아서, 직접 installation 페이지에 있는대로 installation from source code를 진행하였습니다.
https://buffalo-recsys.readthedocs.io/en/latest/intro.html#installation

실제 사용 코드는 다음과 같습니다.

!git clone -b master https://github.com/kakao/buffalo
%cd buffalo
!ls
!git submodule update --init
!pip install -r requirements.txt
!python setup.py install

를 실행시켜서 설치를 완료하면, 1.1.2 버젼이 성공적으로 설치가 됩니다.

그 이후에

import buffalo 를 실행하면 정상적으로 load가 되지만,

!pytest ./tests/algo/test_als.py -v를 실행하면, 다음과 같은 에러가 발생합니다.

Screen Shot 2020-05-16 at 8 20 45 PM

basic usage에 나와있었던 예시 코드대로, 라이브러리를 Import 했을 때도, 다음과 같은 에러가 발생합니다.

from buffalo.algo.als import ALS
from buffalo.algo.bpr import BPRMF
from buffalo.misc import aux, log
from buffalo.algo.options import ALSOption, BPRMFOption
import buffalo.data
from buffalo.data.mm import MatrixMarketOptions

log.set_log_level(1)

Screen Shot 2020-05-16 at 8 27 31 PM

로컬 환경이었던, Ubuntu 18.04의 연구실 서버에서는 pip install buffalo로 설치가 바로 진행되어 잘 쓰고 있는 상황인데, colab에서는 어떤 문제 때문에 misc가 설치가 안되는지 모르겠습니다.

읽어주셔서 감사드리고, 답변 기다리겠습니다.
감사합니다!

tried setup.py build but execute install

i tried build source code but cmake attempt to install.

python3 setup.py build
......
[ 75%] Building CXX object CMakeFiles/cbuffalo.dir/lib/algo_impl/w2v/w2v.cc.o
[ 87%] Building CXX object CMakeFiles/cbuffalo.dir/lib/misc/log.cc.o
[100%] Linking CXX shared library ../lib.linux-x86_64-3.6/libcbuffalo.so
[100%] Built target cbuffalo
Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/lib/libcbuffalo.so.0.1.0
CMake Error at cmake_install.cmake:53 (file):
  file INSTALL cannot copy file
  "/home/skyer9/work/gitrepo/buffalo/build/lib.linux-x86_64-3.6/libcbuffalo.so.0.1.0"
  to "/usr/local/lib/libcbuffalo.so.0.1.0".


Makefile:117: recipe for target 'install' failed
make: *** [install] Error 1
error: command 'cmake' failed with exit status 2

The test code `tests/data/test_mm.py` does not work.

Bug

OSError is raised when executing the test code tests/data/test_mm.py. All test cases failed for the same issue.

$ nosetests ./data/test_mm.py -v

test0_get_default_option (data.test_mm.TestMatrixMarket) ... ok                                                                                                                                                                                                                                
test1_is_valid_option (data.test_mm.TestMatrixMarket) ... ok                                                                                                                                                                                                                                   
test2_create (data.test_mm.TestMatrixMarket) ... [INFO    ] 2023-12-19 04:03:30 [mm.py:247] Create the database from matrix market file.                                                                                                                                                       
[DEBUG   ] 2023-12-19 04:03:30 [mm.py:252] Building meta part...                                                                                                                                                                                                                               
^M[PROGRESS] 0.00% 0.0/0.0secs 0.00it/s[INFO    ] 2023-12-19 04:03:30 [base.py:179] File ./mm.h5py exists. To build new database, existing file ./mm.h5py will be deleted.                                                                                                                     
[ERROR   ] 2023-12-19 04:03:30 [mm.py:162] Cannot create db: Can't write data (no appropriate function for conversion path)                                                                                                                                                                    
[ERROR   ] 2023-12-19 04:03:30 [mm.py:163] Traceback (most recent call last):                                                                                                                                                                                                                  
  File "/home/bc-user/.local/lib/python3.10/site-packages/buffalo/data/mm.py", line 141, in _create                                                                                                                                                                                            
    idmap["rows"][:] = np.loadtxt(fin, dtype=f"S{uid_max_col}")                                                                                                                                                                                                                                
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper                                                                                                                                                                                                                        
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper                                                                                                                                                                                                                        
  File "/home/bc-user/.local/lib/python3.10/site-packages/h5py/_hl/dataset.py", line 999, in __setitem__                                                                                                                                                                                       
    self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl)                                                                                                                                                                                                                                 
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper                                                                                                                                                                                                                        
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper                                                                                                                                                                                                                        
  File "h5py/h5d.pyx", line 283, in h5py.h5d.DatasetID.write                                                                                                                                                                                                                                   
  File "h5py/_proxy.pyx", line 114, in h5py._proxy.dset_rw                                                                                                                                                                                                                                     
OSError: Can't write data (no appropriate function for conversion path)

......(skip the middle lines)

MatrixMarketDataReader: DEBUG: creating temporary matrix-market data from numpy-kind array
MatrixMarket: INFO: Create the database from matrix market file.
MatrixMarket: DEBUG: Building meta part...
[PROGRESS] 0.00% 0.0/0.0secs 0.00it/s
MatrixMarket: INFO: File ./mm.h5py exists. To build new database, existing file ./mm.h5py will be deleted.
MatrixMarket: ERROR: Cannot create db: Can't write data (no appropriate function for conversion path)
MatrixMarket: ERROR: Traceback (most recent call last):
  File "/home/bc-user/.local/lib/python3.10/site-packages/buffalo/data/mm.py", line 141, in _create
    idmap["rows"][:] = np.loadtxt(fin, dtype=f"S{uid_max_col}")
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/bc-user/.local/lib/python3.10/site-packages/h5py/_hl/dataset.py", line 999, in __setitem__
    self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 283, in h5py.h5d.DatasetID.write
  File "h5py/_proxy.pyx", line 114, in h5py._proxy.dset_rw
OSError: Can't write data (no appropriate function for conversion path)

[PROGRESS] 100.00% 0.0/0.0secs 1,137.96it/s

--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 10 tests in 0.041s

FAILED (errors=5)

The cause is from mismatching between the data type of HDF5 and the numpy object, as annotated in the above error log. The current version only supports "utf-8" encoding for creating idmap, which makes the MatrixMarket object fail to load both user and item ID lists. To resolve the issue, converting the encoding rule from "utf-8" to "ascii" might be the feasible way. I tested a code with the local patch(buffalo/data/base.py) as follows,

# Method in Data class
def _create_database(self, path, **kwargs):
    ......
    [ASIS]
    idmap.create_dataset("rows", (num_users,), dtype=h5py.string_dtype("utf-8", length=uid_max_col),
                         maxshape=(num_users,))
    idmap.create_dataset("cols", (num_items,), dtype=h5py.string_dtype("utf-8", length=iid_max_col),
                         maxshape=(num_items,))
    ......
    [TOBE]
    idmap.create_dataset("rows", (num_users,), dtype=h5py.string_dtype("ascii", length=uid_max_col),
                         maxshape=(num_users,))
    idmap.create_dataset("cols", (num_items,), dtype=h5py.string_dtype("ascii", length=iid_max_col),
                         maxshape=(num_items,))
    ......
test0_get_default_option (data.test_mm.TestMatrixMarket) ... ok
test1_is_valid_option (data.test_mm.TestMatrixMarket) ... ok
test2_create (data.test_mm.TestMatrixMarket) ...
[INFO    ] 2023-12-19 04:54:58 [mm.py:247] Create the database from matrix market file.
[DEBUG   ] 2023-12-19 04:54:58 [mm.py:252] Building meta part...
[PROGRESS] 0.00% 0.0/0.0secs 0.00it/s[INFO    ] 2023-12-19 04:54:58 [base.py:179] File ./mm.h5py exists. To build new database, existing file ./mm.h5py will be deleted.
[PROGRESS] 100.00% 0.0/0.0secs 742.35it/s
[INFO    ] 2023-12-19 04:54:58 [mm.py:260] Creating working data...
[PROGRESS] 0.00% 0.0/0.0secs 0.00it/s^M[PROGRESS] 100.00% 0.0/0.0secs 168,937.24it/s
[DEBUG   ] 2023-12-19 04:54:58 [mm.py:264] Working data is created on /tmp/tmpr5a6iwrk
[INFO    ] 2023-12-19 04:54:58 [mm.py:265] Building data part...
[INFO    ] 2023-12-19 04:54:58 [base.py:417] Building compressed triplets for rowwise...
[INFO    ] 2023-12-19 04:54:58 [base.py:418] Preprocessing...
[INFO    ] 2023-12-19 04:54:58 [base.py:421] In-memory Compressing ...
[INFO    ] 2023-12-19 04:54:59 [base.py:301] Load triplet files. Total job files: 73
[INFO    ] 2023-12-19 04:54:59 [base.py:451] Finished
[INFO    ] 2023-12-19 04:54:59 [base.py:417] Building compressed triplets for colwise...
[INFO    ] 2023-12-19 04:54:59 [base.py:418] Preprocessing...
[INFO    ] 2023-12-19 04:54:59 [base.py:421] In-memory Compressing ...
[INFO    ] 2023-12-19 04:54:59 [base.py:301] Load triplet files. Total job files: 73
[INFO    ] 2023-12-19 04:54:59 [base.py:451] Finished
[INFO    ] 2023-12-19 04:54:59 [mm.py:279] DB built on ./mm.h5py
ok
......(skip the middle lines)
test3_list (data.test_mm.TestMatrixMarketReader) ... [DEBUG   ] 2023-12-19 04:55:01 [mm.py:70] creating temporary matrix-market data from numpy-kind array
ok

----------------------------------------------------------------------
Ran 10 tests in 3.166s

OK

However, this patch is not functional for treating w2v training(PR) in which "utf-8" characters are employed to train Korean words. To reconcile this conflict, providing the appropriate encoding rules for both loading a matrix-market file and a stream data file is one of the feasible actions.

Travis build error

Installed /usr/local/lib/python3.6/dist-packages/pytest-5.3.0-py3.6.egg
Searching for tensorflow==1.14.0
Reading https://pypi.org/simple/tensorflow/
No local packages or working download links found for tensorflow==1.14.0
error: Could not find suitable distribution for Requirement.parse('tensorflow==1.14.0')
The command "sudo python3 setup.py install" failed and exited with 1 during .

pip install n2시 오류

Collecting n2
Using cached https://files.pythonhosted.org/packages/ac/e7/1758cd0973aa2b1a46ad3556bf37c7a625eec7f603f3389a4908d0f70a14/n2-0.1.4.tar.gz
Requirement already satisfied: cython in /Users/seungwoo/PycharmProjects/flaskrestful/venv/lib/python3.7/site-packages (from n2) (0.29.14)
Building wheels for collected packages: n2
Building wheel for n2 (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /Users/seungwoo/PycharmProjects/flaskrestful/venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/pf/f_mv79jj0xn20ymmnjr923yr0000gn/T/pip-install-mq_r5hg2/n2/setup.py'"'"'; file='"'"'/private/var/folders/pf/f_mv79jj0xn20ymmnjr923yr0000gn/T/pip-install-mq_r5hg2/n2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/pf/f_mv79jj0xn20ymmnjr923yr0000gn/T/pip-wheel-0ob5uuf7 --python-tag cp37
cwd: /private/var/folders/pf/f_mv79jj0xn20ymmnjr923yr0000gn/T/pip-install-mq_r5hg2/n2/
Complete output (12 lines):
running bdist_wheel
running build
running build_ext
building 'n2' extension
creating build
creating build/temp.macosx-10.9-x86_64-3.7
creating build/temp.macosx-10.9-x86_64-3.7/bindings
creating build/temp.macosx-10.9-x86_64-3.7/bindings/python
creating build/temp.macosx-10.9-x86_64-3.7/src
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch x86_64 -g -I./include/ -I./third_party/spdlog/include/ -I/Users/seungwoo/PycharmProjects/flaskrestful/venv/include -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c ./bindings/python/n2.cpp -o build/temp.macosx-10.9-x86_64-3.7/./bindings/python/n2.o -std=c++11 -O3 -fPIC -march=native -fopenmp
clang: error: unsupported option '-fopenmp'
error: command 'gcc' failed with exit status 1

ERROR: Failed building wheel for n2
Running setup.py clean for n2
Failed to build n2
Installing collected packages: n2
Running setup.py install for n2 ... error
ERROR: Command errored out with exit status 1:
command: /Users/seungwoo/PycharmProjects/flaskrestful/venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/pf/f_mv79jj0xn20ymmnjr923yr0000gn/T/pip-install-mq_r5hg2/n2/setup.py'"'"'; file='"'"'/private/var/folders/pf/f_mv79jj0xn20ymmnjr923yr0000gn/T/pip-install-mq_r5hg2/n2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /private/var/folders/pf/f_mv79jj0xn20ymmnjr923yr0000gn/T/pip-record-ui4nsmpl/install-record.txt --single-version-externally-managed --compile --install-headers /Users/seungwoo/PycharmProjects/flaskrestful/venv/include/site/python3.7/n2
cwd: /private/var/folders/pf/f_mv79jj0xn20ymmnjr923yr0000gn/T/pip-install-mq_r5hg2/n2/
Complete output (12 lines):
running install
running build
running build_ext
building 'n2' extension
creating build
creating build/temp.macosx-10.9-x86_64-3.7
creating build/temp.macosx-10.9-x86_64-3.7/bindings
creating build/temp.macosx-10.9-x86_64-3.7/bindings/python
creating build/temp.macosx-10.9-x86_64-3.7/src
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch x86_64 -g -I./include/ -I./third_party/spdlog/include/ -I/Users/seungwoo/PycharmProjects/flaskrestful/venv/include -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c ./bindings/python/n2.cpp -o build/temp.macosx-10.9-x86_64-3.7/./bindings/python/n2.o -std=c++11 -O3 -fPIC -march=native -fopenmp
clang: error: unsupported option '-fopenmp'
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /Users/seungwoo/PycharmProjects/flaskrestful/venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/pf/f_mv79jj0xn20ymmnjr923yr0000gn/T/pip-install-mq_r5hg2/n2/setup.py'"'"'; file='"'"'/private/var/folders/pf/f_mv79jj0xn20ymmnjr923yr0000gn/T/pip-install-mq_r5hg2/n2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /private/var/folders/pf/f_mv79jj0xn20ymmnjr923yr0000gn/T/pip-record-ui4nsmpl/install-record.txt --single-version-externally-managed --compile --install-headers /Users/seungwoo/PycharmProjects/flaskrestful/venv/include/site/python3.7/n2 Check the logs for full command output.

설치시 오류가 뜨는데 다른 해결방법이 보이지 않습니다.

Add Algorithm CML

The model Collaborative Metric Learning is very similar to WARP and often outperforms WARP.
It also works quite well for KNN tasks among items and users.

It seems not very tricky to implement this model using the existing WARP module in buffalo.

Build fails in ubuntu 18.04

I tried pip3 install buffalo as well as building buffalo from source. The build always fails at -

Scanning dependencies of target cbuffalo [ 12%] Building CXX object CMakeFiles/cbuffalo.dir/3rd/json11/json11.cpp.o [ 25%] Building CXX object CMakeFiles/cbuffalo.dir/lib/algo.cc.o virtual memory exhausted: Cannot allocate memory CMakeFiles/cbuffalo.dir/build.make:86: recipe for target 'CMakeFiles/cbuffalo.dir/lib/algo.cc.o' failed make[2]: *** [CMakeFiles/cbuffalo.dir/lib/algo.cc.o] Error 1 CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/cbuffalo.dir/all' failed make[1]: *** [CMakeFiles/cbuffalo.dir/all] Error 2 Makefile:129: recipe for target 'all' failed make: *** [all] Error 2 error: command 'cmake' failed with exit status 2

I even tried building the docker image from the docker file and instructions you mentioned, it also fails on the very same step

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.