Code Monkey home page Code Monkey logo

gnrs's Introduction

GreenRec: Green AI Benchmarking for News Recommendation

Environment

pip install -r requirements.txt

Data Processing

Please specify the path to the data in python file

cd process/mind
python processor.py

Configuration

Data

Please refer to config_v2/data/mind.yaml for the data configuration.

Model

We support the following models on both MIND small and large datasets:

NAML LSTUR NRMS DCN DIN BST
ID-based ID-NAML ID-LSTUR ID-NRMS DCN DIN BST
text-based NAML LSTUR NRMS text-DCN text-DIN text-BST
PLMNR PLMNR-NAML PLMNR-LSTUR PLMNR-NRMS PLMNR-DCN PLMNR-DIN PLMNR-BST
BERT BERT-NAML BERT-LSTUR BERT-NRMS BERT-DCN BERT-DIN BERT-BST
MFT MFT-NAML MFT-LSTUR MFT-NRMS MFT-DCN MFT-DIN MFT-BST

Training and Testing

python worker.py 
    --config config/data/mind.yaml 
    --model config/model/nrms.yaml 
    --exp config/exp/tt-nrms.yaml
    --embed config/embed/null.yaml
    --version small-v2 

gnrs's People

Contributors

jyonn avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

cadobe

gnrs's Issues

How to load bert embeddings

I tried to load bert embeddings of news texts with 'bert-token.yaml' and use 'dcn.yaml' as the recommend model. After preprocess the data with bert_processor.py, i realize it only tokenize the text. When load the data.npy in embedding_loader.py, i print out the embedding and realize there are only tokens and no bert embeddings. How can i extract the bert embeddings and load it to the model?

print out the embedding variable
`{'nid': array([0, 1, 2, ..., 65235, 65236, 65237], dtype=object), 'cat': array([list([9580]), list([2740]), list([2739]), ..., list([2739]), dtype=object),
'title': array([list([1996, 9639, 3035, 3870, 1010, 3159, 2798, 1010, 1998, 3159, 5170, 8415, 2011]),..., list([3901, 1997, 4916, 2237, 5998, 2007, 3571, 2044, 9288]),dtype=object),
'abs': array([list([4497, 1996, 14960, 2015, 1010, 17764, 1010, 1998, 2062, 2008, 1996, 15426, 2064, 1005, 1056, 2444, 2302, 1012]), list([2122, 9428, 19741, 14243, 2024, 3173, 2017, 2067, 1998, 4363, 2017, 2013, 8328, 4667, 2008, 18162, 7579, 6638, 2005, 2204, 1012]),...,list([])], dtype=object)}

the error
Traceback (most recent call last):
File "/Users/chuanqijiao/GNRS-master/worker.py", line 395, in
worker = Worker(config=configuration)
File "/Users/chuanqijiao/GNRS-master/worker.py", line 54, in init
self.config_manager = ConfigManager(
File "/Users/chuanqijiao/GNRS-master/loader/config_manager.py", line 196, in init
self.embedding_manager.load_pretrained_embedding(**Obj.raw(embedding_info))
File "/Users/chuanqijiao/GNRS-master/loader/embedding/embedding_manager.py", line 66, in load_pretrained_embedding
self.pretrained[vocab_name] = EmbeddingInfo(**kwargs).load()
File "/Users/chuanqijiao/GNRS-master/loader/embedding/embedding_loader.py", line 39, in load
self.embedding = getter(self.path)
File "/Users/chuanqijiao/GNRS-master/loader/embedding/embedding_loader.py", line 21, in get_numpy_embedding
return torch.tensor(embedding, dtype=torch.float32)
TypeError: can't convert np.ndarray of type numpy.object
. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.`

Besides, the configs look a little bit confusing to me. If i try to load bert embedding and not use image features, can i use the following config?
mind.yaml-->dcn/din/bst/pnn.yaml-->tt.yaml-->bert-token.yaml

TypeError: model.operator.attention_operator.AttentionOperatorConfig() got multiple values for keyword argument 'hidden_size'

I do follow the lead in the README exactly, but after run the worker.py with the same confs, i got an TypeError as shown in the picture. Could you please tell me how to fix it? @Jyonn Thanks.
Traceback (most recent call last): File "/Users/chuanqijiao/GNRS-master/worker.py", line 395, in <module> worker = Worker(config=configuration) File "/Users/chuanqijiao/GNRS-master/worker.py", line 54, in __init__ self.config_manager = ConfigManager( File "/Users/chuanqijiao/GNRS-master/loader/config_manager.py", line 219, in __init__ self.recommender = self.recommender_class( File "/Users/chuanqijiao/GNRS-master/model/recommenders/base_neg_recommender.py", line 26, in __init__ super().__init__(**kwargs) File "/Users/chuanqijiao/GNRS-master/model/recommenders/base_recommender.py", line 76, in __init__ self.user_config = self.user_encoder_class.config_class( TypeError: model.operator.attention_operator.AttentionOperatorConfig() got multiple values for keyword argument 'hidden_size'

How to get original text data in the training process

I noticed that the data in a batch is represented as Unitok objects, which is the result of tokenization using Unitok (after processor.py). I'm wondering if there is a way to map these tokenized results back to the original text data. For example, if a nid token is 234 that can be mapped to N25648 in the original dataset, then original title data can be found using the N25648 index? Is there a way to do that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.