Code Monkey home page Code Monkey logo

recsim's Introduction

RecSim: A Configurable Recommender Systems Simulation Platform

RecSim is a configurable platform for authoring simulation environments for recommender systems (RSs) that naturally supports sequential interaction with users. RecSim allows the creation of new environments that reflect particular aspects of user behavior and item structure at a level of abstraction well-suited to pushing the limits of current reinforcement learning (RL) and RS techniques in sequential interactive recommendation problems. Environments can be easily configured that vary assumptions about: user preferences and item familiarity; user latent state and its dynamics; and choice models and other user response behavior. We outline how RecSim offers value to RL and RS researchers and practitioners, and how it can serve as a vehicle for academic-industrial collaboration. For a detailed description of the RecSim architecture please read Ie et al. Please cite the paper if you use the code from this repository in your work.

Bibtex

@article{ie2019recsim,
    title={RecSim: A Configurable Simulation Platform for Recommender Systems},
    author={Eugene Ie and Chih-wei Hsu and Martin Mladenov and Vihan Jain and Sanmit Narvekar and Jing Wang and Rui Wu and Craig Boutilier},
    year={2019},
    eprint={1909.04847},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Disclaimer

This is not an officially supported Google product.

What's new

  • 12/13/2019: Added (abstract) classes for both multi-user environments and agents. Added bandit algorithms for generalized linear models.

Installation and Sample Usage

It is recommended to install RecSim using (https://pypi.org/project/recsim/):

pip install recsim

However, the latest version of Dopamine is not in PyPI as of December, 2019. We want to install the latest version from Dopamine's repository like the following before we install RecSim. Note that Dopamine requires Tensorflow 1.15.0 which is the final 1.x release including GPU support for Ubuntu and Windows.

pip install git+https://github.com/google/dopamine.git

Here are some sample commands you could use for testing the installation:

git clone https://github.com/google-research/recsim
cd recsim/recsim
python main.py --logtostderr \
  --base_dir="/tmp/recsim/interest_exploration_full_slate_q" \
  --agent_name=full_slate_q \
  --environment_name=interest_exploration \
  --episode_log_file='episode_logs.tfrecord' \
  --gin_bindings=simulator.runner_lib.Runner.max_steps_per_episode=100 \
  --gin_bindings=simulator.runner_lib.TrainRunner.num_iterations=10 \
  --gin_bindings=simulator.runner_lib.TrainRunner.max_training_steps=100 \
  --gin_bindings=simulator.runner_lib.EvalRunner.max_eval_episodes=5

You could then start a tensorboard and view the output

tensorboard --logdir=/tmp/recsim/interest_exploration_full_slate_q/ --port=2222

You could also find the simulated logs in /tmp/recsim/episode_logs.tfrecord

Tutorials

To get started, please check out our Colab tutorials. In RecSim: Overview, we give a brief overview about RecSim. We then talk about each configurable component: environment and recommender agent.

Documentation

Please refer to the white paper for the high-level design.

recsim's People

Contributors

cwhsu-google avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recsim's Issues

pip error

when I run pip install recsim to install the recsim,following error occurred:

ERROR: Could not find a version that satisfies the requirement dopamine-rl>=2.0.6 (from recsim) (from versions: 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 2.0.0, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5)

ERROR: No matching distribution found for dopamine-rl>=2.0.6 (from recsim)

Unable to setup

After downloading the zip and runsetup.py in the recsim-master folder.
Following error occurred
//////////////////////////////////////////////////////////////////////////////////////
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help

error: no commands supplied
//////////////////////////////////////////////////////////////////////////////////////
Then I tried to install it myself.

///////////////////////////////////
(tf_gpu) C:\Users\TR201\recsim>python -m pip install recsim
Requirement already satisfied: recsim in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (0.1.6)
Requirement already satisfied: gin-config in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from recsim) (0.2.1)
Requirement already satisfied: gym in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from recsim) (0.14.0)
Requirement already satisfied: atari-py in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from recsim) (0.2.6)
Requirement already satisfied: dopamine-rl in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from recsim) (2.0.5)
Requirement already satisfied: numpy in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from recsim) (1.16.4)
Requirement already satisfied: tensorflow in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from recsim) (1.13.1)
Requirement already satisfied: absl-py in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from recsim) (0.7.1)
Requirement already satisfied: six>=1.10.0 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from gin-config->recsim) (1.12.0)
Requirement already satisfied: scipy in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from gym->recsim) (1.3.0)
Requirement already satisfied: cloudpickle~=1.2.0 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from gym->recsim) (1.2.2)
Requirement already satisfied: pyglet<=1.3.2,>=1.2.0 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from gym->recsim) (1.3.2)
Requirement already satisfied: opencv-python>=3.4.1.15 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from dopamine-rl->recsim) (4.1.1.26)
Requirement already satisfied: tensorboard<1.14.0,>=1.13.0 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from tensorflow->recsim) (1.13.1)
Requirement already satisfied: keras-applications>=1.0.6 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from tensorflow->recsim) (1.0.8)
Requirement already satisfied: tensorflow-estimator<1.14.0rc0,>=1.13.0 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from tensorflow->recsim) (1.13.0)
Requirement already satisfied: termcolor>=1.1.0 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from tensorflow->recsim) (1.1.0)
Requirement already satisfied: wheel>=0.26 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from tensorflow->recsim) (0.33.4)
Requirement already satisfied: gast>=0.2.0 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from tensorflow->recsim) (0.2.2)
Requirement already satisfied: grpcio>=1.8.6 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from tensorflow->recsim) (1.16.1)
Requirement already satisfied: keras-preprocessing>=1.0.5 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from tensorflow->recsim) (1.1.0)
Requirement already satisfied: astor>=0.6.0 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from tensorflow->recsim) (0.7.1)
Requirement already satisfied: protobuf>=3.6.1 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from tensorflow->recsim) (3.8.0)
Requirement already satisfied: future in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from pyglet<=1.3.2,>=1.2.0->gym->recsim) (0.17.1)
Requirement already satisfied: werkzeug>=0.11.15 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow->recsim) (0.15.4)
Requirement already satisfied: markdown>=2.6.8 in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow->recsim) (3.1.1)
Requirement already satisfied: h5py in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from keras-applications>=1.0.6->tensorflow->recsim) (2.9.0)
Requirement already satisfied: setuptools in c:\users\tr201\anaconda3\envs\tf_gpu\lib\site-packages (from protobuf>=3.6.1->tensorflow->recsim) (41.0.1)

(tf_gpu) C:\Users\TR201\recsim>python setup.py
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help

error: no commands supplied

(tf_gpu) C:\Users\TR201\recsim>cd recsim

(tf_gpu) C:\Users\TR201\recsim\recsim>python main.py
Traceback (most recent call last):
File "main.py", line 33, in
from recsim.agents import full_slate_q_agent
File "C:\Users\TR201\Anaconda3\envs\tf_gpu\lib\site-packages\recsim\agents_init_.py", line 19, in
from recsim.agents import full_slate_q_agent
File "C:\Users\TR201\Anaconda3\envs\tf_gpu\lib\site-packages\recsim\agents\full_slate_q_agent.py", line 22, in
from recsim.agents.dopamine import dqn_agent
File "C:\Users\TR201\Anaconda3\envs\tf_gpu\lib\site-packages\recsim\agents\dopamine\dqn_agent.py", line 23, in
from dopamine.agents.dqn import dqn_agent
File "C:\Users\TR201\Anaconda3\envs\tf_gpu\lib\site-packages\dopamine\agents\dqn\dqn_agent.py", line 28, in
from dopamine.discrete_domains import atari_lib
File "C:\Users\TR201\Anaconda3\envs\tf_gpu\lib\site-packages\dopamine\discrete_domains\atari_lib.py", line 31, in
import atari_py
File "C:\Users\TR201\Anaconda3\envs\tf_gpu\lib\site-packages\atari_py_init_.py", line 1, in
from .ale_python_interface import *
File "C:\Users\TR201\Anaconda3\envs\tf_gpu\lib\site-packages\atari_py\ale_python_interface.py", line 18, in
'ale_interface/ale_c.dll'))
File "C:\Users\TR201\Anaconda3\envs\tf_gpu\lib\ctypes_init_.py", line 434, in LoadLibrary
return self.dlltype(name)
File "C:\Users\TR201\Anaconda3\envs\tf_gpu\lib\ctypes_init
.py", line 356, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] Can not find the module.
////////////////////////////////////////////////////////////////

please help

Comparison operator not supported for 'collections.OrderedDict'

I'm trying to implement the TabularQAgent and I am running into the following error (in Python 3.8):

{TypeError}'<' not supported between instances of 'collections.OrderedDict' and 'collections.OrderedDict'

The error is thrown when max_q_state_action is defined on line 208 of tabular_q_agent.py (due to the max function).

Simulations with delayed feedback

Is there a way to simulate dalayed feedback with recsim? For example purchases that come a variable number of timesteps after the item was recommended?

Returning same slate after every iteration

Thanks for the great work @cwhsu-google. Our team is trying to use RecSim for slate recommendation.

After training the agent (slate_decomp_q_agent) for 300k steps. I tried loading different checkpoints and to generate slates for the same user (to understand convergence of q values) but the slates returned after every iteration are the same.

Here is my script that I used for prediction:

inference.py

from recsim.environments import interest_evolution  
from recsim.agents import slate_decomp_q_agent  
  
  
def create_decomp_q_agent(sess, environment, eval_mode, summary_writer=None):  
    """  
 This is one variant of the agent featured in SlateQ paper """  kwargs = {  
      'observation_space': environment.observation_space,  
  'action_space': environment.action_space,  
  'summary_writer': summary_writer,  
  'eval_mode': eval_mode,  
  }  
    return slate_decomp_q_agent.create_agent(agent_name='slate_topk_sarsa', sess=sess, **kwargs)  
  
  
seed = 0  
slate_size = 3  
np.random.seed(seed)  
env_config = {  
  'num_candidates': 30,  
  'slate_size': slate_size,  
  'resample_documents': True,  
  'seed': seed,  
}  
  
  
tmp_decomp_q_dir = '../results12/'  
  
user_vec = [-0.00598616, 0.1760635, -0.0913329, 0.59239239, -0.90903912,  
  -0.17019989, 0.00312255, -0.32639151, -0.5325127, -0.47683574,  
  -0.86847277, 0.32046379, -0.56788602, -0.69480169, 0.071154,  
  0.33922171, 0.04820297, 0.97037383, 0.04213649, -0.16748408]  
  
user_obs = np.array(user_vec)  
print('Shape of user observation:', user_obs.shape)  
runner = prediction.PredRunner(  
      base_dir=tmp_decomp_q_dir,  
  create_agent_fn=create_decomp_q_agent,  
  env=interest_evolution.create_environment(env_config))  
print('Going to predict...')  
start_time = time.time()  
print(runner.predict(user_obs_features=user_obs))  
print('Prediction Time taken', time.time()-start_time, 'seconds')

prediction.py

import os
import time
from dopamine.discrete_domains import checkpointer
from recsim.simulator.runner_lib import Runner
import tensorflow.compat.v1 as tf

class PredRunner(Runner):
    def __init__(self,
                 train_base_dir=None,
                 **kwargs):
        st = time.time()
        super(PredRunner, self).__init__(**kwargs)
        self._output_dir = os.path.join(self._base_dir, 'pred')
        tf.io.gfile.makedirs(self._output_dir)
        if train_base_dir is None:
            train_base_dir = self._base_dir
        self._checkpoint_dir = os.path.join(train_base_dir, 'train', 'checkpoints')
        self._set_up(eval_mode=True)
        # Use the checkpointer class.
        self._checkpointer = checkpointer.Checkpointer(
            self._checkpoint_dir, self._checkpoint_file_prefix)
        checkpoint_version = -1
        latest_checkpoint_version = checkpointer.get_latest_checkpoint_number(
            self._checkpoint_dir)
        latest_checkpoint_version = 100
        print('Checkpoint that would be read:', latest_checkpoint_version)
        # checkpoints_iterator already makes sure a new checkpoint exists.
        if latest_checkpoint_version <= checkpoint_version:
            time.sleep(self._min_interval_secs)
        experiment_data = self._checkpointer.load_checkpoint(
            latest_checkpoint_version)
        assert self._agent.unbundle(self._checkpoint_dir,
                                    latest_checkpoint_version, experiment_data)
        # Saving weights to file for debugging
        tvars = tf.trainable_variables()
        tvars_vals = self._sess.run(tvars)
        var_list = []
        tensor_list = []
        for var, val in zip(tvars, tvars_vals):
            var_list.append(var.name)
            tensor_list.append(val)
        import pandas as pd
        df = pd.DataFrame({'var': var_list, 'tensor': tensor_list})
        df.to_pickle('youtube-test-weights{}.pickle'.format(latest_checkpoint_version))
        print('Model loading time taken: {}'.format(time.time() - st))

    def predict(self, user_obs_features):
        st = time.time()
        self._env.reset_sampler()
        self._initialize_metrics()
        observation = self._env.reset()
        observation['user'] = user_obs_features
        start = time.time()
        action = self._agent.begin_episode(observation)
        print('Step time taken: {}'.format(time.time() - start))
        slate = [0] * len(action)
        doc_keys = list(observation['doc'].keys())
        for i in range(len(action)):
            slate[i] = doc_keys[action[i]]
        print('Time taken: {} ms'.format(1000*(time.time()-st)))
        return slate

These graphs were generated on tensorboard:

Screenshot 2020-12-08 at 5 40 54 PM

Screenshot 2020-12-08 at 5 41 07 PM

Screenshot 2020-12-08 at 5 41 20 PM

Screenshot 2020-12-08 at 5 41 31 PM

Screenshot 2020-12-08 at 5 41 55 PM

Screenshot 2020-12-08 at 5 42 03 PM

Screenshot 2020-12-08 at 5 42 44 PM

Screenshot 2020-12-08 at 5 43 25 PM

Screenshot 2020-12-08 at 5 43 36 PM

Most importantly I am looking answers for the following

  • Why q values over different epochs are turning out to be same?
  • Which in turn is returning same slates for all the checkpoints
  • This raises question, whether model is training or not
  • Also we see watch time for each video is 4 min, since q values reflect cumulative reward over state, action pair, how come their scale is 10exp-2

Any help would be appreciated

Environment only release?

Hi,
Is it possible to have a release without dopamine & tensorflow dependencies? We want to include to support this environment in our project but we are using PyTorch so we want to avoid the additional dependencies. I understand this is a Google's project so it might not be a priority to support PyTorch users. Please let me know if that something that can be supported.

Thank you,
Kittipat

recism requires dopamine-rl>=2.0.6, which is not in PyPI

Summary

Running pip install recsim fails due to dependency on dopamine-rl>=2.0.6, which is not in PyPI.

Description

Run the following:

pip install recsim

The following is then output:

Collecting dopamine-rl>=2.0.6 (from recsim)
  ERROR: Could not find a version that satisfies the requirement dopamine-rl>=2.0.6 (from recsim) (from versions: 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 2.0.0, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5)
ERROR: No matching distribution found for dopamine-rl>=2.0.6 (from recsim)

I don't see dopamine-rl>=2.06 in PyPI.

Illegal instruction (core dumped)

ENV:

python :3.6.8
tensorflow:1.15.0

after I pip install recsim and run

python main.py --logtostderr
--base_dir="/tmp/recsim/interest_exploration_full_slate_q"
--agent_name=full_slate_q
--environment_name=interest_exploration
--episode_log_file='episode_logs.tfrecord'
--gin_bindings=simulator.runner_lib.Runner.max_steps_per_episode=100
--gin_bindings=simulator.runner_lib.TrainRunner.num_iterations=10
--gin_bindings=simulator.runner_lib.TrainRunner.max_training_steps=100
--gin_bindings=simulator.runner_lib.EvalRunner.max_eval_episodes=5

it returns :
Illegal instruction (core dumped)

Cannot install on macos: No module named 'tensorflow.contrib'

After following the installation instructions.
I used pip3 as python2 was failing to install the atari lib.
I also upgraded dopamine.

I am running in an python3.7 virtual env

Successfully installed Pillow-6.2.1 dopamine-rl-3.0.1
mhstnsc-mac:~/personal/recsim/recsim [master ?] $ ./start.sh
Traceback (most recent call last):
  File "main.py", line 33, in <module>
    from recsim.agents import full_slate_q_agent
  File "/usr/local/lib/python3.7/site-packages/recsim/agents/__init__.py", line 19, in <module>
    from recsim.agents import full_slate_q_agent
  File "/usr/local/lib/python3.7/site-packages/recsim/agents/full_slate_q_agent.py", line 22, in <module>
    from recsim.agents.dopamine import dqn_agent
  File "/usr/local/lib/python3.7/site-packages/recsim/agents/dopamine/dqn_agent.py", line 23, in <module>
    from dopamine.agents.dqn import dqn_agent
  File "/usr/local/lib/python3.7/site-packages/dopamine/agents/dqn/dqn_agent.py", line 27, in <module>
    from dopamine.discrete_domains import atari_lib
  File "/usr/local/lib/python3.7/site-packages/dopamine/discrete_domains/atari_lib.py", line 54, in <module>
    from tensorflow.contrib import layers as contrib_layers
ModuleNotFoundError: No module named 'tensorflow.contrib'

TF2.0 Compatibility Bug

In the dqn_agent.py the used tf.keras.layers.Dense(TF2.0) does not support graph reuse as in the way it is intended to. Hence N number of graphs are being initialised.

using tf.layers.dense, inplace of tf.keras.layers.Dense was able to overcome the issue for now.

Problem of Large action spaces in interest evolution

I am trying to use RecSim(decompose_q with interest evolution) for training of recommendation.
My action space( video ) is quite large and one-hot encoding will not work as we do in interest_evolution environment.
Can you please suggest some other methodology by which we can simulate the environment.

Accuracy and Diversity

Hi, on page 13 of the white paper refers that :

RECSIM includes a number of typical metrics (e.g., average cumulative
reward and episode length) as well as some basic diversity metrics.

Is there an example for calculating such metrics (i.e., accuracy and diversity)?
Thanks.

Clarification of satisfaction/engagement tradeoff in choc-kale example

When running the chocolate/kale example, I select a slate following a deterministic kaleness-first selection policy - i.e. I order document observations descendingly by their kaleness and then define the action as the first slate_size items:

action = tuple(np.argsort([do[0] for do in document_observations]))[:slate_size]

document_observations is observation["doc"], so e.g. (array([0.57019677]), array([0.43860151]), ..., array([0.46631077]).

For comparison I run the environment with the reverse policy, which is to select the action by picking the documents with the lowest kaleness:

action = tuple(np.argsort([do[0] for do in document_observations]))[::-1][:slate_size]

If I compare both policies after running them for a couple hundred steps, the kaleness-first policy yields higher engagement and lower user satisfaction than the chocolateness-first policy. I would expect exactly the opposite, since kaleness is supposed to boost user satisfaction at the cost of lower engagement.

Why does selecting items with the highest kaleness yield a lower user satisfaction and higher engagement than selecting items with the lowest kaleness?

Not able to get a converged reward for interest_evolution env and full_slate_q_agent

I am trying to get familiarized with recsim.
I want to test the full_slate_q_agent upon the interest_evolution but the agent does not seem to learn something. random_agent performs same as full_slate_q_agent.
The command i execute is the usage example from https://pypi.org/project/recsim/ with the only difference that num_iterations are set to 100 instead of 10.

python main.py --logtostderr --base_dir="/tmp/recsim/interest_evolution_full_slate_q" --agent_name=full_slate_q --environment_name=interest_evolution --episode_log_file='episode_logs.tfrecord' --gin_bindings=simulator.runner_lib.Runner.max_steps_per_episode=100 --gin_bindings=simulator.runner_lib.TrainRunner.num_iterations=100 --gin_bindings=simulator.runner_lib.TrainRunner.max_training_steps=100 --gin_bindings=simulator.runner_lib.EvalRunner.max_eval_episodes=5

Am i doing something wrong?
Which set up should i use so as to explore the performance of full_slate_q_agent ???

I also attach the tensorboard figure with the training results:
2020-01-31

Thanks a lot for you time and help and many congratulation for recsim. Seems to be a very potencial and promising lib :-)

Support for interleaved user interaction

Hi,
On page 12 of the white paper refers that "In the current design, the simulator sequentially simulates each user."
Is it available to support interleaved user interaction now?

Thanks,
Jeremy.

Tensorflow 2.0 and more complex agents examples

Hello, thank you for this great work!

As I understand, RecSim is made for use with TF 1, because, if I'm not mistaken, we must provide a session to an agent. Do you plan to adapt interface for the new version of TF?

And it would be great to have some more examples of more complex deep RL agents, besides DQN.

Thanks!

User aware document sampler

Is there a way to create a document sampler that depends on the user, i.e. it will recommend sets of documents d_A and d_b to users u_A and u_b respectively?

Recommender System step-by-step output

Hello,

I am working on a project and we want to use recsim to understand how RL agents on recommender systems work. We created a new environment based on our business case and applied the full-slate-q agent on it, but the only output we have are the reward graphs in the TensorBoard.

Would it be possible to display the step-by-step output for a given user starting state, with the recommandations made by the agent and the choices made by the user to understand better how everything works?

Congrats for the great work and thanks in advance!

Théophile

Using slate_decomp_q_agent in production

I am new to RecSim and Reinforcement Learning. How can I use a slate_decomp_q_agent trained using RecSim in production or simply in the real world outside the simulation environment?

Q&A

How can I evaluate whether the simulator is valuable or not?
I mean, are there any metrics which can evaluate the performance of different simulators?

Minor bug in slate_decomp_q_agent

I was training a slate_decomp_q_agent and ran into this AttributeError related to tf.compat.v1 module. Please refer to this notebook for more details.

AttributeError: module 'tensorflow._api.v1.compat.v1.compat' has no attribute 'v1'

L22 In the slate_decomp_q_agent.py:
import tensorflow.compat.v1 as tf

L175 in the same file:
output_slate = tf.compat.v1.where(tf.equal(mask, 0))

I changed L175 (and other such instances) to:
output_slate = tf.where(tf.equal(mask, 0))

And it worked like a charm. If you think that's the right approach, I can submit a pull request for the same.

FYI, this is the first time I am opening an issue. If there's anything else you need from my side, feel free to comment.

My env details:

# OS: macOS Mojave v10.14.6
# used pip virtual environments with below libraries installed
absl-py==0.9.0
appnope==0.1.0
astor==0.8.1
astunparse==1.6.3
attrs==19.3.0
backcall==0.2.0
bleach==3.1.5
cachetools==4.1.0
certifi==2020.4.5.2
chardet==3.0.4
cloudpickle==1.3.0
decorator==4.4.2
defusedxml==0.6.0
dopamine-rl==3.0.1
entrypoints==0.3
future==0.18.2
gast==0.2.2
gin-config==0.3.0
google-auth==1.16.1
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.29.0
gym==0.17.2
h5py==2.10.0
idna==2.9
importlib-metadata==1.6.1
ipykernel==5.3.0
ipython==7.15.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.17.0
Jinja2==2.11.2
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.3
jupyter-console==6.1.0
jupyter-core==4.6.3
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
Markdown==3.2.2
MarkupSafe==1.1.1
mistune==0.8.4
nbconvert==5.6.1
nbformat==5.0.7
notebook==6.0.3
numpy==1.18.5
oauthlib==3.1.0
opencv-python==4.2.0.34
opt-einsum==3.2.1
packaging==20.4
pandocfilters==1.4.2
parso==0.7.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.1.2
prometheus-client==0.8.0
prompt-toolkit==3.0.5
protobuf==3.12.2
ptyprocess==0.6.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyglet==1.5.0
Pygments==2.6.1
pyparsing==2.4.7
pyrsistent==0.16.0
python-dateutil==2.8.1
pyzmq==19.0.1
qtconsole==4.7.4
QtPy==1.9.0
recsim==0.2.3
requests==2.23.0
requests-oauthlib==1.3.0
rsa==4.0
scipy==1.4.1
Send2Trash==1.5.0
six==1.15.0
tensorboard==1.15.0
tensorboard-plugin-wit==1.6.0.post3
tensorflow==1.15.0
tensorflow-estimator==1.15.1
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
tornado==6.0.4
traitlets==4.3.3
urllib3==1.25.9
wcwidth==0.2.4
webencodings==0.5.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wrapt==1.12.1
zipp==3.1.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.