Papers on the topic of reinforcement learning for recommendation
[1]. Top-K Off-Policy Correction for a REINFORCE Recommender System
This paper has incorporated importance sampling with REINFORCE for recommendation. Looking into their design of the structure of the neural networks of the offline policy and the target policy, I think it is similar to domain adaptation. The offline policy can be tailored into the target policy by modifying the last layer of the network.
[2]. Generative Adversarial User Model for Reinforcement Learning Based Recommendation System