Code Monkey home page Code Monkey logo

pensieve's Introduction

Pensieve

Pensieve is a system that generates adaptive bitrate algorithms using reinforcement learning. http://web.mit.edu/pensieve/

Prerequisites

  • Install prerequisites (tested with Ubuntu 16.04, Tensorflow v1.1.0, TFLearn v0.3.1 and Selenium v2.39.0)
python setup.py

Training

  • To train a new model, put training data in sim/cooked_traces and testing data in sim/cooked_test_traces, then in sim/ run python get_video_sizes.py and then run
python multi_agent.py

The reward signal and meta-setting of video can be modified in multi_agent.py and env.py. More details can be found in sim/README.md.

Testing

  • To test the trained model in simulated environment, first copy over the model to test/models and modify the NN_MODEL field of test/rl_no_training.py , and then in test/ run python get_video_sizes.py and then run
python rl_no_training.py

Similar testing can be performed for buffer-based approach (bb.py), mpc (mpc.py) and offline-optimal (dp.cc) in simulations. More details can be found in test/README.md.

Running experiments over Mahimahi

  • To run experiments over mahimahi emulated network, first copy over the trained model to rl_server/results and modify the NN_MODEL filed of rl_server/rl_server_no_training.py, and then in run_exp/ run
python run_all_traces.py

This script will run all schemes (buffer-based, rate-based, Festive, BOLA, fastMPC, robustMPC and Pensieve) over all network traces stored in cooked_traces/. The results will be saved to run_exp/results folder. More details can be found in run_exp/README.md.

Real-world experiments

  • To run real-world experiments, first setup a server (setup.py automatically installs an apache server and put needed files in /var/www/html). Then, copy over the trained model to rl_server/results and modify the NN_MODEL filed of rl_server/rl_server_no_training.py. Next, modify the url field in real_exp/run_video.py to the server url. Finally, in real_exp/ run
python run_exp.py

The results will be saved to real_exp/results folder. More details can be found in real_exp/README.md.

pensieve's People

Contributors

hongzimao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pensieve's Issues

Errors in plot_results.py

I'm trying to plot the result of pensieve(rl) and buffer based (bb) algorithm, but encountered the following error:
Traceback (most recent call last): File "plot_results.py", line 227, in <module> main() File "plot_results.py", line 57, in main bit_rate.append(VIDEO_BIT_RATE[int(parse[6])]) ValueError: invalid literal for int() with base 10: '-10.714325277567443'

Also another error is encountered when I try to plot the result of rl and dp:
Traceback (most recent call last): File "plot_results.py", line 227, in <module> main() File "plot_results.py", line 70, in main assert rebuff >= -1e-4 AssertionError

Note that I did change SCHEMES at line 19 to match the corresponding experiments in both cases. Any idea what would cause the errors above?
Thanks a lot!

A few questions when run multi_agent.py with cpu and gpu?

Hi:
I tested with Ubuntu 14.04, Tensorflow v1.4.0, TFLearn v0.3.2 and Selenium v2.39.0,The data is your sample training/testing data(train_sim_traces and test_sim_traces).
I installed pensieve and run testing environmet: python rl_no_training.py , it's ok whatever run at cpu or gpu.

    But there are some questions when I'm running training: python multi_agent.py
    Run at GPU error info:
            1. keep_dims is deprecated, use keepdims instead
                WARNING:tensorflow:Error encountered when serializing data_augmentation.
                        Type is unsupported, or the types of the items don't match field type in CollectionDef.
                         'NoneType' object has no attribute 'name'
            2.  W tensorflow/core/grappler/utils.cc:48] Node actor/FullyConnected/MatMul_fused is not in the graph.
                  Process Process-1:
                  Traceback (most recent call last):
                  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
                  self.run()
                 File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
                    self._target(*self._args, **self._kwargs)
                 File "multi_agent.py", line 143, in central_agent
                   s_batch=np.stack(s_batch, axis=0),
                File "/usr/local/lib/python2.7/dist-packages/numpy/core/shape_base.py", line 350, in stack
                 raise ValueError('need at least one array to stack')
                 ValueError: need at least one array to stack
    Run at CPU error info:
           1.Instructions for updating:
              Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
            WARNING:tensorflow:Error encountered when serializing data_augmentation.
             Type is unsupported, or the types of the items don't match field type in CollectionDef.
             'NoneType' object has no attribute 'name'


   Am I calling the wrong script or should be fix the environment? What do I need to do to fix it next?

   Hope to get your reply , thanks!

Parameters for buffer-based algorithm from Huang et al.

Dear Hongzi Mao,

The buffer-based algorithm from [1] used to test Pensieve's performance against used the following parameters:

Reservoir: 5 seconds
Cushion: 10 seconds

These were found in both test/bb.py and dash.js/src/streaming/controllers/AbrController.js.

Huang et al. [1] suggested calculating the reservoir dynamically (section 5.1) and chose upper reservoir to be something around 10% of the buffer size (e.g. section 4). Compared to the 60 seconds of buffer size that was used across all tests in your paper [2], the reservoir and cushion size chosen seem relatively much smaller than what the Huang et al. suggested in [1].

I would really like to know the reasons behind your choice. This will help me replicate results presented in your paper [2] in a more informed manner.

[1] T.-Y. Huang, R. Johari, N. McKeown, M. Trunnell and M. Watson, "A Buffer-Based Approach to Rate Adaptation: Evidence from a Large Video Streaming Service," in Proceedings of the 2014 ACM Conference on SIGCOMM, Chicago, Illinois, USA, 2014.

[2] H. Mao, R. Netravali and M. Alizadeh, "Neural Adaptive Video Streaming with Pensieve," in Proceedings of the Conference of the ACM Special Interest Group on Data Communication, Los Angeles, CA, USA, 2017.

Question about the CNN structure

Hi hongzi,

I have a question about the CNN structure used in the Pensieve system.

Based on my understanding, CNN layers are usually used to detect features or patterns from the data, since the convolution kernels can be seemed as filters. For example, when the past 8 chunk throughputs are feeded to an 1D-CNN, if we want to detect the change pattern of the throughput history, then the input shape should be in [?, 8, 1] (in tensorflow). And if there are 128 filters with size 4, the kernel shape will be [4, 1, 128], then we will get the output in shape [?, 8, 128].

However in the code, the input shape is [?, 1, 8], which means that the 8 history records are treated as channels, not samples in a single channel. So the filter shape is [4, 8, 128], each of the filters has 8 channels and each channel has 4 coefficients. However for each channel, since there is only 1 input sample, only 1 coefficient is actually used and other 3 ones are multiplied by padded "zeros". I am quite confused with this structure. Why the network is designed like this? Please correct me if I am wrong...Thanks!

A question about the network in a3c.py

Hi, hongzi:
I found line 290 to 291 in a3c.py as follows:

split_4 = tflearn.conv_1d(inputs[:, 4:5, :A_DIM], 128, 4, activation='relu')
split_5 = tflearn.fully_connected(inputs[:, 4:5, -1], 128, activation='relu')

, and I wonder why the split_5 takes the inputs[:, 4:5, -1] as it's input rather than
inputs[:, 5:6, -1] ? I think s_batch[:, 4:5, :A_DIM] stands for the next_video_chunk_size vector and split_5 should take the video_chunk_remain as input, which corresponds to s_batch[:, 5:6, -1].
Looking forward to your reply.

Question about training time

Hi,
How long does the training process take? I am running the tensorflow cpu version on an i7-4720HQ CPU @ 2.60GHz. The training has been running for a couple of hours now..

~Sandip

Why the result is not better than MPC?

Hi Hongzi,

I tried to reproduce the result of Pensieve. After several attempts, I failed to get an ideal result (better performance than MPC). The following is the way I used. The code was downloaded from GitHub, and the trace files were got from Dropbox:

  1. Put training data (train_sim_traces) in sim/cooked_traces and testing data (test_sim_traces) in sim/cooked_test_traces;
  2. Run python multi_agent.py to train the model;
  3. Copy the generated model files to test/model, and modify the model name in test/rl_no_training.py;
  4. Run python rl_no_training.py in test/ folder to test the model, trace files in test_sim_traces are also used;
  5. Run python plot_results.py to compare the results with DP method & MPC method.

I put two figures of total_reward and CDF here. We can see the performance of Pensieve is not better than MPC.
figure_4
figure_1-4

Here is a figure of tensorboard. The training step is about 160,000.
screenshot-2017-11-1 tensorboard

I found the result is not very stable after long time training (more than 10,000). Thus the trained models bring different performance when testing. For example, the model of 164500 steps got a reward of 35.2, while the model of 164600 steps got a reward of 33.7.

Did I do something wrong, so that I couldn't get the same result as you described in the paper? The pretrain_linear_reward model performs good. How do you get it? Can you give me a hand to solve these questions, any answer is highly appreciated.

Thanks!

How to modify the loss to encourage exploration?

I'm trying to apply a3c.py to a different problem. However it quickly converge to the dominant action and never move again. My question is: whether there are any suggestions for modifying parameters or loss function to encourage exploration? Have you tried that in pensieve? Thanks!

Playback restarted when all fragments downloaded

Hi all,

I have been testing the code in dash_client, and in turn the generated dash.all.js file, and I realised the playback is restarted when all the fragments have been downloaded, without playing the whole video.

Why does the player behave this way? And would it be possible to modify the provided code in such a way the complete video is played and the playback finished after that?

Thanks!

P.S.: I just checked the provided, compiled dash.all.min.js code and it presents the same behaviour

Question about reproducing paper performance

  1. In the Pensieve code, I guess only difference on mine and the paper is 'entropy weight'.
    We tried many types of initial values and decaying patterns, but I want to use the original Pensieve setting.
    Since I try to get exact baseline performance (e.g. paper performance), could you share your detailed strategy on 'entropy weight'?

  2. Are there other factors that might affect the performance of Pensieve?
    Anything to be careful to reproduce the paper performance based on github code would be really helpful.

About cooked data

Hi,

I'm trying to use the raw fcc data to generate the cooked data, but I didn't know how to do that. Can you provide some sample code or give a hint about how to generate? It seems that you use unit_id + target(URL) as a key to separate the traces. But in the raw data, for each unit_id + target, the interval between each log is about 2 hours. Can you tell us how to generate the cooked trace(as you provided) that interval is 5 seconds.

Thanks

a few questions about the math

  1. The training law of the Actor network (Eq.2) uses the gradient of the network times the reward difference A, to evolve the model of the neural network

Is the term "detla_theda log pi_theda (s_t, a_t)" the gradient of the network?

I am a little confused about what's in ac3.py: actor_gradients is pi_theda(s_t, a_t), then what does tf.log(tf.reduce_sum(tf.multiply(self.out, self.acts),...)) do? Which term is the function A?

In a3c.py, line 46 - 59

  1. The value of A(s_t, a_t) is calculated by
    r_t + gamma V_pi_theda(s_t+1; theda_v) - V_pi_theda(st;theda_v): the expression between Eq.3 and 4

The first term r_t is the new reward based on the new action a. The second term is the future reward based on the new state s_t+1, but with the policy unchanged (no more new action)?

  1. The training of the critic network is based on "empirically observed rewards", which is the value of r_t (based on s_t and a_t) in Eq.3

In the ac3.py line 144, where does td_target come from?

Suppose the "empirically observed rewards" is the ground truth, can supervised learning also work? What is the value of using reinforcement learning?

  1. A sperate question about how sensitive the trained model to different parameters. The state s_t contains a number of data fields, which one do you think brings the greatest impact on improvement (e.g., knowing the sizes of m future chunks might be very useful)? Also, the number of past chunks and the number of future chunks should affect the complexity and the accuracy of the neural network, what do you think is the sweet spot?

A related question is that you use the "Envivo-Dash3" video set, do you think a different transcoder stack with a different bitrate ladder, segment duration, even the fluctuation of bitrate (within one bitrate) may make a trained model perform less well?

Incorrect cooked traces?

Hi,

I'm trying to run the sim/multi_agent.py script, but there seems to be an issue. I downloaded your cooked_traces folder from dropbox and put it into the sim folder as the readme file requested. The problem is, the cooked_traces file contain only one column, whereas it looks like your script load_traces.py requires two columns instead.

Am I calling the wrong script or should there be different data instead? Also, how do I "cook" traces myself?

Thanks!

Trace iteration in run_exp

I'm trying to run Pensieve's experimental setup in 'run_exp'. I have installed all dependencies using setup.py and put some traces in 'cooked_traces'. When I run the python script 'run_all_traces.py' for the experiments, I only see one trace being tested. I verified that the experiment fetches video chunks for the trace, and generates results for the specific trace. But it does not move on to other traces.

I looked into 'chrome_retry_log' to see if there were any reports. The only thing I see there is in the form:

'BB_trace5.txt'
'Timeout'

'RL_trace5.txt
'Timeout'

Looking into the code, I can see that there is a handler for a SIGALARM signal which creates an Exception with the 'Timeout' text.

Any help in making it iterate through traces properly is appreciated.

Qustion about A3C "apply_gradients" function

Hi,

I am reading your a3c.py and multi_agent.py code, the function "apply_gradients(self, actor_gradients)" really makes me confuse.
def apply_gradients(self, actor_gradients):
return self.sess.run(self.optimize, feed_dict={
i: d for i, d in zip(self.actor_gradients, actor_gradients)
})
which will inference:

Optimization Op

    self.optimize = tf.train.RMSPropOptimizer(self.lr_rate).\
        apply_gradients(zip(self.actor_gradients, self.network_params))

This self.actor_gradients op is not even a placeholder, why can you just feed_dict it with the fixed value? which is got from get_gradients().

Thanks!

Some questions about the environment

Hi, hongzi. I tried to reproduce your pensieve system. However, I found whatever bitrate the agent choiced, the rebuf time was always 0, then the agent would just choice one action for every chunk because it always received positive reward. I really do not understand how this happened. The data was downloaded from your dropbox. Any help would be great appreciated.

A few questions about the real experiment of pensieve

Hi hongzi, I have trained a model and found it performs well in test traces. I tend to apply it into real experiment and implement a presentation effect like that in the beginning your video, namely two videos are being played and the bitrates are showed in lower right corner. However, I am not fully understand what I should do according to your README.md.

  1. According to the REAME.md, I need to setup a server, then the video file should be put in local? Or I can open a specific website to download the video?
  2. What are the files like 'myindex_BB.html' used for ? Each of them may just have a empty video player? Or I need this to play the video?

I would appreciate it if you can give some tips or some more detailed steps. Thanks you !

Avg_entropy in tensorboard

Hi, hongzi:
I'm confused about the actual meaning of the avg_entropy in tensorboard. I found that you just computed the entropy of the lowest bit rate choice(action_prob[0]), right? So what does it mean?
Hope to get your reply!

question about mpc and dp

hi:
I want to reproduce your experiment, now i know how to train the rl model, and i just want to plot the figure to compare the different algorithm.but when i put the trained model into the ./test/model, and run rl_no_training.py, it does works. and then i run the bb.py, it also works,but when i try to run mpc.py and dp.py, there have some error: the error information as follows:
Traceback (most recent call last):
File "dp.py", line 226, in
main()
File "dp.py", line 71, in main
all_cooked_time, all_cooked_bw = load_trace.load_trace()
ValueError: too many values to unpack

Traceback (most recent call last):
File "mpc.py", line 272, in
main()
File "mpc.py", line 92, in main
net_env.get_video_chunk(bit_rate)
ValueError: too many values to unpack.

should i did something wrong? and i want to ask where the SIM_DP's result?
what's more ,when i plot the result of sim_rl only ,(i have changed the SCHEMES), but there another error as follows:
Traceback (most recent call last):
File "plot_results.py", line 227, in
main()
File "plot_results.py", line 99, in main
time_ms -= time_ms[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

what's wrong with the code? and if i want to plot the results of all algorithm, what shuould i do.For example, if i want to plot the results of sim_rl and bb, does it work if i change the SCHEMES to SCHEMES=['sim_rl','BB']?
i am very grateful if you can give me some suggestion.looking for your reply.

Multi Video Training

Hi,
I wanted to try Multi Video training for Pensieve. I do see you have code for multi_video_sim . But couldn't see any documentation. Could you guide me how I can proceed with multi video training. Also I wanted to know your opinion on advantages of training Pensieve on multiple videos. Given we sample videos of different duration (say small <5min, medium ~ 5min to 30 min, large > 30min) would the network be able to perform well on new videos or there should be other factors considered while sampling the videos?

Difference between CHUNK_TIL_VIDEO_END_CAP and TOTAL_VIDEO_CHUNK

Dear Hongzi,

There are two variables (CHUNK_TIL_VIDEO_END_CAP and TOTAL_VIDEO_CHUNK) in pensieve which have same value (48). (for example, in 'test/mpc.py')
Could you explain what is the difference between these variables?
Since I use a different video, total number of chunks also gets different.
I actually modified these two variables to total number of chunks of my video.

Thank you.
Hyunho

If two video stream will effect each other in mahimahi env?

Hi Hongzi,

Thanks for your great work, I build up the real_exp successfully!
But I have one concern as title, if I run two video stream in one mahimahi env, do they effect each other? I mean the trace in mahimahi will limit the throughput, if the two video streams share the same link bandwidth, I think maybe they will effect each other, am I right?

Mahimahi experiment consuming too much time.

Hello, Hongzimao,

I'm trying to reproduce results(Figure 7,8,9,10) in your "Pensieve" paper and I met some problems.

According to my estimate, it will take about 13days assuming that we use one server for mahimahi experiment.

  • 320 seconds run time for each simulation, pensieve/run_exp/run_videos.py
  • 455 trace files from your dropbox/cooked_traces
  • 8 algorithms (BB, RB, FIXED, FESTIVE, BOLA, fast mpc, robust_mpc, rl(pensieve rl))

So total 3204558 sec = 13days

Can you give me advice on how you proceeded experiment in this part to reduce total consumed time.

I thought about proceeding experiment pararelly in single machine. But I'm worried about bandwidth, and have no clue on how many experiments I can hold with single machine.

Thank you

a few more questions if you don't mind :-)

Hi, Hongzi

If you don't mind, may I ask you a few more questions?

Firstly, I have two general questions:

  1. Is the data flow of critic network completely separate from the one of the actor network (I mean the actor network takes the output from the critic network, but the critic network is not affected by the actor network)? If so, would it be okay to train the critic network first, then use the matured critic network to train the actor network?

  2. Is the critic network more difficult to converge than the actor network (assuming there's a stable A function existing for the actor network)? Intuitively, it might be hard to predict what will happen in the future based on the current player buffer and the download stats of the past several segments (but I guess that's one of the biggest contributions of your paper). Is there any way to verify the quality of a trained critic model?

I also have a few questions about your source code. I might have missed some explanation in your paper, so please pardon me if some of my questions below are silly.

  1. In env.py at line 26-27,
    self.all_cooked_time = all_cooked_time
    self.all_cooked_bw = all_cooked_bw
    The input is a long array of network bandwidth at certain time interval. If this is already the data structure in the FCC or 3G/HSDPA dataset, then what role does Mahimahi plays in the training?

  2. In env.py at line 60,
    while True: # download video chunk over mahimahi
    the while loop calculates the time needed to download a video chunk, based on the network bandwidth at discrete time points. To my knowledge, the bandwidth fluctuation can be caused by multiple reasons and one thing inevitable thing is browser terminating TCP connection after a few HTTP requests. Sometimes the connection time can be long (a few hundred milliseconds or even longer) and the bandwidth drops significantly for the first segment after the reconnection. Therefore, when the calculation in the while loop at line 60 encounters a low bandwidth value in cooked_bw, could it be caused by a new TCP connection when the trace log was collected in the first place? Could this way of calculating chunk download time not reflect the actual network condition (on the other hand, I don't know what a better way can be)?

  3. In agent.py at line 127-129
    action_prob = actor.predict(np.reshape(state, (1, S_INFO, S_LEN)))
    action_cumsum = np.cumsum(action_prob)
    bit_rate = (action_cumsum > np.random.randint(1, RAND_RANGE) / float(RAND_RANGE)).argmax()
    This part I don't understand too well. What is the structure of action_prob, an array of probability of all bitrates. I don't quite follow what action_cumsum is and how bit_rate is calculated based on it with some randomness.

  4. In agent.py line 145 (TRAIN_SEQ_LEN is 100)
    if len(r_batch) >= TRAIN_SEQ_LEN or end_of_video: # do training once
    and line 173 (GRADIENT_BATCH_SIZE is 16)
    if len(actor_gradient_batch) >= GRADIENT_BATCH_SIZE:
    This is another part I don't quite understand. We only compute gradients every 100 chunks and evolve the neural networks 16 times in one go? Why do we want to do that, to save computer power or help with convergence?

Also a minor question:

  1. In ac3.py at line 83
    def train(self, inputs, acts, act_grad_weights):
    This function is not used anywhere. Is that right?

Why training/test reward value fluctuate a lot

Hello, Hongzimao

While training, in tensorboard, or /sim/results/log_test the average reward value fluctuates a lot.

Is it supposed to be like this? Below is the test result of models by every 100 epoch. ( I used 0.5 as initial entropy and set entropy decaying with 0.99998

default

Second, How did you selected the best model. Was it based on how high the test result was?

Lastly, how did you came to conclusion that the model generalize well in real world. I'm concerned that on new real word test set, the test result could decrease.

Thank you!

Version of dash.js used

Hi,
Could you indicate the exact version of dash.js you used for this work. The paper mentions v2.4, is that the exact version used. I need to add some code to limit forward buffer size, which as implemented as a feature in later versions of dash.js.

Problems with Reproducing Figure 11

Hi,
As part of a class at Stanford, my team and I are attempting to reproduce several figures in the Pensieve paper, but we have run into some issues with Figure 11. In particular, we were wondering if you had network information for the last link. In our tests, the QoE values we computed were significantly higher than the paper, even when evaluating a link from San Francisco to China. We wanted to know if you had any bandwidth and latency numbers for the networks tested in Figure 11, or if you could at least qualitatively describe the networks that your host and client were connected to.

Thanks,
Paul Crews

network is unreachable

net.ipv4.ip_forward = 1
Traceback (most recent call last):
File "run_all_traces.py", line 22, in
ip_data = json.loads(urllib.urlopen("http://ip.jsontest.com/").read())
File "/usr/lib/python2.7/urllib.py", line 87, in urlopen
return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 213, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 350, in open_http
h.endheaders(data)
File "/usr/lib/python2.7/httplib.py", line 1053, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 897, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 859, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 836, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 575, in create_connection
raise err
IOError: [Errno socket error] [Errno 101] Network is unreachable
i wonder what cause this when i run run_all_traces.py

View the results

Hello! Thank you for your software. Unfortunately I cannot get the results. I use Ubuntu 16.04, Tensorflow 1.1.0, TFLearn 0.3.1. Figure does not show any results.

pensieve/test$ python plot_results.py
log_sim_rl_norway_tram_27
log_sim_rl_norway_metro_5
log_sim_rl_norway_metro_1
log_sim_rl_norway_tram_6
log_sim_rl_norway_tram_51
log_sim_rl_norway_car_8
log_sim_rl_norway_tram_50
log_sim_rl_norway_tram_5
log_sim_rl_norway_tram_28
log_sim_rl_norway_car_3
log_sim_rl_norway_car_1
log_sim_rl_norway_bus_19
log_sim_rl_norway_bus_13
log_sim_rl_norway_ferry_5
log_sim_rl_norway_bus_11
log_sim_rl_norway_train_18
log_sim_rl_norway_bus_14
log_sim_rl_norway_tram_10
log_sim_rl_norway_tram_17
log_sim_rl_norway_bus_17
log_sim_rl_norway_tram_25
log_sim_rl_norway_tram_2
log_sim_rl_norway_train_11
log_sim_rl_norway_train_1
log_sim_rl_norway_ferry_10
log_sim_rl_norway_bus_21
log_sim_rl_norway_bus_8
log_sim_rl_norway_tram_47
log_sim_rl_norway_bus_5
log_sim_rl_norway_ferry_8
log_sim_rl_norway_tram_21
log_sim_rl_norway_tram_55
log_sim_rl_norway_train_8
log_sim_rl_norway_train_21
log_sim_rl_norway_tram_41
log_sim_rl_norway_train_10
log_sim_rl_norway_tram_53
log_sim_rl_norway_train_3
log_sim_rl_norway_ferry_17
log_sim_rl_norway_bus_18
log_sim_rl_norway_tram_48
log_sim_rl_norway_metro_10
log_sim_rl_norway_ferry_7
log_sim_rl_norway_bus_2
log_sim_rl_norway_bus_16
log_sim_rl_norway_car_5
log_sim_rl_norway_ferry_19
log_sim_rl_norway_tram_39
log_sim_rl_norway_tram_29
log_sim_rl_norway_ferry_2
log_sim_rl_norway_tram_33
log_sim_rl_norway_train_14
log_sim_rl_norway_tram_7
log_sim_rl_norway_bus_7
log_sim_rl_norway_ferry_11
log_sim_rl_norway_tram_18
log_sim_rl_norway_ferry_14
log_sim_rl_norway_metro_7
log_sim_rl_norway_ferry_1
log_sim_rl_norway_bus_1
log_sim_rl_norway_ferry_16
log_sim_rl_norway_tram_4
log_sim_rl_norway_metro_9
log_sim_rl_norway_tram_43
log_sim_rl_norway_tram_14
log_sim_rl_norway_train_2
log_sim_rl_norway_bus_4
log_sim_rl_norway_ferry_18
log_sim_rl_norway_train_7
log_sim_rl_norway_bus_20
log_sim_rl_norway_car_12
log_sim_rl_norway_tram_31
log_sim_rl_norway_tram_34
log_sim_rl_norway_ferry_3
log_sim_rl_norway_bus_10
log_sim_rl_norway_tram_8
log_sim_rl_norway_tram_19
log_sim_rl_norway_tram_46
log_sim_rl_norway_tram_49
log_sim_rl_norway_tram_15
log_sim_rl_norway_bus_12
log_sim_rl_norway_car_6
log_sim_rl_norway_tram_22
log_sim_rl_norway_ferry_9
log_sim_rl_norway_tram_26
log_sim_rl_norway_tram_9
log_sim_rl_norway_car_4
log_sim_rl_norway_tram_37
log_sim_rl_norway_tram_54
log_sim_rl_norway_car_10
log_sim_rl_norway_tram_20
log_sim_rl_norway_ferry_20
log_sim_rl_norway_train_19
log_sim_rl_norway_bus_6
log_sim_rl_norway_ferry_6
log_sim_rl_norway_train_9
log_sim_rl_norway_car_2
log_sim_rl_norway_tram_23
log_sim_rl_norway_bus_3
log_sim_rl_norway_metro_4
log_sim_rl_norway_tram_32
log_sim_rl_norway_train_6
log_sim_rl_norway_tram_12
log_sim_rl_norway_train_5
log_sim_rl_norway_ferry_13
log_sim_rl_norway_tram_24
log_sim_rl_norway_ferry_4
log_sim_rl_norway_bus_22
log_sim_rl_norway_tram_40
log_sim_rl_norway_tram_1
log_sim_rl_norway_bus_15
log_sim_rl_norway_tram_36
log_sim_rl_norway_train_16
log_sim_rl_norway_train_17
log_sim_rl_norway_tram_42
log_sim_rl_norway_tram_45
log_sim_rl_norway_train_15
log_sim_rl_norway_tram_30
log_sim_rl_norway_metro_3
log_sim_rl_norway_ferry_12
log_sim_rl_norway_tram_11
log_sim_rl_norway_bus_9
log_sim_rl_norway_tram_56
log_sim_rl_norway_tram_38
log_sim_rl_norway_tram_35
log_sim_rl_norway_bus_23
log_sim_rl_norway_train_12
log_sim_rl_norway_train_4
log_sim_rl_norway_train_13
log_sim_rl_norway_tram_44
log_sim_rl_norway_car_9
log_sim_rl_norway_car_11
log_sim_rl_norway_metro_6
log_sim_rl_norway_metro_8
log_sim_rl_norway_tram_3
log_sim_rl_norway_tram_13
log_sim_rl_norway_train_20
log_sim_rl_norway_tram_52
log_sim_rl_norway_tram_16
log_sim_rl_norway_metro_2
log_sim_rl_norway_ferry_15
log_sim_rl_norway_car_7
/home/osboxes/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2909: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/home/osboxes/.local/lib/python2.7/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)

Range of bandwidth the linear model is trained on?

Hey,

I noticed that the bandwidth traces you guys provided. The average BW across those traces is mostly less than 3000Kbps. Is this the same range of bandwidth the linear QoE model is trained for or has it also been trained for higher bandwidths like between 7000Kbps-9000Kbps?

I ask this because I may need to evaluate Pensieve on slightly higher bandwidths and would like to make sure if I need to retrain for it or not.

Thanks.

Question about pensieve

Dear Hongzi,

I have couples of questions about Pensieve.

[About reward]

  • Pensieve tests on 3 types of QoE, but each has different parameters for weighting re-buffering penalty. Why did you set it different and how did you pick the actual values (4.3 - linear, 2.6 - log, 8 - hd)?

  • Pensieve uses bitrate to hd reward mapping to calculate QoE (hd) (ex. 0.3 to 1, 0.75 to 2, ...). How did you pick the actual values (e.g. parameters for each bitrate)?

[Minor]

  • In plot_results.py, the evaluation excludes the first chunk as follows, reward_all[scheme].append(np.sum(raw_reward_all[scheme][l][1:VIDEO_LEN])).
    I agree it is minor issue, I want to know any reasons about this.

  • The default bitrate is set to 1, not as the lowest quality 0. I think the latter is somewhat common, but why pensieve chooses 1 as initial quality? It causes fair amount of re-buffering in many traces you provided in the dropbox.

[Typo]

  • There is a mistake in calculating log scale reward. The previous one uses highest bitrate resulting in negative value.
    original: # log_bit_rate = np.log(VIDEO_BIT_RATE[bit_rate] / float(VIDEO_BIT_RATE[**-1**]))
    modified: # log_last_bit_rate = np.log(VIDEO_BIT_RATE[last_bit_rate] / float(VIDEO_BIT_RATE[**0**]))

Thanks.
Hyunho

/pensieve/sim/rl_test.py Randomness in picking ACTION

Dear Hongzimao,

Hello, I succeeded working out with reproducing, training testing. But while training I found out that the fluctuation in test results is large (I put real world data in /sim/cooked_test_traces and kept looking at /sim/results/log_test).
And while looking in to the code, there are some part I don't clearly understand.

action_prob= actor.predict(np.reshape(state, (1, S_INFO, S_LEN))) action_cumsum = np.cumsum(action_prob) bit_rate = (action_cumsum > np.random.randint(1, RAND_RANGE) / float(RAND_RANGE)).argmax() # Note: we need to discretize the probability into 1/RAND_RANGE steps, # because there is an intrinsic discrepancy in passing single state and batch states

In this part, which is part of rl_test.py and multi_agent.py, as I understood the code gives probability of selecting actions with small probability. But why don't we just pick the action with the best probability.

Can you give a little help on this part.

By the way I had so much fun working with your paper.

Thank you

About last_index MPC.py

Hi, Hongzi

Thank you for your great work.

I have a question about implementation in test/mpc.py

Shouldn't last_index in line 177 be the index of last chunk?
last_index = int(CHUNK_TIL_VIDEO_END_CAP - video_chunk_remain)

If so, shouldn't last_index should start with 0?

last_index is used while calculating the max QoE using the bandwidth prediction. But it seems chunk sizes used in combo are one index moved back(last_index = current downloaded video index). So I think it should start with 0 until 47.But in the code it starts from 1.

This gives bitrate difference in the middle of video streaming a bit and usually matters with the last chunk, which the original code in github tries a higher bitrate at end, but if I modify the code it doesn't try a higher bitrate and results in lower rebuffer. Actually tring a higher bitrate doesn’t give benefit because of smoothness. Average reward varies by 2.6 (40.06 to 42.67 in simulation)

Line 177-180 to below

last_index = int(CHUNK_TIL_VIDEO_END_CAP - video_chunk_remain)-1
future_chunk_length = MPC_FUTURE_CHUNK_COUNT
if (CHUNK_TIL_VIDEO_END_CAP-1 - last_index < 5):
            future_chunk_length = CHUNK_TIL_VIDEO_END_CAP - last_index-1
  • Also, in rl_server/mpc_server.py

In line 191-192

if ( TOTAL_VIDEO_CHUNKS - last_index < 5 ):
                    future_chunk_length = TOTAL_VIDEO_CHUNKS - last_index

shoudln't it be

if ( TOTAL_VIDEO_CHUNKS - last_index-1 < 5 ):
                    future_chunk_length = TOTAL_VIDEO_CHUNKS - last_index-1
  • tested in realworld, 142 dropbox test_set, and found 1.3 QoE average increasing when fixed, compared original robust_mpc_server.py code and fixed code.

Thank you!

How did you generate dash.all.min.js in video_server?

I used grunt dist in the dash.js folder to generate a new dash.all.min.js, and found that the resulting dash.all.min.js differed from that contained in the video_server folder. I am attempting to run Pensieve over some video files which require slight changes to the dash.js code, but I cannot "recompile" the dash.js code in video_server because I cannot find what dash.js code was used to to generate the file actually used in the example testing.

Thanks in advance for any advice!

Pensieve and real experiments. How to use my own video sequences.

Hi!

I have integrated Pensieve ABR algorithm (pensieve/real_exp/) in our AdViSE: Adaptive Video Streaming Evaluation Framework for the Automated Testing of Media Players (https://dl.acm.org/citation.cfm?id=3083221).
Now I use your pretrained model with linear QoE (NN_MODEL = '../rl_server/results/pretrain_linear_reward.ckpt'). And I get good results. Do you have any information about this model, how did you create this model and for what network conditions it was designed?

How can I conduct tests with another dash content? What should I change in a rl_server_no_training.py? I want to conduct some experiment with Bick Buck Bunny sequense including some representaion.

Thank you!

Changing ENTROPY WEIGHT with training

Hi,

We are having a little bit of trouble integrating the suggestion for adapting the ENTROPY_WEIGHT parameter as the training proceeds. Is it possible to describe, a little descriptively, what implementation changes are needed for it.

Specifically, once the training starts, how should we pass ENTROPY_WEIGHT as a parameter to the a3c.py module? As of now it seems that ENTROPY_WEIGHT is only used once when the ActorNetwork is initialized.

Further, as provided, the value of ENTROPY_WEIGHT in a3c.py is set to 0.5. Is this a static value which results in a reasonable model?

Thanks.

Some questions about run run-exp/run_all_traces.

Of course, I can run ./testing/rl_no_training.py and plot a lot of pictures(also has some problem at other algrithms).
but when executing ./run_exp/run_all_traces.py according to readme,

It stays in this line:
run_all_traces.py (67): proc_RL = subprocess.Popen (command_RL, stdout = subprocess.PIPE, shell = True)

It seems that this subprocess have some error,so I check called run_traces.py ,and I just run BB algorithm using a new py file.The error is that,
it would stay in
py(29): proc_BB = subprocess.Popen(command_BB, stdout=subprocess.PIPE, shell=True)#create a new subporcess,

plot_results.py outputs nothing?

Hi,

I have a problem with creating charts. I followed your guide and generated test data in test/results via rl_no_training.py, but when I run plot_results.py, the charts turn out to be empty.

I tried inspecting the plot_results.py code, and I found that in the block starting on line 121, schemes_check is never true and thus reward_all never gets any data.

Is there any workaround to fixing that?

Thanks!!

About cooked_traces and mahimahi test

Dear Hongzi,

I have several questions about making traces.
How did you generate the trace from raw data to cooked traces especially norway?

I generated mahimahi traces from raw hspda logs by using the code that you provided(convert_mahimahi_format.py, cut_mahimahi_chunks.py).
When we used these traces for mahimahi test, 3 out of 10 traces(randomly selected) failed, i.e no video chunk was downloaded at all or didn't download couple of video chunks at the end because runtime ran out.

When we use test_norway_ * traces from cooked_traces in dropbox that you provide, 1 out of 10(randomly selected) trace failed.

It seems like there can be really bad trace included in mahimahi traces and I tried selecting traces that has good average bandwith but this approach still has limitations.
For example, the trace could be really bad at the beginning but could get better at the end and the whole average bandwidth would be big enough.
The failed mahimahi trace would look like this:
0
104
209
314
418
523
691
......

How did you manage to select testable mahimahi traces?
and did you also had some cases when some of the mahimahi test failed for specific trace?

Pretrain model policy

I have trained the model similar to #30 by decaying the entropy from 1 to 0.1 over 150k iterations and gotten a model with a slightly inferior overall reward. On further inspection, it seemed like the only difference in the two policies was that pensieve nearly always avoided video 2 (2850kbps) and video 4 (1200kbps).
Any intuition towards why this happens in the pretrained model ?

Also in #30 , you said to choose the best performing model from the validation set results (using mean reward as the metric). Now what if the highest reward among the first 20k iterations occur at 2k iterations? Do you pick that model itself ? Any insights would be appreciated

env.py delay noise implementation

Hi,

Checking env.py I realised you added some noise to the total delay as follows:

 delay *= MILLISECONDS_IN_SECOND
 delay += LINK_RTT
 delay *= np.random.uniform(NOISE_LOW, NOISE_HIGH)

I would like to know why you added noise to the total delay.

I understand the noise for the propagation delay (LINK_RTT) in order to avoid overfitting to 80ms, as it is constant for the whole training. However, the transmission delay always depends on the link's throughput, which is gathered from the traces and changes for each second. The throughput values are actually real values achieved by apps in the wild and in principle adding noise wouldn't make sense to me.

Thank you so much!

how to run the script on the gpu?

Hi:
There are four gpu card in machine.I wanted to run the script on the gpu , then modified os.environ['CUDA_VISIBLE_DEVICES']='0' in sim/multi_agent.py and rl_server/rl_server_no_training.py.But the sim/multi_agent.py and run_exp/run_all_traces.py still ran at cpu,did I modified scripts are wrong? how should I do?
Thanks for your help.

Ensuring Latency

Dear Hongzi Mao,

While looking at the network log using Chrome, I noticed the following:

i. Video playback client was contacting the video server using the server's IP. This means, using run_exp/run_traces.py will allow packets to go from inside the Mahimahi shell to outside and back. This will ensure the 80 ms latency that was set.

ii. Video playback client (i.e. dash.all.min.js) was contacting the ABR server using "localhost:8333". But contacting localhost bypasses Mahimahi's emulation. Hence, the delay will only be due to computation. I tested this by pinging an Apache 2 server running on my own PC:
a. sudo sysctl net.ipv4.ip_forward=1
b. mm-delay 2000
c. ping -c 5 -n localhost
d. ping -c 5 -n 10.0.2.15

Ping on localhost took avg RTT of a few ms while that from 10.0.2.15 (my VM's IP address) took 4s plus a few ms.

Table 4 of the paper suggests a range of latencies was tested. Was that achieved by using a different version of dash.all.min.js than that in this repository?

readable dash.all.min.js

Hi,

I was wondering if you could share a readable version of //pensieve/video_server/dash.all.min.js so I could make some modifications to the Pensieve player.

Thanks!

Some question with the code

Dear sir:
I am a student from Communication University of China , i have read your paper about pensieve。 Your research is very charming, i have some question about the code :
1.traditional method about ABR based on client,just pick the video bitrate within dash.all.min.js,it should not to connect to a local server to calculate the quality of next chunk.In your approach, even the traditional client-based ABR algorithm(BB,BOLA) also enabled the local server, but in fact the calculation process or in dash.all.min.js inside, why do so?

2.You provide the code inside video_server / dash.all.min.js is what generated, I directly use grunt to re-generate your dash.js and you provide the code is not the same.Why is this?

3.mahimahi is used to limit the speed of the local network tool, is it? But only in the use of non-root role, right? When running run_exp / run_all_traces.py, run to child process mm-delay 40 mm-link 12mbps trace python simple_server.py, google chrome prompt link localhost: 8333 refused, if the direct operation python simple_server.py, there is no error The Why is this?

QoE performance

Dear Hongzi:

I did many experiments based on the pensieve's source code, but I cannot get the equivalent performance as reported in the sigcomm paper (12%-25% outperforms Robust-MPC).

Below is the result:

At first, I used the pre-trained model (i.e. pretrain_linear_reward.ckpt) provided in the source code to do the test with two sets of trace data (i.e. train_sim_traces and test_sim_traces) and the ENTROPY_WEIGHT=0.5:

author model test 1
Fig .1
author model test 2
Fig. 2

We can see that pensieve outperfomed Robust-MPC about 6~7%.

Second. I do the training by myself. I fixed the bug mentioned in #20 and followed the ENTROPY_WEIGHT tuning strategy in #11 . I also selected the model based on a validation set (parts of trace data provided in the source code) to avoid the fluctuation issue in #28 . The QoE function is the linear one:

0 1
Fig.3

We can see that the final testing performance is similar with that in #11 , but much worse than the performance in the sigcomm paper.

Did I do something wrong or anything important I didn't do, so that I couldn't get the same result as you described in the paper?
Can you pls give me a hand to solve these questions, any answer is highly appreciated.

Thanks a lot

pensieve/run_exp/ : page reloaded while run_video.py

Hi, hongzi
While performing tests in the run_exp I met some errors.
First I setted up environment as (Ubuntu 16.04, Tensorflow v1.1.0, TFLearn v0.3.1 and Selenium v2.39.0)
I found run_traces.py alone, working well. Checked the apache2 log file.
But run_all_traces.py failed to work. (seems nothing is working)
Also there is issue in running run_traces.py. As it waits for 320 seconds in the test, after downloading all the video, it seems to reload the page and download the video again. (checked apache2 log for video chunk requests (total 84 requests), also in ./results/log_trace there are 84 lines of log, 49 created while downloading all the video, and 35 created while downloading again until it is killed) Can you give a little help on solving this?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.