hongzimao / deeprm Goto Github PK

View Code? Open in Web Editor NEW

284.0 284.0 152.0 23 KB

Resource Management with Deep Reinforcement Learning (HotNets '16)

License: MIT License

Python 100.00%

deeprm's People

Contributors

Stargazers

Watchers

Forkers

liuyq17 harendranathvegi9 zerocurve brightfeather mufchen taipei123 janonwang yl-casey-song sauravn wxdublin carlosnatalino liyanonline peterzhousz zhedasuiyuan sulasen berminecn eqqlyz kkjslee yesterdaydx buptcbc allen-shao zhf459 ziyueluocs w-ding zc300 imu96 chq7920 sherrysnowrocks giangnvbk1989 minhqnguyen ringwraith yangrenyu aascode dongchirua tbsilence 170928 yafimk murilovfm peterzs litpuvn mengzili gaorunzebit askopte shubhampachori12110095 wh-forker dir-lab orzrunningsnail pedro-batista-er jeme-yufeng-zhan kumsh ll550 huaizhengzhang amirhossein-esmaili flipwebapps panlichen blankhe pelikhovp arqam123 brvyss kudhru jaedukseo spacevstab yuting-li ohjcy imadtyx canyuchen heibonnahana doandongnguyen lrjxaint iris0102 machinecf chunrzou majadoon litchi-zdd ghizall4 hintonthu vivienzou1 proomod binzhou-com whoismanoj tonellotto asmaswapna jcarreira wangchen615 northstar twodoge kkc-krish vibek-d suda-amir jrdeco560 sambd86 gccxeon borodark exp-optimization-tools baichuns ssy66652 gitqifan yumingsheng erzhu419 jcassiojr

deeprm's Issues

License for the repository

This is an excellent work and I would like to carry out some experiments (non-commercial of course) using this repository. Could you please provide the license for the same.

Thanks

What is max_track_since_new?

Mr. Mao,
Could you explain more what exactly the below parameter do in the parameters class? (I appreciate you if you give me an example.)

deeprm/parameters.py

Line 31 in b42eff0

self.max_track_since_new = 10 # track how many time steps since last new jobs

In the paper, there is a discussion of the synthetic database. If I want to use a different database that reflects my workload, I think that I need to create my own copy of generate_sequence_work. Is this the correct place to make the change?

Slowdown for cluster load 10% to 190%

Dear Hongzi,

I am trying to reproduce all the results that you reported on paper. From the source code, it is unclear to plot the slowdown from 10% to 190% cluster load. When I run the run_script.py, I am able to see the generated logs and nothing corresponds to the Figure 4.

Can you please give a detailed explanation on how you are plotting the slowdown for cluster load from 10% to 190%. Again from the source code, It is clear that you are relying on the job rate from 0.1 to 1.0 to vary the load from 10% to 190%, but when I tried to rely on just the job rate from 0.1 to 1.0 , and varied the cluster load from 10 to 190%, the slowdown after 100% was constant till 190%.

It would be great, if you also say a few words about how the load is varied from 10% to 190%.
and can you please tell me how to reproduce this figure 4 or how I can move forward with the logs generated to achieve Figure 4?.

Thank You.

Modifications on Reward Signal

Following the recommendation to post an e-mail conversation (adapted) on the issues page so others can also learn from this and discuss. Regarding studies of how job's runtime unaccuracies could affect the RL and the overall scheduler performance:

I'm very interested in investigating the ability of the reinforcement learning agent in performing well when considering different job models, in particular, at first, jobs with uncertain length, i.e., experimenting on some of the partial observability discussion in the paper's section 5. Because it would be needed a set of conditional observation probabilities to cast the problem as a POMDP, I thought of a preliminary methodology that would involve, at first, randomly choosing the reward as either the original one, which uses the original job length, or a modified one, which would use the original job length + 1 in its calculations. With this I'm planning to test the RL robustness to some uncertainty in the job length.
I saw you were very solicitous to answer some questions in the GitHub code's issues section (reading your answers helped me a lot through the comprehension of the code) and decided to write this e-mail to ask if you could, if you have the time, reply with any immediate methodological flaws you can see in my approach, I really appreciate any thoughts you can provide.
Sincerely,
Vinícius [...]

Hi Vinicius,

I see what are trying to do. The high level goal for training a robust agent makes a pretty decent sense. I wonder if consistently +1 in the reward will create enough disturbance. You might want to perturb the reward signal with a noise sampled from some distribution (which can have some bias, as in your +1 case). You can vary the distribution and see how it affect the system.

Would be nice if you can post on the GitHub issue page so that others can also learn from it.

Thanks,
Hongzi

Since then I've had some very interesting results creating disturbance in reward using normal distributions, i.e, changing the reward like , but my intention is to also check for uniform and halfnormal distributions, since it's known that users runtime estimates are almost always overestimated, although some very interesting concerns and issues are appearing:

Depending on the distribution and it's parameters, the performance could be heavily influenced by the workload model, e.g. N(1,1²) is an uncertainty higher in terms of percentage than N(15,1²); (Some carefully picked methods to introduce estimation errors can be found in II.A of this paper)
The original workload has a job distribution in which a lot of jobs has duration 1 (80% of
the jobs have duration uniformly chosen between 1t and 3t), carelessly setting up the disturbances could cause the estimate runtime to be a negative number.

It takes about 5 minutes to finish one epoch...

Hi, I have read your paper and want to have a try. but when I run the demo's command, the training process seems to be very slow.
The demo's command:

# python launcher.py --exp_type=pg_su --simu_len=50 --num_ex=1000 --ofile=data/pg_su --out_freq=10

And here is the output log:

Epoch 1 of 10000 took 297.221s
  training loss:    		0.838964
  training accuracy:		78.30 %
  test loss:        		0.802981
  test accuracy:    		79.96 %
...

So if each epoch costs 5 minutes, the whole 10000 epoches will cost about one month. Is this normal?
And When I ran the command, I found the process only cost my server machine one physical core, this seemed to be the key of the slow training process problem.
Any suggestion to make the training process faster? Or is there anything wrong with my understanding about the training process?
Thanks in advance!

FileNotFoundError: [Errno 2] No such file or directory: 'data/pg_su_net_file_0.pkl'

File "C:\Users\76047\Desktop\学习材料\cloud computing\p3code\deeprm\pg_su.py", line 157, in launch
net_file = open(pa.output_filename + 'net_file' + str(epoch) + '.pkl', 'wb')
FileNotFoundError: [Errno 2] No such file or directory: 'data/pg_su_net_file_0.pkl'

Multiple Questions on Paper/Development

Dear Hongzi,

Following you suggestion, I have opened this issue to share my questions and your answers exchanged via email. Further to the information provided below, I just wanted to clarify whether running the supervised part of the experiment is optional. The way I see it, it is probably better to provide the agent with some kind of heuristic policy to use in order to "kick off" its learning process. Following this assumption, I can see that you have used a specific pkl file generated during the supervised process to feed into the algorithm in the reinforcement learning process. How did you select that? Did you compare the accurracy and error of the training vs. testing sets and used the one with the min difference? Similar, did you choose a specific pkl file generated during the reinforcement learning process, i.e. equal to 1600, since after 1000 iterations the algorithm has already converged? Last, I just wanted to clarify whether using a larger working space is expected to increase the complexity of the algorithm and whether once the backlog size has been reached, any further incoming jobs are simply rejected.

I apologize for all the questions ;) I do hope they help further too. Thank you so much!

Kind regards,
A.

Q1: What is the difference between the 1st and the 2nd type of training, i.e. --exp_type=pg_su vs. pg_re? As far as I understand, the 1st one is used to create a sample of experiments num_ex, each of which consists of a number of jobs that have arrived within a given timeframe, also called episode_max_length, performed using the JSF algorithm. The results are then fed into the DeepRM algorithm to re-adjust the weights of the network/parameters. The 2nd one is used to allow the RL algorithm to be trained using the weights of the DNN learned above and the penalties defined.

Answer: You are basically correct. The first type is supervised learning, where we generate the state-action pair from existing heuristics (e.g., SJF) and ask the agent to mimic the policy. The second type is RL training—the agent will explore and see which policy is better and automatically adjust its policy parameter to get larger rewards.

Q2: How did you decide what type of network to use? You have used a DNN with one dense, hidden layer of 20 neurons. Were there any particular reasons for these choices? Have you tried different variations of them?

Answer: We did some parameter search but not too much. As long as the model is rich enough to express strong scheduling policies (e.g., can learn existing heuristics with supervised learning), we will use the network model for RL.

Q3: Was there any problem with overfitting the data and if so, would you have any further suggestions on this issue?

Answer: If the system dynamics change dramatically, there will be overfitting. In our paper, we evaluate on different job combinations but those jobs were generated from the same distribution. You might need to adapt (from a meta-learned policy) or learn a family of robust policies if you need the policy to work well with distribution shift.

Q4: I would expect the input neurons to be equal to: (res_slot + max_job_slot * num_nw ). * num_res. However, you also take into account the backlog_width and a bias. Could you please justify why have you made that decision and what its purpose is? Also, what the backlog_width represents? I can understand that the backlog is used to store jobs that arrive for service, yet they can not fit in the current working space, but I can not understand whether this is just a number, why it is important to include as an input to the DNN, and why not store the extra jobs in eg a file for later usage.

Answer: The backlogged jobs are just represented as a number. DNN needs to know a rough number of know the current system load. We only provide a number for the neural network to handle. The job information is kept in the environment (it’s just that the agent doesn’t see it).

loss function (In Policy Gradient section), optimizer and entropy

Dear Mr.hongzi
I was interested in your resource scheduling method. Now, I stuck in your network class. I can't understand why you used the blow function:
loss = T.log(prob_act[T.arange(N), actions]).dot(values) / N
Did you calculate the special loss function? If you didn't, what's the name of this loss function?

Should the variable "seq_idx" in the Env class be initialized to 0?

I noticed that when "seq_idx" is initialized to 0, the first job in the job sequence seems to be skipped.
Should it be initialized to -1?

Regarding Graph for Accuracy and loss

Hey there !

Can you please tell me how to create a graph for the accuracy and loss.

Thanks in advance.

Can you provide me the source of the data set that you run the training algorithom upon ?

Discrepancy Between Graphs

Hey there,

I tried to regenerate the graphs depicted in the article, Tetris is showing significantly lower performance than what is expected based on these graphs.
Another issue is that Packer is absent from all generated graphs ( why? ).
DeepRM's (or PG's in this case) average job slowdown is asymptotically around 2 in the article graphs while in mine is around 4.
All trainings and tests were done using the default commands described in the README.md file.
Is anyone here who could reproduce the exact results described in the article? I'd be thankful if you could help me with this issue.

pg_re_lr_curve.pdf

Regards

Issue running examples

Hi,

I'm trying to rerun your experiments in order to recreate the graphs. However, I'm getting errors when running the github.

I'm running the command presented there:
python launcher.py --exp_type=pg_re --pg_re=data/pg_su_net_file_20.pkl --simu_len=50 --num_ex=10 --ofile=data/pg_re

But I'm getting an error:
Traceback (most recent call last):
File "launcher.py", line 10, in
import pg_su
File "/home/arik/deeprm/pg_su.py", line 12, in
np.set_printoptions(threshold='nan')
File "/home/arik/.local/lib/python2.7/site-packages/numpy/core/arrayprint.py", line 246, in set_printoptions
floatmode, legacy)
File "/home/arik/.local/lib/python2.7/site-packages/numpy/core/arrayprint.py", line 93, in _make_options_dict
raise ValueError("threshold must be numeric and non-NAN, try "
ValueError: threshold must be numeric and non-NAN, try sys.maxsize for untruncated representation

The versions:

pip list
asn1crypto (0.24.0)
backports.functools-lru-cache (1.4)
cryptography (2.1.4)
cycler (0.10.0)
decorator (4.1.2)
enum34 (1.1.6)
idna (2.6)
ipaddress (1.0.17)
keyring (10.6.0)
keyrings.alt (3.0)
Lasagne (0.2.dev1)
matplotlib (2.1.1)
nose (1.3.7)
numpy (1.16.2)
olefile (0.45.1)
Pillow (5.1.0)
pip (9.0.1)
pycrypto (2.6.1)
pygobject (3.26.1)
pyparsing (2.2.0)
python-dateutil (2.6.1)
pytz (2018.3)
pyxdg (0.25)
scipy (1.2.1)
SecretStorage (2.3.1)
setuptools (39.0.1)
six (1.12.0)
subprocess32 (3.2.7)
Theano (1.0.4)
wheel (0.30.0)

Do you know how to fix it?

Thanks.

RL agent results

Hello sir
I converted your code, but now I need to be sure my conversion is true. Could you insert some picture of your results(Monte Carlo) which you print in your code (iteration, numtrajs, numtimesteps, loss,...)

Issue running example

Hi,
I'm running the Example for the very first time.
I'm running the command presented there:

python launcher.py --exp_type=pg_re --pg_re=data/pg_su_net_file_20.pkl --simu_len=50 --num_ex=10 --ofile=data/pg_re

But I'm getting an error:
Traceback (most recent call last):
File "launcher.py", line 163, in
main()
File "launcher.py", line 149, in main
pg_re.launch(pa, pg_resume, render, repre='image', end='all_done')
File "/home/k8s-master/Desktop/Deeprm/deeprm-master/pg_re.py", line 315, in launch
for r in manager_result:
File "", line 2, in getitem
File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
self._connect()
File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 169, in Client
c = SocketClient(address)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 308, in SocketClient
s.connect(address)
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 111] Connection refused

Do you konw where I went wrong,and how to fix it?
Thanks.

Some unprofessional questions in my mind

Hello, I am a college student and I have seen your paper also run your code. At present I have some questions, may not be professional, but hope to get your answers.

What is the difference between "launch supervised learning for policy estimation" and "launch policy gradient using network parameter just obtained" ? Command generated by something like pg_su_net_file_0.pkl and pg_re_10.pkl is saved something? What is pg_su and pg_re?
What are the parameters of simu_len and num_ex in the instruction?
When I run the first instruction "python launcher.py --exp_type = pg_su --simu_len = 50 --num_ex = 1000 --ofile = data / pg_su --out_freq = 10". My computer have 8G memory, but it can only run 8 epochs, then the computer will report the lack of memory. I find you in another question and answer said change -- num_ex = 10, I tried .It also can only run 640 epochs, the gap between 640 and 10000 is too large, What did I do wrong? Will my results be different with your paper results when I use --num_ex = 10?
What is the difference and contact between each epoch in code and iteration, jobset, episode in paper. I'm confused with these concepts.
In your paper writes “a fully connected hidden layer with 20 neurons, and a total of 89451 parameters”. Are you mean your neural network only has a hidden layer, and this hidden layer has a total of 20 neurons? Why there are 89451 parameters?

Thanks for your answer. :)

由于我猜测您是**人，所以我也用中文问一遍，我的英语可能表达的不好。
你好，我是一名大学生，我看了您的论文也运行了您的代码。目前我有一些疑问，可能不太专业，但希望得到您的解答。

你在example中有三个指令，其中"launch supervised learning for policy estimation"与"launch policy gradient using network parameter just obtained"的区别是什么？命令生成的类似pg_su_net_file_0.pkl与pg_re_10.pkl是保存的什么东西？pg_su与pg_re是什么的缩写？
指令中的simu_len 与num_ex分别对应着论文中的什么参数？
第一条指令python launcher.py --exp_type=pg_su --simu_len=50 --num_ex=1000 --ofile=data/pg_su --out_freq=10 。我的电脑8G内存，只能运行8个epoch,接着电脑就会报内存不足。我看您在另外一篇问答中说道换为--num_ex=10，我试了下可以跑640个epoch,但是与10000个差距太大，是我什么地方做错了吗？使用--num_ex=10会导致我的结果和您的论文结果有什么不同？
每个epoch 与论文中的iteration，jobset，episode有什么区别与联系，这几个我混淆了。
论文中提到的a fully connected hidden layer with 20 neurons, and a total of 89; 451 parameters.这是说明您的神经网络只有一层隐藏层，这一层上共有20个神经元的意思吗？为什么会有89451个参数呢？

感谢您的回答。:)

question

hello author
what is the time_horizon mean? actually, I don't know the graph means.
another question is that how to define the time? and you using the current_time to replace the actual time, how's the time moving(or increasing)?
thank you for opening source, this code is the first that I can find for resource management using DRL,thank you very mach
I am a student of BeijingJiaotong university, my email is [email protected], we can talk with email,looking forward your reply.

Multiple machines

Hi,
I'd like to test some scheduling algorithms on multiple machines. How to pick number of machines ranging from 1 to 5 in the code?
Thanks,

hongzimao / deeprm Goto Github PK

deeprm's People

Contributors

Stargazers

Watchers

Forkers

deeprm's Issues

Kind regards, A.

Recommend Projects

Recommend Topics

Recommend Org

Kind regards,
A.