Create cloud simulators in Scala for evaluating executions of applications in cloud datacenters with different characteristics and deployment models.
The main goal of this homework is to create cloud simulations with multiple datacenters and different configurations, and analyze the simulations.
Tools: IntelliJ IDEA 2021.2.1(Community Edition), jdk 11.0.12, Scala 2.13.6, sbt 1.5.2
- Open terminal
- Type the following command
git clone https://github.com/gnzeleven/CloudOrgSimulatorScala.git
- Open IntelliJ IDEA
- File -> Open Folder -> CloudOrgSimulatorScala
- Navigate to src/main/scala/Simulations/. Open the simulation(.scala) file and run it from the IDE
- Open terminal
- Type the following command
git clone https://github.com/gnzeleven/CloudOrgSimulatorScala.git
cd CloudOrgSimulatorScala
- Run the following commands
sbt clean
sbt compile
sbt run
- To run test
sbt test
There are two main sections: Main and Test
Main has the following modules:
- HelperUtils
- CreateLogger.scala - Used for Logging
- ObtainConfigReference.scala - Used for getting configuration parameters
- Utils.scala - Utility functions
- Simulations
- BasicCloudSimPlusExample.scala - Already existing file in the repository(Naive datacenter implementation)
- BasicFirstExample.scala - Created as a rough draft to learn scala and play around with Cloudsimplus
- Simulation1.scala - Simulate multiple Simple Datacenters vs Datacenters with Network Topology
- Simulation2.scala - Simulate various services (IaaS, PaaS, SaaS) with multiple datacenters
- Simulation3.scala- Simulate Cloudlet Scheduler Policies (Time Shared vs Space Shared) with multiple datacenters
- Simulation.scala - Class with the main function. This is where the flow starts. Calls various simulations' start method based on what we assign to the variable SIM_TITLE
Test has following classes
- BasicCloudSimPlusExampleTestSuite.scala - Example test class
- SimulationsTestSuite.scala - Contains various test methods to test various functionalities in the simulations such as recursive vm creations, number of datacenters, number of hosts, number of vms, configuration parameters, number of PEs with a host, length of cloudlets, is the datacenter created with network topology and so on
Inorder to refrain from using hard coded values in the code, the configuration parameters and provided in src/main/resources/application.conf file and are read in the scala file ObtainConfigReference helper object. This makes the code reliable and improves the readability of the code. It is generally considered a good practice to read parameters from a .config file. Below are some of the parameters used in the simulations:
Simulation1
- Datacenters: 3
- Hosts in the first datacenter: 1 | Processing Entities: 4 | MIPS: 5000
- Hosts in the second datacenter: 2 | Processing Entities: 4 | MIPS: 5000
- Hosts in the third datacenter: 1 | Processing Entities: 4 | MIPS: 5000
- VM: 8 | Processing Entities: 1
- Cloudlets: 8 | Utilization Ratio: 0.5 | Length: 10000 | Processing Entitites: 4
Simulation2
- Datacenters: 3
- Hosts in the first datacenter: 2 | Processing Entities: 6 | MIPS: 100000
- Hosts in the second datacenter: 2 | Processing Entities: 4 | MIPS: 12000
- Hosts in the third datacenter: 2| Processing Entities: 8 | MIPS: 80000
- VM: 8 | Processing Entities: 4 | MIPS: 4000
- Cloudlets: [4, 8, 12] | Utilization Ratio: 0.5 | Length: [10000, 50000, 100000] | Processing Entitites: [2, 3, 4]
Simulation3
- Datacenters: 3
- Hosts in the first datacenter: 1 | Processing Entities: 6 | MIPS: 5000
- Hosts in the second datacenter: 2 | Processing Entities: 4 | MIPS: 3000
- Hosts in the third datacenter: 1 | Processing Entities: 8 | MIPS: 10000
- VM: 6 | Processing Entities: 2 | MIPS: 2500
- Cloudlets: 10 | Utilization Ratio: 0.5 | Length: 100000 | Processing Entitites: 2
Just like the configuration settings, pricing settings are also written in the configuration file. The price varies for each simulation and within that the price varies for each datacenter. Note that the cost per second varies based on the execution time, while RAM cost, Storage cost and Bandwidth cost remain constant.
The start method in Simulation1 expects a parameter isNetwork which is set to either true or false depending on whether we need network or not. Notice that this simulation uses recursion unlike the other simulations which
If isNetwork is true, the datacenters created will have network topology attached to it. Below is the piece of code that setup topology:
def setUpNetworkTopology(dc: DatacenterSimple, broker0: DatacenterBrokerSimple): Unit = {
val bw = config.getDouble("simulation1.network.bw")
val latency = config.getDouble("simulation1.network.latency")
val networkTopology: NetworkTopology = new BriteNetworkTopology()
simulation.setNetworkTopology(networkTopology)
networkTopology.addLink(dc, broker0, bw, latency)
}
The comparison is explained in the analysis section
The second simulation simulates the three major services IaaS/PaaS/SaaS offered by Cloud Service Providers.
credit: https://develux.com/blog/iaas-paas-saas-cloud-model-for-business
In IaaS, the customer choosing os, vm and architecture and the cloudlets configurations is simulated. This is done by a random function that will pick any of the available options, which simulates customer choosing the configuration.
In PaaS, the customer configuring the cloudlet related parameters is simulated. Similar to IaaS implementation, a random function that will pick any of the available options.
In SaaS, the customer has absolutely no control over of the parameters(think google sheets). This is simulated by not giving any choice in the configuration.
Limitation: The way this is coded has two major setbacks.
- The toggling of the service can be done only by changing service parameter in the configuration file
- All the datacenters will either be IaaS or PaaS or SaaS. The mixing up of services is not done in this project.
The third simulation is to simulate what happens when the cloudlets requires more resources than what is available in a given point and how the cloudlet scheduling policies - time shared and space shared react to such influx of cloudlets.
CloudletSchedulerSpaceShared() - If there are more cloudlets running at any point than available PEs, the cloudlets that arrive in the last wait until enough resources are free before getting executed. There are 6 available VMs and a total of 10 cloudlets try to execute. The first 6 cloudlets run parallel in each of the VMs and the remaining 4 cloudlets wait for resources to free up. At T = 51, they get the resource they need and they get executed.
CloudletSchedulerTimeShared() - Here, a fraction of available PEs are shared among executing cloudlets, hence all the cloudlets run in parallel fashion. In contrast, when the cloudlet scheduler policy is time shared, all the 10 cloudlets execute simultaneously and the share CPU resources available across the VMs.
Comparing both the policies, there is an insignificant difference in the execution time and hence, the cost. It just depends on the user and the application that run to decide which policy to use. One thing to be considered when choosing the policy for cloudlet scheduler is the size of the application(length of cloudlet and the number of cloudlets in this simulation example).
When the number of cloudlets is increased(from 10 to 14), and there isn't available resource for some of the cloudlets(13 and 14 in this example), time sharing cloudlet policy seems to go to, as wise Buzz Lightyear would say, whereas space sharing policy seems to be just right. Imagine spending tens of millions of dollars to run your cloud application because you used Time Sharing policy versus paying a mere $121.77 in Space Sharing.
A Cloudlet's UtilizationModel allows to increase/decrease the utilization of the resources along the simulation time. The Utilization Ratio is the parameter that controls the usage of resources by the cloudlets during its execution time. Below is the graph depicting how cost increases or decreases as we increase the Utilization Ratio from 0.0 to 1.0
It can be clearly seen from the graph that it doesn't make any sense to limit CPU usage by the cloudlets when space shared cloudlet scheduler is used. There is no reason to withhold CPU resources from the cloudlets as cloudlet compete for PEs in this case. In comparison, when time shared cloudlet scheduler is used, all the cloudlets run simulaneous and they all need equal resources. So, limiting the resource utilization is justified here. But how much of the resources should be held back? From the simulation, the sweetspot seems to be somewhere around 65% and 75%. Although the data is insuffient to single out a particular number with any degree of certainity, we can safely infer that the utilization ratio, when time shared cloudlet scheduler is used, must not be too small(<20%) or too large(>85%)
The simulation for the various services(IaaS, PaaS, Saas) offered by cloud service providers are simulated only for educational purpose. There is no comparison or metric involved here, as obviously, one might guess, we make the cloudlet parameters randomized within a set of available choices. If comparison is made in such a case, it would be similar to comparing apples, oranges and bananas.
IaaS is simulated by randomly choosing the os, architecture, and vmm and also randomly choosing various parameters of the cloudlets.
In PaaS, the cloudlets' parameters can be chosen by the customer(randomized in the simulation) whereas all other configuration parameters are fixed
In SaaS, the customer has absolutely no control over any of the configuration parameters, simulated by making them rigid.
Datacenters without network topology
With naive implementation of datacenters, default network configurations are used. In this case, the default network configuration is null. So, the delay that might occur due to routers and switches are not accounted for in the naive simulation. This can be executed by setting the argument isNetwork=false when Simulation3.start() is called.
Datacenters with network topology
When network topology is included in the datacenters, the delay that could potentially occur due to any network related factors can be simulated. As it is evident in the image above, there is a little delay in the start time of the cloudlet execution. This simulation might correspond to the delay in cloudlets reaching the datacenters from the customer through the internet.
This delay could be problem, and the customer may incur some extra cost in cases when there is SLA or even when pricing policy includes this delay. To reduce/avoid such costs, the customer has to have knowledge about the networking policies the cloud service provider is using and should take an informed decision.
In all the scenarios mentioned above, it can be clearly seen that the CPU(the combination of MIPS and number of Processing Entities) is always a bottleneck in the cloud ecosystem. The number of cloudlets, length of the cloudlets, and the required processing entities by each cloudlet must fit in with the combined amount of resources that the VMs can offer and on top of that, proper cloudlet scheduling policy has to be in place for all the cloudlets to run smooth. And let us not forget about the most important factor, the utilization ratio. Intuitively, the resources must be split equally among all the cloudlets, which means, a good heuristic in choosing the utilization ratio is 1/(number of cloudlets).
This assignment's goals can be roughly ELI5ed as follows:
- Learn to use version control
- Understand how cloudsimplus works
- Introduction to functional programming in scala
- Get a good grasp on cloud service and architecture
- Digging deeper into all the Classes and Interfaces
- How to structure the project/code
- Workload and time management
- How to approach open ended problems