oap-project / pmem-spill Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 5.0 30.79 MB

Spark plug-in package for accelerating Spark runtime spill functions using PMem such as RDD cache PMem extension.

License: Apache License 2.0

JavaScript 2.45% Java 0.57% Scala 96.98%

pmem-spill's People

Contributors

Stargazers

Watchers

Forkers

hongw2019 justdocoder yma11 yeyuqiang nanpaiuncle3

pmem-spill's Issues

Update the documentation for OAP 1.1.1

Cannot get benefit when running SVM workload by using PMEM-SPILL

The performance data showed below:

Component	Configuration	Status	Succeeded Cases	Failed cases	Baseline time/s	Optimized time/s	Performance gain/%
oap-spark	SVM_600GB_DCPMM_RDD_Cache	SUCCEED	1	0	865.184	1028.388	-15.9

The cluster contains 3 workers and each contains 384G DRAM. The configuration of spark showed below:
spark.memory.pmem.extension.enabled true
hibench.streambench.spark.checkpointPath /var/tmp
spark.storage.unrollMemoryThreshold 1048576
hibench.streambench.spark.receiverNumber 4
spark.yarn.historyServer.address vsr219:18080
spark.memory.pmem.initial.size 450GB
spark.executor.extraJavaOptions -Xms50G -XX:InitialBootClassLoaderMetaspaceSize=128m -XX:MetaspaceSize=128m -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=10 -XX:ParallelGCThreads=10 -XX:ConcGCThreads=10
hibench.yarn.executor.cores 45
spark.executor.memory 90g
hibench.streambench.spark.useDirectMode true
spark.eventLog.dir hdfs://vsr219:9000/spark-history-server
spark.driver.memory 10g
spark.eventLog.enabled true
spark.memory.spill.pmem.enabled false
spark.driver.extraClassPath /opt/Beaver/OAP/oap_jar/pmem-spill-1.1.0-with-spark-3.0.0.jar:/opt/Beaver/OAP/oap_jar/pmem-common-1.1.0-with-spark-3.0.0.jar
spark.kryo.unsafe true
hibench.yarn.executor.num 6
spark.history.fs.logDirectory hdfs://vsr219:9000/spark-history-server
spark.files /opt/Beaver/OAP/oap_jar/pmem-spill-1.1.0-with-spark-3.0.0.jar,/opt/Beaver/OAP/oap_jar/pmem-common-1.1.0-with-spark-3.0.0.jar
spark.executor.extraClassPath ./pmem-spill-1.1.0-with-spark-3.0.0.jar:./pmem-common-1.1.0-with-spark-3.0.0.jar
spark.history.fs.cleaner.enabled true
spark.default.parallelism ${hibench.default.map.parallelism}
spark.serializer.bufferedInputStreamSize 4096
hibench.streambench.spark.storageLevel 2
hibench.streambench.spark.batchInterval 100
hibench.spark.master yarn
spark.sql.shuffle.partitions 200
spark.history.ui.port 18080
hibench.spark.home /opt/Beaver/spark
spark.sql.warehouse.dir hdfs://vsr219:9000/spark-warehouse
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.memory.pmem.initial.path /mnt/pmem0,/mnt/pmem1
hibench.streambench.spark.enableWAL false

[Doc]Refine PMem-Spill Introduction for better user experience

PMEM-SPILL of OAP-1.1 cannot support spark-3.0.1.

We try to run K-means algorithm by using PMEM-SPILL on spark-3.0.1 while meet the issue like following picture:

PMEM-SPILL overwrite the source code of spark("src/main/scala/org/apache/spark/internal/config/package.scala") and add some configs for PMEM

While the package.scala of spark-3.0.0 is different with the one of spark-3.0.1 and the issue is caused by the package.scala of OAP-1.1 PMEM-SPILL not contains EXECUTOR_ALLOW_SPARK_CONTEXT.

So how can we decouple package.scala of spark? If we cannot decouple this file, we need to adapt to all some versions of spark.

[SDLe]Vulnerabilities scanned by Snyk

Upgrade Jetty version

PMEM-SPILL cannot support spark-3.1.1.

The similar issue like #38

java.lang.NoClassDefFoundError: org/apache/log4j/spi/Filter

Update documentation for OAP 1.2.0

Reorganize the source code for the new repository organization

We split the original single repository OAP project to a more flexible multi-repositories. OAP project becomes the top level organization at github.

This is the issue to track the work for reorganizing the source code structure.

The compiled code failed because the variable name was not changed

The compiled code returns the following error:
error: not found: value initPath
[ERROR] logInfo(s"Intel Optane PMem initialized with path: ${initPath}, size: ${pmemInitialSize} ")

Need add vanilla SparkEnv.scala for future update

For memory extension, we will need initialize PMem in SparkEnv when executors launched, so need to update the vanilla SparkEnv.scala.

[SDLe][Snyk]Upgrade Jetty version to fix vulnerability scanned by Snyk

Unable to initialize one pmem path of RDD Cache

Unable to initialize one pmem path. There is only one namespace available in the environment, such as virtual machine environment, When spark.memory.pmem.initial.path is set to one path, AppDirect mode cannot be used.

fix issues found by vulnerabilities scans in pmem-spill project

There are some vulnerabilities found during security scan in PMem spill project. Need to fix these issues.

Organize documentation of different versions in website with Github Pages and Mkdocs

Put documents in ${repo}/docs
create gh-pages, and use Github Pages and Mkdocs to build different versions

update to 1.2.0-snapshot

Modify title check and automatic link to Issues for PRs

Add configuration items to spark-defaults.conf

When using AppDirect mode or KMemDax mode, the configuration of spark.memory.pmem.extension.enabled needs to be added

PMEM-SPILL of OAP-1.1 has performance regression comparing with OAP-1.1.1

We use the same configuration to run K-means & SVM algorithm. The cluster contains 3 workers and each contains 1TB PMEM. The performance has 12.9% regression when running SVM 1.2TB scale and 28.6% regression when running K-means 500GB.
The configuration of spark when running SVM is showed below:

spark.memory.pmem.extension.enabled true
hibench.streambench.spark.checkpointPath /var/tmp
spark.storage.unrollMemoryThreshold 1048576
hibench.streambench.spark.receiverNumber 4
spark.yarn.historyServer.address vsr219:18080
spark.memory.pmem.initial.size 450GB
hibench.yarn.executor.cores 45
spark.executor.memory 90g
hibench.streambench.spark.useDirectMode true
spark.eventLog.dir hdfs://vsr219:9000/spark-history-server
spark.driver.memory 10g
spark.eventLog.enabled true
spark.memory.spill.pmem.enabled false
spark.driver.extraClassPath /opt/Beaver/OAP/oap_jar/pmem-rdd-cache-1.1.1-with-spark-3.1.1.jar:/opt/Beaver/OAP/oap_jar/pmem-common-1.1.1-with-spark-3.1.1.jar
spark.kryo.unsafe true
hibench.yarn.executor.num 6
spark.history.fs.logDirectory hdfs://vsr219:9000/spark-history-server
spark.files /opt/Beaver/OAP/oap_jar/pmem-rdd-cache-1.1.1-with-spark-3.1.1.jar,/opt/Beaver/OAP/oap_jar/pmem-common-1.1.1-with-spark-3.1.1.jar
spark.executor.extraClassPath ./pmem-rdd-cache-1.1.1-with-spark-3.1.1.jar:./pmem-common-1.1.1-with-spark-3.1.1.jar
spark.history.fs.cleaner.enabled true
spark.default.parallelism ${hibench.default.map.parallelism}
spark.serializer.bufferedInputStreamSize 4096
hibench.streambench.spark.storageLevel 2
hibench.streambench.spark.batchInterval 100
hibench.spark.master yarn
spark.sql.shuffle.partitions 200
spark.history.ui.port 18080
hibench.spark.home /opt/Beaver/spark
spark.sql.warehouse.dir hdfs://vsr219:9000/spark-warehouse
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.memory.pmem.initial.path /mnt/pmem0,/mnt/pmem1
hibench.streambench.spark.enableWAL false

Setting one pmem path on AppDirect mode may cause the pmem initialization path to be empty Path

When setting one pmem path on AppDirect mode, it may cause the pmem initialization path to be empty Path.

Support vanilla spark 3.1.1

Support vanilla spark 3.1.1 #33

Make pmem-spill repo compile successfully

Update the documentation for OAP 1.1.0

Align master branch with branch-1.1

Error when using PMEM_AND_DISK storage level

Hello. I'm trying to cache RDD in my persistent memory and I continuously get this error.

My system is a single NUMA node system and I have one pmem path.
I installed spark version 3.0.0 and hadoop version 3.2.0 so that I can use pmem-spill and pmem-common v1.1.0-spark-3.0.0.

Because of the error message as above, I first thought that the reason of the problem was because of the memkind library. However I tested the memkind library with the test examples provided and checked that memkind_malloc worked fine.

Is there something I'm missing? Thank you.