Code Monkey home page Code Monkey logo

pmem-spill's People

Contributors

adrian-wang avatar carsonwang avatar chenghao-intel avatar eugene-mark avatar gfl94 avatar haojinintel avatar happycherry avatar ivoson avatar jerrychenhf avatar jian-zhang avatar jikunshang avatar jkself avatar justdocoder avatar lee-lei avatar lidinghao avatar luciferyang avatar marin-ma avatar moonlit-sailor avatar rui-mo avatar shaowenzh avatar songzhan01 avatar tigersong avatar xuanyuanking avatar xuechendi avatar xwu99 avatar yao531441 avatar yma11 avatar zhixingheyi-tian avatar zhouyuan avatar zhztheplayer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pmem-spill's Issues

Cannot get benefit when running SVM workload by using PMEM-SPILL

The performance data showed below:

Component Configuration Status Succeeded Cases Failed cases Baseline time/s Optimized time/s Performance gain/%
oap-spark SVM_600GB_DCPMM_RDD_Cache SUCCEED 1 0 865.184 1028.388 -15.9

The cluster contains 3 workers and each contains 384G DRAM. The configuration of spark showed below:
spark.memory.pmem.extension.enabled true
hibench.streambench.spark.checkpointPath /var/tmp
spark.storage.unrollMemoryThreshold 1048576
hibench.streambench.spark.receiverNumber 4
spark.yarn.historyServer.address vsr219:18080
spark.memory.pmem.initial.size 450GB
spark.executor.extraJavaOptions -Xms50G -XX:InitialBootClassLoaderMetaspaceSize=128m -XX:MetaspaceSize=128m -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=10 -XX:ParallelGCThreads=10 -XX:ConcGCThreads=10
hibench.yarn.executor.cores 45
spark.executor.memory 90g
hibench.streambench.spark.useDirectMode true
spark.eventLog.dir hdfs://vsr219:9000/spark-history-server
spark.driver.memory 10g
spark.eventLog.enabled true
spark.memory.spill.pmem.enabled false
spark.driver.extraClassPath /opt/Beaver/OAP/oap_jar/pmem-spill-1.1.0-with-spark-3.0.0.jar:/opt/Beaver/OAP/oap_jar/pmem-common-1.1.0-with-spark-3.0.0.jar
spark.kryo.unsafe true
hibench.yarn.executor.num 6
spark.history.fs.logDirectory hdfs://vsr219:9000/spark-history-server
spark.files /opt/Beaver/OAP/oap_jar/pmem-spill-1.1.0-with-spark-3.0.0.jar,/opt/Beaver/OAP/oap_jar/pmem-common-1.1.0-with-spark-3.0.0.jar
spark.executor.extraClassPath ./pmem-spill-1.1.0-with-spark-3.0.0.jar:./pmem-common-1.1.0-with-spark-3.0.0.jar
spark.history.fs.cleaner.enabled true
spark.default.parallelism ${hibench.default.map.parallelism}
spark.serializer.bufferedInputStreamSize 4096
hibench.streambench.spark.storageLevel 2
hibench.streambench.spark.batchInterval 100
hibench.spark.master yarn
spark.sql.shuffle.partitions 200
spark.history.ui.port 18080
hibench.spark.home /opt/Beaver/spark
spark.sql.warehouse.dir hdfs://vsr219:9000/spark-warehouse
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.memory.pmem.initial.path /mnt/pmem0,/mnt/pmem1
hibench.streambench.spark.enableWAL false

PMEM-SPILL of OAP-1.1 cannot support spark-3.0.1.

We try to run K-means algorithm by using PMEM-SPILL on spark-3.0.1 while meet the issue like following picture:
image
PMEM-SPILL overwrite the source code of spark("src/main/scala/org/apache/spark/internal/config/package.scala") and add some configs for PMEM
image
While the package.scala of spark-3.0.0 is different with the one of spark-3.0.1 and the issue is caused by the package.scala of OAP-1.1 PMEM-SPILL not contains EXECUTOR_ALLOW_SPARK_CONTEXT.
image
So how can we decouple package.scala of spark? If we cannot decouple this file, we need to adapt to all some versions of spark.

Unable to initialize one pmem path of RDD Cache

Unable to initialize one pmem path. There is only one namespace available in the environment, such as virtual machine environment, When spark.memory.pmem.initial.path is set to one path, AppDirect mode cannot be used.

PMEM-SPILL of OAP-1.1 has performance regression comparing with OAP-1.1.1

We use the same configuration to run K-means & SVM algorithm. The cluster contains 3 workers and each contains 1TB PMEM. The performance has 12.9% regression when running SVM 1.2TB scale and 28.6% regression when running K-means 500GB.
The configuration of spark when running SVM is showed below:

spark.memory.pmem.extension.enabled true
hibench.streambench.spark.checkpointPath /var/tmp
spark.storage.unrollMemoryThreshold 1048576
hibench.streambench.spark.receiverNumber 4
spark.yarn.historyServer.address vsr219:18080
spark.memory.pmem.initial.size 450GB
hibench.yarn.executor.cores 45
spark.executor.memory 90g
hibench.streambench.spark.useDirectMode true
spark.eventLog.dir hdfs://vsr219:9000/spark-history-server
spark.driver.memory 10g
spark.eventLog.enabled true
spark.memory.spill.pmem.enabled false
spark.driver.extraClassPath /opt/Beaver/OAP/oap_jar/pmem-rdd-cache-1.1.1-with-spark-3.1.1.jar:/opt/Beaver/OAP/oap_jar/pmem-common-1.1.1-with-spark-3.1.1.jar
spark.kryo.unsafe true
hibench.yarn.executor.num 6
spark.history.fs.logDirectory hdfs://vsr219:9000/spark-history-server
spark.files /opt/Beaver/OAP/oap_jar/pmem-rdd-cache-1.1.1-with-spark-3.1.1.jar,/opt/Beaver/OAP/oap_jar/pmem-common-1.1.1-with-spark-3.1.1.jar
spark.executor.extraClassPath ./pmem-rdd-cache-1.1.1-with-spark-3.1.1.jar:./pmem-common-1.1.1-with-spark-3.1.1.jar
spark.history.fs.cleaner.enabled true
spark.default.parallelism ${hibench.default.map.parallelism}
spark.serializer.bufferedInputStreamSize 4096
hibench.streambench.spark.storageLevel 2
hibench.streambench.spark.batchInterval 100
hibench.spark.master yarn
spark.sql.shuffle.partitions 200
spark.history.ui.port 18080
hibench.spark.home /opt/Beaver/spark
spark.sql.warehouse.dir hdfs://vsr219:9000/spark-warehouse
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.memory.pmem.initial.path /mnt/pmem0,/mnt/pmem1
hibench.streambench.spark.enableWAL false

Error when using PMEM_AND_DISK storage level

Hello. I'm trying to cache RDD in my persistent memory and I continuously get this error.

image

My system is a single NUMA node system and I have one pmem path.
I installed spark version 3.0.0 and hadoop version 3.2.0 so that I can use pmem-spill and pmem-common v1.1.0-spark-3.0.0.

Because of the error message as above, I first thought that the reason of the problem was because of the memkind library. However I tested the memkind library with the test examples provided and checked that memkind_malloc worked fine.

Is there something I'm missing? Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.