Name and Version bitnamicharts/spark What archi

I create a discussion in kubernetes community too <a href="https://discuss.kubernetes.

with scala shell (spark-shell), everything is ok. <div class="snippet-clipboard-co

Problem with pyspark in kubernetes via bitnami helm chart about charts HOT 6 OPEN

kayvansol commented on August 16, 2024

Problem with pyspark in kubernetes via bitnami helm chart

from charts.

Comments (6)

carrodher commented on August 16, 2024 1

The issue may not be directly related to the Bitnami container image or Helm chart, but rather to how the application is being utilized or configured in your specific environment.

Having said that, if you think that's not the case and are interested in contributing a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.

If you have any questions about the application itself, customizing its content, or questions about technology and infrastructure usage, we highly recommend that you refer to the forums and user guides provided by the project responsible for the application or technology.

With that said, we'll keep this ticket open until the stale bot automatically closes it, in case someone from the community contributes valuable insights.

from charts.

kayvansol commented on August 16, 2024 1

I test above code with docker compose too with bitnami image and the result was the same fault in creation of *.parquert file:

csv read success:

parquet file creation failure:

docker-compose.yml :

version: '3.6'

services:

  spark:
    container_name: spark
    image: bitnami/spark:latest
    environment:
      - SPARK_MODE=master
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_USER=spark   
    ports:
      - 127.0.0.1:8081:8080
    

  spark-worker:
    image: bitnami/spark:latest
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark:7077
      - SPARK_WORKER_MEMORY=2G
      - SPARK_WORKER_CORES=2
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_USER=spark

docker run :

docker-compose up --scale spark-worker=2

ctp.py :

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("WritingParquet").getOrCreate()

df = spark.read.option("header", True).csv("csv/file.csv")

df.show()

df.write.mode('overwrite').parquet("a.parquet")

spark submit :

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://35368355157f:7077 csv/ctp.py

please help me 👍

from charts.

kayvansol commented on August 16, 2024 1

I create a discussion in kubernetes community too link

from charts.

kayvansol commented on August 16, 2024 1

I tested the python code for saving dataframe to json format, but the result was the same problem as I mentioned before :

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("WritingJson").getOrCreate()

df2 = spark.createDataFrame([(1, "Alice", 10),
                            (2, "Bob", 20),
                            (3, "Charlie", 30)], 
                            ["id", "name", "age"])


df2.show()

df2.write.mode('overwrite').json('file_name.json')

please say something helpfull.

from charts.

kayvansol commented on August 16, 2024 1

with scala shell (spark-shell), everything is ok.

val df = spark.read.csv("csv/file.csv")

df.write.mode("overwrite").format("json").save("file_name.json")

but with pyspark and spark-submit python code file not found !

from charts.

kayvansol commented on August 16, 2024 1

I tested the java code for saving dataframe to json format, but the result was the same problem as I mentioned before :

package arka;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class ctjson {

	public static void main(String[] args) {

		SparkSession SPARK_SESSION = SparkSession.builder().appName("Mahla ctjson")
				.master("spark://6fe9e36ddaa9:7077")
				.getOrCreate();

		Dataset<Row> df = SPARK_SESSION.read().option("inferSchema", "true")
				.option("header", "true")
				.csv("csv/file.csv");

		df.show();

		df.printSchema();
		
		df.write().mode("overwrite").format("json").save("file_name.json");
		
	}
}

pom.xml :

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<groupId>com.mahla</groupId>
	<artifactId>arka</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>csvtojson</name>

	<dependencies>

		<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core_2.12</artifactId>
			<version>3.5.1</version>
		</dependency>

		<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-sql_2.12</artifactId>
			<version>3.5.1</version>
			<scope>provided</scope>
		</dependency>
		
	</dependencies>

</project>

jar file :
ctj.zip

submit command :

./bin/spark-submit --class arka.ctjson --master spark://6fe9e36ddaa9:7077 csv/ctj.jar

Could you please check the issue.

from charts.

Problem with pyspark in kubernetes via bitnami helm chart about charts HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent