<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

Reading : New header row is added every time while new sheet data is appended about spark-hadoopoffice-ds HOT 6 CLOSED

zuinnote commented on June 13, 2024

Reading : New header row is added every time while new sheet data is appended

from spark-hadoopoffice-ds.

Comments (6)

jornfranke commented on June 13, 2024

I see. For the current version you have the option to filter sheets so you could have for each dataframe one filter for each sheet. I could imagine we could set a configuration that header should be read / sheet. This could be included in future versions.

…

On 5. Mar 2018, at 10:52, Jay Panchal ***@***.***> wrote: For instance if i am choosing useHeader true and pass a file or multiple files with multiple sheets but all sheets and files has exactly same number of columns with same name and datatype, then final datasets contains additional new row at start of new sheet data. E.g. 3 files with 2 sheets in each, total 6 sheets, so final dataset will have 5 additional rows, from each sheet's first row as a "data row" — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

from spark-hadoopoffice-ds.

jornfranke commented on June 13, 2024

Multiple files => in this case the header should be already skipped, because it is skipped for each file. Can you confirm that this is not the case?
Multiple sheets in a single file: You can pass the HadoopOffice option "hadoopoffice.read.sheets" where you just specify one sheet / data frame. In this way you can already today skip the header for each sheet, because you load each sheet individually. However, it also implies that the file is loaded x times and x corresponds to the number of sheets in a file.

for the latter case we can include an option read.spark.useHeader.skipHeaderInEachSheet

from spark-hadoopoffice-ds.

jaypanchal commented on June 13, 2024

Yes, for multiple files headers are not skipped...

I will make sure and let you read.spark.useHeader.skipHeaderInEachSheet is working fine or not. :) thnks

from spark-hadoopoffice-ds.

jornfranke commented on June 13, 2024

ok, i look a little bit into it. It is feasible, but I have to give a little bit thought to make this feature more general. The reason is that we soon also will release this library for Hive, Flink DataSource/DataSink and Flink TableSource/TableSink API.

from spark-hadoopoffice-ds.

jornfranke commented on June 13, 2024

released in 1.1.0

check readme.md, please confirm

from spark-hadoopoffice-ds.

jornfranke commented on June 13, 2024

please create a new issue if it does not meet your use case

from spark-hadoopoffice-ds.

Recommend Projects

Reading : New header row is added every time while new sheet data is appended about spark-hadoopoffice-ds HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent