Code Monkey home page Code Monkey logo

fsspreadsheet's People

Contributors

freymaurer avatar hlweil avatar kmutagene avatar muehlhaus avatar omaus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

freymaurer

fsspreadsheet's Issues

[Feature Request] Table api is very obscure

Without any documentation the table api seems impossible to navigate, as it does not use any common vocabularies from spreadsheet manipulation.

image

What is a field?

image

Why is this not AddCell (same for worksheet.Table)? Why is there no "GetCell", even if i need to pass an additional cellcollection.

image

Why can i access HeaderRow, but not body?

In addition #45

[BUG] RescanRows

Make this function internal only. Never force the user to call this!

[BUG] OpenXML does not correctly read DataType

In Xlsx DataType is written as t="s" attribute in cells (<c></c> tag), in this example DataType is string.

  • Default for DataType is Number NOT empty as previously implemented.

  • DateTime is also Number so not t attribute . Instead there will be an s attribute which references cellXfs in styles.xml.

  • The number of the s attribute is the index in the cellXfs tag.
    image
    image

  • numFmtId="14" is date time.

  • The number value in the sheet.xlm represent days since 1900 and can be converted with .NET System.DateTime.FromOADate (thank god)

[Feature Request] Make OpenXml Wrapper as separate .fsproj.

With the fable compatibility in FsSpreadsheet.ExcelIO we need to separate the OpenXml conversion stuff. A simple switch in the final file, such as shown below, is not enough to avoid compilation of the pure dotnet parts. Below i will list some solution which might work.

namespace FsSpreadsheet.ExcelIO

open FsSpreadsheet

[<AutoOpen>]
module FsExtensions =

    type FsWorkbook with

        /// <summary>
        /// Creates an FsWorkbook from a given Stream to an XlsxFile.
        /// </summary>
        static member fromXlsxStream (stream : System.IO.Stream) =
            #if FABLE_COMPILER_JAVASCRIPT
            JsIO.fromStream stream
            #else
            FsWorkbook.fromXlsxStream stream
            #endif

Possible solutions:

  1. Write the following around all code in dotnet files.
#if !FABLE_COMPILER
// ... all code
#endif
  1. Move dotnet part to its own project (no own nuget, just to reduce #if !FABLE_CMPILER around a single open statement
  2. Rename dotnet part namespace to not automatically open.

For now i will implement solution 1, as quick and dirty solution.

Use xml doc structure with tags

For example this XML doc:

https://github.com/CSBiology/FsSpreadsheet/blob/810424fadaf432587c12e24d550a30578ed7598c/src/FsSpreadsheet/FsAddress.fs#L7-L15

should be

/// <summary> 
/// Transforms excel column string indices (e.g. A, B, Z, AA, CD) to index number (starting with A = 1).
/// </summary> 
/// <param name="columnAdress">param description here</param>
let colAdressToIndex (columnAdress : string) =
    let length = columnAdress.Length
    let mutable sum = 0u
    for i=0 to length-1 do
        let c = columnAdress.[length-1-i] |> System.Char.ToUpper
        let factor = 26. ** (float i) |> uint
        sum <- sum + ((uint c - 64u) * factor)
    sum

Why?

while fsdocs is debatable for creating actual tutorial-style documentation (we might migrate to something that can convert notebooks for that), it's API doc generation is top-notch. XML docs using tags are significantly improving the output of API doc generation.

For a real-life example, take a look at plotly.net's API docs, for example here: https://plotly.net/reference/plotly-net-chart.html#Grid

[Feature Request] Add FsSpreadsheet.Exceljs for fable compatible read/write.

To allow seamless integration of FsSpreadsheet in ARCtrl we want to introduce fable compatibility on xlsx level. I already wrote basic f# bindings for javascript exceljs in Fablel.Exceljs. These are my proposed changes:

  • Add Fable.Exceljs dependency to FsSpreadsheet.ExcelIO.
  • Add Femto and fable support in .fsproj.
  • Add a converter from FsSpreadsheet data model to Fable.Exceljs datamodel (only compiled under fable!)
  • Add a switch to "fromFile"/"toFile" (stream, buffer) functions switching between OpenXml and Fable.Exceljs depending on compiler. This is not possible. as js read/write MUST be async, so instead we offer separate repo.
  • Test conversion with Fable.Mocha
  • Test read/write in native Mocha

ExcelIO Reader

graph LR;
    A(.xlsx file) -->|ExcelIO| B(FsSpreadsheet datamodel);

Must read .xlsx file and return a populated FsSpreadsheet.

MVP is needed for arc validation, meaning it must:

[BUG] FsSpreadsheet.Exceljs incorrect read-in for "headerRow" bool

Tables with headerRow:

image

are only recognized by exceljs if exceljs did write the table:

image

In our tests only table MyNewTable has true for headerRow. Especially for the bottom 4 i can guarantee the headerRow value should be true.

This is most likely a exceljs bug and needs to be fixed in @nfdi4plants/exceljs.

[BUG] Address custom equality

Describe the bug

FsRangeAddress("A1:A1") = FsRangeAddress("A1:A1")
FsAddress("A1") = FsAddress("A1") 

both return false.

Expected behavior
These should return true. This could be done by implementing custom equality for the address types.

[BUG] DSL: required and optional operators dont unpack value

To Reproduce

(+.) id (Option.Some 5)

returns

val it: SheetEntity<Value> = Some ((String, "Some(45)"), [])

But the internal value should be only 45.

Solution
use parseAny

let inline (+.) (f : 'T -> 'U) (v : 'T) : SheetEntity<Value> =
    let err = fun s -> NoneRequired([message s])
    try 
        f v 
        |> parseAny err
    with 
    | err -> NoneRequired([Exception err])

DataType is not set correctly when exported to xlsx

Is your feature request related to a problem? Please describe.
When a cell contains "MAR3" with DataType = string, in Excel the cell data type is set to General instead of Text. Therefore it is converted to March-03.

Describe the solution you'd like
The datatype encoded within the model should be correctly converted to Excel types.

Describe alternatives you've considered
None

[BUG] Critical bug when writing and reading again

Describe the bug
When writing a spreadsheet file and reading it again, it throws an error regardless of what has been done to the structure/file.

To Reproduce
Steps to reproduce the behavior:

  1. Do this:
#r "nuget: FsSpreadsheet"
#r "nuget: FsSpreadsheet.ExcelIO"

open FsSpreadsheet
open FsSpreadsheet.ExcelIO

let path = <yourPath>

let wb = new FsWorkbook()

wb.InitWorksheet "empty"

FsWorkbook.toFile path wb

let wbFromFile = FsWorkbook.fromXlsxFile path
  1. See error:
System.IO.IOException: Cannot modify readonly container
   at System.IO.Packaging.Package.ThrowIfReadOnly()
   at System.IO.Packaging.Package.CreatePart(Uri partUri, String contentType, CompressionOption compressionOption)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.CreateMetroPart(Uri partUri, String contentType)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPart.CreateInternal(OpenXmlPackage openXmlPackage, OpenXmlPart parent, String contentType, String targetExt)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.InitPart[T](T newPart, String contentType, String id)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.InitPart[T](T newPart, String contentType)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.AddNewPartInternal[T]()
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.AddNewPart[T]()
   at FsSpreadsheet.ExcelIO.Spreadsheet.getOrInitSharedStringTablePart(SpreadsheetDocument spreadsheetDocument)
   at FsSpreadsheet.ExcelIO.Spreadsheet.getCellsBySheet(Sheet sheet, SpreadsheetDocument spreadsheetDocument)
   at FsSpreadsheet.ExcelIO.Spreadsheet.getCellsBySheetID(String sheetID, SpreadsheetDocument spreadsheetDocument)
   at [email protected](Sheet xlsxSheet)
   at Microsoft.FSharp.Collections.Internal.IEnumerator.map@128.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 134
   at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 117
   at Microsoft.FSharp.Collections.SeqModule.Fold[T,TState](FSharpFunc`2 folder, TState state, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 914
   at FsSpreadsheet.ExcelIO.FsExtensions.FsWorkbook.fromXlsxFile.Static(String filePath)
   at <StartupCode$FSI_0006>.$FSI_0006.main@() in C:\Users\olive\Untitled-2:line 15
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Stopped due to error

Expected behavior
Reading should work as expected.

Additional context
It seems to be an error produced by the writer, not the reader. When reading files that were created by Excel, no error happens.

[Feature Request] One-step creation and usage of Worksheets

Is your feature request related to a problem? Please describe.
Currently, adding a worksheet to a workbook retrieving it for editing is a two step process, as

myWorkbook.AddWorksheet("SheetName")

returns a unit and

FsWorkbook.addWorksheetWithName  "SheetName" myWorkbook

returns the FsWorkbook

Describe the solution you'd like
I would like a functionality that directly returns the newly created FsWorksheet. Maybe even implemented as an ".Item" like cells in rows and rows in sheets?

let myWorksheet = myWorkbook.["NewSheetName"] 

which either returns an exisiting sheet or creates a new one.

[BUG] Unable to write correct xlsx file.

Not sure if this is an error in usage or execution.

#r "nuget: FsSpreadsheet, 3.1.1"
#r "nuget: FsSpreadsheet.ExcelIO, 3.1.1"

let path = @"C:/Users/Kevin/Desktop/Book1.xlsx"

open FsSpreadsheet
open FsSpreadsheet.ExcelIO

let wb = new FsWorkbook()

let ws = wb.InitWorksheet("New Worksheet")

ws.AddCell(FsCell("My Column 1",address=FsAddress("B1")))
ws.AddCell(FsCell("My Column 2",address=FsAddress("C1")))
let t = FsTable("My New Table", FsRangeAddress("B1:C2"))

ws.AddTable(t)

wb.ToFile(path)

-------------->
image
image

Re-use functions/methods whenever possible

See for example here:

https://github.com/CSBiology/FsSpreadsheet/blob/810424fadaf432587c12e24d550a30578ed7598c/src/FsSpreadsheet/FsWorkbook.fs#L40-L55

You have high code duplication. You can refactor one of these methods to use the other (most likely the one using the name as string to use the one using the worksheet argument)

Why?

Maintainability and smaller surface for bugs. nearly 90% of the logic in these 2 functions are the same, in fact one of them is most likely an adapted copy of the other. If the original version has a bug, the copied one has it as well.

If you rewrite one of them using the other, most of the possible bugs will be catched by testing the inner function

[Docs] Bundling npm package from fable

Problem

Fable transpiled fsproj to include f# dependencies (&npm dependencies) in relative subfolder fable_modules.

image

  • Shipping npm dependencies with the npm package instead of them being a dependency is a rather dirty solution we should avoid
  • We could try a bundler minifying codespace with one output file. This will most likely destroy human readability of the "source code" downloaded by the api user.

[BUG] FsWorkbook can be saved to file with no worksheet present

Describe the bug
It is possible to save an FsWorkbook with no worksheets attached. This is critical since an XLSX spreadsheet file always needs to have at least 1 worksheet with a name given. Else it is not valid and, e.g., MS Excel will tell you it is corrupted when trying to open it.

To Reproduce
Steps to reproduce the behavior:
1.

#r "nuget: FsSpreadsheet"
#r "nuget: FsSpreadsheet.ExcelIO"

open FsSpreadsheet
open FsSpreadsheet.ExcelIO

let path = "test.xlsx"

let wb = new FsWorkbook()

FsWorkbook.toFile path wb
  1. Open in MS Excel

Expected behavior
This is up to debate.
Maybe throw an error when trying to save such a workbook? Or at least give a warning? Is this even a bug at all or do we want to give the library user the freedom to artificially produce invalid XLSX files?
@HLWeil

[Feature Request] Add additional functionality for `FsRow`

Is your feature request related to a problem? Please describe.
Atm., we miss some functionality for FsRow type that is needed in other projects.

Describe the solution you'd like

  • tryItem which returns Some item if present and None if not
  • getMaxColNumber and getMinColNumber
  • getDenseRow where all non-existent cells between min and max column number are created as empty cells

[BUG] Error reading file with table with no header row

#r "nuget: FsSpreadsheet, 3.1.1"
#r "nuget: FsSpreadsheet.ExcelIO, 3.1.1"

let path = @"C:/Users/Kevin/Desktop/Book1.xlsx"

open FsSpreadsheet
open FsSpreadsheet.ExcelIO

let wb = FsWorkbook.fromXlsxFile(path)

returns:

> let wb = FsWorkbook.fromXlsxFile(path);;
System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at FsSpreadsheet.ExcelIO.Table.Area.toBoundaries(StringValue area)
   at FsSpreadsheet.ExcelIO.FsExtensions.FsTable.fromXlsxTable.Static(Table table)
   at FsSpreadsheet.ExcelIO.FsExtensions.sheets@186.Invoke(Sheet xlsxSheet)
   at Microsoft.FSharp.Collections.Internal.IEnumerator.map@128.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 131
   at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 113
   at Microsoft.FSharp.Collections.SeqModule.Fold[T,TState](FSharpFunc`2 folder, TState state, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 911
   at FsSpreadsheet.ExcelIO.FsExtensions.FsWorkbook.fromXlsxFile.Static(String filePath)
   at <StartupCode$FSI_0004>.$FSI_0004.main@() in c:\Users\Kevin\Desktop\test.fsx:line 10
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Stopped due to error
>

Book1.xlsx

[Feature Request] Implement `Stylesheet`

Is your feature request related to a problem? Please describe.
In OpenXml's Stylesheet class, things like CellFormat are stored. We need these to correctly set a cell we want to save as pure text as it otherwise leads to problems like in #19.

Describe the solution you'd like
Implement styles and map them from the Stylesheet object.

[BUG] FsTable.HeadersRow syntax/function issues

Describe the bug

  • HeadersRow instead of HeaderRow/GetHeaderRow
  • returns null instead of option or fail
  • returns FsRangeRow instead of FsRow?!

Possible Solution

  • I will obsolete this function
  • Add TryGetHeaderRow and GetHeaderRow with correct output

[Feature Request] clean up io tests and plan a new set

Is your feature request related to a problem? Please describe.

Currently IO tests were created ad-hoc and not really planned. These should get cleaned up and planned out in a senseful way.

Describe the solution you'd like

We must cover:

  • Reading/Writing to/from different io libraries
  • Tables

So for FsSpreadsheet.ExcelIO we must test:

  • READ fom exceljs
  • READ from excel (the app)
  • READ from ClosedXML
  • READ from libre
  • READ from FsSpreadsheet
  • ROUNDABOUT FsSpreadsheet.ExcelIO
  • WRITE

ForFsSpreadsheet.Exceljs we must test:

  • READ fom FsSpreadsheet.ExcelIO
  • READ from excel (the app)
  • #77
  • READ from libre
  • READ from FsSpreadsheet
  • ROUNDABOUT FsSpreadsheet.Exceljs
  • WRITE

More

  • All files must contain a table with different cell value types. We propose the following:
Numbers Strings DateTime Boolean ARCtrl Column ARCtrl Column
1 Hello any true (A) This is part 1 of 2 (A) This is part 2 of 2
2 World any false Tests if column names with whitespace at end can be unique
3 Bye any true
4 Outer Space any false
  • Duplicate this Table to TWO worksheets to also include multiple worksheets.

  • Due to the nature of these tests, we can recycle most of the test files. We only need one file of any kind for read-in.

  • All files written by tests should be prefixed with "WRITE_" so they can be deleted at any point, while all other files MUST never be opened AND saved to avoid corrupting their origin.

[BUG] FsTable.Cell access is relative to table start

#r "nuget: FsSpreadsheet, 3.1.1"
#r "nuget: FsSpreadsheet.ExcelIO, 3.1.1"

let path = @"C:/Users/Kevin/Desktop/Book1.xlsx"

open FsSpreadsheet
open FsSpreadsheet.ExcelIO

let wb = new FsWorkbook()

let ws = wb.InitWorksheet("New Worksheet")

ws.AddCell(FsCell("My Column 1",address=FsAddress("B1")))
ws.AddCell(FsCell("My Column 2",address=FsAddress("C1")))
let t = FsTable("My New Table", FsRangeAddress("B1:C2"))

ws.AddTable(t)

// Cell access
t.Cell(FsAddress("B1"), ws.CellCollection).Value // "My Column 2"
t.Cell(FsAddress("A1"), ws.CellCollection).Value //  "My Column 1"
t.Cell(FsAddress("C1"), ws.CellCollection).Value // ""

Setup: As you can see i set two cells "B1" and "C1" and create a table over their range.

Now i would expect that the "Cell access" via FsAddress on "B1" should return "My Column 1", but instead it returns "My Column 2".

Testing the same access for "A1" and "C1" shows, that this cell access, even though it is done via address works relative to table start, which i can only assume is a bug.
I would understand and appreciate this behaviour in a function with index based access but not here.

The correct insert can be checked via:

let b1 = FsAddress("B1")

ws.GetCellAt(b1.RowNumber, b1.ColumnNumber).Value // "My Column 1"

[Feature Request] Read only xlsx reading

Is your feature request related to a problem? Please describe.

let inv = FsWorkbook.fromXlsxFile p

image

Describe the solution you'd like
A read only reader for xlsx files

[BUG] FsCells with row#/col# 0/0 are invalid, yet ctor creates them

Describe the bug
When creating FsCells with the constructor and not specifying the address, they are given the location row 0, column 0. This is not valid.

To Reproduce
Steps to reproduce the behavior:
1.

#r "nuget: FsSpreadsheet"
#r "nuget: FsSpreadsheet.ExcelIO"

open FsSpreadsheet
open FsSpreadsheet.ExcelIO

let wb = new FsWorkbook()
FsWorkbook.initWorksheet "test" wb
let ws = wb.GetWorksheets().Head

let cell1 = FsCell("test")
ws.CellCollection.Add [cell1]

FsWorkbook.toFile "test.xlsx" wb
  1. Open in MS Excel, see error

Expected behavior
Should initiate with address row 1, col 1.

[BUG] FsTable and FsRangeBase inerhitance fable issues

image

ColumnCount is part of the inheritance of FsRangeBase. Using this language construct the transpiled javascript code i not easily accessible FsRangeBase__ColumnCount(fstable). Maybe this needs some testing or must be refactored

[BUG] `FsWorkbook.fromXlsxFile` throws IOException under unknown circumstances

Describe the bug
Some XLSX files let FsWorkbook.fromXlsxFile throw an IOException. The reason behind this (the properties of such files) are currently not known.

To Reproduce
Steps to reproduce the behavior:

  1. Download this file:
    errorInducing.xlsx
  2. Use FsWorkbook.fromXlsxFile <pathToThisFile> in a script environment
  3. See error:
    image

Expected behavior
At least a more precise error message.
Cases like these suggest that a readonly version of the function (see #25) is urgently needed.

[BUG] Worksheets have no rows when reading Xlsx file

Describe the bug
Worksheets have no rows when reading Xlsx file.

To Reproduce
Have an Xlsx file with some values in it.

let wb = FsWorkbook.fromXlsxFile filepath
let ws1 = FsWorkbook.getWorksheets wb |> Seq.head
ws1.Rows

Expected behavior
All FsRows are in place where they should be.

Additional context
The problem here is that you have to call ws1.RescanRows() before being able to access them. Maybe it would be best to have this called in the FsWorkbook.fromXlsxFile method? Or is there anything that would speak against it? @HLWeil

[BUG] Cannot read file

Tried reading file with FsSpreadsheet.ExcelIO.

#r "nuget: FsSpreadsheet"
#r "nuget: FsSpreadsheet.ExcelIO"

let path = @"C:/Users/Kevin/Desktop/Book1.xlsx"

open FsSpreadsheet
open FsSpreadsheet.ExcelIO

let wb = FsWorkbook.fromXlsxFile(path)

returns:

> let wb = FsWorkbook.fromXlsxFile(path);;
System.InvalidOperationException: Error in implicit conversion. Cannot convert null object.
   at DocumentFormat.OpenXml.BooleanValue.op_Implicit(BooleanValue xmlAttribute)
   at FsSpreadsheet.ExcelIO.FsExtensions.FsTable.fromXlsxTable.Static(Table table)
   at FsSpreadsheet.ExcelIO.FsExtensions.sheets@185.Invoke(Sheet xlsxSheet)
   at Microsoft.FSharp.Collections.Internal.IEnumerator.map@128.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 134
   at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 117
   at Microsoft.FSharp.Collections.SeqModule.Fold[T,TState](FSharpFunc`2 folder, TState state, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 914
   at FsSpreadsheet.ExcelIO.FsExtensions.FsWorkbook.fromXlsxFile.Static(String filePath)
   at <StartupCode$FSI_0017>.$FSI_0017.main@() in c:\Users\Kevin\Desktop\test.fsx:line 37
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Stopped due to error

This is the file:
Book1.xlsx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.