fslaborg / fsspreadsheet Goto Github PK
View Code? Open in Web Editor NEWSpreadsheet creation and manipulation in F#, in a functional and/or object-oriented way.
Home Page: http://fslab.org/FsSpreadsheet/
License: MIT License
Spreadsheet creation and manipulation in F#, in a functional and/or object-oriented way.
Home Page: http://fslab.org/FsSpreadsheet/
License: MIT License
Without any documentation the table api seems impossible to navigate, as it does not use any common vocabularies from spreadsheet manipulation.
What is a field?
Why is this not AddCell
(same for worksheet.Table
)? Why is there no "GetCell", even if i need to pass an additional cellcollection.
Why can i access HeaderRow
, but not body?
In addition #45
Make this function internal only. Never force the user to call this!
In Xlsx DataType is written as t="s"
attribute in cells (<c></c>
tag), in this example DataType is string.
Default for DataType is Number NOT empty as previously implemented.
DateTime is also Number so not t
attribute . Instead there will be an s
attribute which references cellXfs
in styles.xml.
The number of the s
attribute is the index in the cellXfs
tag.
numFmtId="14"
is date time.
The number value in the sheet.xlm represent days since 1900 and can be converted with .NET System.DateTime.FromOADate
(thank god)
README should contain minor information about scope of project
With the fable compatibility in FsSpreadsheet.ExcelIO we need to separate the OpenXml conversion stuff. A simple switch in the final file, such as shown below, is not enough to avoid compilation of the pure dotnet parts. Below i will list some solution which might work.
namespace FsSpreadsheet.ExcelIO
open FsSpreadsheet
[<AutoOpen>]
module FsExtensions =
type FsWorkbook with
/// <summary>
/// Creates an FsWorkbook from a given Stream to an XlsxFile.
/// </summary>
static member fromXlsxStream (stream : System.IO.Stream) =
#if FABLE_COMPILER_JAVASCRIPT
JsIO.fromStream stream
#else
FsWorkbook.fromXlsxStream stream
#endif
Possible solutions:
#if !FABLE_COMPILER
// ... all code
#endif
#if !FABLE_CMPILER
around a single open statementFor now i will implement solution 1, as quick and dirty solution.
For example this XML doc:
should be
/// <summary>
/// Transforms excel column string indices (e.g. A, B, Z, AA, CD) to index number (starting with A = 1).
/// </summary>
/// <param name="columnAdress">param description here</param>
let colAdressToIndex (columnAdress : string) =
let length = columnAdress.Length
let mutable sum = 0u
for i=0 to length-1 do
let c = columnAdress.[length-1-i] |> System.Char.ToUpper
let factor = 26. ** (float i) |> uint
sum <- sum + ((uint c - 64u) * factor)
sum
while fsdocs is debatable for creating actual tutorial-style documentation (we might migrate to something that can convert notebooks for that), it's API doc generation is top-notch. XML docs using tags are significantly improving the output of API doc generation.
For a real-life example, take a look at plotly.net's API docs, for example here: https://plotly.net/reference/plotly-net-chart.html#Grid
Functions like Row.getIndexedValues
and Row.mapCells
fail if cells in the row don`t contain a value, like here:
<row r="14" spans="1:5" x14ac:dyDescent="0.2">
<c r="A14" t="s">
<v>29</v>
</c>
<c r="B14" s="11" t="s">
<v>30</v>
</c>
<c r="D14" s="11"/>
<c r="E14" s="11"/>
</row>
Is your feature request related to a problem? Please describe.
There are some features added to FsRow
that are needed in FsColumn
, too.
Describe the solution you'd like
Add them.
Describe alternatives you've considered
Don't add them.
Additional context
Needed in ARC-Validation.
The current test project is empty (https://github.com/CSBiology/FsSpreadsheet/blob/datamodel/tests/FsSpreadsheet.Tests/Sample.fs)
I would suggest the following priorities for adding tests:
To allow seamless integration of FsSpreadsheet in ARCtrl we want to introduce fable compatibility on xlsx level. I already wrote basic f# bindings for javascript exceljs in Fablel.Exceljs. These are my proposed changes:
Add
FsWorkbook.fromSpreadsheetDocument: Packaging.SpreadsheetDocument
FsWorkbook.fromPackage: Packaging.Package
to allow for easy fixing of nfdi4plants/ARCtrl.NET#12
Tables with headerRow:
are only recognized by exceljs if exceljs did write the table:
In our tests only table MyNewTable
has true
for headerRow
. Especially for the bottom 4 i can guarantee the headerRow value should be true.
This is most likely a exceljs bug and needs to be fixed in @nfdi4plants/exceljs.
Describe the bug
FsRangeAddress("A1:A1") = FsRangeAddress("A1:A1")
FsAddress("A1") = FsAddress("A1")
both return false
.
Expected behavior
These should return true. This could be done by implementing custom equality for the address types.
Would be nice to have. A good starting point wouldfbe a formatter for a workbook, i'll look into a PR containing a POC for further development.
To Reproduce
(+.) id (Option.Some 5)
returns
val it: SheetEntity<Value> = Some ((String, "Some(45)"), [])
But the internal value should be only 45
.
Solution
use parseAny
let inline (+.) (f : 'T -> 'U) (v : 'T) : SheetEntity<Value> =
let err = fun s -> NoneRequired([message s])
try
f v
|> parseAny err
with
| err -> NoneRequired([Exception err])
Is your feature request related to a problem? Please describe.
When a cell contains "MAR3" with DataType = string
, in Excel the cell data type is set to General
instead of Text
. Therefore it is converted to March-03.
Describe the solution you'd like
The datatype encoded within the model should be correctly converted to Excel types.
Describe alternatives you've considered
None
Without source files in package, fable cannot use FsSpreadsheet.Exceljs
Describe the bug
When writing a spreadsheet file and reading it again, it throws an error regardless of what has been done to the structure/file.
To Reproduce
Steps to reproduce the behavior:
#r "nuget: FsSpreadsheet"
#r "nuget: FsSpreadsheet.ExcelIO"
open FsSpreadsheet
open FsSpreadsheet.ExcelIO
let path = <yourPath>
let wb = new FsWorkbook()
wb.InitWorksheet "empty"
FsWorkbook.toFile path wb
let wbFromFile = FsWorkbook.fromXlsxFile path
System.IO.IOException: Cannot modify readonly container
at System.IO.Packaging.Package.ThrowIfReadOnly()
at System.IO.Packaging.Package.CreatePart(Uri partUri, String contentType, CompressionOption compressionOption)
at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.CreateMetroPart(Uri partUri, String contentType)
at DocumentFormat.OpenXml.Packaging.OpenXmlPart.CreateInternal(OpenXmlPackage openXmlPackage, OpenXmlPart parent, String contentType, String targetExt)
at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.InitPart[T](T newPart, String contentType, String id)
at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.InitPart[T](T newPart, String contentType)
at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.AddNewPartInternal[T]()
at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.AddNewPart[T]()
at FsSpreadsheet.ExcelIO.Spreadsheet.getOrInitSharedStringTablePart(SpreadsheetDocument spreadsheetDocument)
at FsSpreadsheet.ExcelIO.Spreadsheet.getCellsBySheet(Sheet sheet, SpreadsheetDocument spreadsheetDocument)
at FsSpreadsheet.ExcelIO.Spreadsheet.getCellsBySheetID(String sheetID, SpreadsheetDocument spreadsheetDocument)
at [email protected](Sheet xlsxSheet)
at Microsoft.FSharp.Collections.Internal.IEnumerator.map@128.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 134
at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 117
at Microsoft.FSharp.Collections.SeqModule.Fold[T,TState](FSharpFunc`2 folder, TState state, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 914
at FsSpreadsheet.ExcelIO.FsExtensions.FsWorkbook.fromXlsxFile.Static(String filePath)
at <StartupCode$FSI_0006>.$FSI_0006.main@() in C:\Users\olive\Untitled-2:line 15
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Stopped due to error
Expected behavior
Reading should work as expected.
Additional context
It seems to be an error produced by the writer, not the reader. When reading files that were created by Excel, no error happens.
Is your feature request related to a problem? Please describe.
Currently, adding a worksheet to a workbook retrieving it for editing is a two step process, as
myWorkbook.AddWorksheet("SheetName")
returns a unit and
FsWorkbook.addWorksheetWithName "SheetName" myWorkbook
returns the FsWorkbook
Describe the solution you'd like
I would like a functionality that directly returns the newly created FsWorksheet. Maybe even implemented as an ".Item" like cells in rows and rows in sheets?
let myWorksheet = myWorkbook.["NewSheetName"]
which either returns an exisiting sheet or creates a new one.
Not sure if this is an error in usage or execution.
#r "nuget: FsSpreadsheet, 3.1.1"
#r "nuget: FsSpreadsheet.ExcelIO, 3.1.1"
let path = @"C:/Users/Kevin/Desktop/Book1.xlsx"
open FsSpreadsheet
open FsSpreadsheet.ExcelIO
let wb = new FsWorkbook()
let ws = wb.InitWorksheet("New Worksheet")
ws.AddCell(FsCell("My Column 1",address=FsAddress("B1")))
ws.AddCell(FsCell("My Column 2",address=FsAddress("C1")))
let t = FsTable("My New Table", FsRangeAddress("B1:C2"))
ws.AddTable(t)
wb.ToFile(path)
See for example here:
You have high code duplication. You can refactor one of these methods to use the other (most likely the one using the name as string to use the one using the worksheet argument)
Maintainability and smaller surface for bugs. nearly 90% of the logic in these 2 functions are the same, in fact one of them is most likely an adapted copy of the other. If the original version has a bug, the copied one has it as well.
If you rewrite one of them using the other, most of the possible bugs will be catched by testing the inner function
Fable transpiled fsproj to include f# dependencies (&npm dependencies) in relative subfolder fable_modules
.
Describe the bug
It is possible to save an FsWorkbook with no worksheets attached. This is critical since an XLSX spreadsheet file always needs to have at least 1 worksheet with a name given. Else it is not valid and, e.g., MS Excel will tell you it is corrupted when trying to open it.
To Reproduce
Steps to reproduce the behavior:
1.
#r "nuget: FsSpreadsheet"
#r "nuget: FsSpreadsheet.ExcelIO"
open FsSpreadsheet
open FsSpreadsheet.ExcelIO
let path = "test.xlsx"
let wb = new FsWorkbook()
FsWorkbook.toFile path wb
Expected behavior
This is up to debate.
Maybe throw an error when trying to save such a workbook? Or at least give a warning? Is this even a bug at all or do we want to give the library user the freedom to artificially produce invalid XLSX files?
@HLWeil
Is your feature request related to a problem? Please describe.
Atm., we miss some functionality for FsRow
type that is needed in other projects.
Describe the solution you'd like
tryItem
which returns Some item
if present and None
if notgetMaxColNumber
and getMinColNumber
getDenseRow
where all non-existent cells between min and max column number are created as empty cells#r "nuget: FsSpreadsheet, 3.1.1"
#r "nuget: FsSpreadsheet.ExcelIO, 3.1.1"
let path = @"C:/Users/Kevin/Desktop/Book1.xlsx"
open FsSpreadsheet
open FsSpreadsheet.ExcelIO
let wb = FsWorkbook.fromXlsxFile(path)
returns:
> let wb = FsWorkbook.fromXlsxFile(path);;
System.IndexOutOfRangeException: Index was outside the bounds of the array.
at FsSpreadsheet.ExcelIO.Table.Area.toBoundaries(StringValue area)
at FsSpreadsheet.ExcelIO.FsExtensions.FsTable.fromXlsxTable.Static(Table table)
at FsSpreadsheet.ExcelIO.FsExtensions.sheets@186.Invoke(Sheet xlsxSheet)
at Microsoft.FSharp.Collections.Internal.IEnumerator.map@128.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 131
at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 113
at Microsoft.FSharp.Collections.SeqModule.Fold[T,TState](FSharpFunc`2 folder, TState state, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 911
at FsSpreadsheet.ExcelIO.FsExtensions.FsWorkbook.fromXlsxFile.Static(String filePath)
at <StartupCode$FSI_0004>.$FSI_0004.main@() in c:\Users\Kevin\Desktop\test.fsx:line 10
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Stopped due to error
>
Today i killed the whole read/write functions because i did a mistake in Sort(), so i should write a test for this on monday!
Is your feature request related to a problem? Please describe.
In OpenXml's Stylesheet
class, things like CellFormat
are stored. We need these to correctly set a cell we want to save as pure text as it otherwise leads to problems like in #19.
Describe the solution you'd like
Implement styles and map them from the Stylesheet
object.
Describe the bug
Possible Solution
TryGetHeaderRow
and GetHeaderRow
with correct outputCurrently IO tests were created ad-hoc and not really planned. These should get cleaned up and planned out in a senseful way.
We must cover:
So for FsSpreadsheet.ExcelIO we must test:
ForFsSpreadsheet.Exceljs we must test:
Numbers | Strings | DateTime | Boolean | ARCtrl Column | ARCtrl Column |
---|---|---|---|---|---|
1 | Hello | any | true | (A) This is part 1 of 2 | (A) This is part 2 of 2 |
2 | World | any | false | Tests if column names with whitespace at end can be unique | |
3 | Bye | any | true | ||
4 | Outer Space | any | false |
Duplicate this Table to TWO worksheets to also include multiple worksheets.
Due to the nature of these tests, we can recycle most of the test files. We only need one file of any kind for read-in.
All files written by tests should be prefixed with "WRITE_" so they can be deleted at any point, while all other files MUST never be opened AND saved to avoid corrupting their origin.
#r "nuget: FsSpreadsheet, 3.1.1"
#r "nuget: FsSpreadsheet.ExcelIO, 3.1.1"
let path = @"C:/Users/Kevin/Desktop/Book1.xlsx"
open FsSpreadsheet
open FsSpreadsheet.ExcelIO
let wb = new FsWorkbook()
let ws = wb.InitWorksheet("New Worksheet")
ws.AddCell(FsCell("My Column 1",address=FsAddress("B1")))
ws.AddCell(FsCell("My Column 2",address=FsAddress("C1")))
let t = FsTable("My New Table", FsRangeAddress("B1:C2"))
ws.AddTable(t)
// Cell access
t.Cell(FsAddress("B1"), ws.CellCollection).Value // "My Column 2"
t.Cell(FsAddress("A1"), ws.CellCollection).Value // "My Column 1"
t.Cell(FsAddress("C1"), ws.CellCollection).Value // ""
Setup: As you can see i set two cells "B1" and "C1" and create a table over their range.
Now i would expect that the "Cell access" via FsAddress on "B1" should return "My Column 1"
, but instead it returns "My Column 2"
.
Testing the same access for "A1" and "C1" shows, that this cell access, even though it is done via address works relative to table start, which i can only assume is a bug.
I would understand and appreciate this behaviour in a function with index based access but not here.
The correct insert can be checked via:
let b1 = FsAddress("B1")
ws.GetCellAt(b1.RowNumber, b1.ColumnNumber).Value // "My Column 1"
As discussed in last meeting, we need a functionality that updates the annotationTable
in an existing openXML data structure. This would lead to us being able of updating metadata tables while keeping user-defined stuff outside of the tables intact, without the need of implementing #21
writer for #73
This will be postponed for now and pushed to backlog
Describe the bug
When creating FsCells with the constructor and not specifying the address, they are given the location row 0, column 0. This is not valid.
To Reproduce
Steps to reproduce the behavior:
1.
#r "nuget: FsSpreadsheet"
#r "nuget: FsSpreadsheet.ExcelIO"
open FsSpreadsheet
open FsSpreadsheet.ExcelIO
let wb = new FsWorkbook()
FsWorkbook.initWorksheet "test" wb
let ws = wb.GetWorksheets().Head
let cell1 = FsCell("test")
ws.CellCollection.Add [cell1]
FsWorkbook.toFile "test.xlsx" wb
Expected behavior
Should initiate with address row 1, col 1.
A lot of other xlsx readers/writers use row based input for table body, would be nice to have one uniform access method for rows (headerRow, bodyRows) on FsTable.
Describe the bug
Some XLSX files let FsWorkbook.fromXlsxFile
throw an IOException. The reason behind this (the properties of such files) are currently not known.
To Reproduce
Steps to reproduce the behavior:
FsWorkbook.fromXlsxFile <pathToThisFile>
in a script environmentExpected behavior
At least a more precise error message.
Cases like these suggest that a readonly version of the function (see #25) is urgently needed.
Describe the bug
Worksheets have no rows when reading Xlsx file.
To Reproduce
Have an Xlsx file with some values in it.
let wb = FsWorkbook.fromXlsxFile filepath
let ws1 = FsWorkbook.getWorksheets wb |> Seq.head
ws1.Rows
Expected behavior
All FsRows are in place where they should be.
Additional context
The problem here is that you have to call ws1.RescanRows()
before being able to access them. Maybe it would be best to have this called in the FsWorkbook.fromXlsxFile
method? Or is there anything that would speak against it? @HLWeil
Tried reading file with FsSpreadsheet.ExcelIO.
#r "nuget: FsSpreadsheet"
#r "nuget: FsSpreadsheet.ExcelIO"
let path = @"C:/Users/Kevin/Desktop/Book1.xlsx"
open FsSpreadsheet
open FsSpreadsheet.ExcelIO
let wb = FsWorkbook.fromXlsxFile(path)
returns:
> let wb = FsWorkbook.fromXlsxFile(path);;
System.InvalidOperationException: Error in implicit conversion. Cannot convert null object.
at DocumentFormat.OpenXml.BooleanValue.op_Implicit(BooleanValue xmlAttribute)
at FsSpreadsheet.ExcelIO.FsExtensions.FsTable.fromXlsxTable.Static(Table table)
at FsSpreadsheet.ExcelIO.FsExtensions.sheets@185.Invoke(Sheet xlsxSheet)
at Microsoft.FSharp.Collections.Internal.IEnumerator.map@128.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 134
at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 117
at Microsoft.FSharp.Collections.SeqModule.Fold[T,TState](FSharpFunc`2 folder, TState state, IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 914
at FsSpreadsheet.ExcelIO.FsExtensions.FsWorkbook.fromXlsxFile.Static(String filePath)
at <StartupCode$FSI_0017>.$FSI_0017.main@() in c:\Users\Kevin\Desktop\test.fsx:line 37
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Stopped due to error
This is the file:
Book1.xlsx
Currently, the FsSpreadsheet codebase is transpilable to js via Fable, but it is not optimized for it. This results in terrible js code.
We should implement the design principles defined here:
https://github.com/nfdi4plants/ISADotNet/tree/arctrl#fable-compatibility-as-top-priority
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.