Code Monkey home page Code Monkey logo

xlsxio's Introduction

XLSX I/O

Mentioned in Awesome C

Cross-platform C library for reading values from and writing values to .xlsx files.

Description

XLSX I/O aims to provide a C library for reading and writing .xlsx files. The .xlsx file format is the native format used by Microsoft(R) Excel(TM) since version 2007.

Goal

The library was written with the following goals in mind:

  • written in standard C, but allows being used by C++
  • simple interface
  • small footprint
  • portable across different platforms (Windows, *nix)
  • minimal dependencies: only depends on expat (only for reading) and minizip or libzip (which in turn depend on zlib)
  • separate library for reading and writing .xlsx files
  • does not require Microsoft(R) Excel(TM) to be installed

Reading .xlsx files:

  • intended to process .xlsx files as a data table, which assumes the following:
    • assumes the first row contains header names
    • assumes the next rows contain values in the same columns as where the header names are supplied
    • only values are processed, anything else is ignored (formulas, layout, graphics, charts, ...)
  • the entire shared string table is loaded in memory (warning: could be large for big spreadsheets with a lot of different values)
  • supports .xlsx files without shared string table
  • worksheet data itself is read on the fly without the need to buffer data in memory
  • 2 methods are provided
    • a simple method that allows the application to iterate through rows and cells
    • an advanced method (with less overhead) which calls callback functions for each cell and after each row

Writing .xlsx files:

  • intended for writing data tables as .xlsx files, which assumes the following:
    • only support for writing data (no support for formulas, layout, graphics, charts, ...)
    • no support for multiple worksheets (only one worksheet per file)
  • on the fly file generation without the need to buffer data in memory
  • no support for shared strings (all values are written as inline strings)

Libraries

The following libraries are provided:

  • -lxlsxio_read - library for reading .xlsx files, requires #include <xlsxio_read.h>
  • -lxlsxio_write - library for writing .xlsx files, requires #include <xlsxio_write.h>
  • -lxlsxio_readw - experimental library for reading .xlsx files, linked with -lexpatw, requires #define XML_UNICODE before #include <xlsxio_read.h>

Command line utilities

Some command line utilities are included:

  • xlsxio_xlsx2csv - converts all sheets in all specified .xlsx files to individual CSV (Comma Separated Values) files.
  • xlsxio_csv2xlsx - converts all specified CSV (Comma Separated Values) files to .xlsx files.

Dependencies

This project has the following dependencies:

Note that minizip is preferred, as there have been reports that .xlsx files generated with XLSX I/O built against libzip can't be opened with LibreOffice.

There is no dependency on Microsoft(R) Excel(TM).

XLSX I/O was written with cross-platform portability in mind and works on multiple operating systems, including Windows, macOS and Linux.

Building from source

Requirements:

  • a C compiler like gcc or clang, on Windows MinGW and MinGW-w64 are supported
  • the dependency libraries (see Dependencies)
  • a shell environment, on Windows MSYS is supported
  • the make command
  • CMake version 2.6 or higher (optional, but preferred)

There are 2 methods to build XLSX I/O:

  • using the basic Makefile included
  • using CMake (preferred)

Building with make

  • build and install by running make install optionally followed by:
    • PREFIX=<path> - Base path were files will be installed (defaults to /usr/local)
    • WITH_LIBZIP=1 - Use libzip instead of minizip
    • WIDE=1 - Also build UTF-16 library (xlsxio_readw)
    • STATICDLL=1 - Build a static DLL (= doesn't depend on any other DLLs) - only supported on Windows

Building with CMake (preferred method)

  • configure by running cmake -G"Unix Makefiles" (or cmake -G"MSYS Makefiles" on Windows) optionally followed by:
    • -DCMAKE_INSTALL_PREFIX:PATH=<path> Base path were files will be installed
    • -DBUILD_STATIC:BOOL=OFF - Don't build static libraries
    • -DBUILD_SHARED:BOOL=OFF - Don't build shared libraries
    • -DBUILD_TOOLS:BOOL=OFF - Don't build tools (only libraries)
    • -DBUILD_EXAMPLES:BOOL=OFF - Don't build examples
    • -DWITH_LIBZIP:BOOL=ON - Use libzip instead of Minizip
    • -DLIBZIP_DIR:PATH=<path> - Location of libzip library
    • -DMINIZIP_DIR:PATH=<path> - Location of Minizip library
    • -DWITH_WIDE:BOOL=ON - Also build UTF-16 library (libxlsxio_readw)
  • build and install by running make install (or make install/strip to strip symbols)

Prebuilt binaries

Prebuilt binaries are also available for download for the following platforms:

  • Windows 32-bit
  • Windows 64-bit

Both Windows versions were built using the MinGW-w64 under an MSYS2 shell. To link with the .dll libraries from Microsoft Visual C++ you need a .lib file for each .dll. This file can be generated using the lib tool that comes with Microsoft Visual C++.

For 32-bit Windows:

cd lib
lib /def:libxlsxio_write.def /out:libxlsxio_write.lib /machine:x86
lib /def:libxlsxio_read.def /out:libxlsxio_read.lib /machine:x86
lib /def:libxlsxio_readw.def /out:libxlsxio_readw.lib /machine:x86

For 64-bit Windows:

cd lib
lib /def:libxlsxio_write.def /out:libxlsxio_write.lib /machine:x64
lib /def:libxlsxio_read.def /out:libxlsxio_read.lib /machine:x64
lib /def:libxlsxio_readw.def /out:libxlsxio_readw.lib /machine:x64

Example C programs

Reading from an .xlsx file

#include <xlsxio_read.h>
//open .xlsx file for reading
xlsxioreader xlsxioread;
if ((xlsxioread = xlsxioread_open(filename)) == NULL) {
  fprintf(stderr, "Error opening .xlsx file\n");
  return 1;
}

//read values from first sheet
char* value;
xlsxioreadersheet sheet;
const char* sheetname = NULL;
printf("Contents of first sheet:\n");
if ((sheet = xlsxioread_sheet_open(xlsxioread, sheetname, XLSXIOREAD_SKIP_EMPTY_ROWS)) != NULL) {
  //read all rows
  while (xlsxioread_sheet_next_row(sheet)) {
    //read all columns
    while ((value = xlsxioread_sheet_next_cell(sheet)) != NULL) {
      printf("%s\t", value);
      xlsxioread_free(value);
    }
    printf("\n");
  }
  xlsxioread_sheet_close(sheet);
}

//clean up
xlsxioread_close(xlsxioread);

Listing all worksheets in an .xlsx file

#include <xlsxio_read.h>
//open .xlsx file for reading
xlsxioreader xlsxioread;
if ((xlsxioread = xlsxioread_open(filename)) == NULL) {
  fprintf(stderr, "Error opening .xlsx file\n");
  return 1;
}

//list available sheets
xlsxioreadersheetlist sheetlist;
const char* sheetname;
printf("Available sheets:\n");
if ((sheetlist = xlsxioread_sheetlist_open(xlsxioread)) != NULL) {
  while ((sheetname = xlsxioread_sheetlist_next(sheetlist)) != NULL) {
    printf(" - %s\n", sheetname);
  }
  xlsxioread_sheetlist_close(sheetlist);
}

//clean up
xlsxioread_close(xlsxioread);

Writing to an .xlsx file

#include <xlsxio_write.h>
//open .xlsx file for writing (will overwrite if it already exists)
xlsxiowriter handle;
if ((handle = xlsxiowrite_open(filename, "MySheet")) == NULL) {
  fprintf(stderr, "Error creating .xlsx file\n");
  return 1;
}

//write column names
xlsxiowrite_add_column(handle, "Col1", 16);
xlsxiowrite_add_column(handle, "Col2", 0);
xlsxiowrite_next_row(handle);

//write data
int i;
for (i = 0; i < 1000; i++) {
  xlsxiowrite_add_cell_string(handle, "Test");
  xlsxiowrite_add_cell_int(handle, i);
  xlsxiowrite_next_row(handle);
}

//close .xlsx file
xlsxiowrite_close(handle);

License

XLSX I/O is released under the terms of the MIT License (MIT), see LICENSE.txt.

This means you are free to use XLSX I/O in any of your projects, from open source to commercial.

This library does not require Microsoft(R) Excel(TM) to be installed.

xlsxio's People

Contributors

brechtsanders avatar danarnold avatar jorbakk avatar pospelove avatar remicollet avatar salvor-hardin avatar taozuhong avatar ujifman avatar viest avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xlsxio's Issues

Mid cell font change - screws up sharedstrings

When a font change occurs mid cell (in my case some of the characters were in italics), there are more than one records in an record. However excel refers back to this as a single record, so the record in question is truncated and it throws off any record whose sharedstring is in the table after it.

I made an adjustment to append the strings together, by adding the following code:

BOOL bStringStarted=FALSE;

void shared_strings_callback_find_shared_stringitem_start (void* callbackdata, const XML_Char* name, const XML_Char** atts)
{
struct shared_strings_callback_data* data = (struct shared_strings_callback_data*)callbackdata;
bStringStarted=FALSE;

/*****/

if (stricmp(name, "t") == 0) {
if (bStringStarted){
char* sTemp=data->sharedstrings->strings[data->sharedstrings->count-1];
sTemp=(char*)realloc(sTemp, strlen(sTemp)+data->textlen+1);
strncat(sTemp,data->text,data->textlen);
data->sharedstrings->strings[data->sharedstrings->count-1]=sTemp;
}
else{
sharedstringlist_add_buffer(data->sharedstrings, data->text, data->textlen);
bStringStarted=TRUE;
}

Current locale can corrupt the xlsx-file

If the application uses the static library and set the locale to use a comma instead of a dot for decimals, then the formatting in the xlsxio will produce corrupt xlsx-files since for example the width of the columns will be "19,5" instead of "19.5".

So formatting should use the a locale that gives a decimal dot. Either set the locale and restore the old one when done saving or use a format function that takes a locale as a parameter.

Could NOT find Minizip (missing: MINIZIP_LIBRARIES MINIZIP_INCLUDE_DIRS)

OS: Linux Ubuntu

This is my log when i run cmake .

CMake Error at /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
Could NOT find Minizip (missing: MINIZIP_LIBRARIES MINIZIP_INCLUDE_DIRS)
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
CMake/FindMinizip.cmake:18 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
CMakeLists.txt:38 (FIND_PACKAGE)

Could any one fix this error ?

Building with Zlib from Source

I am having a hard time compiling because I want to use Zlib (or miniz) from source instead of relying on the library to be installed on the host.

Can anyone post any pointers on how to alter xlsxio's cmake file so that, instead of FIND_PACKAGE for zlib (or miniz) we are compiling the zlin/miniz from a subdirectory?

I've spent a few hours on this without success. I keep coming up with zip.h header not found. I am not a Cmake expert and xlsxio's cmake file is more complicated than what I'm used to.

Thank you.

No support for /usr/local/ install Prefix

I believe the FindLibZip.cmake should also include the following changes to find libzip with the /usr/local/ prefix. You already support /opt/local/include, and /usr/local/include is frequently used by non-system packages.

From:
FIND_PATH(LIBZIP_INCLUDE_DIR_zh NAMES zip.h PATHS /include /usr/include /opt/local/include) FIND_PATH(LIBZIP_INCLUDE_DIR_zch NAMES zipconf.h PATHS /lib/libzip/include /usr/lib/libzip/include /opt/local/lib/libzip/include)
To:
FIND_PATH(LIBZIP_INCLUDE_DIR_zh NAMES zip.h PATHS/include /usr/include /usr/local/include /opt/local/include) FIND_PATH(LIBZIP_INCLUDE_DIR_zch NAMES zipconf.h PATHS /lib/libzip/include /usr/lib/libzip/include /usr/local/lib/libzip/include /opt/local/lib/libzip/include)

xlsxio fails if rels paths are 'absolute'

I'm not sure whether or not this is allowed in the xlsx spec, but my same difficult file from #28 specified it's paths like this

<?xml version="1.0" encoding="utf-8"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="/xl/worksheets/sheet1.xml" Id="R4f1c94c378d64409" />
    <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="/xl/styles.xml" Id="R48c49c2535d74bcc" />
    <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings" Target="/xl/sharedStrings.xml" Id="R5c29e56ada2e432c" />
</Relationships>

This failed in xlsxio and was fixed like this:

//join basepath and filename (caller must free result)
XML_Char* join_basepath_filename (const XML_Char* basepath, const XML_Char* filename)
{
  XML_Char* result = NULL;
  if (filename && *filename) {
    size_t basepathlen = (basepath ? XML_Char_len(basepath) : 0);
    size_t filenamelen = XML_Char_len(filename);
    if ((result = XML_Char_malloc(basepathlen + filenamelen + 1)) != NULL) {

        if(filename[0] == '/') {
            // the filename is written as an 'absolute' path, we need to
            // remove the leading slash, but otherwise return as-is
            XML_Char_poscpy(result, 0, filename + 1, filenamelen - 1);
            result[filenamelen - 1] = 0;
            return result;
        }
        
        if (basepathlen > 0)
        XML_Char_poscpy(result, 0, basepath, basepathlen);
      XML_Char_poscpy(result, basepathlen, filename, filenamelen);
      result[basepathlen + filenamelen] = 0;
    }
  }
  return result;
}

Get max row/column

It would be good with at function that returns number of columns and rows in a sheet, so you could reserve space for the date before reading the sheet.

There is an optional dimension attribute to read.

<worksheet>
 <dimension ref="A1:B20"/>
 <sheetViews>
  <sheetView ...

If the dimension attribute is not available, a rough estimate is to look a cells for the first row and just call while(next_row) ++rows; for rows if there is not precalculated value in the xlsx file.

can not use both read/write dll in one exe?

I wirte one test.exe agains libxlsxio dll, it's working well respectively on libxlsxio_read.dll and libxlsxio_write.dll
but if I combine write and read in one application, error will report

environment
windows 7
vc2012
lib: libxlsxio_read.dll.a/libxlsxio_write.dll.a

sample code:

#include <xlsxio_read.h>
#include <xlsxio_write.h>
#include <stdio.h>
#include <string>
#include <Windows.h>
using namespace std;

int read_xlsx() {
  //open .xlsx file for reading
xlsxioreader xlsxioread;
if ((xlsxioread = xlsxioread_open("d:\\0801_test.xlsx")) == NULL) {
  fprintf(stderr, "Error opening .xlsx file\n");
  return 1;
}

//read values from first sheet
char* value;
xlsxioreadersheet sheet;
const char* sheetname = NULL;
printf("Contents of first sheet:\n");
if ((sheet = xlsxioread_sheet_open(xlsxioread, sheetname, XLSXIOREAD_SKIP_EMPTY_ROWS)) != NULL) {
  //read all rows
  while (xlsxioread_sheet_next_row(sheet)) {
    //read all columns
    while ((value = xlsxioread_sheet_next_cell(sheet)) != NULL) {
      printf("%s\t", value);
      //free(value);
    }
    printf("\n");
  }
  xlsxioread_sheet_close(sheet);
}

//clean up
xlsxioread_close(xlsxioread);

return 0;
}

int write_xlsx() {
//open .xlsx file for writing (will overwrite if it already exists)
xlsxiowriter handle;
if ((handle = xlsxiowrite_open("d:\\test_xlsx.xlsx", "MySheet")) == NULL) {
  fprintf(stderr, "Error creating .xlsx file\n");
  return 1;
}

//write column names
xlsxiowrite_add_column(handle, "Col1", 16);
xlsxiowrite_add_column(handle, "Col2", 0);
xlsxiowrite_next_row(handle);

//write data
int i;
for (i = 0; i < 1000; i++) {
  xlsxiowrite_add_cell_string(handle, "test");
  xlsxiowrite_add_cell_int(handle, i);
  xlsxiowrite_next_row(handle);
}

//close .xlsx file
xlsxiowrite_close(handle);

return 0;
}

int main() {
  read_xlsx();
  write_xlsx();

return 0;
}

Crashes opening this file

For me, xlsxio crashes on the attached file. I have a highly customized build, though, so I would be interested to know if you can open this file with xlsxio. (My build is wonky because I'm using xlsxio in a Node.js addon, here.)

* thread #10, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00007fff65e1a232 libsystem_c.dylib`strlen + 18
    frame #1: 0x000000010aadf086 xlsx.node`unzLocateFile(file=0x0000000104c13140, szFileName=0x0000000000000000, iCaseSensitivity=0) at unzip.c:1253
    frame #2: 0x000000010aaa6fef xlsx.node`XML_Char_openzip(archive=0x0000000104c13140, filename=0x0000000000000000, flags=0) at xlsxio_read.c:43
    frame #3: 0x000000010aaa6e52 xlsx.node`expat_process_zip_file(zip=0x0000000104c13140, filename=0x0000000000000000, start_handler=(xlsx.node`shared_strings_callback_find_sharedstringtable_start at xlsxio_read_sharedstrings.c:125), end_handler=0x0000000000000000, data_handler=0x0000000000000000, callbackdata=0x0000700007a30ec0, xmlparser=0x0000700007a30ec0) at xlsxio_read.c:139
    frame #4: 0x000000010aaa9ade xlsx.node`xlsxioread_process(handle=0x0000000104c058b0, sheetname=0x0000000000000000, flags=129, cell_callback=0x0000000000000000, row_callback=0x0000000000000000, callbackdata=0x0000000104c1cb50) at xlsxio_read.c:1223
  * frame #5: 0x000000010aaaa05f xlsx.node`xlsxioread_sheet_open(handle=0x0000000104c058b0, sheetname=0x0000000000000000, flags=1) at xlsxio_read.c:1350
    frame #6: 0x000000010aa75ff8 xlsx.node`xlsx::extractAllRows(sheetname=0x0000000000000000, xreader=0x0000700007a349e8, hasHeaders=false, headerTransformMap=size=0, deletes=size=0, doPascalCase=false, pascalWords=size=0) at XlsxReaderFunctions.h:73
    frame #7: 0x000000010aa73674 xlsx.node`xlsx::extractAllData(filename="/Users/mjb/Documents/repos/xlsx-util/tests/sci-bug.xlsx", hasHeaders=false, headerTransformMap=size=0, deletes=size=0, doPascalCase=false, pascalWords=size=0) at XlsxReaderFunctions.h:52
    frame #8: 0x000000010aa72b7c xlsx.node`xlsx::AsyncReader::Execute(this=0x00000001061139f0) at AsyncReader.cpp:34
    frame #9: 0x000000010aa700c9 xlsx.node`Napi::AsyncWorker::OnExecute(env=0x0000000104d2bbb0, this_pointer=0x00000001061139f0) at napi-inl.h:3170
    frame #10: 0x00000001000937ff node`(anonymous namespace)::uvimpl::Work::ExecuteCallback(req=0x000000010610dd60) at node_api.cc:3363
    frame #11: 0x000000010171c201 node`uv__queue_work(w=0x000000010610ddb8) at threadpool.c:259
    frame #12: 0x000000010171c8e6 node`worker(arg=0x0000000000000000) at threadpool.c:83
    frame #13: 0x0000000104b65665 libsystem_pthread.dylib`_pthread_body + 340
    frame #14: 0x0000000104b65511 libsystem_pthread.dylib`_pthread_start + 377
    frame #15: 0x0000000104b64bfd libsystem_pthread.dylib`thread_start + 13

image

sci-bug.xlsx

Tutorial x64 application failed to start.

Tried with the tutorial code to read xlsx file with VC 2017 in x64 debug mode. Libs were generated from defs and the x64 dlls( including dependencies) were also downloaded . Compiled and built without any error on VC 2017 in 64 bit, but application won't start. Error dialog displays - " The application was unable to start correctly (0xc000007b).". Any help will be appreciated.

#include <xlsxio_read.h>

int main() {

	//open .xlsx file for reading
	xlsxioreader xlsxioread;
	if ((xlsxioread = xlsxioread_open("Transmission_Log.xlsx")) == NULL) {
		fprintf(stderr, "Error opening .xlsx file\n");
		return 1;
	}

	//read values from first sheet
	char* value;
	xlsxioreadersheet sheet;
	const char* sheetname = NULL;
	printf("Contents of first sheet:\n");
	if ((sheet = xlsxioread_sheet_open(xlsxioread, sheetname, XLSXIOREAD_SKIP_EMPTY_ROWS)) != NULL) {
		//read all rows
		while (xlsxioread_sheet_next_row(sheet)) {
			//read all columns
			while ((value = xlsxioread_sheet_next_cell(sheet)) != NULL) {
				printf("%s\t", value);
				free(value);
			}
			printf("\n");
		}
		xlsxioread_sheet_close(sheet);
	}

	//clean up
	xlsxioread_close(xlsxioread);

}

CMakeLists modification

First: Thank you very much for the CMakeLists.txt - it makes live a lot easier.
I have a little optimization. Can you Replace line 95 - 109 with:

# add a target to generate API documentation with Doxygen
OPTION(BUILD_DOCUMENTATION "Create and install API documentation (requires Doxygen)" ${DOXYGEN_FOUND})

IF(BUILD_DOCUMENTATION)
  IF(NOT DOXYGEN_FOUND)
    MESSAGE(FATAL_ERROR "Doxygen is needed to build the documentation.")
  ENDIF()

  ADD_CUSTOM_TARGET(doc ALL
    COMMAND ${DOXYGEN_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile
    #WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
    WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
    COMMENT "Generating API documentation with Doxygen"
    VERBATIM
  )
  INSTALL(DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/doc/man
    DESTINATION .
  )
ENDIF()

What does it do? One can select BUILD_DOCUMENTATION as an option. It is preselected, if Doxygen was found and unselected, if not.

xlsxio fails in the presence of xmlnamespace qualification

The same proprietary file that I mentioned in #28 also includes namespace qualification on the elements in several of its xml files. For example

<?xml version="1.0" encoding="utf-8"?>
<x:sst xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <x:si>
<x:t>NPANXX</x:t></x:si><x:si>
<x:t>JUR</x:t></x:si><x:si>
<x:t>CARRIERGRP</x:t></x:si><x:si>
<x:t>DETAILED_CARRIERGRP</x:t></x:si><x:si>
<x:t>ST</x:t></x:si><x:si>
<x:t>LATA</x:t></x:si><x:si>
...

xlsxio assumes the author of an xlsx file will not do this, but it is valid for them to do so.

I hack-fixed this in my fork by changing every comparison like this:

if ((XML_Char_icmp(name, X("sst")) == 0) || (XML_Char_icmp(name, X("x:sst")) == 0)) {

But this is just a quick workaround, since any namespace is possible, I just needed to solve for a particular vendor that was using x.

A proper solution would be to only compare the element name's ending substring, i.e. (pseudocode)

int len = "sst'.length();
if( element.name.last(len) == "sst")

This way xlsxio could just ignore namespaces all-together without failing in their presence.

Feature Request: Support Compiling Entirely from Source

Embed minizip and expat sourcecode (licenses are permissive) and build everything from source using Cmake. Use a flag in Cmake for the user to choose between building entirely from source vs. using FIND_PACKAGES. For those use-cases where the runtime environment is not controllable or known, it should be possible to create a binary that does not rely on the installation and runtime linking of these installed packages.

CMakeLists support

Hey,
I think your code has some great potential and I see you develop it actively.
Would you mind to add a CMakeLists.txt instead (or in addition) of a Makefile? It's much more Cross-platform friendly as I still have troubles compiling it under Windows.

Thank you very much in advance

xlsxiowrite_add_column,xlsxiowrite_add_cell_string issue

I do a simple test using below code snippet. The test.xlsx once I opened has problem after column Z. Column AA is blank. 26 is at column AB. 27 at column BB, 28 at column CB, 29 at column DB. It appear to insert randomly after column Z although code wise I just keep calling xlsxiowrite_add_column,xlsxiowrite_add_cell_string in a sequential for loop. I am using xlsxio-0.2.16-binary-win64.zip running on Windows.

xlsxiowriter xlsxiowrite = xlsxiowrite_open("test.xlsx","Sheet1");
char value[3];
for(int i=0; i<30; i++) {
	sprintf(value, "%d", i);
	//printf("%s\n",str);
	xlsxiowrite_add_column(xlsxiowrite, value, 0);
	//xlsxiowrite_add_cell_string(xlsxiowrite,value);
}
xlsxiowrite_close(xlsxiowrite);

test.xlsx

LIBREOFFICE

The resulting xlsx on the demo doesn't open on libreoffice

Reading unicode?

Will your library correctly read unicode strings out of an XLSX file? I notice that the functions are declared as returning a char pointer <char* xlsxioread_sheet_next_cell()>

Procedure entry point xlsxiowrite_* could not be located in the dll

Seems all exports from xlsxiowrite result in this same error. Tried x86 / x64 and various versions of each.

Full error is:
The procedure entry point xlsxiowrite_* could not be located in the dynamic link library C:\path\to\exe\app.exe

I believe everything is linked fine, xlsxio_read works fine. Any ideas what may cause this? depends seems to show that the exports are there.

Support binding.gyp compilation for Node.js Addon Use

Sadly, Node.js does not use cmake for native code addons, instead it uses gyp and the equivilant of a CmakeLists.txt file is binding.gyp. Including a binding.gyp and testing its validity in continuous integration could boost popularity of the library.

unix only headers

Hey,
your project uses two unix only includes:

  • inttypes.h
  • unistd.h

For the unistd.h in lib/xlsxio_write.c there is an easy solution, as there is an equivalent header. Replace unistd.h (line 8) with:

#ifdef _WIN32
    #include <io.h>
#else
    #include <unistd.h>
#endif

For inttypes.h in lib/xlsxio_write.c and lib/xlsxio_read.c I'm still searching for a solution, as it is not officially in windows, there is a port here, which I'm not a big fan of. It would be better to find the used defines and define them with an #ifdef like the code snipped above.
e.g. (probably did not get all defines)

#ifdef _WIN32
    #define PRIi64 "I64i"
#else
    #include <inttypes.h>
#endif

datetime cell recognize as integer value

Dear brechtsanders:
when using yours projects, the cell value actually is 2013/3/24, but after transferring into csv file , the value becames an integer value "41357",the project version is xlsxio-0.2.8, are there any solutions?

date-time conversion using xlsx to csv converter

First of all, thanks for developing this application. It is very useful.
I compiled the code in Ubuntu 14.04 and ran the xlsxio_xlsx2csv program on an xlsx file. The first column of the file consists of a date string in the format mm/dd/yyyy hh:mm:ss. The csv file appears to contain Microsoft timestamps. See attached zip file containing xlsx and csv files. For example, 1/1/2014 00:00:00 gets converted to 41640.
It would be useful to dump out the time stamp as a string or have the option of getting a linux time stamp.
Thanks.
0.zip

no supper xlsm file type

filter :application/vnd.ms-excel.sheet.macroEnabled.main+xml
vs application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml
can you supper with this?

how to use with visual studio

Hi,

I am new to C/C++ and I am using visual studio. I find your project very cool, but I have no clue of how to include this library into my project. I was wondering if I could have some step by step guidance. Any links to external learning resource would also be appreciated!

Many thanks,

Jason

xlsxioread_sheet_next_cell_datetime issue

When the Excel cell is datetime type and it is 1/1/1900, the time_t returned is 1-Jan-1970. Even if the cell is blank or empty still return 1-Jan-1970.

cell blank/empty xlsxioread_sheet_next_cell_datetime return time_t which is 1-Jan-1970
cell is 1/1/1900 xlsxioread_sheet_next_cell_datetime return time_t which is 1-Jan-1970

Can we have a prev version of xlsxioread_sheet_next_cell_datetime? e.g xlsxioread_sheet_prev_cell_datetime <- this allow us to reference the previous cell datetime

Create Integration Tests and Set-up Continuous Integration

This project would benefit greatly from an automated test suite running on travis or circleci. Users of the library should be encouraged to submit files they have which fail in xlsxio, and these should be added to the test suite.

p.s. I have some experience with CircleCI and might be able to get it building there. I am a C++ programmer (not a C programmer), but I could probably get testing working with Catch

I am considering writing my own xlsx solution using C++ (pugixml), but maybe it would be better to stick with this project and get it in shape. I wouldn't mind becoming a contributor here. Right now I have xlsxio running in production and it's been a painful couple of days, so I'd like to sleep better at night with a well tested and code-covered xlsx library.

De-Suckify Scientific Notation

It would be great if the library had an option to convert the scientific notation into decimal notation. I have a reasonably performant algorithm in C++ here:

https://github.com/bitflip-software/xlsx-util/blob/efab41e9b6d62845dffc09db333f26e8d05ece61/cpp/Val.cpp

    bool
    Val::isScientificOrDecimal( const std::string& inVal, double& outVal )
    {
        outVal = 0.0;
        SciPart part = SciPart::start;
        std::stringstream base;
        std::stringstream exp;

        for( const auto c : inVal )
        {
            if( part == SciPart::start )
            {
                if( c == '-' )
                {
                    base << c;
                    part = SciPart::baseSign;
                }
                else if( std::isdigit( static_cast<unsigned char>( c ) ) )
                {
                    part = SciPart::baseInt;
                    base << c;
                }
                else
                {
                    return false;
                }
            }
            else if( part == SciPart::baseSign )
            {
                if( !std::isdigit( static_cast<unsigned char>( c ) ) )
                {
                    return false;
                }
                else
                {
                    part = SciPart::baseInt;
                    base << c;
                }
            }
            else if( part == SciPart::baseInt )
            {
                if( c == 'e' || c == 'E' )
                {
                    part = SciPart::e;
                }
                else if( c == '.' )
                {
                    base << c;
                    part = SciPart::basePoint;
                }
                else if( std::isdigit( static_cast<unsigned char>( c ) ) )
                {
                    base << c;
                }
                else
                {
                    return false;
                }
            }
            else if( part == SciPart::basePoint )
            {
                if( !std::isdigit( static_cast<unsigned char>( c ) ) )
                {
                    return false;
                }
                else
                {
                    part = SciPart::baseDec;
                    base << c;
                }
            }
            else if( part == SciPart::baseDec )
            {
                if( c == 'E' || c == 'e' )
                {
                    part = SciPart::e;
                }
                else if( std::isdigit( static_cast<unsigned char>( c ) ) )
                {
                    base << c;
                }
                else
                {
                    return false;
                }
            }
            else if( part == SciPart::e )
            {
                if( c == '-' )
                {
                    exp << c;
                    part = SciPart::expSign;
                }
                else if( std::isdigit( static_cast<unsigned char>( c ) ) )
                {
                    part = SciPart::expInt;
                    exp << c;
                }
                else
                {
                    return false;
                }
            }
            else if( part == SciPart::expSign )
            {
                if( !std::isdigit( static_cast<unsigned char>( c ) ) )
                {
                    return false;
                }
                else
                {
                    part = SciPart::expInt;
                    exp << c;
                }
            }
            else if( part == SciPart::expInt )
            {
                if( !std::isdigit( static_cast<unsigned char>( c ) ) )
                {
                    return false;
                }
                else
                {
                    exp << c;
                }
            }
            else
            {
                return false;
            }
        }

        const std::string b = base.str();
        const std::string e = exp.str();
        
        if( b.empty() )
        {
            return false;
        }

        if( e.empty() )
        {
            // optimization, this is probably a decimal number so we will handle it
            const double shortcutDouble = std::stod( b );
            outVal = shortcutDouble;
            return true;
        }
        
        const double baseDouble = std::stod( b );
        const double expDouble = std::stod( e );
        const double multDouble = std::pow( 10.0, expDouble );
        const double valueDouble = baseDouble * multDouble;
        outVal = valueDouble;
        return true;
    }

Determine cell type

It would be very useful to be able to question what cell type something is, if it should be used as a string, number or date, for example to know if xlsxioread_sheet_next_cell_float or some other function should be called.

Issues when using library compiled with Unicode support.

I've noticed a few issues when compiling on windows with XML_UNICODE_WCHAR_T defined. In this case, the "XML_Char" typedef refers to a wchar_t rather than char. The xlsxio code then calls stricmp, passing in a const char* and a const wchar_t*, and so the input is truncated and the results are incorrect. Has this been noticed before, or am I missing a #define that would fix it?

Error loading workbook with multiple sheets

Hi,

Thanks for a your hard work!

I did find a small glitch and I hope you will be able to find and fix the problem.

If I traverse through a file with 3 sheets, I get strange and unpredictable behavior.

Sometimes (not on every run), in one or more of the sheets (different with every run), the first call to xlsxioread_sheet_next_row returns returns NULL, indicating that the sheet is empty.
It shouldn't be hard to reproduce. I have create a workbook (attached as f.xlsx) with 3 sheets, with "a" in the first cell of the first sheet, "b" in the first cell of the second sheet and "c" in the first cell of the third sheet.

Use the function from the attached file (f.txt) to reproduce the problem.
Run it a couple of times.
Sometimes, one or more of the sheets will show 0 rows....

Kind regards,
Goswinus

f.xlsx
f.txt

Memory leak

Hi,
Thanks for your library.
I found a memory leak in the function below:

void sharedstringlist_destroy (struct sharedstringlist* sharedstrings)
{
  if (sharedstrings) {
    size_t i;
    for (i = 0; i < sharedstrings->numstrings; i++)
      free(sharedstrings->strings[i]);
    free(sharedstrings->strings); // this should be added otherwise memory leaked
    free(sharedstrings);
  }
}

Problem with filters

Hi,

it seems that the filters in the column headers occurs error. (xlsxio_xlsx2csv.exe)

core dump when open a sheet named in unicode(Chinese)

my xlsx file has a only sheet with a chinese name('批量导入'), and my code core-dumped at 'xlsxioreadersheet sheet = xlsxioread_sheet_open(xlsxioread, NULL, XLSXIOREAD_SKIP_EMPTY_ROWS);', but it runs ok on an ascii-code named sheet.

what does 75B9F350 error code mean?

I'm using slightly modified example code

std::string analyze(path filename)
{
	int c = 0;
	xlsxioreader reader;
	std::string result;

	reader = xlsxioread_open(filename.string().c_str());
	if (reader == NULL)
	{
		std::cout << stderr << " error opening file " << filename << std::endl;
		return result;
	}

	xlsxioreadersheetlist sheetlist;
	if ((sheetlist = xlsxioread_sheetlist_open(reader)) != NULL)
	{
		while ((xlsxioread_sheetlist_next(sheetlist)) != NULL)
		{
			c++;
		}
		xlsxioread_sheetlist_close(sheetlist);
		std::cout << filename.string() << " contains " << c << " sheets" << std::endl;
	}
	return result;
}

Is there a way to discard phonetic string ?

Hi,
First, thank you for this library ! Among the several libraries I tested in order to parse excel spreadsheets, yours is the best one.

However, I'm experiencing issue when parsing Japanese spreadsheet, with xlsxio_read returning both text string and phonetic string appended as cell content.

Example : following XML code in xl\sharedStrings.xml :

<si>
    <t>共通</t>
    <rPh sb="0" eb="2"><t>キョウツウ</t></rPh>
    <phoneticPr fontId="4"/>
</si>

Would be displayed as "共通" in excel spreadsheet, but is retrieved as "共通キョウツウ" as value string in sheet_cell_callback function by xlsxio.
Majority of users manipulating Asian spreadsheets would prefer to retrieve text string only, so is it possible to modify xlsxio in order to do this ?

Thank you.

Segfault in iterate_files_by_contenttype_expat_callback_element_start

Given a certain file (which is unfortunately contains proprietary data), created by a client's business intelligence systems (i.e. not created by Microsoft Excel)...

During sheet_open, I ran into serious issues with the else if (XML_Char_icmp(name, X("Default")) == 0) branch of iterate_files_by_contenttype_expat_callback_element_start.

Foremost was that you are explicitly passing NULL as the second argument to unzGetCurrentFileInfo, which, at least in my version of minizip (remember that my build is wonky), crashes because the minizip function dereferences this without checking for NULL. I fixed this by allocating and then freeing a temporary struct of the correct type even though we ignore its values.

Once that crash was fixed, I found that this while loop was endless because this call set the minizip file pointer back to file index 0 each time data->filecallbackfn(data->zip, filename, contenttype, data->filecallbackdata);

I hack-fixed this by maintaining my own counter of which file index we should be on and re-incrementing the minizip file pointer each pass through the loop like this:

              // prevent the endless loop
              int looperx = 0;
              for( looperx = 0; looperx < fileindex && status == UNZ_OK; ++looperx ) {
                  status = unzGoToNextFile(data->zip);
              }

The complete solution is a mess, but is given below. Ignore doFindRelsFirst that was a wild goose chase that I haven't deleted out yet.

I'll try to figure out how to create a test file that demonstrates these issues without the proprietary data.

//callback function definition for use with iterate_files_by_contenttype
typedef void (*contenttype_file_callback_fn)(ZIPFILETYPE* zip, const XML_Char* filename, const XML_Char* contenttype, void* callbackdata);

struct iterate_files_by_contenttype_callback_data {
  ZIPFILETYPE* zip;
  const XML_Char* contenttype;
  contenttype_file_callback_fn filecallbackfn;
  void* filecallbackdata;
    bool doFindRelsFirst;
};

//expat callback function for element start used by iterate_files_by_contenttype
void iterate_files_by_contenttype_expat_callback_element_start (void* callbackdata, const XML_Char* name, const XML_Char** atts)
{
  struct iterate_files_by_contenttype_callback_data* data = (struct iterate_files_by_contenttype_callback_data*)callbackdata;
  if (XML_Char_icmp(name, X("Override")) == 0) {
    //explicitly specified file
    const XML_Char* contenttype;
    const XML_Char* partname;
    if ((contenttype = get_expat_attr_by_name(atts, X("ContentType"))) != NULL && XML_Char_icmp(contenttype, data->contenttype) == 0) {
      if ((partname = get_expat_attr_by_name(atts, X("PartName"))) != NULL) {
        if (partname[0] == '/')
          partname++;
        data->filecallbackfn(data->zip, partname, contenttype, data->filecallbackdata);
      }
    }
  } else if (XML_Char_icmp(name, X("Default")) == 0) {
    //by extension
    const XML_Char* contenttype;
    const XML_Char* extension;
    if ((contenttype = get_expat_attr_by_name(atts, X("ContentType"))) != NULL && XML_Char_icmp(contenttype, data->contenttype) == 0) {
      if ((extension = get_expat_attr_by_name(atts, X("Extension"))) != NULL) {
//          if( data->doFindRelsFirst && XML_Char_icmp( X( "rels" ), extension ) != 0 )
//          {
//              extension = X("rels");
//              
//          }
        XML_Char* filename;
        size_t filenamelen;
        size_t extensionlen = XML_Char_len(extension);
#ifdef USE_MINIZIP
#define UNZIP_FILENAME_BUFFER_STEP 32
        char* buf;
        size_t buflen;
        int status;
unz_global_info zipglobalinfo;
unzGetGlobalInfo(data->zip, &zipglobalinfo);
        buf = (char*)malloc(buflen = UNZIP_FILENAME_BUFFER_STEP);
        status = unzGoToFirstFile(data->zip);
          
          // this loop is problematic
          
          int fileindex = 0;
        while (status == UNZ_OK) {
          buf[buflen - 1] = 0;
        unz_file_info* finfo = malloc(sizeof(unz_file_info));
            
            // this was previously crashing because we were passing NULL as the second argument
        while ((status = unzGetCurrentFileInfo(data->zip, finfo, buf, buflen, NULL, 0, NULL, 0)) == UNZ_OK && buf[buflen - 1] != 0) {
            buflen += UNZIP_FILENAME_BUFFER_STEP;
            buf = (char*)realloc(buf, buflen);
            buf[buflen - 1] = 0;
          }
            free(finfo);
            finfo = NULL;

            if (status != UNZ_OK) {
            break;
            }
          filename = XML_Char_dupchar(buf);
          status = unzGoToNextFile(data->zip);
            ++fileindex;
//            unz64_s* s = (unz64_s*)data->zip;
//            ZPOS64_T currnumfile = s->num_file;
#else
        zip_int64_t i;
        zip_int64_t zipnumfiles = zip_get_num_entries(data->zip, 0);
        for (i = 0; i < zipnumfiles; i++) {
          filename = XML_Char_dupchar(zip_get_name(data->zip, i, ZIP_FL_ENC_GUESS));
#endif
          filenamelen = XML_Char_len(filename);
            
            // if the extension is a match
          if (filenamelen > extensionlen && filename[filenamelen - extensionlen - 1] == '.' && XML_Char_icmp(filename + filenamelen - extensionlen, extension) == 0) {
            
              // this call causes data->zip->num_file to revert to zero causing an endless loop?
              data->filecallbackfn(data->zip, filename, contenttype, data->filecallbackdata);
              
              // prevent the endless loop
              int looperx = 0;
              for( looperx = 0; looperx < fileindex && status == UNZ_OK; ++looperx ) {
                  status = unzGoToNextFile(data->zip);
              }
          }
          free(filename);
        }
#ifdef USE_MINIZIP
        free(buf);
#endif
      } // while (status == UNZ_OK)
    }
  }
}

Opening an existing file

It would be great if you could tear off an existing xlsx-file, for example, ready-made templates, for later recording. Is this possible in the future?

Windows Binaries.

Hi,
I am looking to try your program on a Windows 7 + Visual Studio system. I saw in the readme file a note:
"For Windows prebuilt binaries are also available for download (both 32-bit and 64-bit)" but I was not sure where in gitHub those files would be. I looked in the Lib folder but I did not see any dll's or lib files there. I downloaded the project files but do not see them there either. The 'docs' folder is fairly empty as well. I am probably asking a dumb question and that the answer is right in front of me.

Thanks
XeniCoder

DLL_EXPORT defined in *.c instead of header file

When compiling with VS2015 I get errors like 'xlsxiowrite_set_row_height': definition of dllimport function not allowed in xlsxio_read.c and xlsxio_write.c.
The issue: You have to define DLL_EXPORT_XLSXIO in the header only (it's a stupid windows thing). There you can set the flag, if you want to import or export the functions.
The user does not have access to the inside of the *.lib or *.dll, but only to the header file and the compiled *.lib and *.dll. So the solution is to remove DLL_EXPORT_XLSXIO in xlsxio_read.c and xlsxio_write.c and keep them in the header files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.