Code Monkey home page Code Monkey logo

asperagmbh / xlsx-reader Goto Github PK

View Code? Open in Web Editor NEW
45.0 5.0 9.0 569 KB

xlsx-reader is a PHP library for fast and efficient reading of XLSX spreadsheet files. Its focus is on reading the data contained within XLSX files, disregarding all document styling beyond that which is strictly necessary for data type recognition. It is built to be usable for very big XLSX files in the magnitude of multiple GBs.

License: Other

PHP 100.00%
xlsx xlsx-files xlsxreader xlsx-lib xlsx-spreadsheet xlsx-parser excel excelreader excel-import excelparser

xlsx-reader's People

Contributors

adirfische avatar groberts84 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

xlsx-reader's Issues

Tests throwing errors

Hi there,

I'm currently writing a PR to add a feature but when running tests without any modification to the master code, it's throwing errors:

25) Aspera\Spreadsheet\XLSX\Tests\CustomNumberFormatTest::testFormat with data set "scientific notation - exponent larger than 1 digit" ('0.00000000005', '0.00E+0', '5.00E-11')
Undefined array key 0

/xlsx-reader/lib/NumberFormat.php:499
/xlsx-reader/lib/NumberFormat.php:273
/xlsx-reader/lib/NumberFormat.php:189
/xlsx-reader/tests/CustomNumberFormatTest.php:71

When debugging the error, it is passing 123.00 as a value to the test, but is then converted to 123. due to the following code:

// Remove insignificant zeroes for now, we will (re-)add them based on format_info next.
if (strpos($number, '.') !== false) {
    $number = rtrim($number, '0');
}

This results in this failing, as it returns an empty array:

$right_side_chars = str_split($number_parts[1]);
if ($right_side_chars[0] === '') { // Side-effect of str_split('')
    $right_side_chars = array();
}

No option to return percentage values as unformatted fractional number

XLSX files internally represent values such as "20%" as a floating point string like "0.2".
The NumberFormat class is currently hardcoded to multiply such values by 100, implicitly casting them to floats.
The "ReturnUnformatted" option does not affect this behavior as it is only checked later in the formatValue() method.

Applications may find it useful to read percentage values in their more semantically native fractional representation.
This is also particularly relevant when migrating from the akeneo-labs/spreadsheet-parser library, which returns percentages this way. This represents one of the few breaking changes between the akeneo library and this one that cannot be overcome by configuration.

Reader incorrectly handles empty <row> elements

We encountered an XLSX file with sheet data in the following form:

<row r="1" spans="1:9" ht="15.75" thickBot="1" x14ac:dyDescent="0.3"/>
<row r="2" spans="1:9" x14ac:dyDescent="0.25">
	<c r="A2" s="3" t="s">
		<v>0</v>
	</c>
	...
</row>

Note how the first <row> element merely consists of a self-closing tag.

This is currently not handled correctly by the Reader, which only checks $this->worksheet_reader->isClosingTag() to detect the end of a row. The worksheet_reader would also have to consider the state of $this->isEmptyElement.

The result is that the Reader keeps reading forward until it reaches the </row> tag of row 2, incorrectly reading and outputting the cells of row 2 as the content of row 1. Then it returns empty values for row 2, since there are no further cells to be read.

Handling invalid row spans values?

Hi there,

I am using xslx-reader to parse a customer provided spreadsheet and it has handled their horror of a document with little drama however when trying to parse a particular sheet, it reaches a standard line and then throws an exception.

A non well formed numeric value encountered 
  at vendor/aspera/xlsx-reader/lib/Reader.php:296

After tracking this down, the spans value was 1:7167 7169:14335 14337:16379.

I'm unable to modify the spreadsheet and visually inspecting it doesn't appear to show any reason for this within Excel?

Is there anything that can be done via the code to gracefully handle this and move on to the next row?

Thanks

Error "P-1D" when reading an incorrect DateTime

Hi,

I have a document where the date seems to be -1, probably because of an export.

The Excel file is opened in libreoffice, where the date appears as "29/12/1899".

But the reader throws an Exception when stumbling on this value :
Exception with message 'DateInterval::__construct(): Unknown or bad format (P-1D)'.

Incompatible return type (Deprecated)

With PHP 8.1.7 im getting the following Deprecated Messages:

Deprecated:  Return type of Aspera\Spreadsheet\XLSX\Reader::current() should either be compatible with Iterator::current(): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in dist\vendor\aspera\xlsx-reader\lib\Reader.php on line 244

Deprecated:  Return type of Aspera\Spreadsheet\XLSX\Reader::next() should either be compatible with Iterator::next(): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in dist\vendor\aspera\xlsx-reader\lib\Reader.php on line 263

Deprecated:  Return type of Aspera\Spreadsheet\XLSX\Reader::key() should either be compatible with Iterator::key(): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in dist\vendor\aspera\xlsx-reader\lib\Reader.php on line 485

Deprecated:  Return type of Aspera\Spreadsheet\XLSX\Reader::valid() should either be compatible with Iterator::valid(): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in dist\vendor\aspera\xlsx-reader\lib\Reader.php on line 496

Deprecated:  Return type of Aspera\Spreadsheet\XLSX\Reader::rewind() should either be compatible with Iterator::rewind(): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in dist\vendor\aspera\xlsx-reader\lib\Reader.php on line 219

Deprecated:  Return type of Aspera\Spreadsheet\XLSX\Reader::count() should either be compatible with Countable::count(): int, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in dist\vendor\aspera\xlsx-reader\lib\Reader.php on line 507

Im loading xlsx-reader via:

use Aspera\Spreadsheet\XLSX\Reader;
use Aspera\Spreadsheet\XLSX\ReaderConfiguration;
use Aspera\Spreadsheet\XLSX\ReaderSkipConfiguration;

$options = (new ReaderConfiguration())
	->setSkipEmptyCells(ReaderSkipConfiguration::SKIP_EMPTY)
	->setReturnDateTimeObjects(true);
$reader = new Reader($options);
$reader->open($sheet);

Using the most current package:
aspera/xlsx-reader v0.10.1 Spreadsheet reader library for Microsoft Excel XLSX files

Is there something i miss?

Thanks, oNdsen

libre office xlsx document overload

when I try to read created by libre office xlsx file. So I see more much null array. And how to see col umn name for example [A=> 'text data']

No option to skip empty rows

Excel sadly often saves xlsx files which end in a large number of explicit empty row elements (for example if the cells were formatted but then never used), which may take a form such as the following:

<row r="659" spans="1:10" x14ac:dyDescent="0.2">
	<c r="D659" s="3"/>
	<c r="E659" s="3"/>
</row>

Note that not a single <v> element is contained.

It would be nice to have a native option to ignore empty rows, ideally only at the end of the file. However, since a sequential reader cannot look ahead to know whether there will be further non-empty rows, this would probably be difficult to implement.

I would once again suggest to create feature parity with akeneo-labs/spreadsheet-parser, which skips any empty rows by default, but still updates the return value of Iterator::key() to reflect the actual excel row and therefore expose to the caller that an empty row was skipped.

how to disable remove repeat cell value?

my execel data is

A,B,C,D,E
1,london,london,DEC,12345

when i get by reader

        $reader = new Reader($options);
        $reader->open($tempFile);

        $data = [];
        foreach ($reader as $row) {
            var_dump($row);
            if (empty(implode('', $row))) {
                continue;
            } else {
                $data[] = $row;
            }
        }

the dump is

['A', 'B', 'C', 'D', 'E']
['1', 'london', 'DEC', '', 12345]

how to get the full data?

['1', 'london', 'london', 'DEC', 12345]

Thanks

Bug; multi-sectioned formats are exploded incorrectly...

your code:
$sections = explode(';', $format['Code']);

doesn't handle if the ";" is quoted - I suggest:
$sections = preg_split('/(;)(?=(?:[^"]|"[^"]")$)/u', $format['Code']); // up to four sections, separated with an (unquoted) semi-colon

Enhancement to allow access to meta data read?

I need to read meta data from an xlsx to help when I re-output a new adjusted file and so was thinking that you could make the data you've read (like $cell_type & the format string) available...
I was thinking something along the lines of an "include_meta_data" configuration option which could populate a separate "current_row_meta_data" array (for backward compatibility) which could look like this:

current_row_meta_data[0]=[
  'type'=>'d',          // 'd'=datetime | 's'=shared string/'inlineStr'/'str' | 'b'=boolean | 'n'=numeric | 'e'=error
]

Maybe also include the "format" string for the cell, as in:
  'format'=>'dd/mm/yy',

Please support number formats [$-F400] and [$-F800]

Hi, I have received multiple XLSX files that contain the number format code [$-F400]h:mm:ss\ AM/PM. According to the Ecma Office Open XML specification the F400 code is used for "System time format". There is also an F800 code for "System long date format".

Trying to load the file with this number format fails in NumberFormatTokenizer.php:753. The regex in line 751 doesn't match this format so the next call that tries to access $matches[1] results in Undefined array key 1.
Extending the regex to include an optional F (\$([^-]*)-[fF]?\d+) would fix it and the unit tests would still pass even though you might want to handle it differently to be able to apply the correct format.

It would be nice if you could add support for those format codes.

Thank you for providing this helpful library!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.