Code Monkey home page Code Monkey logo

xlsxgrep's Introduction

Owerview

xlsxgrep is a CLI tool to search text in XLSX, XLS, CSV, TSV and ODS files. It works similarly to Unix/GNU Linux grep.

Features

  • Grep compatible: xlsxgrep tries to be compatible with Unix/Linux grep, where it makes sense. Some of grep options are supported (such as -r, -i or -c).

  • Search many XLSX, XLS, CSV, TSV and ODS files at once, even recursively in directories.

  • Regular expressions: Python regex.

  • Supported file types: csv, ods, tsv, xls, xlsx

Usage:


usage: xlsxgrep [-h] [-V] [-P] [-F] [-i] [-w] [-c] [-r] [-H] [-N] [-l] [-L] [-S SEPARATOR] [-Z]
                pattern path [path ...]

positional arguments:
  pattern               use PATTERN as the pattern to search for.
  path                  file or folder location

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         display version information and exit.
  -P, --python-regex    PATTERN is a Python regular expression. This is the default.
  -F, --fixed-strings   interpret PATTERN as fixed strings, not regular expressions.
  -i, --ignore-case     ignore case distinctions.
  -w, --word-regexp     force PATTERN to match only whole words.
  -c, --count           print only a count of matches per file.
  -r, --recursive       search directories recursively.
  -H, --with-filename   print the file name for each match.
  -N, --with-sheetname  print the sheet name for each match.
  -l, --files-with-match
                        print only names of FILEs with match pattern.
  -L, --files-without-match
                        print only names of FILEs with no match pattern.
  -S SEPARATOR, --separator SEPARATOR
                        define custom list separator for output, the default is TAB
  -Z, --null            output a zero byte (the ASCII NUL character) instead of the usual newline.
 

Examples:

xlsxgrep -i "foo" foobar.xlsx
xlsxgrep -c -H "(?i)foo|bar" /folder

Installation

     pip install xlsxgrep  

xlsxgrep's People

Contributors

majuwa avatar zazuum avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

xlsxgrep's Issues

Using python3.9 and xlsxgrep results in errors

With python3.9 and xlsxgrep installed via pip3, previously known behavior like:

xlsxgrep -i <pattern> *.xlsx

results in:

Error:    Unsupported format, password protected or corrupted file:

I suspect that python3.9 might be the issue, because reverting to python3.8 and installing xlsxgrep in the same manner works as expected. I confirmed this in a python virtualenv.

support xlsm ?

Hi,

Do you plan to support searching in xlsm file ?

Thx

A few warning are written to stdout instead of ignored

OS: Ubuntu 22.04
python: 3.10
pip: 22.3.1
xlsxgrep: 0.0.28

I used xlsxgrep -riH and put my output to a file with >>. I wanted to create a pipe with commands but the warnings in the output breaks my pipe.

Stdout of xlsxgrep ignores all errors expect for:

WARNING *** file size (36373) not 512 + multiple of sector size (512)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero

These errors are piped or added to the file I write the output to. I don't know exactly where this error come from but the only command I ran is xlsxgrep -riH PATTERN PATH >> output.txt.

UserWarning on searches

Is there a way to suppress the UserWarning ?

#python3 -V
Python 3.9.9

#xlsxgrep -V
xlsxgrep  0.0.25


#xlsxgrep -H -r 123456789 *
/usr/local/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py:312: UserWarning: Data Validation extension is not supported and will be removed
  warn(msg)
/usr/local/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py:312: UserWarning: Unknown extension is not supported and will be removed
  warn(msg)

Unsupported format, password protected or corrupted file

This looks like a fantastic tool for hunting down poorly named/organized excel files (I task I wish I didn't have to do so often)!

I am frequently encountering Error: Unsupported format, password protected or corrupted file, even when the file is not password protected and opens in Excel with no problem.

Here is an example file: https://www.dropbox.com/s/5wtpuhxjzbkfxr7/table.xlsx?dl=0

xlsxgrep 0.0.24, Python 3.8.3, running on a mac 10.15.7

Thanks!

Increase max sessions

OS: Ubuntu 22.04
python: 3.10
pip: 22.3.1
xlsxgrep: 0.0.28
RAM: 32GB
CPU: Intel Xeon with 18 cores

I'm searching through 3000 excel files on different terms but this tool is much slower than pdfgrep or something like that. It takes a whole day for 1 search and there are many files I need to check. Because of that I wanted to create multiple sessions with tmux. But xlsxgrep only support 3 sessions? When I create a 4th session with xlsxgrep it says the xlsxgrep process is "Killed" on the previous started session". Please increase max sessions and preferably increase the speed

Tag releases

Could you please consider to tag the releases?

Would make it easier to build distribution packages as the source is complete and the latest release is only as wheel available on PyPI. Thanks

Support for searching ods (libreoffice files)

Hi,
it might be useful to also search through LibreOffice Calc (ods) files. There are some python libraries like pyexcel , which might help in this regard. In a fork of your project. I applied this and at least for my use cases, it works. If you are interested in the changes, I can create a pull request.

-l doens't work when searching through a big amount of files

OS: Ubuntu 22.04
python: 3.10
pip: 22.3.1
xlsxgrep: 0.0.28

I'm searching through a very big amount of files. Some files are encrypted with a password so give an error in the output. But 99% isn't. When I search with xlsxgrep -ri "PATTERN" . or xlsxgrep -riH "PATTERN" . the output contain matches. When I search with xlsxgrep -ril "PATTERN" . through the exact same files my output contain no matches. I tried it 5 times in a row

NewFeature: Support for the -l and -L

Super nice tool THANK you. It would be nice if it supported -l and -L like grep. I have many excel files and sometimes want to find all the files without X text in them.

Request: Print to stderr?

I really do appreciate this little script. It really comes in handy.

Can I request for error messages to be printed to stderr so that xlsxgrep 'blah' 2> /dev/null will properly suppress error messages? It should be a simple matter of replacing the appropriate print() lines with print("Error: Something went wrong...", file=sys.stderr).

I'm considering creating a pull request when I have some time to look at this, but if someone else gets around to it before me, please do!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.