Code Monkey home page Code Monkey logo

alexhallam / tv Goto Github PK

View Code? Open in Web Editor NEW
2.0K 6.0 42.0 34.01 MB

πŸ“Ί(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.

License: The Unlicense

Rust 96.26% Shell 3.74%
cli terminal csv pretty-printer pretty-print command-line-tool data-science rust command-line tabular-data tibble dataframe datatable csv-viewer csv-visualization csv-pretty-print csv-cat column csv-column

tv's Introduction

Rust Crate Crates.io GitHub all releases tidy-viewer

Tidy Viewer (tv)

Tidy Viewer (tv) is a cross-platform csv pretty printer that uses column styling to maximize viewer enjoyment.

logo

Pretty Printing

example

Contents

Features

  1. Nice colors out of the box
  2. Significant digit printing (no more decimal dust taking valuable terminal space)
  3. NA comprehension and coloring (no more misaligned data cells due to missing data)
  4. Dimensions printed first (no more guessing how many rows and columns are in the data)
  5. Column overflow logic (no more misalignment due to terminal dimensions)
  6. Long string/Unicode truncation (no more long strings pushing other data around)
  7. Customizable with a dotfile config (bring your own theme)

Installation

The following install options are available via package managers:

We currently cut releases for the following architectures. Download from the release page.

  • MacOS
  • ARM
  • Windows
  • Build from source (Most general)

The instructions for all of the above are very similar with the following general steps.

  1. Download your desired release from the release page
  2. tar -xvzf <RELEASE_FILE_NAME>
  3. cd into uncompressed folder
  4. Find binary tidy-viewer

After the above steps I would highly recommend you make an alias for tidy-viewer as shown for other builds.

Cargo

The following will install from the crates.io source. For convenience add the alas alias tv='tidy-viewer' to .bashrc.

cargo install tidy-viewer
sudo cp /home/$USER/.cargo/bin/tidy-viewer /usr/local/bin/.
echo "alias tv='tidy-viewer'" >> ~/.bashrc
source ~/.bashrc

Debian

The below instructions work with the most recent release <VERSION> found here release page.

wget https://github.com/alexhallam/tv/releases/download/<VERSION>/tidy-viewer_<VERSION>_amd64.deb
sudo dpkg -i tidy-viewer_<VERSION>_amd64.deb
echo "alias tv='tidy-viewer'" >> ~/.bashrc
source ~/.bashrc

AUR

Kindly maintained by @yigitsever

paru -S tidy-viewer

Snap

sudo snap install --edge tidy-viewer

Homebrew

brew install tidy-viewer

Examples

Have some fun with the following datasets!

Diamonds

# Download the diamonds data
wget https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/diamonds.csv

# pipe to tv
cat diamonds.csv | tv

Starwars

wget https://raw.githubusercontent.com/tidyverse/dplyr/master/data-raw/starwars.csv

# Pass as argument
tv starwars.csv

Pigeon Racing

wget https://raw.githubusercontent.com/joanby/python-ml-course/master/datasets/pigeon-race/pigeon-racing.csv
cat pigeon-racing.csv | tv

Titanic

wget https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv
# send to pager with color
# less 
tv titanic.csv -ea | less -R
# bat
tv titanic.csv -a -n 1000 | bat -p

Significant Figure Definitions And Rules

example

example

The first three digits represent > 99.9% the value of a number. -- GNU-R Pillar

Choosing the sigfigs amounts to how much of the value of a number is desired. The table below shows an example calculation with variable sigfigs.

sigfigs value sigfiged_value %value_of_the_number_explained_by_sigfiged_vale
1 0.1119 0.1 >89%
2 0.1119 0.11 >98%
3 0.1119 0.111 >99%

tv uses the same significant figure (sigfig) rules that the R package pillar uses.

The purpose of the sigfig rules in tv is to guide the eye to the most important information in a number. This section defines terms and the decision tree used in the calculation of the final value displayed.

Definitions

     β”Œβ”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”     ─┐
     β”‚     β”‚      β”‚     β”‚      β”‚
     β”‚     β”‚      β”‚     β”‚      β”‚
     β”‚     β”‚      β”‚     β”‚      β”‚
     β”‚     β”‚      β”‚     β”‚      β”‚
     β”‚     β”‚  β”Œβ”  β”‚     β”‚      β”‚
     β””β”€β”€β”€β”€β”€β”˜  β””β”˜  β””β”€β”€β”€β”€β”€β”˜    ──┴─
   β”‚        β”‚    β”‚                β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β–² β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
left hand side β”‚  right hand side
     (lhs)     β”‚       (rhs)

            decimal

left hand side (lhs): digits on the left hand side of the decimal.

right hand side (rhs): digits on the right hand side of the decimal.


 β”Œβ”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”     ─┐     β”Œβ”€β”€β”€β”€β”€β”
 β”‚     β”‚      β”‚     β”‚      β”‚     β”‚     β”‚
 β”‚     β”‚      β”‚     β”‚      β”‚     β”‚     β”‚
 β”‚     β”‚      β”‚     β”‚      β”‚     β”‚     β”‚
 β”‚     β”‚      β”‚     β”‚      β”‚     β”‚     β”‚
 β”‚     β”‚  β”Œβ”  β”‚     β”‚      β”‚     β”‚     β”‚
 β””β”€β”€β”€β”€β”€β”˜  β””β”˜  β””β”€β”€β”€β”€β”€β”˜    ──┴─    β””β”€β”€β”€β”€β”€β”˜

β”‚                     β”‚         β”‚       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”˜
       leading 0s              trailing 0s

leading 0s: 0s to the left of a non-zero.

trailing 0s: 0s to the right of a non-zero. The zeros in 500m are trailing as well as the 0s in 0.500km.

 ─┐     β”Œβ”€β”€β”€β”€β”€β”       ─┐
  β”‚     β”‚     β”‚        β”‚
  β”‚     β”‚     β”‚        β”‚
  β”‚     β”‚     β”‚        β”‚
  β”‚     β”‚     β”‚        β”‚
  β”‚     β”‚     β”‚  β”Œβ”    β”‚
──┴─    β””β”€β”€β”€β”€β”€β”˜  β””β”˜  ──┴─

                   β”‚        β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              fractional digit(s)

fractional digits: Digits on the rhs of the decimal. The represent the non-integer part of a number.

Rules

There are only 4 outputs possible. The significant figures to display are set by the user. Assume sigfig = 3:

  1. lhs only (12345.0 -> 12345): If no fractional digits are present and lhs >= sigfig then return lhs
  2. lhs + point (1234.5 -> 1234.): If fractional digits are present and lhs >= sigfig then return lhs with point. This is to let the user know that some decimal dust is beyond the main mass of the number.
  3. lhs + point + rhs (1.2345 -> 1.23): If fractional digits are present and lhs < sigfig return the first three digits of the number.
  4. long rhs (0.00001 -> 0.00001): This is reserved for values with leading 0s in the rhs.
# Pseudo Code: Sigfig logic assuming sigfig = 3
if lhs == 0:
    n = ((floor(log10(abs(x))) + 1 - sigfig)
    r =(10^n) * round(x / (10^n))
    return r
    // (0.12345 -> 0.123)
else:
    if log10(lhs) + 1 > sigfig:
        if rhs > 0:
            //concatenate:
            //(lhs)
            //(point)
            //(123.45 -> 123.)
        else:
            //concatenate:
            //(lhs)
            //(1234.0 -> 1234)
            //(100.0 -> 100)
    else:
        //concatenate:
        //(lhs)
        //(point)
        //sigfig - log10(lhs) from rhs
        //(12.345 -> 12.3)
        //(1.2345 -> 1.23)

Tools to pair with tv

tv is a good complement to command line data manipulation tools. I have listed some tools that I like to use with tv.

qsv - Fork of xsv. Has more commands/subcommands and allows users to evaluate lua/python on data. [Rust | CLI]

xsv - Command line csv data manipulation. [Rust | CLI]

SQLite - Database engine with CLU, shell, and library interfaces . [C | CLI/shell/lib]

DuckDB - Database engine with CLU, shell, and library interfaces . [C++ | CLI/shell/lib]

csvtk - Command line csv data manipulation. [Go | CLI]

tsv-utils - Command line csv data manipulation toolkit. [D | CLI]

q - Command line csv data manipulation query-like. [Python | CLI]

miller - Command line data manipulation, statistics, and more. [C | CLI]

VisiData - An interactive terminal user interface that is built to explore and wrangle data. [Python | TUI]

Tools similar to tv

column Comes standard with Linux. To get similar functionality run column file.csv -ts,

Though column is similar I do think there are some reasons tv is a better tool.

1. NA comprehension

NA values are very important! Viewers should have their attention drawn to these empty cells. In the image below NA values are not only invisible, but it seems to be causing incorrect alignment in other columns.

na_comp

There are many ways that programs will designate missing values. Some use none, others use NaN, and many more "", NA, null, n/a etc. tv searches for these strings and replaces them with NA. This is similar in spirit to the significant digit calculations and the truncation of columns with long strings. The purpose of tv is not to show the complete literal value, but to guide the eye.

na_comp

2. Column Overflow Logic

In cases where the terminal width can't fit all of the columns in a dataframe, column will try to smush data on the rows below. This results in an unpleasant viewing experience.

tv can automatically tell when there will be too many columns to print. When this occurs it will only print the columns that fit in the terminal and mention the extras in the footer below the table.

overflow

Configuration Dotfile

For information on dotfile configuration see tv --help. This allows users to set their own color palette, rows to print, max column width, etc.

FAQ

  • Does tv have a light theme?

Yes, solorized light is added out of the box. This was added in version 1.4.6. You may also define your own themes in the config.

  • The ~/.config/tv.toml file is having no effect on the output. What am I doing wrong?

Every key/value pair must exist or the toml will not be read. If even one key/value is missing then the config will not work.

  • It would be nice to be able to scroll vertically/horizontally through tall/wide csv file. Does tv allow for this functionality?

Yes, pipe the output to less or bat. tv allows for this with the -e flag. To extend to the full csv width and length and keep color try the following tv diamonds.csv -ea | less -SR To extend to the full csv width and length and remove all color try the following tv diamonds.csv -e | less -S

Help

tv --help

tv 1.5.2
Tidy Viewer (tv) is a csv pretty printer that uses column styling to maximize viewer enjoyment.βœ¨βœ¨πŸ“Ίβœ¨βœ¨

    Example Usage:
    wget https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/diamonds.csv
    cat diamonds.csv | head -n 35 | tv
    tv diamonds.csv

    Configuration File Support:
    An example config is printed to make it easy to copy/paste to `tv.toml`.
    Check the parameters you have changed with `tv --config-details`.
    The config (tv.toml) location is dependent on OS:
        * Linux: $XDG_CONFIG_HOME or $HOME/.config/tv.toml
        * macOS: $HOME/Library/Application Support/tv.toml
        * Windows: {FOLDERID_RoamingAppData}\tv.toml

        ## ==Tidy-Viewer Config Example==
        ## Remove the first column of comments for valid toml file
        ## All fields must be defined. Partial files will not be read.
        ## The delimiter separating the columns. [default: ,]
        #delimiter = ","
        ## Add a title to your tv. Example 'Test Data' [default: NA ("")]
        #title = ""
        ## Add a footer to your tv. Example 'footer info' [default: NA ("")]
        #footer = ""
        ## The upper (maximum) width of columns. [default: 20]
        #upper_column_width = 20
        ## The minimum width of columns. Must be 2 or larger. [default: 2]
        #lower_column_width = 2
        ## head number of rows to output <row-display> [default: 25]
        #number = 35
        ## extend width and length in terms of the number of rows and columns displayed beyond term width [default: false]
        # extend_width_length = true
        ## meta_color = [R,G,B] color for row index and "tv dim: rows x cols"
        #meta_color = [64, 179, 162]
        ## header_color = [R,G,B] color for column headers
        #header_color = [232, 168, 124]
        ## std_color = [R,G,B] color for standard cell data values
        #std_color = [133, 205, 202]
        ## na_color = [R,G,B] color for NA values
        #na_color = [226, 125, 95]
        ## neg_num_color = [R,G,B] color for negative values
        #neg_num_color = [226, 125, 95]

USAGE:
    tidy-viewer [FLAGS] [OPTIONS] [FILE]

FLAGS:
    -C, --config-details             Show the current config details
    -d, --debug-mode                 Print object details to make it easier for the maintainer to find and resolve bugs.
    -e, --extend-width-and-length    Extended width beyond term width (do not truncate). Useful with `less -S`.
    -f, --force-all-rows             Print all rows in file. May be piped to 'less -S'. Example `tidy-viewer
                                     data/diamonds.csv -f -a | less -R`
    -a, --color-always               Always force color output. Example `tv -a starwars.csv | less -R` or `tv -a
                                     starwars.csv | bat -p`. The `less` cli has the `-R` flag to parse colored output.
    -h, --help                       Prints help information
    -D, --no-dimensions              Turns off dimensions of the data
    -R, --no-row-numbering           Turns off row numbering
    -V, --version                    Prints version information

OPTIONS:
    -c, --color <color>
            There are 5 preconfigured color palettes (Defaults to nord):
                            (1)nord
                            (2)one_dark
                            (3)gruvbox
                            (4)dracula
                            (5)solarized light [default: 0]
    -s, --delimiter <delimiter>                      The delimiter separating the columns.
    -F, --footer <footer>                            Add a footer to your tv. Example 'footer info' [default: NA]
    -l, --lower-column-width <lower-column-width>
            The lower (minimum) width of columns. Must be 2 or larger. [default: 2]

    -n, --number-of-rows-to-output <row-display>     Show how many rows to display. [default: 25]
    -g, --sigfig <sigfig>                            Significant Digits. Default 3. Max is 7 [default: 3]
    -t, --title <title>                              Add a title to your tv. Example 'Test Data' [default: NA]
    -u, --upper-column-width <upper-column-width>    The upper (maximum) width of columns. [default: 20]

ARGS:
    <FILE>    File to process

Use With Database Engines

Here I show how to use tv with a couple of database engines (SQLite, DuckDB).

Use With SQLite

Sqlite is a fantastic program! If it is not the most deployed software it is probably close to it. For more info on SQLite see their Executive Summary

For this example you will need to download and uncompress taxi data

wget https://github.com/multiprocessio/dsq/blob/43e72ff1d2c871082fed0ae401dd59e2ff9f6cfe/testdata/taxi.csv.7z?raw=true -O taxi.csv.7z
7z x taxi.csv.7z
cd testdata
ls -l --block-size=M # the data is farily large at 192MB

SQLite One-liner

sqlite3 :memory: -csv -header -cmd '.import taxi.csv taxi' 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count' | tv

The above one-liner queries a csv as an in-memory database. It is also possible to query an existing sqlite database and pipe the output as a csv for tv to pick up. A one-liner is shown below.

sqlite3 -csv -header <file_name.sqlite> 'select * from <table>;' | tv

Use With DuckDB

DuckDB has a lot in common with SQLite. As personal anecdotes I do like that fewer CLI flags are needed to run on csvs. I also like the speed. Though it is not as universal as SQLite I think that it is a good fit for command line data manipulation.

For this example you will need to download and uncompress taxi data

wget https://github.com/multiprocessio/dsq/blob/43e72ff1d2c871082fed0ae401dd59e2ff9f6cfe/testdata/taxi.csv.7z?raw=true -O taxi.csv.7z
7z x taxi.csv.7z
cd testdata
ls -l --block-size=M # the data is fairly large at 192MB

Fun With tv

Using duckdb with tv and less to manipulate data with SQL grammar and view results in a scrolling window.

duckdb --csv -c "select norm1 from norms.csv" | ../target/release/tidy-viewer -f -a | less -R

DuckDB One-liner

duckdb --csv -c "SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi.csv GROUP BY passenger_count ORDER BY passenger_count" | tv

Inspiration

pillar - R's tibble like formatting. Fantastic original work by Kirill MΓΌller and Hadley Wickham. tv makes an attempt to port their ideas to the terminal.

tv's People

Contributors

38 avatar 5tefan avatar alexhallam avatar atakurt avatar bwagner avatar chasing-freedom avatar domoritz avatar frisoft avatar harharlinks avatar iamjameswalters avatar jacobmischka avatar lireer avatar namitaarya avatar rlewicki avatar shirobachi avatar tomshafer avatar ulazkamateusz avatar yigitsever avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tv's Issues

Experiment with approaches to view Record-heterogeneity in csv files. (ragged csvs)

Here I am using the term "ragged csv" like miller.

In a standard csv, if a cell is missing ,NA, it is common to just omit the value, but retain the commas ,,. @lithiumfrost uploaded a "ragged csv" where the omitted data came with no commas --- or any delimiter. See line 361 in the below image

image

It was mentioned that miller has an option to work with ragged csvs.

#75 (comment)

This is an open issue to think about how tv should work with these types of files.

Proposal

Since tv is based on pillar I lean on the shoulders of giants and see what the creators of the fantastic GNU-R pillar library decided to do. There are two components:

  1. Truncated warnings when parsing
  2. A pretty print of readable records

Here is the output.

image

Original Test file:
en_climate_hourly_AB_3012209_05-2021_P1H.csv

More complicated missingness

  1. Long spaces should probably be NA. See the name column.
generated_on,adif_ver,programid,programversion,band,call,country,freq,mode,my_gridsquare,my_sig,my_sig_info,my_state,name,operator,qso_date,qso_date_off,qth,rst_rcvd,rst_sent,state,time_on,tx_pwr
2022-04-07T17:42:27.048Z,3.0.5,HAMRS," 1.0.5"," 40m"," KC4TVZ","United States",7.295,SSB,EM73vx," POTA"," K-0662"," GA"," Todd Burnette"," WQ8R"," 20220407"," 20220407"," Flowery Branch",59,59," GA"," 1146"," 10"
2022-04-07T17:42:27.048Z,3.0.5,HAMRS," 1.0.5"," 40m"," ED4BDG","United States",7.295,SSB,EM73vx," POTA"," K-0662"," GA","              "," WQ8R"," 20220407"," 20220407"," Flowery Branch",59,59," GA"," 1146"," 10"
2022-04-07T17:42:27.048Z,3.0.5,HAMRS," 1.0.5"," 40m"," WF4I","United States",7.295,SSB,EM73vx," POTA"," K-0662"," GA"," DEREK S BROWN"," WQ8R"," 20220407"," 20220407"," Flowery Branch",59,59," GA"," 1146"," 10"

image

  1. Where is the footer?

string with float-likes fail

I had a column entry with the following entry in my csv and the program failed.
2/ 2.5 Gallon

The error was

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParseFloatError { kind: Invalid }', /home/ubuntu/.cargo/registry/src/github.com-1ecc6299db9ec823/tidy-viewer-0.0.5/src/datatype.rs:160:50
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

FWIW

96/3 Oz does not fail. So it must be the decimal in the string

thread 'main' panicked at 'a csv record: Error(UnequalLengths ... )'

Using the 0.0.20 Homebrew package:

# Execution with error and full backtrace
RUST_BACKTRACE=full tidy-viewer Chase9989_Activity_20211008.CSV
thread 'main' panicked at 'a csv record: Error(UnequalLengths { pos: Some(Position { byte: 69, line: 1, record: 1 }), expected_len: 7, len: 8 })', src/main.rs:185:20
stack backtrace:
   0:        0x1100652b1 - __mh_execute_header
   1:        0x110081b7b - __mh_execute_header
   2:        0x110061f7a - __mh_execute_header
   3:        0x1100668c5 - __mh_execute_header
   4:        0x1100664af - __mh_execute_header
   5:        0x110066fb0 - __mh_execute_header
   6:        0x110066a4e - __mh_execute_header
   7:        0x110065737 - __mh_execute_header
   8:        0x1100669ba - __mh_execute_header
   9:        0x11008eacf - __mh_execute_header
  10:        0x11008ebb5 - __mh_execute_header
  11:        0x10ff78137 - __mh_execute_header
  12:        0x10ff6dc54 - __mh_execute_header
  13:        0x10ff714d6 - __mh_execute_header
  14:        0x10ff714ec - __mh_execute_header
  15:        0x110064b54 - __mh_execute_header
  16:        0x10ff70bd9 - __mh_execute_header

# Offending line - plain ASCII
head -n 1 Chase9989_Activity_20211008.CSV
Details,Posting Date,Description,Amount,Type,Balance,Check or Slip #

# Offending line - hexdump (error happens with both Windows and UNIX line endings)
head -n 1 Chase9989_Activity_20211008.CSV | hexdump -C
00000000  44 65 74 61 69 6c 73 2c  50 6f 73 74 69 6e 67 20  |Details,Posting |
00000010  44 61 74 65 2c 44 65 73  63 72 69 70 74 69 6f 6e  |Date,Description|
00000020  2c 41 6d 6f 75 6e 74 2c  54 79 70 65 2c 42 61 6c  |,Amount,Type,Bal|
00000030  61 6e 63 65 2c 43 68 65  63 6b 20 6f 72 20 53 6c  |ance,Check or Sl|
00000040  69 70 20 23 0d 0a                                 |ip #..|
00000046

Error trying to open a tab-separated file

Hi, I'm trying to visualize a TSV file, and I'm getting this error:

thread 'main' panicked at 'a csv record: Error(UnequalLengths { pos: Some(Position { byte: 513, line: 2, record: 1 }), expected_len: 37, len: 8 })', src/main.rs:312:20
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I'm using the latest Linux release.

atty?

Use the atty create to know if stdout is being pipes to tty make conditions to error gracefully.

[Github Actions] - realease.yml

The current release is 0.0.19, but the version downloaded from the current releases will say 0.0.14. These two versions have the exact same features. Work is needed to align the version numbers. This is not urgent now, but would be nice to have for the next release.

Where is the .deb?

Hi,

i try to download tv.
In readme.md : "installation" "debian"
wget https://github.com/alexhallam/tv/releases/download/<VERSION>/tidy-viewer_<VERSION>_amd64.deb
ok, version is 1.4.3, then:
wget https://github.com/alexhallam/tv/releases/download/1.4.3/tidy-viewer_1.4.3_amd64.deb
not found.
where is my mistake?

file with semi-colon and lines with different number of cols not work

Hi,

Your tools look awesome, unfortunately, I work with embedded devices that generate non-standard csv files, you can find a sample of the content below :

MYVALUE;126;001
MYOTHERVALUE;THISALSOAVALUE
37;1;2;3;4;6;7;8;9;10;11;13;15;17;19;21;23;25;26;27;28;29;30;31;32;33;34;35;36;37;38;39;40;41;42;43;45;46
17/03/20-14:00:04;1901346120;146;146;147;65535;65535;65535;2341;2335;2338;1027;4998;1027;0;32768;1224125;129;561;725;64;505;323;65535;65535;65535;65535;65535;65535;65535;65535;65535;65535;65535;65535;300;4;0
YOURVALUE;127;002
MYOTHERVALUE;THISANICEVALUE
37;1;2;3;4;6;7;8;9;10;11;13;15;17;19;21;23;25;26;27;28;29;30;31;32;33;34;35;36;37;38;39;40;41;42;43;45;46
17/03/20-14:00:04;1901346338;147;147;147;65535;65535;65535;2338;2332;2334;1028;4998;1028;0;32768;1222219;131;556;726;64;507;327;65535;65535;65535;65535;65535;65535;65535;65535;65535;65535;65535;65535;300;4;0

If I try to open them in tidy-view, then I ran this command :

$ tidy-viewer CHALE1_MODBUS_000102_070000.csv  -s ";"

An get this crash

thread 'main' panicked at 'a csv record: Error(UnequalLengths { pos: Some(Position { byte: 16, line: 2, record: 1 }), expected_len: 3, len: 2 })', src/main.rs:353:20
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Should individual toml config file fields be optional?

In experimenting with the tv.toml config file, it seems to fail to parse and take effect if only some options are present. They must all be present. This was a bit confusing at first!

I think making each field in tv.toml optional would improve the user experience and make the tool more powerful/flexible.

Fails on titanic example

Something is off here. Just installed 0.0.13 as a .deb yet:

edd@rob:/tmp$ wget https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv
--2022-03-03 07:42:13--  https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8002::154, 2606:50c0:8000::154, 2606:50c0:8001::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8002::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 60302 (59K) [text/plain]
Saving to: β€˜titanic.csv’

titanic.csv                100%[=====================================>]  58.89K  --.-KB/s    in 0.002s  

2022-03-03 07:42:13 (26.0 MB/s) - β€˜titanic.csv’ saved [60302/60302]

edd@rob:/tmp$ head titanic.csv 
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
edd@rob:/tmp$ 
edd@rob:/tmp$ tidy-viewer --version
tv 0.0.13
edd@rob:/tmp$ tidy-viewer titanic.csv 
error: Found argument 'titanic.csv' which wasn't expected, or isn't valid in this context

USAGE:
    tidy-viewer [FLAGS] [OPTIONS]

For more information try --help
edd@rob:/tmp$ tidy-viewer ./titanic.csv 
error: Found argument './titanic.csv' which wasn't expected, or isn't valid in this context

USAGE:
    tidy-viewer [FLAGS] [OPTIONS]

For more information try --help
edd@rob:/tmp$ 

Publish as a Snap

Hi there! Saw this tool on Show HN, and thought it was worth snapping. I've opened #56 with a working snapcraft.yaml.

Would the upstream project be interested in putting this utility on the Snap Store? If not, I would be interested in doing so myself.

Align decimals

For the below example csv the decimals are not centered with each-other. The desired alignment is shown in the last image.

cat align_dec.csv

value
12345.
1234.5
123.45
12.345
1.2345
.12345

The following results are incorrect:
image

We should see the following:
image

Consider adding config file support

It would be nice if a .tv.toml existed. The keys would match the command line options. Mostly this would be a way for users to specify their color theme. On a linux the program would look for $HOME/.tv.toml. I am not sure how dotfiles are setup on other OSes.

This issue is open for discussion.

Lower required libc6 version if possible

Using Ubuntu 16.04 LTS, the version of libc6 installed is 2.23-0ubuntu11.3. Attempting to install tidy-viewer via dpkg gives the following error:

dpkg: dependency problems prevent configuration of tidy-viewer:
 tidy-viewer depends on libc6 (>= 2.31); however:
  Version of libc6:amd64 on system is 2.23-0ubuntu11.3.

If nothing between versions 2.23 and 2.31 is needed it would be nice to lower the required version to increase the number of systems tidy-viewer can be installed on.

Cut space from really long doubles.

I am surprised this bug came up, but the fix should be simple.

I want to make sure a test is check for this on each PR. Getting this cleaned up is central to the purpose of tv.

Example file:

cat long_sigfig.csv

text,col1,col2,col3
row1,3.333333333333333,3.333333333333333,3.333333333333333
row2,1.11111111111111111,1.11111111111111111,1.11111111111111111

Problem

cat long_sigfig.csv | tv

      tv dim: 2 x 4
      text col1                col2                col3
1     row1 3.33                3.33                3.33
2     row2 1.11                1.11                1.11

Note the huge space between columns.

Desired Output

      tv dim: 2 x 4
      text col1  col2  col3
1     row1 3.33  3.33  3.33
2     row2 1.11  1.11  1.11

Ellipsis then space, not space then ellipsis

We have word ...number, but the proper result should be words... number. Ellipsis then space, not space then ellipsis.

cat ellipsis.csv

long_text,double
jaywalker-swear-popcorn-opacity-nuttiness-roster,5.89
unspoiled-treachery-yearning-deputize-stimuli-coexist,2.34
conflict-yield-sulphate-grumpily-vagrantly-subsiding,7.89

Current results

> cat ellipsis.csv | tv

      tv dim: 3 x 2
      long_text            double
1     jaywalker-swear-pop …5.89
2     unspoiled-treachery …2.34
3     conflict-yield-sulp …7.89

Desired results

> cat ellipsis.csv | tv

      tv dim: 3 x 2
      long_text            double
1     jaywalker-swear-pop… 5.89
2     unspoiled-treachery… 2.34
3     conflict-yield-sulp… 7.89

Revisit sigfigs, write tests, make minor corrections

I need to revisit the sigfig calculation. I will be doing a lot of comparisons with pillar output. The original output was tailored to a sigfig of 3. I just want to make sure that sigfigs are correct for many different sigfigs. It may also be good to put limits on how many sigfigs are allowed. We probably do not want users putting in sigfig=100.

Homebrew update/install fails with SHA mismatch

Not urgent, but FYI:

~ $ brew update && brew upgrade tidy-viewer
Already up-to-date.
==> Upgrading 1 outdated package:
alexhallam/tidy-viewer/tidy-viewer 1.4.3 -> 1.4.5
==> Downloading https://github.com/alexhallam/tv/releases/download/1.4.3/tidy-viewer--x86_64-apple-darwin
Already downloaded: [...]/Caches/Homebrew/downloads/def23c2288ff52abd6d39369986336a169881983a0d8d22d33e351f53db4e9b2--tidy-viewer--x86_64-apple-darwin.tar.gz
Error: SHA256 mismatch
Expected: 63233fbd215293e50edcb47d36bbab05975f19000e74bf25a8e6d161886492c0
  Actual: f79e7481add98af9a52a83bbc23e075e06be1af09f6b8f94315bc58d8216d90a
    File: [...]/Caches/Homebrew/downloads/def23c2288ff52abd6d39369986336a169881983a0d8d22d33e351f53db4e9b2--tidy-viewer--x86_64-apple-darwin.tar.gz
To retry an incomplete download, remove the file above.

Add `cargo-deb` to GHA

I need to add cargo-deb to GHA to make sure that I don't forget to add this binary to the release.

println! is a debug tool

image

what happens is command's stdout is closed and you end up with a broken pipe error. usually you want to handle that error by exiting the process gracefully. (search for "BrokenPipe" in ripgrep:crates/core/main.rs for examples.)

NA alignment is off

NA alignment should be as follows for the different data types:

type format
double right aligned, but not pass the decimal
int right aligned
char (and every other type) left aligned

This is an example of the correct formatting of NAs.

   name                  height   mass hair_color   
   <chr>                  <int>  <dbl> <chr>        
 1 Luke Skywalker           172   77   blond        
 2 C-3PO                    167   75   NA           
 3 R2-D2                     96   32   NA           
 4 Darth Vader              202  136   none         
 5 Leia Organa              150   49   brown        
 6 Owen Lars                178  120   brown, grey  
 7 Beru Whitesun lars       165   75   brown        
 8 R5-D4                     97   32   NA           
 9 Biggs Darklighter        183   84   black        
10 Obi-Wan Kenobi           182   77   auburn, white
11 Anakin Skywalker         188   84   blond        
12 Wilhuff Tarkin           180   NA   auburn, grey 
13 Chewbacca                228  112   brown        
14 Han Solo                 180   80   brown        
15 Greedo                   173   74   NA           
16 Jabba Desilijic Tiure    175 1358   NA           
17 Wedge Antilles           170   77   brown        
18 Jek Tono Porkins         180  110   brown        
19 Yoda                      66   17   white        
20 Palpatine                170   75   grey         
21 Boba Fett                183   78.2 black        
22 IG-88                    200  140   none         
23 Bossk                    190  113   none         
24 Lando Calrissian         177   79   black        
25 Lobot                    175   79   none         
26 Ackbar                   180   83   none         
27 Mon Mothma               150   NA   auburn       
28 Arvel Crynyd              NA   NA   brown 

This is the current NA formatting.

image

Note that NA should end where a where a float would end (right before the decimal).

[Looking for MacOS User Input] - Check background values on NAs

Just pulled tv on MacOS and saw that the NA values, which should only be colored, seem to have a background effect. I am not sure if this is an issue with the default terminal, the light theme, or tv.

I just want to open this issue to verify that tv is not doing any additional background effects for NAs.

I would love feedback from Mac Users who may use different terminal emulators or different themes. I am using the default terminal app.

To reproduce do the following:

# Download and untar (tar -xvzf) the apple-darwin release.
curl https://raw.githubusercontent.com/alexhallam/tv/main/data/a.csv -o a.csv
cat a.csv | ./tidy-viewer

Screen Shot 2021-09-30 at 7 28 06 AM

Catch UnequalLengths

I've come across this csv:

$ curl -s https://opendata.dwd.de/weather/local_forecasts/swsmos/swsmos_LATEST_opendata.csv.bz2 | bzcat | head
ID;Lat;Lon;YYYYMMDDHHmm;TL;TLSTA;RRL1c;RRS1c;RR6;WWL6;WWS3;RRS3c;R650;RC;TS;TD
202207031100
A006;54.88920;8.90870;202207031200;22.8;0.8;0.0;0.0;0.0;6.0;0.0;0.0;0.0;1;45.27;18.8
A006;54.88920;8.90870;202207031300;23.3;2.1;0.0;0.0;0.0;4.0;0.0;0.0;0.0;1;44.33;18.0
A006;54.88920;8.90870;202207031400;23.1;3.0;0.0;0.0;0.0;8.0;0.0;0.0;0.0;1;42.62;17.5
[...]

tv does not like it due to the second line being just an ISO-ish date string with missing data:

$ curl -s https://opendata.dwd.de/weather/local_forecasts/swsmos/swsmos_LATEST_opendata.csv.bz2 | bzcat | head | tidy-viewer -s ';'
thread 'main' panicked at 'a csv record: Error(UnequalLengths { pos: Some(Position { byte: 79, line: 2, record: 1 }), expected_len: 16, len: 1 })', src/main.rs:354:20

While the csv is clearly at fault, I expect this isn't all that unusual. I would like tv to be able to

  • at a minimum, have an option to just ignore (skip) faulty lines and continue
  • better, note the error in the line and leave it unformatted or similar, potentially highlighting it in a way? e.g. make the line red with a ⚠ symbol.

For now, I've added | awk 'NR != 2' into my pipe to skip the 2nd line explicitly.

Homebrew install

It would be great to see this installable on Mac via Homebrew!

-e broken

I have -e but I think that -n is doing an override on this parameter

tv titanic.csv -e

only prints 25 rows

--input-file or similar

Any program reading files as input should support deciding where that input comes from. Currently, tv seems to function by always opening and reading from stdin. This is inefficient and unconventional, and promoted UUOC.

Compare the current method:

$ cat file | tv

                v write 1         v write 2
file --> (cat) --> pipe --> (tv) --> stdout
      ^ read 1           ^ read 2

Reads:  2
Writes: 2

to a better method:

$ tv -i file

               v write 1
file --> (tv) --> stdout
      ^ read 1

Reads:  1
Writes: 1

It's less memory intensive, it's quicker on older machines, and it follows convention.

Alternatively, you can still do this on the shell without abusing a concatenation program as a file-reading one. The shell itself is capable of redirecting stdin to be the file itself. It would be at least better for the examples to show the "proper", or at least, better way of reading a file into a program:

$ tv < file

              v write 1
file --> (tv) --> stdout
     ^ read 1

Reads:  1
Writes: 1

Instead of giving you a pipe or the console input buffer, the shell now gives you the file's file descriptor when you ask for stdin. Readability and the clean "flow" of information through a pipeline isn't an issue here as tv is definitely not meant to be piped into other programs, though you could if you wanted to.

TL;DR: Reading from stdin promotes UUOC which makes using the program more intensive, especially noticeably for large files on older machines. Allowing the user to choose the input file, or at least promoting the usage of redirects over cat can speed up reading by however long it takes cat to read and write your file, and it's less memory intensive.

Stop cat abuse, and stop promoting cat abuse!

Package for Red Hat

Many AWS servers default to red hat servers that use yum for package install. It would ne nice to have a build for this distribution.

Can tv be snap installed on AWS machines?

Why not read gzip and snappy parquet

It is common for production pipelines to dump intermediate data. The problem occurs when this data is in parquet format. Wouldn't it be nice if tv could read parquet.

Add pager support

It would be useful to have a pager mode that does not cuts the number of rows but lets you explore the data (including the colorization). This support could be inside the tool or just for external pagers such as less or bat.

I have tried to use less in a pipe but I lose the colors:

tidy-viewer data.csv -n 10000 -c 1 | less -R

Thanks for this awesome tool, great job here!

Parsing booleans

Regex::new(r"^true$|^false$|^t$|^f$|TRUE$|^FALSE$|^T$|^F$|^True|^False").unwrap();

Not sure if this is a mistake, but the function comments mentions scanning for "0" and "1" but they seem omitted from the regex

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.