Code Monkey home page Code Monkey logo

qsv's Issues

RUSTSEC-2020-0159: Potential segfault in `localtime_r` invocations

Potential segfault in localtime_r invocations

Details
Package chrono
Version 0.4.19
URL chronotope/chrono#499
Date 2020-11-10

Impact

Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.

Workarounds

No workarounds are known.

References

See advisory page for additional details.

RUSTSEC-2020-0036: failure is officially deprecated/unmaintained

failure is officially deprecated/unmaintained

Details
Status unmaintained
Package failure
Version 0.1.8
URL rust-lang-deprecated/failure#347
Date 2020-05-02

The failure crate is officially end-of-life: it has been marked as deprecated
by the former maintainer, who has announced that there will be no updates or
maintenance work on it going forward.

The following are some suggested actively developed alternatives to switch to:

See advisory page for additional details.

Add `validate` command

Checks a CSV against a jsonschema file.

The jsonschema file can be located on the filesystem or a URL.

Feature Request: deduplicate columns/extract unique columns

Cross reference BurntSushi/xsv#283

We can use qsv dedup or the Unix command line tools sort and uniq to remove duplicate rows in plain text table, but I find myself wanting to do something similar with duplicated columns.

For example, after doing qsv join ... there will be at least one pair of duplicated columns (the values used for the join).

I am hoping for something like a column based version of the row based qsv dedup command (see #26).

I suspect I could workaround this via the qsv transpose command (see #3).

"Heavy-duty" configurable `geocode` command

qsv bundles reverse-geocoder - a "lightweight" static, nearest city geonames geocoder.

But for real, street-level geocoding, we need a configurable geocoder that can use the user's geocoder backend of choice.

For the initial implementation of a heavy-weight geocoder, we'll start in order of implementation:

  • pelias (because it's open-source, and users can stand up their own customizable pelias geocoder instance; no ToS prohibiting caching results, etc.)
  • google geocoder

Other geocoder backends in the backlog:

This geocoder will be its own qsv command - geocode unlike the current lightweight one, which is just one of many apply operations.

Add "normalize" command

To normalize data inside the CSV:

  • convert dates to ISO-8601 format
  • optionally add additional date-based columns to the CSV
    • weekday
    • week number
    • year
    • month
    • day
    • hour
    • minute
    • second
    • timezone
  • convert null fields of specified columns to a specified value (e.g. "N/A", "None", "0", "Not specified", etc.)

Create `schema` command

stats does a great job of not only getting descriptive stats about a CSV, it also infers the data type.
frequency compiles a frequency table.

The schema command will use the output of the stats, and optionally frequency (to specify the valid range of a field), to create a json schema file that can be used with the validate command (#46) to validate a CSV against the generated schema.

With the combo addition of schema and validate, qsv can be used in a more bullet-proof automated data pipeline that can fail gracefully when there are data quality issues:

  • use schema to create a json schema from a representative CSV file for a feed
  • adjust the schema to fine-tune the validation rules
  • use validate at the beginning of a data pipeline and fail gracefully when validate fails
  • for extra large files, use sample to validate against a sample
  • or alternatively, partition the CSV to break down the pipeline into smaller jobs

Remove py command

As it makes building qsv more difficult, with its various version, platform architecture dependencies.

Lua should be more than good enough as it has no external dependencies as its meant to be embeddable, and you can even call lua scripts.

Closes #55.

RUSTSEC-2020-0071: Potential segfault in the time crate

Potential segfault in the time crate

Details
Package time
Version 0.1.43
URL time-rs/time#293
Date 2020-11-18
Patched versions >=0.2.23
Unaffected versions =0.2.0,=0.2.1,=0.2.2,=0.2.3,=0.2.4,=0.2.5,=0.2.6

Impact

Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.

The affected functions from time 0.2.7 through 0.2.22 are:

  • time::UtcOffset::local_offset_at
  • time::UtcOffset::try_local_offset_at
  • time::UtcOffset::current_local_offset
  • time::UtcOffset::try_current_local_offset
  • time::OffsetDateTime::now_local
  • time::OffsetDateTime::try_now_local

The affected functions in time 0.1 (all versions) are:

  • at
  • at_utc

Non-Unix targets (including Windows and wasm) are unaffected.

Patches

Pending a proper fix, the internal method that determines the local offset has been modified to always return None on the affected operating systems. This has the effect of returning an Err on the try_* methods and UTC on the non-try_* methods.

Users and library authors with time in their dependency tree should perform cargo update, which will pull in the updated, unaffected code.

Users of time 0.1 do not have a patch and should upgrade to an unaffected version: time 0.2.23 or greater or the 0.3. series.

Workarounds

No workarounds are known.

References

time-rs/time#293

See advisory page for additional details.

Apply rustfmt

With the release of 0.16.1, all the major pending pull requests from xsv have been merged into qsv.

IMHO, we can now apply rustfmt to the whole project and standardize the code to rustfmt standards.

This will put us in a better position to directly accept PRs.

Package qsv in conda-forge (as done for xsv)

I currently use xsv installed from conda via the conda-forge community package collection,

https://anaconda.org/conda-forge/xsv
https://github.com/conda-forge/xsv-feedstock/tree/master/recipe

I would like to do the same for qsv since I increasingly find myself wanting to use functionality only available in this more up to date fork (thank you!).

I am not familiar with rust, but it ought to be straightforward to package qsv with an almost cut-and-paste copy of that recipe, so I am willing to attempt this having previously contributed recipes for other packages to conda-forge.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.