Code Monkey home page Code Monkey logo

data_compression_course's Introduction

A Crash Course on Data Compression

The need for storing data in compressed form is becoming more and more important due to the ever-growing rate of data produced on a daily basis. To keep up with this data deluge, data compression is a mandatory step to deliver good quality of service in concrete applications.

In this introductory course you will learn about fundamental data compression algorithms that are all widely adopted by tools that we use every day, like filesystems, computer networks, search engines, and databases. These algorithms have now become indispensable knowledge across many fields in computing, including Information Retrieval, Machine Learning, Natural Language Processing, Applied Physics, and Bioinformatics.

Syllabus

  1. Introduction

    • What is and Why Data Compression?
    • Basic Model and Limit
    • Space vs. Time Trade-offs
    • Fundamental Question(s), Undecidability
    • Some Applications
    • Data, Information, Redundancy, Technological Limitations
    • Warmup: Bit-Packing, Bit-Wise Operations, Read/Write Binary Data, Run-Length Encoding
  2. Integer Codes

    • The Static Integer Coding Problem
    • Unique Decodability
    • Unary, Binary, Minimal Binary, Gamma, Delta, Golomb, Rice, Exponential Golomb, Fibonacci, Variable-Byte
    • Effectiveness
    • Information Content, Entropy, Distributions
    • Zero- Minimum-Redundancy Codes, Kraft-McMillan Inequality
  3. List Compressors

    • The Sorted List Coding Problem
    • Motivations
    • Combinatorial Lower Bound
    • Folklore Strategies
    • Blocking, PForDelta, Simple, Elias-Fano, partitioned Elias-Fano, Rank & Select on Bit-Vectors, Interpolative, Directly-Addressable, Hybrid Approaches
    • Performance
  4. Statistical Compressors

    • The Statistical and Minimum-Redundancy Coding Problem
    • Assigning Canonical Prefix-Free Codewords
    • Shannon-Fano
    • Huffman, Canonical Huffman
    • Arithmetic Coding
    • Asymmetric Numeral Systems
  5. Dictionary-based Compressors

    • The Dictionary-based Coding Problem
    • LZ77, LZSS, gzip, LZ78, LZW
    • Variants
    • Overview of Compressors

Books

  1. Robert Sedgewick and Kevin Wayne. 2011. Algorithms. Four-th edition. Addison-Wesley Professional, ISBN 0-321-57351-X.

  2. David Salomon. 2007. Variable-Length Codes for Data Compression. Springer Science & Business Media, ISBN 978-1-84628-959-0.

  3. Alistair Moffat and Andrew Turpin. 2002. Compression and Coding Algorithms. Springer Science & Business Media, ISBN 978-1-4615-0935-6.

  4. Gonzalo Navarro. 2016. Compact Data Structures. Cambridge University Press, ISBN 978-1-107-15238-0.

Survey Papers

  1. G. E. P. and Rossano Venturini. 2020. Techniques for Inverted Index Compression. ACM Computing Surveys. 53, 6, Article 125 (November 2021), 36 pages. https://doi.org/10.1145/3415148

  2. Alistair Moffat. 2019. Huffman Coding. ACM Computing Surveys. 52, 4, Article 85 (July 2020), 35 pages. https://doi.org/10.1145/3342555

Creator

Giulio Ermanno Pibiri (jermp)

A special thank to

data_compression_course's People

Contributors

jermp avatar lisifra96 avatar tansy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.