Code Monkey home page Code Monkey logo

agg's Introduction

Agg

Supported Python Versions Last commit pypi version

A Python library to aggregate files and data. This release supports merging two or more csv files.

Documentation

merge_csv(files_to_merge: tuple,
          output_file: Union[str, pathlib.Path],
          first_line_is_header: Optional[bool] = None) -> dict:

The method merge_csv merges multiple CSV files in the order they are specified. It will overwrite any existing file with the same name.

Parameters:

  • files_to_merge: A tuple containing paths to a files in the order they are to be merged.
  • output_file: The path to the result file. The folder must already exist. An existing file with the same name will be overwritten.
  • first_line_is_header: if True agg will remove the first line of all csv files except for the first. If not set agg will guess if the first line is a header or not.

Its return value is a dictionary containing:

  • a SHA256 hash of the result file,
  • the name of the result file,
  • its absolute path,
  • a boolean indicating whether the first line is a header or not,
  • its size in bytes,
  • its number of lines (including the header),
  • a list of the files merged (absolute path).

Example

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import agg

# tuples are ordered:
my_files = ('file_01.csv', 'file_02.csv')

# Merge the CSV-files - in the order specified by the tuple - into a new file
# called "merged_file". Meanwhile copy the header / first line only once from
# first file.
merged_file = agg.merge_csv(my_files, 'merged_file', True)
# The return value is a dictionary!


print(merged_file)

# {'sha256hash': 'fff30942d3d042c5128062d1a29b2c50494c3d1d033749a58268d2e687fc98c6',
#  'file_name': 'merged_file',
#  'file_path': '/home/exampleuser/merged_file',
#  'first_line_is_header': True,
#  'file_size_bytes': 76,
#  'line_count': 8,
#  'merged_files': ['/home/exampleuser/file_01.csv',
#                  '/home/exampleuser/file_02.csv']
# }

print(merged_file['file_path'])
# '/home/exampleuser/merged_file'

agg's People

Contributors

ruedigervoigt avatar

Watchers

 avatar  avatar

agg's Issues

Different line endings

Agg does produce identical result files under Linux and MacOS. Using MS Windows line count, size and as a result SHA256 do not match.

If agg is run under Windows with the current test files the line endings in the result file are CR CR LF . This does not make sense (instead of CR LF or just LF).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.