Code Monkey home page Code Monkey logo

inplace's Introduction

Project Status: Active - The project has reached a stable, usable
state and is being actively developed. CI Status coverage pyversions Conda Version MIT License

GitHub | PyPI | Issues | Changelog

The in_place module provides an InPlace class for reading & writing a file "in-place": data that you write ends up at the same filepath that you read from, and in_place takes care of all the necessary mucking about with temporary files for you.

For example, given the file somefile.txt:

'Twas brillig, and the slithy toves
    Did gyre and gimble in the wabe;
All mimsy were the borogoves,
    And the mome raths outgrabe.

and the program disemvowel.py:

import in_place

with in_place.InPlace("somefile.txt") as fp:
    for line in fp:
        fp.write("".join(c for c in line if c not in "AEIOUaeiou"))

after running the program, somefile.txt will have been edited in place, reducing it to just:

'Tws brllg, nd th slthy tvs
    Dd gyr nd gmbl n th wb;
ll mmsy wr th brgvs,
    nd th mm rths tgrb.

and no sign of those pesky vowels remains! If you want a sign of those pesky vowels to remain, you can instead save the file's original contents in, say, somefile.txt~ by constructing the filehandle with:

in_place.InPlace("somefile.txt", backup_ext="~")

or save to someotherfile.txt with:

in_place.InPlace("somefile.txt", backup="someotherfile.txt")

Compared to the in-place filtering implemented by the Python standard library's fileinput_ module, in_place offers the following benefits:

  • Instead of hijacking sys.stdout, a new filehandle is returned for writing.
  • The filehandle supports all of the standard I/O methods, not just readline().
  • There are options for setting the encoding, encoding error handling, and newline policy for opening the file, along with support for opening files in binary mode, and these options apply to both input and output.
  • The complete filename of the backup file can be specified; you aren't constrained to just adding an extension.
  • When used as a context manager, in_place will restore the original file if an exception occurs.
  • The creation of temporary files won't silently clobber innocent bystander files.

Installation

in_place requires Python 3.8 or higher. Just use pip for Python 3 (You have pip, right?) to install it:

python3 -m pip install in_place

Basic Usage

in_place provides a single class, InPlace. Its constructor takes the following arguments:

name=<PATH> (required)

The path to the file to open & edit in-place

mode=<"b"|"t"|None>

Whether to operate on the file in binary or text mode. If mode is "b", the file will be opened in binary mode, and data will be read & written as bytes objects. If mode is "t" or None (the default), the file will be opened in text mode, and data will be read & written as str objects.

backup=<PATH>

If set, the original contents of the file will be saved to the given path when the instance is closed. backup cannot be set to the empty string.

backup_ext=<EXTENSION>

If set, the path to the backup file will be created by appending backup_ext to the original file path.

backup and backup_ext are mutually exclusive. backup_ext cannot be set to the empty string.

**kwargs

Any additional keyword arguments (such as encoding, errors, and newline) will be forwarded to open() when opening both the input and output file streams.

name, backup, and backup_ext can be str, filesystem-encoded bytes, or path-like objects.

InPlace instances act as read-write filehandles with the usual filehandle attributes, specifically:

__iter__()              __next__()              closed
flush()                 name                    read()
read1() *               readinto() *            readinto1() *
readline()              readlines()             write()
writelines()

* binary mode only

InPlace instances also feature the following new or modified attributes:

close()

Close filehandles and move files to their final destinations. If called after the filehandle has already been closed, close() does nothing.

Be sure to always close your instances when you're done with them by calling close() or rollback() either explicitly or implicitly (i.e., via use as a context manager).

rollback()

Like close(), but discard the output data (keeping the original file intact) instead of replacing the original file with it

__enter__(), __exit__()

When an InPlace instance is used as a context manager, on exiting the context, the instance will be either closed (if all went well) or rolled back (if an exception occurred). InPlace context managers are not reusable but are reentrant (as long as no further operations are performed after the innermost context ends).

input

The actual filehandle that data is read from, in case you need to access it directly

output

The actual filehandle that data is written to, in case you need to access it directly

inplace's People

Contributors

dependabot[bot] avatar jwodder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

inplace's Issues

Mypy type compliance

Currently when using this package (for example the InPlace class), when linting with mypy a missing library stubs or py.typed marker error is raised. Would you consider adding stub files to make this package mypy compatible?

To reproduce, create a small file using inplace:

import in_place

with in_place.InPlace('example.txt') as example_file:
    for line in example_file:
        example_file.write(line)

Run mypy file_name.py.

Upcoming breaking changes in v1.0

I am currently working on v1.0 of in_place, which will bring with it a number of breaking changes. If your project's requirements limit in_place to less than v1.0 (e.g., in_place < 1.0 or in_place ~= 0.4), your code will continue working until you decide to upgrade. A search of GitHub indicates that nearly all users of in_place will be unaffected by these changes – i.e., your code will continue working as-is with v1.0 — but it's better to be safe than sorry.

The planned breaking changes are:

  • The InPlace class will be split (again) into a text-only InPlace class and a bytes-only InPlaceBytes class. The mode argument will no longer be supported; instead, construct an instance of the appropriate class for the data type you want to work on. This change is necessary in order to add type annotations.
    • After some experimentation, I've managed to add type annotations without having to split up the InPlace class again.
  • InPlaceText and InPlaceBytes will be removed.
  • The delay_open argument and the open() method will be removed. Filehandles will now always be created at the moment when an in-place instance is constructed, just like with calling the standard library's open().
  • The move_first argument will be removed. Only the move_first=False semantics will be retained.
  • The readall() method will be removed. It doesn't apply to the underlying I/O types, and I don't think it ever worked.
  • When the input path points to a symlink and backup_ext is given, the backup extension will now be appended to the resolved path rather than to the pre-resolved path.
  • Support for Python 3.7 will be dropped.

Add a `follow_symlinks: bool` option

InPlace should gain a follow_symlinks: bool option for controlling whether to resolve any symbolic links in the input file path before opening the file for reading. Setting this to True (the default) will result in the same behavior as now. Setting this to False disables symlink resolution.

  • Note that this option only controls resolution of symlinks in the name argument; symlinks in the backup argument are never resolved by in-place.
  • Note that backup_ext is applied to the input file path after any symlink resolution is performed.

Note to self: Make use of the test cases from the Rust port for this.

OSError: [Errno 28] No space left on device

Hi, thank you for the nice module!

My reason for using in_place would be to edit very large files without needing to make copies (which would exceed the amount of space on my system). However, when I use in_place to edit a file line by line, after a while the amount of space on my computer runs out. Here's my code:

def strip_each_line(dataset_path):
    with in_place.InPlace(dataset_path) as dataset_file:
        for i, line in enumerate(dataset_file):
            if i % 10000 == 0:
                print(i)
            stripped_line = line.strip()
            dataset_file.write(stripped_line)


strip_each_line('main_data/main_dataset.txt')

Would there be a way to edit the file in place without taking up extra memory? Thanks in advance!

Write back regularly

Hey there,
how is your package supposed to work with big files and interruptions?

I want to build a script that can process a big text file line by line.
As the process may take quite long, the script should allow to be terminated, resulting in a partly modified file the script later can continue with. Any ideas? Best!

write with a buffer size

Hello there,

is there any functionality to specify a buffer size for writing and not writing for each line

Add a `create: bool` option

InPlace should gain a create: bool option that defaults to False. If this option is set to True and the specified input file doesn't exist, in-place should create the file with empty contents.

Counterintuitive and unuseful results when attempting to edit a symlink

I tried doing an in-place edit on a symbolic link foo.txt in the current directory, requesting a backup copy with an extension of ~.

Expected result of an “in-place edit”: A backup copy of the file, foo.txt~, is made either in the current directory or next to the destination file (I think you could argue for either behavior). The destination file pointed to by the symlink foo.txt is updated.

Actual result: A new file foo.txt is created next to the symlink, with the updates. The backup file foo.txt~ is now a symlink to the destination. The destination file is unchanged.

Though the semantic meaning of dealing with symlinks can get complicated at times, I can't think of any situation in which this would be the desired behavior.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.