Code Monkey home page Code Monkey logo

dliswriter's Introduction

licence test coverage pytest mypy flake8 docs pypi

dliswriter

Welcome to dliswriter - an open-source Python package for writing DLIS files.

The package allows you to specify the structure, data, and metadata of your DLIS file in a simple and flexible fashion. A minimal example is shown below.

import numpy as np  # for creating mock datasets
from dliswriter import DLISFile, enums

df = DLISFile()

df.add_origin("MY-ORIGIN")  # required; can contain metadata about the well, scan procedure, etc.

# define channels with numerical data and additional information
n_rows = 100  # all datasets must have the same number of rows
ch1 = df.add_channel('DEPTH', data=np.linspace(0, 10, n_rows), units=enums.Unit.METER)
ch2 = df.add_channel("RPM", data=np.arange(n_rows) % 10)
ch3 = df.add_channel("AMPLITUDE", data=np.random.rand(n_rows, 5))

# define frame, referencing the above defined channels
main_frame = df.add_frame("MAIN-FRAME", channels=(ch1, ch2, ch3), index_type=enums.FrameIndexType.BOREHOLE_DEPTH)

# write the data and metadata to a physical DLIS file
df.write('./new_dlis_file.DLIS')

For more details about the DLIS file format and using dliswriter, please see the documentation.

Performance

According to our rough measurements, the file writing time seems to be pretty much linearly dependent on the amount of data, in particular the number of rows. There is also some dependency on the dimensionality of the data - e.g a single image (2D dataset) with 1000 columns will write about 20% faster than 10 images of 100 columns each. A rough estimate of the writing speed is about 20M float64 values per second (measured on an x64-based PC with Intel Core i9-8950HK with MS Windows 11 and Python 3.9.19).

The performance may be further tuned by adjusting the input_chunk_size and output_chunk_size of the writer (see this example). The former limits how much of the input data are loaded to memory at a time; the latter denotes the number of output bytes kept in memory before each partial file write action. The optimal values depend on the hardware/software configuration and the characteristics of the data (number and dimensionality of the datasets), but the defaults should in general be a good starting point.

Compatibility notes

Please note that some DLIS viewer applications are not fully compliant with the DLIS standard. If a DLIS file produced by dliswriter causes issues in some of the viewers, it might not necessarily be a dliswriter bug. Some of the known compatibility issues - and ways of dealing with them - are described in a dedicated section of the documentation. If you run into problems not covered by the documentation, please open a new issue.

Installation

dliswriter can be installed from PyPI:

pip install dliswriter

or Anaconda (in a conda environment):

conda install dliswriter -c conda-forge

For developers

Setting up dliswriter for development purposes requires:

Once these requirements are fulfilled, follow the steps below:

  1. Clone the repository and enter it. From a console:

    git clone https://github.com/well-id/dliswriter.git
    cd dliswriter
    
  2. Create the dlis-writer environment from the environment.yaml file and activate it:

    conda env create -f environment.yaml
    conda activate dlis-writer
    
  3. Install dliswriter in editable mode using pip:

    pip install --no-build-isolation --no-deps -e .
    

    For explanation of the required flags, see this issue.

  4. You're good to go! For verification, you can run the tests for the package (still from inside the dliswriter directory):

    pytest .
    

Contributing

To contribute to the dliswriter, please follow this procedure:

  1. Fork the repository
  2. Clone the fork to your computer
  3. Check out the devel branch: git checkout devel
  4. Create a new branch from devel: git checkout -b <your branch name>
  5. Make your changes, commit them, and push them
  6. From the GitHub page of your fork, create a pull request to the original repository.

You can find some more detailed instructions about the fork-and-pull request workflow in the GitHub Docs.

You might also want to have a look at our issues log.


Authors

dliswriter has been developed at Well ID by:

  • Dominika Dlugosz
  • Magne Lauritzen
  • Kamil Grunwald
  • Omer Faruk Sari

Based on the definition of the RP66 v1 standard.

dliswriter's People

Contributors

grundig avatar nup002 avatar o-sari avatar the-mysh avatar

Stargazers

 avatar  avatar

dliswriter's Issues

Test EFLR set names

Each EFLRSet has a set name.

Test that different EFLRSet instances of the same type (e.g. ChannelSet) can be correctly added to the file if they have differing set names. Check that the child EFLRItem (e.g. ChannelItem) instances are added, referenced, and retrieved correctly.

`copy_number` of EFLR items

Each EFLRItem has an attribute copy_number. It is meant to distinguish between two EFLR items of the same type and name.

The copy_number is computed automatically at the level of an EFLRItem by checking how many EFLRItem instances of the given name are already defined in the parent EFLRSet and subtracting 1 (so for the first ELFRItem of the given type and name, copy_number is 0).

However, this is only checked at the level of the parent EFLRSet of the given EFLRItem. In most cases this is sufficient, because normally a single EFLRSet of a given type is defined in a file, but it is possible to define more (see this issue). In case multiple EFLRSets are added to the same file, it is likely that multiple EFLRItem instances will share the same type, name, and copy_number, making them indistinguishable in DLIS references. As a result, references might point to the wrong items (resulting i.e. in a FrameItem displaying the wrong channel data).

`Path` attributes as `Channel` objects

Several attributes of PathItem (borehole_depth, vertical_depth, etc.; check RP66 docs) can be either (arrays of) numbers or ChannelItem instances. Now only the numerical version is supported.

Add support for ChannelItem instances being used as values of these attributes.

Marking as 'enhancement' because it's not critical and as 'bug' because this should be accepted according to the standard.

Performance

Use line profiling to speed up code execution.
Consider different sizes and shapes (number of rows vs number of columns) of the datasets.

Express input chunk size in bytes

When writing the file, data are loaded in chunks,

The size of those chunks is currently specified (by the user) as the number of rows of data table loaded at a time.

Consider expressing this in bytes instead.

Test representation code determination

Test that representation_code of different Attributes is correctly determined from the value.
In particular, check the Attributes for which multivalued=True.

Optimise input and output chunk sizes

When writing a file, data are loaded in chunks and the output bytes are saved in chunks.

The sizes of input and output chunks cna be specified by the user.
Size of an input chunk is currently expressed in the number of rows from a dataset.
Size of an output chunk is expressed in the number of bytes.

Both should be optimized for different shapes of datasets - combinations of different lengths (numbers of rows) and widths (number of columns, number of images, number of columns in images).

`Path` object

Whenever a Path object is added to the file, DeepView refuses to open it.
Sometimes issues appear also in dlisio.

Note that in RP66, Path is similar to frame and should probably be treated as such.

`Path`: `time`

Attribute time of PathItem can be numerical or of datetime type. For now, only the numerical option is accepted.
Add support for datetime values.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.