This was also raised in <a class="issue-link js-issue-link" data-error-text="Failed to

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Harmonize pixel data representations,about pydicom/pydicom

Comments (19)

darcymason commented on May 25, 2024

From [email protected] on September 17, 2009 16:37:11

The secound lazy approach sounds better to me too, however...

The modification of pixel_array may also effect; Rows, Columns, Bits Stored, Smallest
Image Pixel Value, Largest Image Pixel Value. Perhaps pixel_array should be a class
corresponding to an Image Pixel Module (PS 3.3 - 2008 C.7.6.3).

If the pixel_aray were requested, then the workflow could be to attempt to extract a
representation of an Image Pixel Module from a DICOM file, which could raise an
exception if there are problems with the tags.

To get around editing data in two places, and having it crash. In the case that
attempts were made to read one of these tags and the module had been changed, then
the module could write itself back to the tags. In the case that any attempts were
made to write, then the module could update itself or return to a state that it will
lazily evaluate itself again if required.

This concept could potentially be extended to other DICOM modules. With the various
modules extending from a common base.

from pydicom.

darcymason commented on May 25, 2024

From [email protected] on September 23, 2009 06:09:46

Martin, you've given some good ideas here -- the idea of a Module object is very
interesting. Conceptually I've always thought of pydicom as a low-level library, just
reading files and giving back a bag of data elements, requiring the user code to
structure their meaning and make sure they are self-consistent. However, perhaps
there is room to add a Module layer on top. I'd like to see it be explicit (as per
the usual "Explicit is better than Implicit" python philosophy) -- in other words,
the user calls code that is clearly operating at the Module level. This could mean a
new "SmartDataset" object, say, or a dataset.modules object that calls are made to
when you want self-consistency to be enforced. To do it in a general way (not just
for Pixel data) would require some thought and a significant amount of new code (and
IOD definitions files). I've added a new issue ( issue 57 ) for this.

from pydicom.

darcymason commented on May 25, 2024

From [email protected] on December 23, 2011 19:30:33

Changing status as this is not a priority for next release.

Labels: -Milestone-NextRelease Milestone-Release1.5

from pydicom.

darcymason commented on May 25, 2024

This was also raised in #178. As noted there, we need a setter for pixel_array (unless actually use a new backward-incompatible PixelData directly). Question is whether the setter also keeps track of and sets other data elements through the Module concept.

from pydicom.

jrkerns commented on May 25, 2024

So, having read this discussion now, maybe we can work toward a solution. Darcy, I like your original ideas, particularly the second. I guess an executive decision needs to be made whether to link PixelData and pixel_array.

(P.S. I retract this solution but keep it for discussion's sake)
Perhaps we can make a solution that accommodates both...
We could leave the pixel_array property alone for backwards-compatibility, but push a getter and setter (set_pixel_array(), get_pixel_array(); an internal function already exists for the getter). With an explicit setter we could also pass a parameter, something like propagate_changes for lack of a better term. This boolean flag, if true, would write to PixelData and update the Rows, etc tags to be consistent. False would simply be current behavior. This all assumes that pixel_array and PixelData are separate entities.

As I reread this it seems like things might get complicated. What if a user sets the pixel array with the setter but without propagating changes. What would the user expect? Given that a user can change any other tag directly and it's saved, e.g. with .save_as(), upon writing, shouldn't the user expect the same from pixel_array? The pixel_array setter and getter would require Numpy, but PixelData could still be edited directly. Given that there is no current setter for pixel_array there's no backwards-compatible issues, yes?

"Could see which one was modified most recently and use that as the definitive final value" If they are linked (pixel_array assignments (the setter) write to PixelData and pixel_array (the getter) reads from PixelData) then the first issue seems to be negated. IDK how fast .tostring() is, so maybe it's a performance hit.

"or could throw an error or warning of possible inconsistent pixel data changes." This seems like a good idea no matter what and could be incorporated into the setter. It seems like a good idea still though to set other relevant tags with the setter as well.

My hat goes to linking them, but my hat's pretty small ;-)

from pydicom.

cancan101 commented on May 25, 2024

I think the settable property is a no-go if you ever want to support any form of data compression. In order to map from pixel_array to PixelData, you will need to know the transfer syntax (ie the compression scheme), cols, rows, # bit used per pixel, etc.

Assuming you don't want to require the user to have set those on the data set in advance, something like this makes sense:
dataset.set_image(pixel_array, rows, cols, transfer_syntax,...) where those values are then all set on the underlying dataset.

from pydicom.

darcymason commented on May 25, 2024

@jrkerns, yes, there are a lot of complications to figure out. However, I think I may have found a potential way through all of it. See the iod branch for a proof of principle start on this. The idea is that a "Module" defines a list of tags that it "owns". The modules register themselves with Dataset -- If any of the tags is called in the dataset, it is hooked into the module class to get or set them. This can preserve backwards compatibilty, because user code can simply turn off the module hooks to run old code. Still lots to work out here in terms of pixels. For example I'm fairly sure a numpy array of pixels by itself does not uniquely determine number of planes and colour planes vs gray scale etc. So we might be back to a set_image() type function as Alex has mentioned.

@cancan101, you make a good point about transfer syntax. However, I read that with the emphasis on the word "transfer", meaning the syntax can be whatever we want until communicating it to someone else. So it should be possible to leave the pixel data in whatever representation we want internally, until the point where it is written somewhere else. I would argue that after reading a file, regardless of transfer syntax, if someone tried to access the pixels they would want them decompressed or else they cannot be sensibly interpreted anyway. So the ImagePixel module would keep track of the original syntax, convert when pixels requested, and when writing, reconvert if pixels have been changed or if the transfer syntax was changed. It would require some internal flags, but i think it wouldn't be too bad.

from pydicom.

darcymason commented on May 25, 2024

There was a lot of discussion previously, now I'm trying to tidy this up for v1.0 release (or not).

Here is my 'gun to the head' gotta make a decision, quick solution:

make PixelData gettable or settable as a numpy array
pixel_array can stay as a 'getter', perhaps with DeprecationWarning, but is simply an alias for PixelData
as now, setting pixel data is done without enforcing connection to Rows, Columns, planes and all that. Like all other data elements, consistency across data elements is up to the user code.
notwithstanding the point above, we could warn on obvious inconsistencies

Explanation: Pydicom makes dicom file data element values available as the natural python type for that value. This proposal extends that to PixelData. The natural python type for that is a numpy array.

If that is the path, here are the implementation details:

on writing PixelData, if it is a numpy array, it is converted to bytes for the file (i.e. tostring() is done for the user transparently).
wherever PixelData has been used internally in pydicom (including unit tests), it will likely be need to be changed to pixel_bytes() or some name like that, that actually returns the raw bytes pydicom has previously expected.
pixel_array coded as alias for PixelData getter, as mentioned above
on setting PixelData, bytes are acceptable (for backwards compatibility), or a numpy array.

Compressed pixel data complicates this. For now we could just raise NotImplemented if the Dataset transfer syntax is a compressed one, i.e. the user is free to decompress and then write the file, but it is an error unless they change the transfer syntax to an uncompressed version.

Backwards-incompatibilty: with the above solution, PixelData no longer returns 1-D bytes, but a correctly dimensioned numpy array. If any old user code actually used the bytes directly, that code will probably break. I suspect this is very rare, as pixel_array would have been the natural way to
use pixel data.

I think this is a good solution all around. It minimizes backwards incompatibility, and is consistent with pydicom philosophy for other data types. I've also coded much of this before in test branches, so it can be implemented fairly quickly.

from pydicom.

cancan101 commented on May 25, 2024

I disagree with the suggested changes. I think that PixelData should continue to return the raw bytes. This would be 1. consistent with previous behavior and 2. consistent with how the attr access works for all other dicom keywords. Further the idea of transparently calling tostring on the array when assigning is problematic. This works when creating uncompressed dicoms, but currently I am creating some compressed dicoms. this involves me creating the compressed binary representation and then assigning it to PixelData.

Further the "what" that is returned when accessing either of pixel_array or your new PixelData still seems somewhat ambiguous. There is the question as to whether you return the LUT value, the YBR value or the RGB value (see #273 and #263).

from pydicom.

vsoch commented on May 25, 2024

I think having variables that are named clearly based on what they return makes a lot of sense. pixel_data, as a naive person like me would understand it, is the data from the pixels in some format I can work pixel_bytes is the same data in bytes. I can think of use cases for wanting both -
the one that is strongest for my groups are just wanting to extract pixel data for image processing, in which case the numpy array is much preferable. I'm not sure about using tostring (when moving from bytes to string I usually use decode('utf0-8') or something like that, but I suspect this is different).

from pydicom.

darcymason commented on May 25, 2024

I think that PixelData should continue to return the raw bytes. This would be 1. consistent with previous behavior and 2. consistent with how the attr access works for all other dicom keywords

Where are raw bytes returned for other DICOM keywords? Binary numbers, for example, are changed to int, float, etc. Multi-valued items are split by the slash and returned as python data types. RawDataElement's carry the raw bytes, but only until they are accessed, then they are converted.

There is the question as to whether you return the LUT value, the YBR value or the RGB value (see #273 and #263).

Yes, this is an issue. But in any case we have a numpy array to return, whether it is called PixelData or pixel_array. Similarly if the user wants to modify it and write back to file, then we need a way to convert them to some binary representation. In any case, with my proposal, users would still be able to get the raw bytes (through pixel_bytes), and are able to write a bytes array to PixelData to have that control if they needed it.

from pydicom.

cancan101 commented on May 25, 2024

I don't love the idea that depending on whether it is bytes or an array that is assigned to PixelData, the behavior is different.

from pydicom.

darcymason commented on May 25, 2024

There is some concern there, I suppose, but in line with python's 'We are all consenting adults' philosophy, I see it that if someone assigns bytes, then they have full control, we write what they set. If they assign a numpy array, we are going to convert it to bytes when it is written. If that isn't what they wanted then they must have assigned the wrong array anyway, or have not set the other image-related data elements for resolution, transfer syntax, etc. correctly.

One concern is that if the user sets PixelData with bytes, then accesses the value, it will trigger a conversion to a numpy array, which will depend on the other data elements. But we could also use @cancan101's convenience function from a comment further up this thread:

dataset.set_image(pixel_array, rows, cols, transfer_syntax,...)

to help guide that down the right path. Maybe that should be the way that is documented as the 'correct' way to set a pixel array.

from pydicom.

rhaxton commented on May 25, 2024

I would also prefer not to break existing code (meaning mine). I am used to the pixel_array/PixelData duality and I don't find it any worse than the horror that is character encodings in python 2.7. I like the idea that pydicom can read any tag right out of the box without needing numpy/scipy/pillow/etc.
Any change that breaks client code is a big step.

from pydicom.

darcymason commented on May 25, 2024

I appreciate everyone's comments. I'm certainly willing to keep things as they are, which is obviously much easier.

Interestingly, though, I just tried something:

ds = pydicom.read_file("CT_small.dcm")
pix = ds.pixel_array
ds.PixelData = pix
ds.save_as("del_me.dcm")

And compared the output file with the original. Binary identical. So it seems that we can already set the PixelData to a numpy array and it seems to work fine. Reading on python's write() method, it can accept any "bytes-like object" supporting the 'Buffer Protocol' and a C-contiguous buffer. numpy seems to do C order by default, but there is also a ascontiguousarray() method, which makes me think that my example got lucky, and for larger arrays that are not contiguous in memory, it might not work. Could try some kind of slicing example with a step, I guess, to try to show that. Even if it were needed, ascontiguousarray() could be called on writing the file.

Meanwhile, numpy's tostring() doc says it is a compatibility alias for tobytes(). It would make a lot of sense for us to adopt the latter for all future documentation.

Anyway, I'm wondering about putting together a dev branch to at least get rid of the 'tostring' requirement, let people run it and see if it breaks backwards compatibility. I think it can be done so that it doesn't. And I think it can be done so that numpy is only required if the bytes are accessed. As I said, I'm okay to drop this, but v1.0 is our best time to make a change like this so I want to make sure before I drop it.

from pydicom.

mrbean-bremen commented on May 25, 2024

@darcymason - is this still something that needs to be done, or is the current state sufficient? I got a bit lost in the discussion (vs the changes made lately in the implementation), not sure if this is still a valid issue.

from pydicom.

darcymason commented on May 25, 2024

is this still something that needs to be done, or is the current state sufficient

Hmmm, I'd like to think about it a bit longer. Re-reading the discussion, my take-home is that at least part of this can be done without breaking backwards compatibility -- for example, assigning a numpy array to PixelData can just be handled the way the user currently does -- by converting it via tobytes(). That could save one step that is a little confusing for newcomers.

from pydicom.

cancan101 commented on May 25, 2024

I would vote to avoid that sort of magic and keep pixeldata as the bytes for assignment and reading and use pixel array for handling numpy arrays.

from pydicom.

darcymason commented on May 25, 2024

I still dream of resolving this but have pushed back to v3.0. Reading through quickly again, there was lots of concern about breaking code, but I believe my branch code was showing that it can be done without breaking. But assigning v3.0 just in case...

from pydicom.

Harmonize pixel data representations about pydicom HOT 19 OPEN

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent