Comments (19)
From [email protected] on September 17, 2009 16:37:11
The secound lazy approach sounds better to me too, however...
The modification of pixel_array may also effect; Rows, Columns, Bits Stored, Smallest
Image Pixel Value, Largest Image Pixel Value. Perhaps pixel_array should be a class
corresponding to an Image Pixel Module (PS 3.3 - 2008 C.7.6.3).
If the pixel_aray were requested, then the workflow could be to attempt to extract a
representation of an Image Pixel Module from a DICOM file, which could raise an
exception if there are problems with the tags.
To get around editing data in two places, and having it crash. In the case that
attempts were made to read one of these tags and the module had been changed, then
the module could write itself back to the tags. In the case that any attempts were
made to write, then the module could update itself or return to a state that it will
lazily evaluate itself again if required.
This concept could potentially be extended to other DICOM modules. With the various
modules extending from a common base.
from pydicom.
From [email protected] on September 23, 2009 06:09:46
Martin, you've given some good ideas here -- the idea of a Module object is very
interesting. Conceptually I've always thought of pydicom as a low-level library, just
reading files and giving back a bag of data elements, requiring the user code to
structure their meaning and make sure they are self-consistent. However, perhaps
there is room to add a Module layer on top. I'd like to see it be explicit (as per
the usual "Explicit is better than Implicit" python philosophy) -- in other words,
the user calls code that is clearly operating at the Module level. This could mean a
new "SmartDataset" object, say, or a dataset.modules object that calls are made to
when you want self-consistency to be enforced. To do it in a general way (not just
for Pixel data) would require some thought and a significant amount of new code (and
IOD definitions files). I've added a new issue ( issue 57 ) for this.
from pydicom.
From [email protected] on December 23, 2011 19:30:33
Changing status as this is not a priority for next release.
Labels: -Milestone-NextRelease Milestone-Release1.5
from pydicom.
This was also raised in #178. As noted there, we need a setter for pixel_array (unless actually use a new backward-incompatible PixelData directly). Question is whether the setter also keeps track of and sets other data elements through the Module concept.
from pydicom.
So, having read this discussion now, maybe we can work toward a solution. Darcy, I like your original ideas, particularly the second. I guess an executive decision needs to be made whether to link PixelData
and pixel_array
.
(P.S. I retract this solution but keep it for discussion's sake)
Perhaps we can make a solution that accommodates both...
We could leave the pixel_array
property alone for backwards-compatibility, but push a getter and setter (set_pixel_array()
, get_pixel_array()
; an internal function already exists for the getter). With an explicit setter we could also pass a parameter, something like propagate_changes
for lack of a better term. This boolean flag, if true, would write to PixelData
and update the Rows
, etc tags to be consistent. False would simply be current behavior. This all assumes that pixel_array
and PixelData
are separate entities.
As I reread this it seems like things might get complicated. What if a user sets the pixel array with the setter but without propagating changes. What would the user expect? Given that a user can change any other tag directly and it's saved, e.g. with .save_as()
, upon writing, shouldn't the user expect the same from pixel_array
? The pixel_array
setter and getter would require Numpy, but PixelData
could still be edited directly. Given that there is no current setter for pixel_array
there's no backwards-compatible issues, yes?
"Could see which one was modified most recently and use that as the definitive final value" If they are linked (pixel_array
assignments (the setter) write to PixelData
and pixel_array
(the getter) reads from PixelData
) then the first issue seems to be negated. IDK how fast .tostring()
is, so maybe it's a performance hit.
"or could throw an error or warning of possible inconsistent pixel data changes." This seems like a good idea no matter what and could be incorporated into the setter. It seems like a good idea still though to set other relevant tags with the setter as well.
My hat goes to linking them, but my hat's pretty small ;-)
from pydicom.
I think the settable property is a no-go if you ever want to support any form of data compression. In order to map from pixel_array
to PixelData
, you will need to know the transfer syntax (ie the compression scheme), cols, rows, # bit used per pixel, etc.
Assuming you don't want to require the user to have set those on the data set in advance, something like this makes sense:
dataset.set_image(pixel_array, rows, cols, transfer_syntax,...)
where those values are then all set on the underlying dataset.
from pydicom.
@jrkerns, yes, there are a lot of complications to figure out. However, I think I may have found a potential way through all of it. See the iod branch for a proof of principle start on this. The idea is that a "Module" defines a list of tags that it "owns". The modules register themselves with Dataset -- If any of the tags is called in the dataset, it is hooked into the module class to get or set them. This can preserve backwards compatibilty, because user code can simply turn off the module hooks to run old code. Still lots to work out here in terms of pixels. For example I'm fairly sure a numpy array of pixels by itself does not uniquely determine number of planes and colour planes vs gray scale etc. So we might be back to a set_image() type function as Alex has mentioned.
@cancan101, you make a good point about transfer syntax. However, I read that with the emphasis on the word "transfer", meaning the syntax can be whatever we want until communicating it to someone else. So it should be possible to leave the pixel data in whatever representation we want internally, until the point where it is written somewhere else. I would argue that after reading a file, regardless of transfer syntax, if someone tried to access the pixels they would want them decompressed or else they cannot be sensibly interpreted anyway. So the ImagePixel module would keep track of the original syntax, convert when pixels requested, and when writing, reconvert if pixels have been changed or if the transfer syntax was changed. It would require some internal flags, but i think it wouldn't be too bad.
from pydicom.
There was a lot of discussion previously, now I'm trying to tidy this up for v1.0 release (or not).
Here is my 'gun to the head' gotta make a decision, quick solution:
- make
PixelData
gettable or settable as a numpy array pixel_array
can stay as a 'getter', perhaps with DeprecationWarning, but is simply an alias forPixelData
- as now, setting pixel data is done without enforcing connection to Rows, Columns, planes and all that. Like all other data elements, consistency across data elements is up to the user code.
- notwithstanding the point above, we could warn on obvious inconsistencies
Explanation: Pydicom makes dicom file data element values available as the natural python type for that value. This proposal extends that to PixelData
. The natural python type for that is a numpy array.
If that is the path, here are the implementation details:
- on writing
PixelData
, if it is a numpy array, it is converted to bytes for the file (i.e.tostring()
is done for the user transparently). - wherever
PixelData
has been used internally in pydicom (including unit tests), it will likely be need to be changed topixel_bytes()
or some name like that, that actually returns the raw bytes pydicom has previously expected. pixel_array
coded as alias forPixelData
getter, as mentioned above- on setting
PixelData
, bytes are acceptable (for backwards compatibility), or a numpy array.
Compressed pixel data complicates this. For now we could just raise NotImplemented if the Dataset
transfer syntax is a compressed one, i.e. the user is free to decompress and then write the file, but it is an error unless they change the transfer syntax to an uncompressed version.
Backwards-incompatibilty: with the above solution, PixelData
no longer returns 1-D bytes, but a correctly dimensioned numpy array. If any old user code actually used the bytes directly, that code will probably break. I suspect this is very rare, as pixel_array
would have been the natural way to
use pixel data.
I think this is a good solution all around. It minimizes backwards incompatibility, and is consistent with pydicom philosophy for other data types. I've also coded much of this before in test branches, so it can be implemented fairly quickly.
from pydicom.
I disagree with the suggested changes. I think that PixelData
should continue to return the raw bytes. This would be 1. consistent with previous behavior and 2. consistent with how the attr access works for all other dicom keywords. Further the idea of transparently calling tostring
on the array when assigning is problematic. This works when creating uncompressed dicoms, but currently I am creating some compressed dicoms. this involves me creating the compressed binary representation and then assigning it to PixelData
.
Further the "what" that is returned when accessing either of pixel_array
or your new PixelData
still seems somewhat ambiguous. There is the question as to whether you return the LUT value, the YBR value or the RGB value (see #273 and #263).
from pydicom.
I think having variables that are named clearly based on what they return makes a lot of sense. pixel_data, as a naive person like me would understand it, is the data from the pixels in some format I can work pixel_bytes is the same data in bytes. I can think of use cases for wanting both -
the one that is strongest for my groups are just wanting to extract pixel data for image processing, in which case the numpy array is much preferable. I'm not sure about using tostring (when moving from bytes to string I usually use decode('utf0-8') or something like that, but I suspect this is different).
from pydicom.
I think that PixelData should continue to return the raw bytes. This would be 1. consistent with previous behavior and 2. consistent with how the attr access works for all other dicom keywords
Where are raw bytes returned for other DICOM keywords? Binary numbers, for example, are changed to int, float, etc. Multi-valued items are split by the slash and returned as python data types. RawDataElement's carry the raw bytes, but only until they are accessed, then they are converted.
There is the question as to whether you return the LUT value, the YBR value or the RGB value (see #273 and #263).
Yes, this is an issue. But in any case we have a numpy array to return, whether it is called PixelData
or pixel_array
. Similarly if the user wants to modify it and write back to file, then we need a way to convert them to some binary representation. In any case, with my proposal, users would still be able to get the raw bytes (through pixel_bytes
), and are able to write a bytes array to PixelData to have that control if they needed it.
from pydicom.
I don't love the idea that depending on whether it is bytes or an array that is assigned to PixelData
, the behavior is different.
from pydicom.
There is some concern there, I suppose, but in line with python's 'We are all consenting adults' philosophy, I see it that if someone assigns bytes, then they have full control, we write what they set. If they assign a numpy array, we are going to convert it to bytes when it is written. If that isn't what they wanted then they must have assigned the wrong array anyway, or have not set the other image-related data elements for resolution, transfer syntax, etc. correctly.
One concern is that if the user sets PixelData
with bytes, then accesses the value, it will trigger a conversion to a numpy array, which will depend on the other data elements. But we could also use @cancan101's convenience function from a comment further up this thread:
dataset.set_image(pixel_array, rows, cols, transfer_syntax,...)
to help guide that down the right path. Maybe that should be the way that is documented as the 'correct' way to set a pixel array.
from pydicom.
I would also prefer not to break existing code (meaning mine). I am used to the pixel_array/PixelData duality and I don't find it any worse than the horror that is character encodings in python 2.7. I like the idea that pydicom can read any tag right out of the box without needing numpy/scipy/pillow/etc.
Any change that breaks client code is a big step.
from pydicom.
I appreciate everyone's comments. I'm certainly willing to keep things as they are, which is obviously much easier.
Interestingly, though, I just tried something:
ds = pydicom.read_file("CT_small.dcm")
pix = ds.pixel_array
ds.PixelData = pix
ds.save_as("del_me.dcm")
And compared the output file with the original. Binary identical. So it seems that we can already set the PixelData to a numpy array and it seems to work fine. Reading on python's write() method, it can accept any "bytes-like object" supporting the 'Buffer Protocol' and a C-contiguous buffer. numpy seems to do C order by default, but there is also a ascontiguousarray()
method, which makes me think that my example got lucky, and for larger arrays that are not contiguous in memory, it might not work. Could try some kind of slicing example with a step, I guess, to try to show that. Even if it were needed, ascontiguousarray()
could be called on writing the file.
Meanwhile, numpy's tostring()
doc says it is a compatibility alias for tobytes()
. It would make a lot of sense for us to adopt the latter for all future documentation.
Anyway, I'm wondering about putting together a dev branch to at least get rid of the 'tostring' requirement, let people run it and see if it breaks backwards compatibility. I think it can be done so that it doesn't. And I think it can be done so that numpy is only required if the bytes are accessed. As I said, I'm okay to drop this, but v1.0 is our best time to make a change like this so I want to make sure before I drop it.
from pydicom.
@darcymason - is this still something that needs to be done, or is the current state sufficient? I got a bit lost in the discussion (vs the changes made lately in the implementation), not sure if this is still a valid issue.
from pydicom.
is this still something that needs to be done, or is the current state sufficient
Hmmm, I'd like to think about it a bit longer. Re-reading the discussion, my take-home is that at least part of this can be done without breaking backwards compatibility -- for example, assigning a numpy array to PixelData can just be handled the way the user currently does -- by converting it via tobytes()
. That could save one step that is a little confusing for newcomers.
from pydicom.
I would vote to avoid that sort of magic and keep pixeldata as the bytes for assignment and reading and use pixel array for handling numpy arrays.
from pydicom.
I still dream of resolving this but have pushed back to v3.0. Reading through quickly again, there was lots of concern about breaking code, but I believe my branch code was showing that it can be done without breaking. But assigning v3.0 just in case...
from pydicom.
Related Issues (20)
- Cannot control/except processing tag error in latest version HOT 6
- Write PixelData to DICOM Dataset on disk without loading the PixelData into memory HOT 6
- Update concept dicts HOT 1
- Private tag sequence reads as 'UN' array HOT 4
- Non-relative patterns are unsupported HOT 14
- Switch FileSet's temporary file names over to using UUIDs HOT 5
- Setting the mode on JpegImageFile fails with Pillow 10.1.0 HOT 2
- dataset.filename is a BytesIO for Deflated Explicit VR Little Endian objects, breaking codify and other uses HOT 5
- (7FE0,0010) Pixel Data has an undefined length indicating that it's compressed, but the data isn't encapsulated as required HOT 18
- Dataset encoded incorrectly after change from explicit to implicit VR HOT 5
- Add FileMetaDataset.parent HOT 3
- defer_size is not having any performance impact with many tags HOT 3
- FileSet and DICOMDIR usage documentation HOT 5
- FileSet.__str__ add `SeriesDescription` if present
- Documentation search is broken. HOT 2
- Unexpected result from encapsulate_extended if non-even length frame
- convert_color_space should return copy of array HOT 4
- RGB dicoms "AttributeError: can't set attribute" after upgrading pillow to 10.1.0 HOT 1
- Change Python formatting: black → ruff
- Intermittent test failures HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pydicom.