Code Monkey home page Code Monkey logo

dicognito's Introduction

Dicognito logo

Dicognito is a Python module and command-line utility that anonymizes DICOM files.

Use it to anonymize one or more DICOM files belonging to one or any number of patients. Objects will remain grouped in their original patients, studies, and series.

Anonymization causes significant elements, such as identifiers, names, and addresses, to be replaced by new values. Dates and times will be shifted into the past, but their order will remain consistent within and across the files.

The package is available on pypi and can be installed from the command line by typing

pip install dicognito

Anonymizing from the command line

Once installed, a dicognito command will be added to your Python scripts directory. You can run it on entire filesystem trees or a collection of files specified by glob like so:

# Recurse down the filesystem, anonymizing all found DICOM files.
# Anonymized files will be placed in out-dir, named by new SOP
# instance UID.
dicognito --output-directory out-dir .

# Anonymize all files in the current directory with the dcm extension
# (-o is an alias for --output-directory).
dicognito -o out-dir *.dcm

# Anonymize all files in the current directory with the dcm extension
# but overwrite the original files.
# Note: repeatedly anonymizing the same files will cause date elements
# to  move farther into the past.
dicognito --in-place *.dcm

Get more help via dicognito --help.

Anonymizing from within Python

To anonymize a bunch of DICOM objects from within a Python program, import the objects using pydicom and use the Anonymizer class:

import pydicom
import dicognito.anonymizer

anonymizer = dicognito.anonymizer.Anonymizer()

for original_filename in ("original1.dcm", "original2.dcm"):
    with pydicom.dcmread(original_filename) as dataset:
        anonymizer.anonymize(dataset)
        dataset.save_as("clean-" + original_filename)

Use a single Anonymizer on datasets that might be part of the same series, or the identifiers will not be consistent across objects.

Additional (even custom) element handlers can be added to the Anonymizer via add_element_handler to augment or override builtin behavior.

Exactly what does dicognito do?

Using the default settings, dicognito will

  • Add "DICOGNITO" to DeidentificationMethod
  • Remove BranchOfService
  • Remove MedicalRecordLocator
  • Remove MilitaryRank
  • Remove Occupation
  • Remove PatientInsurancePlanCodeSequence
  • Remove PatientReligiousPreference
  • Remove PatientTelecomInformation
  • Remove PatientTelephoneNumbers
  • Remove ReferencedPatientPhotoSequence
  • Remove ResponsibleOrganization
  • Replace AccessionNumber with anonymized values
  • Replace CountryOfResidence with anonymized values
  • Replace CurrentPatientLocation with ""
  • Replace FillerOrderNumberImagingServiceRequest with anonymized values
  • Replace FillerOrderNumberImagingServiceRequestRetired with anonymized values
  • Replace FillerOrderNumberProcedure with anonymized values
  • Replace InstitutionAddress with anonymized values (only if replacing matching InstitutionName element)
  • Replace InstitutionName with anonymized values
  • Replace InstitutionalDepartmentName with "RADIOLOGY"
  • Replace IssuerOfPatientID with "DICOGNITO"
  • Replace OtherPatientIDs with anonymized values
  • Replace PatientAddress with anonymized values
  • Replace PatientID with anonymized values
  • Replace PerformedProcedureStepID with anonymized values
  • Replace PlacerOrderNumberImagingServiceRequest with anonymized values
  • Replace PlacerOrderNumberImagingServiceRequestRetired with anonymized values
  • Replace PlacerOrderNumberProcedure with anonymized values
  • Replace RegionOfResidence with anonymized values
  • Replace RequestedProcedureID with anonymized values
  • Replace RequestingService with ""
  • Replace ScheduledProcedureStepID with anonymized values
  • Replace StationName with anonymized values
  • Replace StudyID with anonymized values
  • Replace all DA elements with anonymized values that precede the originals
  • Replace all DT elements with anonymized values that precede the originals
  • Replace all PN elements with anonymized values
  • Replace all TM elements with anonymized values that precede the originals (only if replacing matching DA element)
  • Replace all UI elements with anonymized values
  • Replace private "MITRA LINKED ATTRIBUTES 1.0" element "Global Patient ID" with anonymized values
  • Set PatientIdentityRemoved to "YES" if BurnedInAnnotation is "NO"

Logo: Remixed from Radiology by priyanka and Incognito by d͡ʒɛrmi Good from the Noun Project.

dicognito's People

Contributors

blairconrad avatar justineclin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dicognito's Issues

InstitutionName collisions break tests too often

InstitutionName is currently picked just from the InstututionAddress with "CLINIC" appended. So there's essentially a 1 in 40 chance there will be a collision, and we run a lot of tests. I'm seeing failures.
Either make the name more uniquey or stop testing it for uniqueness. Maybe just test to see that it's the same as in the address.

Add option to anonymize directory trees

Dicom store tools generally store everything in the current directory and subdirectories, so we should be able to anonymize everything in the subfolders too. Maybe this should include the 'overwrite' flag by default (Issue #21)

Anonymizing according to DICOM Standard

Hey guys, I am currently working on an anonymizing tool for dicoms in typescript and am really appreciating your great work. As I was thinking about what and how to anonmyize i found this table that defines a standardized way to anonymize dicoms and was wondering: how did you define what tags you wanted to alter with dicognito and in which way? Did you follow NEMA guidelines in some way?

Add option to add a prefix or suffix for some IDs

It would be useful for searching on the PACS system if all the Patient IDs and AccessionNumbers, and probably StudyIDs had a prefix or suffix that was supplied when running dicognito.

I'm unsure about a separator character between the prefix/suffix and the rest of the ID. I think I'm tending towards 'leave it off' as it could be passed in as part of the prefix/suffix. Something like:

dicognito -prefix "PD-" some.dcm

Anonymize files in place instead of creating new ones

Sometimes I want to anonymize a whole directory, throwing out what was there. This makes it easier to then store everything in this directory without worrying about file names, and it gets rid of possible confidential data from your system.

Rename salt to seed

From the outside, it should be billed as a seed for randomness, even though inside Randomizer, it's used as a salt when hashing values.

Add API docs

They are woefully lacking, and it would be nice to upgrade them to just "insufficient".

Add -taco option

When I run dicognito with the -taco (or -:taco:) option, it should provide me with a delicious free taco.

Acceptable alternative: -sandwich

Support Python 3

Inspired by #30.

We should support at least the latest minor, but as many as are practical, I suppose.
If it's possible to support earlier versions of Python 2 while we're at it, why not?

Consider warning if Burned In Annotation is YES

If Burned In Annation is "YES", our anonymization won't be sufficient.

We could emit a WARN level log message to indicate that there are images with burned-in demographics.

If the attribute is absent, our anonymization still might not be sufficient, so we have to decide what to do about that. I suspect it's often absent, so we might want to take no action.

We could add a flag so users can customize the behaviour. For example

  • never warn
  • warn if Burned In Annotation is "YES"
  • warn unless Burned In Annotation is "NO"

Anonymize Mitra Global Patient ID

e.g.

0031 0011       28 | private_creator                      | LO |   1 | "MITRA LINKED ATTRIBUTES 1.0"
0031 1120       10 | Unknown element                      | Unkn |  ?  | "GPIAPCB136"

Fails to anonymize LEI file with Mitra global patient ID when using pydicom 2.2.x

module version
platform Windows-10-10.0.18363-SP0
Python 3.10.0 (tags/v3.10.0:b494f59, Oct 4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)]
dicognito 0.12.0
pydicom 2.2.2

Error is like

Traceback (most recent call last):
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 525, in _convert_value
    val.append
AttributeError: 'str' object has no attribute 'append'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\valuerep.py", line 752, in __new__
    newval = super().__new__(cls, val)
ValueError: invalid literal for int() with base 10: 'K5JR4D4YWZN7'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\tag.py", line 28, in tag_in_exception
    yield
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataset.py", line 2382, in walk
    callback(self, data_element)  # self = this Dataset
  File "D:\Sandbox\dicognito\src\dicognito\anonymizer.py", line 127, in _anonymize_element
    if handler(dataset, data_element):
  File "D:\Sandbox\dicognito\src\dicognito\idanonymizer.py", line 63, in __call__
    if self._anonymize_mitra_global_patient_id(dataset, data_element):
  File "D:\Sandbox\dicognito\src\dicognito\idanonymizer.py", line 78, in _anonymize_mitra_global_patient_id
    self._replace_id(data_element)
  File "D:\Sandbox\dicognito\src\dicognito\idanonymizer.py", line 87, in _replace_id
    data_element.value = self._new_id(data_element.value)
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 463, in value
    self._value = self._convert_value(val)
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 527, in _convert_value
    return self._convert(val)
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 541, in _convert
    return pydicom.valuerep.IS(val)
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\valuerep.py", line 755, in __new__
    newval = super().__new__(cls, float(val))
ValueError: could not convert string to float: 'K5JR4D4YWZN7'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\blairyat\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\blairyat\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\Sandbox\dicognito\src\dicognito\__main__.py", line 206, in <module>
    main()
  File "D:\Sandbox\dicognito\src\dicognito\__main__.py", line 182, in main
    anonymizer.anonymize(dataset)
  File "D:\Sandbox\dicognito\src\dicognito\anonymizer.py", line 121, in anonymize
    dataset.walk(self._anonymize_element)
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataset.py", line 2380, in walk
    with tag_in_exception(tag):
  File "C:\Users\blairyat\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\tag.py", line 32, in tag_in_exception
    raise type(exc)(msg) from exc
ValueError: With tag (0031, 1020) got exception: could not convert string to float: 'K5JR4D4YWZN7'
Traceback (most recent call last):
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 525, in _convert_value
    val.append
AttributeError: 'str' object has no attribute 'append'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\valuerep.py", line 752, in __new__
    newval = super().__new__(cls, val)
ValueError: invalid literal for int() with base 10: 'K5JR4D4YWZN7'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\tag.py", line 28, in tag_in_exception
    yield
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataset.py", line 2382, in walk
    callback(self, data_element)  # self = this Dataset
  File "D:\Sandbox\dicognito\src\dicognito\anonymizer.py", line 127, in _anonymize_element
    if handler(dataset, data_element):
  File "D:\Sandbox\dicognito\src\dicognito\idanonymizer.py", line 63, in __call__
    if self._anonymize_mitra_global_patient_id(dataset, data_element):
  File "D:\Sandbox\dicognito\src\dicognito\idanonymizer.py", line 78, in _anonymize_mitra_global_patient_id
    self._replace_id(data_element)
  File "D:\Sandbox\dicognito\src\dicognito\idanonymizer.py", line 87, in _replace_id
    data_element.value = self._new_id(data_element.value)
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 463, in value
    self._value = self._convert_value(val)
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 527, in _convert_value
    return self._convert(val)
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 541, in _convert
    return pydicom.valuerep.IS(val)
  File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\valuerep.py", line 755, in __new__
    newval = super().__new__(cls, float(val))
ValueError: could not convert string to float: 'K5JR4D4YWZN7'

Same patient names anonymize differently when formatted differently

I had a study with multiple DICOM files. In some of the files the patient name was like 'LAST^FIRST^MIDDLE' and in others 'LAST^FIRST^MIDDLE^'. These are obviously the same patient name, but because of the trailing '^' dicognito assumed they were different and anonymized them differently.

Display some basic info after anonymizing file

The first thing I generally want to know after anonymizing a file is how to find it. Consider adding output like this:

dicognito some*.dcm

Created 'anon-some1.dcm' with PatientID: ABCD1234 and AccessionNumber: EFAB5678
Created 'anon-some2.dcm' with PatientID: ABCD1234 and AccessionNumber: CDEF9012
...

This might be too verbose when running against a large dataset, so you could have a -q flag to suppress it.

Explicitly exclude DICOM tags from anonymization

Is there a way to explicitly exclude DICOM tags from anonymization? We use dicognito to anonymize exams for a trial, which works great, but we must retain some tags like StudyDate, SeriesDate or AcquisitionDate. Currently, I reset them programmatically after the anoynomization, but I wonder if there is an option to leave those tags untouched by dicognito.

Unable to anonymize dataset with encapsulated pixel data that contains embedded sequence delimiter

Describe the bug
When anonymizing a DICOM file that includes encapsulated pixel data as described in A.4 Transfer Syntaxes For Encapsulation of Encoded Pixel Data, that is

  1. (7FE0,0010) has VR OB with length 0xFFFFFFFF and
  2. there's at least one data stream fragment beginning with a tag (FFFE,E000) with an explicit length

if the data fragment contains 4 consecutive bytes FE FF DD E0, which would be how the terminating Sequence Delimiter Item tag would appear, the fileutil.read_undefined_length_value method considers the fragment to end at the delimiter

After the read, the pixel data element's value's length is short, cut off at the point the Sequence Delimiter Item appears, with assumed additional tags following. Then s the anonymization attempts to access the dataset via walk, we see the following error:

Expected 4 zero bytes after undefined length delimiter at pos 0bf4
Traceback (most recent call last):
  File "D:\Sandbox\pydicom\pydicom\dataelem.py", line 735, in DataElement_from_raw
    value = convert_value(VR, raw, encoding)
  File "D:\Sandbox\pydicom\pydicom\values.py", line 623, in convert_value
    raise NotImplementedError(message)
NotImplementedError: Unknown Value Representation '0x01 0x00'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Sandbox\pydicom\pydicom\tag.py", line 27, in tag_in_exception
    yield
  File "D:\Sandbox\pydicom\pydicom\dataset.py", line 2032, in walk
    data_element = self[tag]
  File "D:\Sandbox\pydicom\pydicom\dataset.py", line 861, in __getitem__
    self[tag] = DataElement_from_raw(data_elem, character_set)
  File "D:\Sandbox\pydicom\pydicom\dataelem.py", line 737, in DataElement_from_raw
    raise NotImplementedError("{0:s} in tag {1!r}".format(str(e), raw.tag))
NotImplementedError: Unknown Value Representation '0x01 0x00' in tag (0000, 0000)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files (x86)\python38-32\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\program files (x86)\python38-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Sandbox\dicognito\src\dicognito\__main__.py", line 142, in <module>
    main()
  File "D:\Sandbox\dicognito\src\dicognito\__main__.py", line 119, in main
    anonymizer.anonymize(dataset)
  File "D:\Sandbox\dicognito\src\dicognito\anonymizer.py", line 119, in anonymize
    dataset.walk(self._anonymize_element)
  File "D:\Sandbox\pydicom\pydicom\dataset.py", line 2039, in walk
    dataset.walk(callback)
  File "c:\program files (x86)\python38-32\lib\contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "D:\Sandbox\pydicom\pydicom\tag.py", line 34, in tag_in_exception
    raise type(ex)(msg)
NotImplementedError: With tag (0000, 0000) got exception: Unknown Value Representation '0x01 0x00' in tag (0000, 0000)
Traceback (most recent call last):
  File "D:\Sandbox\pydicom\pydicom\dataelem.py", line 735, in DataElement_from_raw
    value = convert_value(VR, raw, encoding)
  File "D:\Sandbox\pydicom\pydicom\values.py", line 623, in convert_value
    raise NotImplementedError(message)
NotImplementedError: Unknown Value Representation '0x01 0x00'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Sandbox\pydicom\pydicom\tag.py", line 27, in tag_in_exception
    yield
  File "D:\Sandbox\pydicom\pydicom\dataset.py", line 2032, in walk
    data_element = self[tag]
  File "D:\Sandbox\pydicom\pydicom\dataset.py", line 861, in __getitem__
    self[tag] = DataElement_from_raw(data_elem, character_set)
  File "D:\Sandbox\pydicom\pydicom\dataelem.py", line 737, in DataElement_from_raw
    raise NotImplementedError("{0:s} in tag {1!r}".format(str(e), raw.tag))
NotImplementedError: Unknown Value Representation '0x01 0x00' in tag (0000, 0000)```

Expected behavior
The dataset would be anonymized.

Steps To Reproduce
Discovered when opening a patient's Video Endoscopic Image (1.2.840.10008.5.1.4.1.1.77.1.1.1), which I can't share because of PHI concerns, and also it's quite large. See JPEG2000-embedded-sequence-delimiter.zip for a constructed dataset. After extracting, attempt to anonymize it.

Environment

module version
platform Windows-10-10.0.18362-SP0
Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:20:19) [MSC v.1925 32 bit (Intel)]
pydicom 2.1.0.dev0 (b9fb05c177b685bf683f7f57b2d57374eb7d882d)
dicognito any version, including current master (7e9b068)

Cause

This is caused by pydicom/pydicom#1140.

Anonymizing dataset with StationName but no Modality fails

Anonymizing a dataset with a supplied StationName attribute but no sibling Modality fails with this output:

Traceback (most recent call last):
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
    yield
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2040, in walk
    callback(self, data_element)  # self = this Dataset
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\anonymizer.py", line 120, in _anonymize_element
    if handler(dataset, data_element):
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 48, in __call__
    element_anonymizer(dataset, data_element)
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 67, in anonymize_station_name
    data_element.value = dataset.Modality + "01"
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 778, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'Modality'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
    yield
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2046, in walk
    dataset.walk(callback)
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2046, in walk
    dataset.walk(callback)
  File "c:\program files (x86)\python38-32\lib\contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 34, in tag_in_exception
    raise type(ex)(msg)
AttributeError: With tag (0008, 1010) got exception: 'Dataset' object has no attribute 'Modality'
Traceback (most recent call last):
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
    yield
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2040, in walk
    callback(self, data_element)  # self = this Dataset
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\anonymizer.py", line 120, in _anonymize_element
    if handler(dataset, data_element):
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 48, in __call__
    element_anonymizer(dataset, data_element)
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 67, in anonymize_station_name
    data_element.value = dataset.Modality + "01"
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 778, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'Modality'

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "c:\program files (x86)\python38-32\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\program files (x86)\python38-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Program Files (x86)\Python38-32\Scripts\dicognito.exe\__main__.py", line 7, in <module>
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\__main__.py", line 119, in main
    anonymizer.anonymize(dataset)
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\anonymizer.py", line 114, in anonymize
    dataset.walk(self._anonymize_element)
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2046, in walk
    dataset.walk(callback)
  File "c:\program files (x86)\python38-32\lib\contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 34, in tag_in_exception
    raise type(ex)(msg)
AttributeError: With tag (0018, 9506) got exception: With tag (0008, 1010) got exception: 'Dataset' object has no attribute 'Modality'
Traceback (most recent call last):
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
    yield
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2040, in walk
    callback(self, data_element)  # self = this Dataset
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\anonymizer.py", line 120, in _anonymize_element
    if handler(dataset, data_element):
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 48, in __call__
    element_anonymizer(dataset, data_element)
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 67, in anonymize_station_name
    data_element.value = dataset.Modality + "01"
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 778, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'Modality'
Traceback (most recent call last):
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
    yield
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2040, in walk
    callback(self, data_element)  # self = this Dataset
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\anonymizer.py", line 120, in _anonymize_element
    if handler(dataset, data_element):
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 48, in __call__
    element_anonymizer(dataset, data_element)
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 67, in anonymize_station_name
    data_element.value = dataset.Modality + "01"
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 778, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'Modality'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
    yield
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2046, in walk
    dataset.walk(callback)
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2046, in walk
    dataset.walk(callback)
  File "c:\program files (x86)\python38-32\lib\contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 34, in tag_in_exception
    raise type(ex)(msg)
AttributeError: With tag (0008, 1010) got exception: 'Dataset' object has no attribute 'Modality'
Traceback (most recent call last):
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
    yield
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2040, in walk
    callback(self, data_element)  # self = this Dataset
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\anonymizer.py", line 120, in _anonymize_element
    if handler(dataset, data_element):
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 48, in __call__
    element_anonymizer(dataset, data_element)
  File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 67, in anonymize_station_name
    data_element.value = dataset.Modality + "01"
  File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 778, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'Modality'

How to map original to anonymized data in recursive mode?

Many thanks for this great tool!

I need to map the original patient name to the anonymized name in order to include clinical data in my analysis. I am running in recursive mode with one input directory including exam data from multiple patients. The output Accession Number Patient ID Patient Name only includes the anonymized data. Is there an existing way to obtain this mapping when running in recursive mode with multiple patients per session? I'm happy to implement this and open a PR if not, I just wanted to check to make sure I'm not missing anything obvious. Thanks for your help!

Private creator 0031,0020 breaks anonymization

Anonymizing a dataset containing a value for 0031,0020, which would typically be a Private Creator Data Element, results in dicognito erroring out with

Error occurred while converting <_io.BytesIO object at 0x0000022DD4CEA160>. Aborting.
Traceback (most recent call last):
  File "E:\Dev\dicognito\.venv\dicognito\Lib\site-packages\pydicom\tag.py", line 28, in tag_in_exception
    yield
  File "E:\Dev\dicognito\.venv\dicognito\Lib\site-packages\pydicom\dataset.py", line 2474, in walk
    callback(self, data_element)  # self = this Dataset
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Dev\dicognito\src\dicognito\anonymizer.py", line 151, in _anonymize_element
    if handler(dataset, data_element):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Dev\dicognito\src\dicognito\idanonymizer.py", line 67, in __call__
    if self._anonymize_mitra_global_patient_id(dataset, data_element):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Dev\dicognito\src\dicognito\idanonymizer.py", line 97, in _anonymize_mitra_global_patient_id
    dataset[(mitra_linked_attributes_group << 16) + private_tag_group].value
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Dev\dicognito\.venv\dicognito\Lib\site-packages\pydicom\dataset.py", line 988, in __getitem__
    elem = self._dict[tag]
           ~~~~~~~~~~^^^^^
KeyError: (0031, 0000)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "E:\Dev\dicognito\src\dicognito\__main__.py", line 76, in main
    anonymizer.anonymize(dataset)
  File "E:\Dev\dicognito\src\dicognito\anonymizer.py", line 134, in anonymize
    dataset.walk(self._anonymize_element)
  File "E:\Dev\dicognito\.venv\dicognito\Lib\site-packages\pydicom\dataset.py", line 2472, in walk
    with tag_in_exception(tag):
  File "D:\Users\amidu\AppData\Local\Programs\Python\Python311\Lib\contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "E:\Dev\dicognito\.venv\dicognito\Lib\site-packages\pydicom\tag.py", line 32, in tag_in_exception
    raise type(exc)(msg) from exc
KeyError: 'With tag (0031, 0020) got exception: (0031, 0000)\nTraceback (most recent call last):\n  File "E:\\Dev\\dicognito\\.venv\\dicognito\\Lib\\site-packages\\pydicom\\tag.py", line 28, in tag_in_exception\n    yield\n  File "E:\\Dev\\dicognito\\.venv\\dicognito\\Lib\\site-packages\\pydicom\\dataset.py", line 2474, in walk\n    callback(self, data_element)  # self = this Dataset\n    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "E:\\Dev\\dicognito\\src\\dicognito\\anonymizer.py", line 151, in _anonymize_element\n    if handler(dataset, data_element):\n       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "E:\\Dev\\dicognito\\src\\dicognito\\idanonymizer.py", line 67, in __call__\n    if self._anonymize_mitra_global_patient_id(dataset, data_element):\n       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "E:\\Dev\\dicognito\\src\\dicognito\\idanonymizer.py", line 97, in _anonymize_mitra_global_patient_id\n    dataset[(mitra_linked_attributes_group << 16) + private_tag_group].value\n    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "E:\\Dev\\dicognito\\.venv\\dicognito\\Lib\\site-packages\\pydicom\\dataset.py", line 988, in __getitem__\n    elem = self._dict[tag]\n           ~~~~~~~~~~^^^^^\nKeyError: (0031, 0000)\n'

It shouldn't error out.

Support fully-reproducible deidentification

When data on a given cohort is accumulated over long periods, users may wish to run dicognito in multiple passes in order to perform preliminary analyses on the partial dataset. It would be convenient to be able to checkpoint the Anonymizer state so that patients seen in previous dicognito runs over the same cohort would have matching anonymized IDs.

Two options occur to me:

  1. Use the anonymization map proposed in #124 as a simple checkpoint. I haven't looked at the code yet, so I'm not sure exactly what drawbacks this would have. I think there are some guarantees about the order of dates that might be broken in this case.
  2. Serialize everything in the Anonymizer and save it to a pickle file. I think this would make starting from a checkpoint 'equivalent' to running in a single pass. It would have the disadvantage of adding another file with sensitive data to manage.

datetime loading and storage

Hey @blairconrad,
I noticed, that you're loading in your datetimeanonymizer.py in both the methods _anonymize_date_and_time and _anonymize_datetime elements of a MultiValue into a list:
if isinstance(data_element.value, pydicom.multival.MultiValue):
datetimes = list([v for v in data_element.value])
but isn't the value of a MultiValue already a list?

later you store the altered values back concatenating them as a string:
new_dates_string = "\\".join(new_dates)
data_element.value = new_dates_string (line 98-100)

data_element.value = "\\".join(new_datetimes) (line 121)

but why? Shouldn't they be stored back as a list?

Anonymization failure cites BytesIO object instead of input filename

Anonymizing a Deflated Explicit VR Little Endian (1.2.840.10008.1.2.1.99) file results in dicognito erroring out with

Error occurred while converting <_io.BytesIO object at 0x0000022DD4CEA160>. Aborting.
Traceback (most recent call last):
  File "E:\Dev\dicognito\.venv\dicognito\Lib\site-packages\pydicom\tag.py", line 28, in tag_in_exception
    yield
…

The "<_io.BytesIO object at 0x0000022DD4CEA160>" should be the input filename.

Occurs whether using --output-directory or --in-place.

Update setup.py

It's lying (at least) about us (only) supporting Python 3

Issuer of Patient ID added when there wasn't one

While possibly technically not incorrect behaviour, if the new issuer is unknown to the receiving system, it can cause problems with study validation. For now, do not add an issuer if there wasn't one.

Fails to anonymize object with Issue Date of Imaging Service Request

… because we look for "Date" at the beginning or end of the element name:

UnboundLocalError: With tag (200b, 102b) got exception: local variable 'time_name' referenced before assignment
Traceback (most recent call last):
  File "c:\program files\python37\lib\site-packages\pydicom\tag.py", line 30, in tag_in_exception
    yield
  File "c:\program files\python37\lib\site-packages\pydicom\dataset.py", line 1773, in walk
    callback(self, data_element)  # self = this Dataset
  File "c:\program files\python37\lib\site-packages\dicognito\anonymizer.py", line 120, in _anonymize_element
    if handler(dataset, data_element):
  File "c:\program files\python37\lib\site-packages\dicognito\datetimeanonymizer.py", line 43, in __call__
    self._anonymize_date_and_time(dataset, data_element)
  File "c:\program files\python37\lib\site-packages\dicognito\datetimeanonymizer.py", line 61, in _anonymize_date_and_time
    if time_name in dataset:
UnboundLocalError: local variable 'time_name' referenced before assignment

Add option to write anonymized files to another directory

Often we want to preserve original files. One option is to copy them away and then anonymize, but that can be tedious. An alternative is a mode where the dicognito command line tool will write anonymized files to another directory.

Proposal:

  • add an -o/--output-directory option to specify an output directory
  • if the directory does not exist, it will be created
  • if it does exist, files will be added to it, in a flat list (no subdirectories, even if the input files are in a complicated hierarchy) named by their (new) SOP Instance UID plus .dcm
  • existing files will be overwritten in the case of (very unlikely) name collision
  • other files already present in the directory will not be touched

Fails on multi-valued dates and times

Example:

(0018, 1200) Date of Last Calibration            DA: ['19900101', '19900101']
(0018, 1201) Time of Last Calibration            TM: ['010000.000000', '010000.000000']

gives

Traceback (most recent call last):
  File "c:\program files\python37\lib\site-packages\pydicom\tag.py", line 30, in tag_in_exception
    yield
  File "c:\program files\python37\lib\site-packages\pydicom\dataset.py", line 1354, in walk
    callback(self, data_element)  # self = this Dataset
  File "c:\program files\python37\lib\site-packages\dicognito\anonymizer.py", line 114, in _anonymize_element
    if handler(dataset, data_element):
  File "c:\program files\python37\lib\site-packages\dicognito\datetimeanonymizer.py", line 42, in __call__
    self._anonymize_date_and_time(dataset, data_element)
  File "c:\program files\python37\lib\site-packages\dicognito\datetimeanonymizer.py", line 51, in _anonymize_date_and_time
    old_date = datetime.datetime.strptime(date_value, date_format).date()
TypeError: strptime() argument 1 must be str, not MultiValue

Date/time offset is not always the same for a given seed

When anonymizing using a set seed, the date (time) offsets will sometimes vary.
To see this, anonymize an object a few times with the same seed.
This is the reason that test_multivalued_date_and_time_pair_gets_anonymized fails from time to time.

Command-line anonymizer fails if an object doesn't have an AccessionNumber

Try to anonymize (from the command line) a file that doesn't have an accession number. The anonymization works, but printing the summary fails:

λ  dicognito .                                                                                   
Traceback (most recent call last):                                                               
  File "C:\Program Files\Python37\Scripts\dicognito-script.py", line 11, in <module>             
    load_entry_point('dicognito==0.7.0', 'console_scripts', 'dicognito')()                       
  File "c:\program files\python37\lib\site-packages\dicognito\__main__.py", line 102, in main    
    ConvertedStudy(dataset.AccessionNumber, dataset.PatientID, str(dataset.PatientName))         
  File "c:\program files\python37\lib\site-packages\pydicom\dataset.py", line 556, in __getattr__
    return super(Dataset, self).__getattribute__(name)                                           
AttributeError: 'FileDataset' object has no attribute 'AccessionNumber'                          

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.