Code Monkey home page Code Monkey logo

self_explainable_transformerflow's Introduction

Self Explainable Transformers for Flow Cytometry Cell Classification

Official implementation of our work: Towards Self-Explainable Transformers for Cell Classification in Flow Cytometry Data by Florian Kowarsch, Lisa Weijler, Matthias Wödlinger, Michael Reiter, Margarita Maurer-Granofszky, Angela Schumich, Elisa O. Sajaroff, Stefanie Groeneveld-Krentz, Jorge G. Rossi, Leonid Karawajew, Richard Ratei, Michael N. Dworzak

Abstract

Decisions of automated systems in healthcare can have far-reaching consequences such as delayed or incorrect treatment and thus must be explainable and comprehensible for medical experts. This also applies to the field of automated Flow Cytometry (FCM) data analysis. In leukemic cancer therapy, FCM samples are obtained from the patient’s bone marrow to determine the number of remaining leukemic cells. In a manual process, called gating, medical experts draw several polygons among different cell populations on 2D plots in order to hierarchically sub-select and track down cancer cell populations in an FCM sample. Several approaches exist that aim at automating this task. However, predictions of state-of-the-art models for automatic cell-wise classification act as black-boxes and lack the explainability of human-created gating hierarchies. We propose a novel transformer-based approach that classifies cells in FCM data by mimicking the decision process of medical experts. Our network considers all events of a sample at once and predicts the corresponding polygons of the gating hierarchy, thus, producing a verifiable visualization in the same way a human operator does. The proposed model has been evaluated on three publicly available datasets for acute lymphoblastic leukemia (ALL). In experimental comparison, it reaches state-of-the-art performance for automated blast cell identification while providing transparent results and explainable visualizations for human experts.

Installation

All decencies are provided in requirements.txt. Install with:

pip install -r requirements.txt

This repo needs flowmepy to be installed. flowmepy is a python package for fcm data loading. For more information see: https://pypi.org/project/flowmepy/ Install with:

pip install flowmepy

(If you run into issues with newer versions of dependencies check the requirements.txt file. It contains the environment package dependencies at the time of testing)

IMPORTANT: As of now the flowmepy package is only supported on windows. If you are running a unix based system and want to try out our method you will need to preload the data (for example to a pandas dataframe) on a windows machine and then adapt the lines in the code where the flowme python package is called. Simply load your preloaded event matrices (dataframes or csv) instead of the events = sample.events() lines and load your gate label matrices (dataframes or csv) instead of the lines where labels = sample.gate_labels(). Sorry for the inconvenience, we are working on a solution.

Usage

In order to perform the experiments from the paper the following steps must be followed:

  1. Create preprocessed cache files from FCM files (includes event data as well as polygons).
  2. Train model with the created cache files.
  3. Test a trained model

Creating Cache

createcache.py generates one cache file per FCM-sample which includes the necessary data to training the model. Preprocessing steps such as computing the convex hull for the specified Gate-Definition is applied as well as determining the ground truth class for every cell.

{
   "type_name": "src.datastructures.configs.cachedatacreationconfig.CacheDataCreationConfig",
   "output_location": "path to folder where cached files should be stored",
   "blacklist_path": "",   //optional text file that specfies FCM-files that should be skipped
   "ignore_blacklist": true,
   "outlier_handler_config": {
       "n_events_threshold": 300, // min number of events needed bevore outlier removal is executed
       "alpha": 0.00001 // alpha value for Mahalanobis outlier removal
   },
   "source_datasets": [] // list of datasets that should be used
   "gate_defintions": [] // definition of gates from which the convex hull should be created
}

Train Model

train.py serves as entrypoint for model training.

{
    "type_name": "src.datastructures.configs.trainconfig.TrainConfig",
    "name": "train_vie14_val_bln",
    "default_retrieve_options": {
        "shuffle": true,
        "use_convex_gates": true, //wheter actual human gt polygons or generated convex gates are used
        "filter_gate" : "Intact", //Gate after which events are considered in training
        "polygon_min": -0.1,
        "polygon_max": 1.7,
        "always_keep_blasts": true, //wheter blast should be favored when sampling events
        "gate_polygon_interpolation_length" : 120, // number of points per polygon that are interpolated
        "gate_polygon_seq_length": 20, //number of points per polygon
        "events_seq_length": 50000, //number of events per sample used for training
        "used_markers": [],      // names of used markers
        "used_gates": [],        // names of used gates
        "gate_definitions" : [], //used Gate Definitions
        "events_mean": [
            1.2207406759262085,
            1.245536208152771,
            1.413953185081482,
            0.9958911538124084,
            2.1471059322357178,
            2.0955066680908203,
            1.4734785556793213,
            0.42288827896118164,
            0.8758889436721802,
            2.472586154937744
        ],
        "events_sd": [
            0.3791336715221405,
            0.3249945342540741,
            0.3320983946323395,
            0.18722516298294067,
            0.35046276450157166,
            0.25932908058166504,
            0.4003131091594696,
            0.014336027204990387,
            0.5279378294944763,
            0.34515222907066345
        ],
        "augmentation_config": {
            "shift_propability": 0.7,
            "shift_percent": 0.25,
            "polygon_scale_range" : {
                "Syto" : 0.01,
                "Singlets" : 0.01,
                "Intact" : 0.05,
                "CD19" : 0.15,
                "Blasts_CD45CD10" : 0.3,
                "Blasts_CD20CD10" : 0.3,
                "Blasts_CD38CD10" : 0.3
            },
            "scale_propability" : 0.7,
            "scale_propability_2nd_marker" : 0.3
        }
    },
    "train_data":  {}, //dataset for training
    "validation_data": {}, //dataset for validation
    "model_storage": {
        "file_path": "./data/saved_models/train_vie14_val_bln",
        "load_stats_from_file": false,
        "gpu_name" :"cuda"
    },
    "train_params": {
        "learning_rate": 0.001,
        "weight_decay": 0.00000000000001,
        "validation_interval": 50,
        "n_training_epochs": 1500,
        "training_batchsize": 2,
        "clip_norm": 4.0,
        "random_seed": 42,
        "polygon_loss_weight": 1.0,
        "saving_interval": 50,
        "use_auxiliary_loss" : true
    },
    "model_factory": {
        "model_type": "src.model.FlowGATR.FlowGATR",
        "params_type": "src.datastructures.configs.modelparams.ModelParams",
        "params": {
            "dim_input": 10,
            "n_hidden_layers_ISAB": 2,
            "n_hidden_layers_decoder": 2,
            "n_obj_queries": 7,
            "points_per_query" : 5,
            "dim_latent": 36,
            "n_polygon_out": 20,
            "n_decoder_cross_att_heads": 6,
            "n_hidden_layers_polygon_out": 2,
            "n_perciever_blocks_decoder" : 4
        }
    },
    "wandb_config": {
        "entity": "your_wandb_username",
        "prj_name": "fcm-polygon-pred",
        "notes": "",
        "tags": [],
        "enabled": true //wheter training is logged to wandb or not
    },
    "gpu_name": "cuda",
    "n_workers": 0
}

Test Model

test.py allows to evaluate the performance of an already trained model on a given test set.

Data

The vie14, bln and bue data from our work can be downloaded from here: https://flowrepository.org/id/FR-FCM-ZYVT

Cite

If you use this project please consider citing our work

@inproceedings{kowarsch2022towards,
  title={Towards Self-explainable Transformers for Cell Classification in Flow Cytometry Data},
  author={Kowarsch, Florian and Weijler, Lisa and W{\"o}dlinger, Matthias and Reiter, Michael and Maurer-Granofszky, Margarita and Schumich, Angela and Sajaroff, Elisa O and Groeneveld-Krentz, Stefanie and Rossi, Jorge G and Karawajew, Leonid and others},
  booktitle={Interpretability of Machine Intelligence in Medical Image Computing: 5th International Workshop, iMIMIC 2022, Held in Conjunction with MICCAI 2022, Singapore, Singapore, September 22, 2022, Proceedings},
  pages={22--32},
  year={2022},
  organization={Springer}
}

self_explainable_transformerflow's People

Contributors

carnifexer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

zbayng1 tg-unit

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.