Code Monkey home page Code Monkey logo

staging-client-java's Introduction

staging-client-java

Quality Gate Status integration Maven Central

A cancer staging client library for Java applications.

Supported staging algorithms

Pediatric Data Collection System (PDCS or Pediatric)

Pediatric Data Collection System (Pediatric) is a set of three data items that describe how far a cancer has spread at the time of diagnosis for Pediatric cancers. PDCS can be collected for cases diagnosed in 2018 and later.

In each Pediatric schema, valid values, definitions, and registrar notes are provided for

  • Pediatric Primary Tumor
  • Pediatric Lymph Nodes
  • Pediatric Mets
  • Site-Specific Data Items (SSDIs)

For cancer cases diagnosed January 1, 2024 and later, the NCI SEER program will collect the Pediatric Data Collection System fields. The schemas have been developed to be compatible with the Toronto Staging v1.1 definitions.

To get started using the Pediatric staging algorithm, instantiate a Staging instance:

Staging staging=Staging.getInstance(PediatricDataProvider.getInstance(PediatricVersion.LATEST));

If a specific version is needed, the algorithm zip file can be downloaded and initialized using an ExternalStagingFileDataProvider.

Version Release Algorithm ZIP
1.1 11.2.2 pediatric-1.1.zip
1.0 11.0.0 pediatric-1.0.zip
0.5 10.2.0 toronto-0.5.zip
0.4 10.1.0 toronto-0.4.zip
0.3 10.0.0 toronto-0.3.zip
0.2 9.1 toronto-0.2.zip

EOD

Extent of Disease (EOD) is a set of three data items that describe how far a cancer has spread at the time of diagnosis. EOD 2018 is effective for cases diagnosed in 2018 and later.

In each EOD schema, valid values, definitions, and registrar notes are provided for

  • EOD Primary Tumor
  • EOD Lymph Nodes
  • EOD Mets
  • Summary Stage 2018
  • Site-Specific Data Items (SSDIs), including grade, pertinent to the schema

For cancer cases diagnosed January 1, 2018 and later, the NCI SEER program will collect Extent of Disease (EOD) revised for 2018 and Summary Stage 2018. The schemas have been developed to be compatible with the AJCC 8th Edition chapter definitions.

All the standard setting organizations will collect the predictive and prognostic factors through Site Specific Data Items (SSDIs). Unlike the SSFs, these data items have formats and code structures specific to the data item.

To get started using the EOD algorithm, instantiate a Staging instance:

Staging staging=Staging.getInstance(EodDataProvider.getInstance(EodVersion.LATEST));

If a specific version is needed, the algorithm zip file can be downloaded and initialized using an ExternalStagingFileDataProvider.

Version Release Algorithm ZIP
3.1 10.3.0 eod_public-3.1.zip
3.0 10.0.0 eod_public-3.0.zip
2.1 8.0 eod_public-2.1.zip

TNM

TNM is a widely accepted system of cancer staging. TNM stands for Tumor, Nodes, and Metastasis. T is assigned based on the extent of involvement at the primary tumor site, N for the extent of involvement in regional lymph nodes, and M for distant spread. Clinical TNM is assigned prior to treatment and pathologic TNM is assigned based on clinical information plus information from surgery. The clinical TNM and the pathologic TNM values are summarized as clinical stage group or pathologic stage group.

For each cancer site, or schema, valid values, definitions, and registrar notes are provided for clinical TNM and stage group, pathologic TNM and stage group, and relevant Site-Specific Factors (SSFs).

TNM categories, stage groups, and definitions are based on the Union for International Cancer Control (UICC) TNM 7th edition classification. UICC 7th edition and AJCC 7th edition TNM categories and stage groups are very similar; however, there are some differences.

For diagnosis years 2016-2017, SEER Summary Stage 2000 is required. SEER Summary Stage 2000 should be collected manually unless the registry is collecting Collaborative Stage, which would derive Summary Stage 2000.

To get started using the TNM algorithm, instantiate a Staging instance:

Staging staging=Staging.getInstance(TnmDataProvider.getInstance(TnmVersion.LATEST));

If a specific version is needed, the algorithm zip file can be downloaded and initialized using an ExternalStagingFileDataProvider.

Version Release Algorithm ZIP
2.0 10.0.0 tnm-2.0.zip
1.9 8.0 tnm-1.9.zip

Collaborative Staging

Collaborative Stage is a unified data collection system designed to provide a common data set to meet the needs of all three staging systems (TNM, SEER EOD, and SEER SS). It provides a comprehensive system to improve data quality by standardizing rules for timing, clinical and pathologic assessments, and compatibility across all the systems for all cancer sites.

To get started using the CS algorithm, instantiate a Staging instance:

Staging staging=Staging.getInstance(CsDataProvider.getInstance(CsVersion.LATEST));
Version Release Algorithm ZIP
02.05.50 10.0.0 cs-02.05.50.zip

Download

Java 8 is the minimum version required to use the library.

If you are interested in just the library without any bundled algorithm, it can be included with the following.

Maven

<dependency>
    <groupId>com.imsweb</groupId>
    <artifactId>staging-client-java</artifactId>
    <version>x.x.x</version>
</dependency>

Gradle

compile 'com.imsweb:staging-client-java:x.x.x'

If you are interested in a specific algorithm, you can include them using their specific artifact.

Maven

<dependency>
    <groupId>com.imsweb</groupId>
    <artifactId>staging-client-java-cs</artifactId>
    <version>x.x.x</version>
</dependency>
<dependency>
    <groupId>com.imsweb</groupId>
    <artifactId>staging-client-java-eod-public</artifactId>
    <version>x.x.x</version>
</dependency>
<dependency>
    <groupId>com.imsweb</groupId>
    <artifactId>staging-client-java-tnm</artifactId>
    <version>x.x.x</version>
</dependency>
<dependency>
    <groupId>com.imsweb</groupId>
    <artifactId>staging-client-java-pediatric</artifactId>
    <version>x.x.x</version>
</dependency>

Gradle

implementation 'com.imsweb:staging-client-java-cs:x.x.x'
implementation 'com.imsweb:staging-client-java-eod-public:x.x.x'
implementation 'com.imsweb:staging-client-java-tnm:x.x.x'
implementation 'com.imsweb:staging-client-java-pediatric:x.x.x'

Usage

More detailed documentation can be found in the Wiki

Get a Staging instance

Everything starts with getting an instance of the Staging object. There are DataProvider objects for each staging algorithm and version. The Staging object is thread safe and cached so subsequent calls to Staging.getInstance() will return the same object.

For example, for the Collaborative Staging algorithm, the call will look like this:

Staging staging=Staging.getInstance(CsDataProvider.getInstance(CsVersion.LATEST));

There could be times when you want to load either a private algorithm or even an older version of an existing algorithm. You can get the algorithm zip file from the release page and load it using ExternalStagingFileDataProvider.

Path path=Paths.get("C:/path/to/algorithm","tnm-1.9.zip");
        try(InputStream is=Files.newInputStream(path)){
        Staging staging=Staging.getInstance(new ExternalStagingFileDataProvider(is));

        // use staging instance
        }

Schemas

Schemas represent sets of specific staging instructions. Determining the schema to use for staging is based on primary site, histology and sometimes additional discrimator values. Schemas include the following information:

  • schema identifier (i.e. "prostate")
  • algorithm identifier (i.e. "cs")
  • algorithm version (i.e. "02.05.50")
  • name
  • title, subtitle, description and notes
  • schema selection criteria
  • input definitions describing the data needed for staging
  • list of table identifiers involved in the schema
  • a list of initial output values set at the start of staging
  • a list of mappings which represent the logic used to calculate staging output

To get a list of all schema identifiers,

Set<String> schemaIds=staging.getSchemaIds();

To get a single schema by identifer,

Schema schema=staging.getSchema("prostate");

Tables

Tables represent the building blocks of the staging instructions specified in schemas. Tables are used to define schema selection criteria, input validation and staging logic. Tables include the following information:

  • table identifier (i.e. "ajcc7_stage")
  • algorithm identifier (i.e. "cs")
  • algorithm version (i.e. "02.05.50")
  • name
  • title, subtitle, description, notes and footnotes
  • list of column definitions
  • list of table data

To get a list of all table identifiers,

Set<String> tableIds=staging.getTableIds();

That list will be quite large. To get a list of table indentifiers involved in a particular schema,

Set<String> tableIds=staging.getInvolvedTables("prostate");

To get a single table by identifer,

Table table=staging.getTable("ajcc7_stage");

Lookup a schema

A common operation is to look up a schema based on primary site, histology and optionally one or more discriminators. Each staging algorithm has a SchemaLookup object customized for the specific inputs needed to lookup a schema.

For Collaborative Staging, use the CsSchemaLookup object (each algorithm has their own lookup class). Here is a lookup based on site and histology.

List<Schema> lookup=staging.lookupSchema(new CsSchemaLookup("C629","9231"));
        assertEquals(1,lookup.size());
        assertEquals("testis",lookup.get(0).getId());

If the call returns a single result, then it was successful. If it returns more than one result, then it needs a discriminator. Information about the required discriminator is included in the list of results. In the Collaborative Staging example, the field ssf25 is always used as the discriminator. Other staging algorithms may use different sets of discriminators that can be determined based on the result.

// do not supply a discriminator
List<Schema> lookup=staging.lookupSchema(new CsSchemaLookup("C111","8200"));
        assertEquals(2,lookup.size());
        for(Schema schema:lookup)
        assertTrue(schema.getSchemaDiscriminators().contains(CsStagingData.SSF25_KEY));

        // supply a discriminator
        lookup=staging.lookupSchema(new CsSchemaLookup("C111","8200","010"));
        assertEquals(1,lookup.size());
        assertEquals("nasopharynx",lookup.get(0).getId());
        assertEquals(Integer.valueOf(34),lookup.get(0).getSchemaNum());

Calculate stage

Staging a case requires first knowing which schema you are working with. Once you have the schema, you can tell which fields (keys) need to be collected and supplied to the stage method call.

A StagingData object is used to make staging calls. All inputs to staging should be set on the StagingData object and the staging call will add the results. The results include:

  • output - all output values resulting from the calculation
  • errors - a list of errors and their descriptions
  • path - an ordered list of the tables that were used in the calculation

Each algorithm has a specific StagingData entity which helps with preparing and evaluating staging calls. For Collaborative Staging, use CsStagingData. One difference between this library and the original Collaborative Stage library is that you no longer have to pass all 25 site-specific factors for every staging call. Only include the ones that are used in the schema being staged.

CsStagingData data=new CsStagingData();
        data.setInput(CsInput.PRIMARY_SITE,"C680");
        data.setInput(CsInput.HISTOLOGY,"8000");
        data.setInput(CsInput.BEHAVIOR,"3");
        data.setInput(CsInput.GRADE,"9");
        data.setInput(CsInput.DX_YEAR,"2013");
        data.setInput(CsInput.CS_VERSION_ORIGINAL,"020550");
        data.setInput(CsInput.TUMOR_SIZE,"075");
        data.setInput(CsInput.EXTENSION,"100");
        data.setInput(CsInput.EXTENSION_EVAL,"9");
        data.setInput(CsInput.LYMPH_NODES,"100");
        data.setInput(CsInput.LYMPH_NODES_EVAL,"9");
        data.setInput(CsInput.REGIONAL_NODES_POSITIVE,"99");
        data.setInput(CsInput.REGIONAL_NODES_EXAMINED,"99");
        data.setInput(CsInput.METS_AT_DX,"10");
        data.setInput(CsInput.METS_EVAL,"9");
        data.setInput(CsInput.LVI,"9");
        data.setInput(CsInput.AGE_AT_DX,"060");
        data.setSsf(1,"020");

        // perform the staging
        staging.stage(data);

        assertEquals(Result.STAGED,data.getResult());
        assertEquals("urethra",data.getSchemaId());
        assertEquals(0,data.getErrors().size());
        assertEquals(37,data.getPath().size());

        // check output
        assertEquals("129",data.getOutput(CsOutput.SCHEMA_NUMBER));
        assertEquals("020550",data.getOutput(CsOutput.CSVER_DERIVED));

        // AJCC 6
        assertEquals("T1",data.getOutput(CsOutput.AJCC6_T));
        assertEquals("c",data.getOutput(CsOutput.AJCC6_TDESCRIPTOR));
        assertEquals("N1",data.getOutput(CsOutput.AJCC6_N));
        assertEquals("c",data.getOutput(CsOutput.AJCC6_NDESCRIPTOR));
        assertEquals("M1",data.getOutput(CsOutput.AJCC6_M));
        assertEquals("c",data.getOutput(CsOutput.AJCC6_MDESCRIPTOR));
        assertEquals("IV",data.getOutput(CsOutput.AJCC6_STAGE));
        assertEquals("10",data.getOutput(CsOutput.STOR_AJCC6_T));
        assertEquals("c",data.getOutput(CsOutput.STOR_AJCC6_TDESCRIPTOR));
        assertEquals("10",data.getOutput(CsOutput.STOR_AJCC6_N));
        assertEquals("c",data.getOutput(CsOutput.STOR_AJCC6_NDESCRIPTOR));
        assertEquals("10",data.getOutput(CsOutput.STOR_AJCC6_M));
        assertEquals("c",data.getOutput(CsOutput.STOR_AJCC6_MDESCRIPTOR));
        assertEquals("70",data.getOutput(CsOutput.STOR_AJCC6_STAGE));

        // AJCC 7
        assertEquals("T1",data.getOutput(CsOutput.AJCC7_T));
        assertEquals("c",data.getOutput(CsOutput.AJCC7_TDESCRIPTOR));
        assertEquals("N1",data.getOutput(CsOutput.AJCC7_N));
        assertEquals("c",data.getOutput(CsOutput.AJCC7_NDESCRIPTOR));
        assertEquals("M1",data.getOutput(CsOutput.AJCC7_M));
        assertEquals("c",data.getOutput(CsOutput.AJCC7_MDESCRIPTOR));
        assertEquals("IV",data.getOutput(CsOutput.AJCC7_STAGE));
        assertEquals("100",data.getOutput(CsOutput.STOR_AJCC7_T));
        assertEquals("c",data.getOutput(CsOutput.STOR_AJCC6_TDESCRIPTOR));
        assertEquals("100",data.getOutput(CsOutput.STOR_AJCC7_N));
        assertEquals("c",data.getOutput(CsOutput.STOR_AJCC7_NDESCRIPTOR));
        assertEquals("100",data.getOutput(CsOutput.STOR_AJCC7_M));
        assertEquals("c",data.getOutput(CsOutput.STOR_AJCC7_MDESCRIPTOR));
        assertEquals("700",data.getOutput(CsOutput.STOR_AJCC7_STAGE));

        // Summary Stage
        assertEquals("L",data.getOutput(CsOutput.SS1977_T));
        assertEquals("RN",data.getOutput(CsOutput.SS1977_N));
        assertEquals("D",data.getOutput(CsOutput.SS1977_M));
        assertEquals("D",data.getOutput(CsOutput.SS1977_STAGE));
        assertEquals("L",data.getOutput(CsOutput.SS2000_T));
        assertEquals("RN",data.getOutput(CsOutput.SS2000_N));
        assertEquals("D",data.getOutput(CsOutput.SS2000_M));
        assertEquals("D",data.getOutput(CsOutput.SS2000_STAGE));
        assertEquals("7",data.getOutput(CsOutput.STOR_SS1977_STAGE));
        assertEquals("7",data.getOutput(CsOutput.STOR_SS2000_STAGE));

About SEER

The Surveillance, Epidemiology and End Results (SEER) Program is a premier source for cancer statistics in the United States. The SEER Program collects information on incidence, prevalence and survival from specific geographic areas representing 28 percent of the US population and reports on all these data plus cancer mortality data for the entire country.

staging-client-java's People

Contributors

ctmay4 avatar depryf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

staging-client-java's Issues

Add TNM 1.1

TNM 1.1 was released. Need to add it to the library. The library will only support the latest TNM 1.x version so 1.0 will be removed as well.

EOD Algorithm lookup input issue

When loading a SchemaLookup object for the EOD library, some of the schemas require the "behavior" field as a discriminator, however the EodSchemaLookup class does not allow this field as an allowed input key. How are we supposed to properly lookup only one EOD schema if the "behavior" value is the deciding factor between 2 or more, the schema Lookup process is failing to allow a single schema result in these scenarios?

One such scenario is: site=C717, histology=9591, and any value for discriminator 1.

I noticed that the "sex" field is an allowed key, but not "behavior".

Allow invalid inputs to fail staging

In the INPUT declaration, we need the ability to say that if a field has an invalid value (not in specified table or outside of specified value range), then execution of staging should not occur.

I am going to add an option on the input:

"fail_on_invalid" = true

If not specified, staging will continue.

Split algorithms from library

I don't think the library releases should be bound to the algorithm releases which happen way more often. My proposal is to remove the algorithms from the library and create a new repository for each one. We would have:

  • staging-client-java which will be this repository but with no data
  • staging-client-java-cs will contain only the CS algorithm
  • staging-client-java-tnm will contain only the TNM algorithm
  • staging-client-java-eod will contain only the EOD algorithm

The algorithm projects will depend on the main project so if you only want CS you can include the dependency staging-client-java-cs and get the algorithm and the library.

Should null discriminators be ignored for schema lookup

In version 2.11 the schema selection process was changed to only use the properties you passed in. It was not intended to affect CS and TNM, but instead be an improvement for EOD. Here are the changes.

66f145f#diff-24f9a47518db825893ecd24e071f8a36

The difference is the following lookup on CS:

{
	"hist": "8000",
	"site": "C240",
	"ssf25": null
}

Will return no matches. However no including the "ssf25"

{
	"hist": "8000",
	"site": "C240"
}

returns 3 possible schemas and is how it used to work.

If we want to fix this the solution is simple. Don't include keys that have a NULL value in the lookup. Otherwise the callers will just need to know to stop passing in NULL values for any field involved in schema lookup. They won't affect staging after the lookup step.

How should INPUT values with dashes that are not ranges be handled

This issue began with the CS breast schema. In the "lymph_nodes_positive_axillary_node_xcy.json" table, there are some invalid ranges. For example:

 ["720", "000", "VALUE:N1c", "VALUE:N1c"],
 ["748", "000", "VALUE:N2b", "VALUE:N2b"],
 ["250", "001-0003", "VALUE:N1a", "VALUE:N1a"],     <--- left and right sides of ranges do not match length
 ["258", "004-009", "VALUE:N2a", "VALUE:N2a"],
 ["260", "004-009", "VALUE:N2a", "VALUE:N2a"],

The rule of ranges is that they have to be the same length on both sides. In the best case scenario, I should be throwing an exception when I encounter this. The reason I don't is because we have tables like this:

  "id": "ajcc6_n_codes",
  "last_modified": "2014-11-07T18:15:47.798Z",
  "version": "02.05.50",
  "algorithm": "cs",
  "rows": [
    ["N0", "VALUE:00", "N0"],
    ["N0(i-)", "VALUE:01", "N0(i-)"],  <--- this is NOT a range, but has a dash in the value

Where the "real" values have a "-" in them. I have no easy way of telling whether it is an error or if it is a value with a dash. So if the two sides do not match in size, I just make the assumption that the value is not a range. So in your case above I assumed it to be a single value of "001-0003".

It seems we need to come up with a way to "escape" dashes in INPUT values so I can throw an exception when the sides of the range do not match.

Will this library and the SEER API support UICC 8 edition of TNM?

If I am correct this library and SEER API currently output UNICC 7 edition TNM, correct? For the NAACCR variables:

TNM CLIN T NAACCR 940
TNM CLIN N NAACCR 950
TNM CLIN M NAACCR 960

TNM PATH T NAACCR 880
TNM PATH N NAACCR 890
TNM PATH M NAACCR 900

Will this library and the SEER API support the UICC 8 edition of TNM? If so, will it be bind the 8th edition only to the following variables:

AJCC TNM CLIN T NAACCR 1001
AJCC TNM CLIN N NAACCR 1002
AJCC TNM CLIN M NAACCR 1003

AJCC TNM PATH T NAACCR 1011
AJCC TNM PATH N NAACCR 1012
AJCC TNM PATH M NAACCR 1013

Add support for external algorithm versions

The library currently only supports the algorithm versions that are bundled with the release. It also need to support algorithm versions maintained outside the release. There needs to be a registerAlgorithmVersion method which takes a set of staging schema and tables and adds them for use within the library. I'm not sure what the best format would be. Options include:

  1. A directory which matches how the files are stored internally. There would be two sub-directories: schemas and tables.
  2. A zip file since with the same information and directory structure as the first option. Performance constraints would probably require the zip file to be exploded so maybe the first option is just easier.

There should also be a deregisterAlgorithmVersion method.

Change how missing context keys are matched

Currently, if an INPUT cell was a blank value, it would only match to blank input value. For regular input values this is not an issue since if you don't supply an input it was defaulted to blank at the start of staging. However intermediate values used during the staging process are not handled the same way. If the input in not in the context, it would not match at table that was looking for a blank value for that input.

This will change that and now missing keys will match blank INPUT cells since keys not in the context are handled exactly the same as blank.

Change <blank> validation

Currently, if an input does not have a default values and it is either:

  • not supplied
  • supplied as blank

Then is will fail validation unless the associated table has a matching blank row. Some inputs are not even used for staging and will produce errors when not supplied. I think there should be a difference between not supplying a value and supplying a value that is not in the table. I propose the following changes:

  1. By default, once the defaults values have been set, do not run validation on fields that are blank.
  2. Add an override to an input that specifies that it should not always allow blank. Perhaps allow_blank = 'false'. By default, allow_blank would be assumed to be true.

Should default output be included for invalid input?

There are times when there is enough input to calculate a schema, however we don't process anything in staging because the inputs are not valid. The question is should the library return the outputs with default values in that case?

For example, EOD input that looks like this:

{
	"site": "C713",
	"hist": "8020",
	"behavior": "3",
	"eod_primary_tumor": "200",
	"eod_regional_nodes": "300",
	"eod_mets": "00",
	"year_dx": "2018"
}

returns the following:

{
	"result": "FAILED_INVALID_INPUT",
	"schema_id": "brain",
	"input": {
		"site": "C713",
		"hist": "8020",
		"behavior": "3",
		"eod_primary_tumor": "200",
		"eod_regional_nodes": "300",
		"eod_mets": "00",
		"year_dx": "2018"
	},
	"errors": [
		{
			"type": "INVALID_REQUIRED_INPUT",
			"table": "extension_bcc",
			"key": "eod_primary_tumor",
			"message": "Invalid 'eod_primary_tumor' value (200)"
		},
		{
			"type": "INVALID_REQUIRED_INPUT",
			"table": "nodes_dna",
			"key": "eod_regional_nodes",
			"message": "Invalid 'eod_regional_nodes' value (300)"
		}
	]
}

There is no output. However the brain schema does have default output values that could be returned.

image

Should they be returned?

Improve error messages

When an ERROR endpoint is reached, we display a very generic message if the the endpoint itself does not specify one:

Processing resulted in an error
Processing resulted in an error
Processing resulted in an error
The input mapping 'ss77' does not exist for ss_codes
Processing resulted in an error
The input mapping 'ss2000' does not exist for ss_codes

The old C DLL displayed more detailed error messages

Lookup of codes (Tispu, N1, M1) in AJCC TNM 6 Stage returns "ERROR".
AJCC stage lookup failed.
Lookup of codes (Tispu, N1, M1) in AJCC TNM 7 Stage returns "ERROR".
AJCC stage lookup failed.
Lookup of codes (IS, RN, D) in Summary Stage returns "ERROR".
SEER Summary Stage 1977 lookup failed.
Lookup of codes (IS, RN, D) in Summary Stage returns "ERROR".
SEER Summary Stage 2000 lookup failed.

The library should improve this by adding the table and inputs like the old library provided.

Add column to the information returned when there is a staging error

From Nicki:

For the diagnosis years 2004-2009, AJCC 7 was not in effect, so values for these fields are not returned. However ERRORs for these fields ARE returned. They show up in SEER*DMS on the staging page in the red 'error' tag. They do not get stored as polisher errors, nor do they show up on the Edits tab on the right pane.
But they do look like something is wrong when it really isn't.

It appears that in the CS staging DLL from CDC, they didn't even attempt 7th for those diagnosis years. But the way we structured the API, all calculations occur at one time and we blank out the unnecessary data at the end. I'd like to have the ability to ignore the AJCC 7th related errors, but I don't have enough information to identify them.

The failures falling in 2 main types of errors.

First, match not found in table with AJCC 7 in name, examples are:

  • Error: Match not found in table ajcctnm7_stage_adenocarcinoma_xib
  • Error: Match not found in table ajcc7_stage_ubq

I can identify which of these can be ignored based on the table name containing ajcctnm7 or ajcc7. The output field is ajcc7_stage.

Second, matching resulted in an error in table in the AJCC 7th column, examples are:

  • Error: Matching resulted in an error in table extension_baj
  • Error: Matching resulted in an error in table nodes_dev
  • Error: Matching resulted in an error in table mets_hae
  • Error: Matching resulted in an error in table extension_size_high_risk_xfy
  • Error: Matching resulted in an error in table lymph_nodes_pathologic_evaluation_xcw

I can't identify which of these can be ignored and which are real problems solely on table name. The output fields I care about are ajcc7_t, ajcc7_n, ajcc7_m; but one would also see ajcc6_t, ajcc6_n, ajcc6_m, t2000, n2000, m2000, t77, n77, m77.

I would like a way to identify all the errors that can be ignored for 2004-2009. Current options seem to be:

  1. Have Katherine write a script to put AJCC 7, AJCC 6, SS2000 or SS1977 after ERROR: so that the final messages would be like:

    • Error: AJCC 7 Match not found in table ajcctnm7_stage_adenocarcinoma_xib
    • Error: AJCC 7 Matching resulted in an error in table mets_hae

    However, I'm not keen on having to change the version (020550).

  2. Have Chuck change the returned items so that the data item the error relates to is included. That would be more like (I don't know everything being returned)

    • Error: Match not found in table ajcctnm7_stage_adenocarcinoma_xib | ajcc7_stage
    • Error: Matching resulted in an error in table mets_hae | ajcc7_m

We don't want to break anything else doing this. While it isn't necessary for TNM or EOD, because they do the same calculations for all diagnosis years they apply to, it shouldn't hurt to have the information. But it would be one more element than before.

How much work would it be, and how dangerous would it be, to implement the 2nd solution?

TNM8

Hi Team,

Great API and tool that gives a simplified way to access and assess different systems and versions...

I was wondering, if there is a place we can download the new algorithms like TNM 8 or TNM 9...

Reason for the ask: TNM 7 supports the DX year till 2017 and the latest DX Years are not supported ...

It will be a great help if we can explore for TNM 8 and TNM 9

More structured output specifications

Schema definitions currently gives detailed information about all inputs. The output that comes back from staging calls are not currently documented in any way. There are a few issues with this:

  1. Users of the staging call may not understand all the outputs that are produced.
  2. There are no tables associated with outputs to describe the values.
  3. Schemas may produce temporary outputs needed in the staging calculation that do not need to be returned as output.

I think the solution is to add a list of outputs to the schema. Each output would contain:

  • key
  • name
  • table (optional)

There are other possibilities we may want to consider:

  • metadata (this might be useful at some point, but I'm not sure)
  • agency requirement (we have this for inputs, but I'm not sure we want to try to maintain this on the output side)
  • description (some fields may not have a lookup and we might want a longer set of text in addition to the name)

At the end of the staging call only keys listed in the outputs would be included in the results. Some intermediate values might be interesting and could be added to the outputs if needed (like the "s" value in the "Testis" schema).

Support metadata on output fields

The StagingSchemaInput supports metadata. That is use for, among other things, classify a field as a site-specific data item (SSDI). Upcoming versions of EOD need to support the same designation for output fields. StagingSchemaOutput does not currently support metadata. We should add it.

Consider adding support for NAACCR XML IDs

The staging framework uses its own field identifiers.

But it also exposes NAACCR numbers in StagingSchemaInput.naaccrItem for standard NAACCR fields.

Yet the new official way to identify NAACCR items is by NAACCR XML ID, not by NAACCR numbers anymore. The numbers still work as identifier of course, and it's great to have them. But it would be very useful if that input class could also expose the XML IDs since many projects (like the validation library that runs edits) directly use and reference XML IDs, not numbers.

That "naaccrItem" field name is a bit unfortunate; it would be nice to rename it to "naaccrItemNum" or "naaccrItemNumber".

And then maybe add a "naaccrItemId"?

There is a direct mapping between numbers and IDs and so a simple unit test could make sure the IDs are valid and correspond to the numbers.

Collapse rows for internal table structures

Some schemas, like CS prostate, have 1000's of raw rows that could be collapsed to a fraction of that size in the internal representation. This is done by collapsing INPUTs that go to the same ENDPOINT. This would slow down the initialization but potentially give large speed gains in processing. I'm not sure if this optimization is worth the effort, but it would definitely help in certain situations.

Add TNM algorithm

The actual algorithm is still being developed, but the work needs to start on supporting in this library.

  1. The updater code already works so getting a copy of the algorithm from SEER*API will be trivial.
  2. Need to add come classes and constants to support it.

This will be released as a SNAPSHOT build for a few months while we wait for the algorithm to finalize and have the library testing in various systems.

Which of the libraries CS, TNM, and/or EOD supplies the NAACCR 2018 SSDI values?

Which of the 3 staging libraries needs to be staged in order to properly extract the allowable values for the schema applicable SSDI fields? There is some documentation suggesting that a number of the SSDI fields were historically recorded as CS fields. But the length of of the older CS fields does not appear to match the length of the associated SSDI field.

Handle case of key being an input and an output field

When staging a case, one of the last steps is to remove any field from the results that are part of the input.

// remove the original input keys from the resulting context;  in addition, we want to remove any input keys
// from the resulting context that were set with a default value; to accomplish this remove all keys that are
// defined as input in the selected schema
for (Entry<String, String> entry : data.getInput().entrySet())
    context.remove(entry.getKey());
for (StagingSchemaInput input : schemas.get(0).getInputs())
    context.remove(input.getKey());

This needs to be adjusted so that it will not remove input fields if they are ALSO defined as an output field. There is a real-world case of this in lymphoma_cll_sll for the upcoming EOD 2.1.

Adding @schusslern

Support multiple field lookups based on year

From @schusslern

So right now there are 4 grade fields defined as SSDIs
(Grade Clin, Grade Path, Grade Post Tx Clin, Grade Post Tx Path)

They are discussing changing the lookups in 2025 and adding a new grade (Grade 2025).

They will then have:

2018-2024 lookup 2025+ lookup
Grade Clin All cases, all standard setters CoC (SEER RC?), AJCC ID != XX
Grade Path All cases, all standard setters CoC (SEER RC?), AJCC ID != XX
Grade Post Tx Clin All cases, all standard setters CoC (SEER RC?), AJCC ID != XX
Grade Post Tx Path All cases, all standard setters CoC (SEER RC?), AJCC ID != XX
Grade 2025 Not collected All cases, all standard setters

So while for any given field I can say any one of these things:

  • Grade Clin Year start = 2018, Year end = 2024, Standard setters = SEER, CoC, NPCR, Canada
  • Grade Clin Year start = 2025, Year end = , Standard setters = SEER, CoC, NPCR, Canada
  • Grade Clin Year start = 2025, Year end = , Standard setters = CoC, SEER (RC)

What I can’t do currently is

  • Grade Clin Year start = 2018, Year end = 2024, Standard setters = SEER, CoC, NPCR, Canada with 2018-2024 lookup
  • Grade Clin Year start = 2025, Year end = , Standard setters = CoC, SEER (RC) with 2025+ lookup

Because these are the same field.

I don’t want to make an entirely new EOD library (Where 2018-2024 is in EOD 2018; and the 2025+ is in EOD 2018b) and I don’t want to make a second set of schemas for all schemas so these 4 fields can have 2 different lookups.

As you can see, we have a year before we need to be able to do this, if they do in fact choose to NOT convert the old values into the new coding system.

I think they should convert it, personally, but I’m not sure I’ll be able to convince them. So I want to think about the best way to do this now so if we need to make changes to the Helios, API, etc; we have a plan.

Optimize duplicate strings

Some recent profiling has pointed out we have many duplicate String values when algorithm versions are loaded.

image

In this care, 195MB of wasted memory when we have many versions loaded. I think this can be fixed by interning the strings when they are loaded. I will do some testing and verify.

Support constants in input mapping

Input mapping currently looks like this:

"input_mapping": [
    {  "from": "ajcc7_t", "to": "t" }
]

This example temporarily puts the context value contained in "ajcc_t" into "t" before processing a table. This allows tables to be reused with different inputs. The need has arisen to use constant values instead of existing keys. This is not difficult, however a good syntax needs to be decided on. Static strings could be quoted. That would look like this:

"input_mapping": [
    {  "from": "\"constant\"", "to": "t" }
]

Allow context variables in defaults

Every time a new release is made of an algorithm, we need to change the default value for the derived version. For example, when TNM goes from 1.1 to 1.2, the derived_version default needs to change in every scheme. Instead of setting it to 1.2 it would be better to be able to set it to a context reference like {{version}}. That way it would automatically used the algorithm version are it would never have to be changed. This is already used in table processing.

The code that uses the output defaults is in DecisionEngine.

// add all output keys to the context; if no default is supplied, use an empty string
for (Entry<String, ? extends Output> entry : definition.getOutputMap().entrySet())
    context.put(entry.getValue().getKey(), entry.getValue().getDefault() != null ? entry.getValue().getDefault() : "");

I think this fix is as easy as making sure version is added to the available context and using the existing translateValue method in DecisionEngine.

We should probably also support this for input defaults as well.

Null values

Initial context is designed to set initial values in a context at the start of staging or at the start of a mapping. It looks like:

"initial_context": [
   {
      "key": "test",
      "value": "something"
   }
]

What would it mean if the value above was null?

I see two options.

  1. Assume null means empty string.
  2. Assume null means to remove the key from the context.

I prefer the second option since it gives the ability to do something that is not possible right now. If you want to set a value to blank you can easily do that. However there is no other way to remove a key from the context in the processing of staging call.

Missing EodInput constants

EodInput is supposed to have constants for all inputs with the exception of SSDI fields. There are a few missing as reported by SEER*DMS.

018-03-26 08:15:20,776 INFO  [ValidatorApiSessionBean] Initialized validation engine in 167472ms (total edits loaded: 1893; pre-compiled: 947; compil
018-03-26 08:16:46,605 ERROR [EodStageInputDto] clin_t is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,605 ERROR [EodStageInputDto] clin_t_suffix is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,605 ERROR [EodStageInputDto] clin_n is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,605 ERROR [EodStageInputDto] clin_n_suffix is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,605 ERROR [EodStageInputDto] clin_m is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,605 ERROR [EodStageInputDto] clin_stage_group_direct is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,605 ERROR [EodStageInputDto] path_t is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,605 ERROR [EodStageInputDto] path_t_suffix is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,605 ERROR [EodStageInputDto] path_n is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,605 ERROR [EodStageInputDto] path_n_suffix is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,620 ERROR [EodStageInputDto] path_m is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,620 ERROR [EodStageInputDto] path_stage_group_direct is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,620 ERROR [EodStageInputDto] ypath_t is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,620 ERROR [EodStageInputDto] ypath_t_suffix is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,620 ERROR [EodStageInputDto] ypath_n is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,620 ERROR [EodStageInputDto] ypath_n_suffix is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,620 ERROR [EodStageInputDto] ypath_m is required for TNM Staging but isn't mapped in DTO!
018-03-26 08:16:46,620 ERROR [EodStageInputDto] ypath_stage_group_direct is required for TNM Staging but isn't mapped in DTO!

EOD 1.1

Here are a summary of the changes:

  • Bladder:

    • EOD Primary Tumor codes 130, 200 corrected (added Subserosa to 130, Superficial muscle - inner half indented in 200)
    • Summary Stage 2018 code 1 corrected (added Subserosa)
  • Breast

    • EOD Regional Nodes Note 7 corrected to move last two entries into the list
    • EOD Regional Nodes Note 5 added (Codes 100-200 and 350)
    • EOD Mets Note and Summary Stage 2018 Note 9 corrected (… less than or equal to 0.2 mm are negative …)
    • ER Allred Score[#3828] and PR Allred Score[#3916] Note 3 was modified (bullet added)
    • Lymph Nodes Positive Axillary Level I-II[#3882]: default changed to 8
    • Oncotype DX Recurrence Score - DCIS[#3903]: Note 4 and Note 5 added
    • Oncotype DX Recurrence Score[#3904] Note 4 added
    • Oncotype DX Risk Level-DCIS[#3905]: Note 3 added
  • Cervical Lymph Nodes and Unknown Primary Tumor of Head and Neck: Summary Stage 2018 site list limited to C760. This is the only site code in the schema and is to be used, by agreement, for unknown primaries of the head and neck.

  • Colon and Rectum:

    • EOD Primary Tumor Note 2 revised, Note 3 added
    • Summary Stage 2018 Note 2 revised, Note 3 added
  • Corpus Adenosarcoma; Corpus Sarcoma: Summary Stage 2018 code 0 removed

  • Corpus Sarcoma: FIGO Stage[#3836] given a default of 98

  • Cutaneous Squamous Cell Carcinoma of Head and Neck; Melanoma Skin; Skin Other: Summary Stage 2018 code 3 corrected (C006 added whenever C000-C002 are listed)

  • Genital Male Other:

    • Schema Note 3 corrected for Merkel Cell Skin and Kaposi Sarcoma
    • Summary Stage 2018 Note 3 about C632 was corrected
  • GIST: Summary Stage 2018 Note 2 corrected (C54 changed to C540)

  • HemeRetic: Summary Stage 2018 Note 4 had 9740 removed

  • Kaposi Sarcoma:

    • EOD Primary Tumor Note 2 corrected (Choice of EOD Primary Tumor for Kaposi sarcoma...)
    • Summary Stage 2018 code 3's first bullet corrected (...pathologically positive lymph node(s))
  • Larynx Other: AJCC Chapter Calculation table had behavior added; /2 is not AJCC eligible

  • Lung:

    • EOD Primary Tumor code 550 corrected (Chest wall (thoracic wall) (separate lesion-see SEER mets) removed)
    • EOD Regional Nodes code 700 corrected (Cervical added)
  • Lymphoma and Lymphoma CLL/SLL: B Symptoms[#3812] default value changed to 8

  • Melanoma Conjunctiva: EOD Primary Tumor code 100 corrected (Less than or equal to one quadrant)

  • Melanoma Head and Neck:

    • EOD Primary Tumor and Summary Stage 2018 site code lists revised to match schema selection
    • Summary Stage 2018 had C000-C002, C006 removed
  • Melanoma Skin:

    • Summary Stage 2018 C500 labeled "Nipple & areola (Breast)"
    • Breslow Thickness[#3817] default changed to XX.8
    • LDH Pretreatment Level[#3932] default set to XXXXX.8
    • Ulceration default[#3936] default changed to 8
  • Melanoma Skin; Merkel Cell Skin: EOD Regional Nodes and Summary Stage 2018 had note added about ITCs

  • Melanoma Skin; Merkel Cell Skin; Skin Other: EOD Regional Nodes Notes and Summary Stage 2018 code 7 were revised to be compatible

  • Merkel Cell Skin

    • EOD Regional Nodes code 700 corrected (Cervical added)
    • Summary Stage 2018 chapter title changed to Merkel Cell Skin
  • Mouth Other; Palate Hard; Tongue Anterior: EOD Primary Tumor code 400 corrected (less than or equal to 10 mm)

  • Nasopharynx: EOD Primary Tumor code 300 corrected (2 bullets collapsed into single phrase)

  • NET Adrenal Gland: Summary Stage 2018 site list now limits C755 to 8680, 8690, 8692-8693, 8700

  • NET Colon and Rectum: EOD Regional Nodes code 300 corrected (C183 associated with Hepatic Flexure)

  • Oral Cavity: EOD Primary Tumor Note 3's 2nd bullet changed to "greater than 5 mm to less than or equal to 10 mm"

    • Oral Cavity schema are Buccal Mucosa, Floor of Mouth, Gum, Lip, Mouth Other, Palate Hard, Tongue Anterior
  • Parathyroid: Summary Stage 2018 Note 3 corrected (code 0 instead of 000)

  • Penis; Vulva: Schema Note and Summary Stage 2018 Note for Merkel Cell corrected (8041, 8190, 8247)

  • Plasma Cell Disorder: EOD Regional Nodes code 987 corrected (Single plasmacytoma occurring in bone (osseous or medullary) (9731) added)

  • Plasma Cell Myeloma: Schema Discriminator 1 default changed to 0

  • Pleural Mesothelioma: Summary Stage 2018: Note 3 corrected (to code EOD Mets removed)

  • Primary Peritoneal Carcinoma: EOD Regional Nodes Note 2 corrected (See EOD Mets added)

  • Skin Eyelid

    • EOD Regional Nodes Note 3 corrected (Infra-auricular added)
    • Summary Stage 2018 code 3 lymph node list aligned with EOD Regional Nodes Note 3
  • Small Intestine: Summary Stage 2018 code 1 bullets reordered (subcategories of Invasion of)

  • Soft Tissue Abdomen: C151-C152, C154-C155, C159 moved from Soft Tissue Other into Soft Tissue Abdomen

  • Urethra; Urethra Prostatic: Schema Discriminator 1 default set to 1

  • General:

    • Various typos corrected, including issues with spacing, formatting, spelling, parentheses, numbering, etc.
    • Various formatting issues corrected for consistency that did not change the meaning, such as
      • lists in text box or text strings converted to standard lists
      • titles and site lists added Summary Stage 2018
      • Summary Stage 2018 Note 1 modified to use the word chapter instead of schema
    • NAACCR item numbers were added for SSDIs references in other data item notes
    • Notes added to the schema and Summary Stage 2018 to indicate where other histologies for the primary sites can be found (8935-8936 -> GIST for example)
    • Various Summary Stage 2018 chapter name corrections
    • Derived Summary Stage 2018 [#762] starting value changed to blank. Derived EOD T, N and M starting values changed to 90.
      • The starting value is returned if a failures prevents that algorithms from executing

—————————————-
EOD Master Additional Changes:

  • Breast: Post Therapy Stage Group value of 99 removed (only 88 is valid)

  • Lung: Stage Group table corrected (M1b results in stage group IVA; M1c results in stage group IVB)

Bad type for some input columns in prostate

The JSON table referenced for SSF3 in Prostate (can't remember the table ID, sorry) has its tow last inputs (summary stage I think) defined as DESCRIPTION instead of ENDPOINT.

This is a surprising bug because I would have thought the engine would use a column as an end-point only if it's defined as such; and if this is already true, then I would have thought some tests would have found that all the end-points defined in those columns weren't actually used by the engine...

It might be worth adding a unit test to make sure no description columns "looks like" an end-points in cstage...

Also, out of curiosity, is there any table in cstage that has more than one description column?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.