Code Monkey home page Code Monkey logo

cttso-ica-to-pieriandx's People

Contributors

alexiswl avatar dependabot[bot] avatar williamputraintan avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cttso-ica-to-pieriandx's Issues

Prevent duplication of pieriandx accession numbers

If a lambda is accidentally launched twice, both batch jobs are invoked at the exact same time.

Two options to resolve this:

  1. Launch batch jobs one at a time
  2. Before launching a batch job - lambda ensures that no batch job exists in the queue?
  3. When running a batch job, job checks that there are not multiple of this job running?

redcap_is_complete column is required but not available in processing_df

[ERROR] AttributeError: 'Series' object has no attribute 'redcap_is_complete'
Traceback (most recent call last):
  File "/var/task/lambda_code.py", line 1450, in lambda_handler
    processing_df = submit_libraries_to_pieriandx(processing_df)
  File "/var/task/lambda_code.py", line 327, in submit_libraries_to_pieriandx
    processing_df["submission_arn"] = processing_df.apply(
  File "/var/lang/lib/python3.9/site-packages/pandas-1.4.3-py3.9-linux-x86_64.egg/pandas/core/frame.py", line 8845, in apply
    return op.apply().__finalize__(self, method="apply")
  File "/var/lang/lib/python3.9/site-packages/pandas-1.4.3-py3.9-linux-x86_64.egg/pandas/core/apply.py", line 733, in apply
    return self.apply_standard()
  File "/var/lang/lib/python3.9/site-packages/pandas-1.4.3-py3.9-linux-x86_64.egg/pandas/core/apply.py", line 857, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/var/lang/lib/python3.9/site-packages/pandas-1.4.3-py3.9-linux-x86_64.egg/pandas/core/apply.py", line 873, in apply_series_generator
    results[i] = self.f(v)
  File "/var/task/lambda_code.py", line 329, in <lambda>
    if x.is_validation_sample or (x.is_research_sample and not x.redcap_is_complete)
  File "/var/lang/lib/python3.9/site-packages/pandas-1.4.3-py3.9-linux-x86_64.egg/pandas/core/generic.py", line 5575, in __getattr__
    return object.__getattribute__(self, name)

Invalid indices check fails when portal workflow finishes in early hours of morning

Portal Workflow Run is set to UTC time,

Pieriandx Case Creation date is a few hours behind (US / Eastern time).

When matching the correct workflow id to the pieriandx submission id, we use the portal UTC date but the pieriandx date may be the day before.

This means that the job will be attempted to be resubmitted as the lims lambda script dismisses the join between the portal data and the pieriandx submission as the portal workflow run id cannot be the day after the pieriandx submission.

Since we only get the date from PierianDx, we cannot convert from US Eastern to UTC timezone. Instead, grab the portal date, and convert to us eastern and then perform check

CDK Build Deployment fails

This CDK CLI is not compatible with the CDK library used by your application. Please upgrade the CLI to the latest version.
(Cloud assembly schema version mismatch: Maximum schema version supported is 20.0.0, but found 31.0.0)

PierianDx ctTSO LIMS data miner not always taking the latest job for a given sample

Previously taking the last element in the array has been successful, however we have one case with the following content in the cttso lims

| pieriandx_case_id | pieriandx_case_accession_number | pieriandx_case_creation_date | pieriandx_case_identified | pieriandx_panel_type | pieriandx_workflow_id | pieriandx_workflow_status | pieriandx_report_status |
|-------------------|---------------------------------|------------------------------|---------------------------|----------------------|-----------------------|---------------------------|-------------------------|
|            227147 | SBJ03034_L2300015_001           |           2023-01-15 0:00:00 |            TRUE           | MAIN                 |                190454 | canceled                  | complete                |

However this case has a new job that is successful (190459).

Rather than taking the last element, it may be worth sorting based on ID, knowing that the IDs are created chronologically.

Could not get pieriandx case id from job df to collect missing jobs



2023-07-13 03:14:53,348 - INFO     - lambda_code               - lambda_handler                           : LineNo. 1566 - Got 2 rows to replace
--
[ERROR] KeyError: "['pieriandx_case_id'] not in index"

Traceback (most recent call last):  

File "/var/task/lambda_code.py", line 1570, in lambda_handler    
pieriandx_job_status_missing_df = update_pieriandx_job_status_missing_df(
  pieriandx_job_status_missing_df, merged_df
)  

File "/var/task/lambda_code.py", line 759, in update_pieriandx_job_status_missing_df    

merged_df = merged_df[[  File 

"/var/lang/lib/python3.10/site-packages/pandas-1.5.3-py3.10-linux-x86_64.egg/pandas/core/frame.py", line 3813, in __getitem__    indexer = self.columns._get_indexer_strict(key, "columns")[1]  

File "/var/lang/lib/python3.10/site-packages/pandas-1.5.3-py3.10-linux-x86_64.egg/pandas/core/indexes/base.py", line 6070, in _get_indexer_strict    self._raise_if_missing(keyarr, indexer, axis_name)  

File "/var/lang/lib/python3.10/site-packages/pandas-1.5.3-py3.10-linux-x86_64.egg/pandas/core/indexes/base.py", line 6133, in _raise_if_missing    raise KeyError(f"{not_found} not in index")
 

[ERROR] AttributeError: 'Series' object has no attribute 'glims_is_validation'

Traceback (most recent call last):  File "/var/task/lambda_code.py", line 1788, in lambda_handler    processing_df = submit_libraries_to_pieriandx(processing_df)  
File "/var/task/lambda_code.py", line 348, in submit_libraries_to_pieriandx    processing_df["submission_arn"] = processing_df.apply(  
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/frame.py", line 9568, in apply    
return op.apply().__finalize__(self, method="apply")  
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/apply.py", line 764, in apply    return self.apply_standard()  
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/apply.py", line 891, in apply_standard    results, res_index = self.apply_series_generator()  
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/apply.py", line 907, in apply_series_generator    results[i] = self.f(v)  
File "/var/task/lambda_code.py", line 350, in <lambda>    if x.glims_is_validation is True  
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/generic.py", line 5902, in __getattr__    return object.__getattribute__(self, name) | [ERROR] AttributeError: 'Series' object has no attribute 'glims_is_validation' Traceback (most recent call last):   
File "/var/task/lambda_code.py", line 1788, in lambda_handler     processing_df = submit_libraries_to_pieriandx(processing_df)  
File "/var/task/lambda_code.py", line 348, in submit_libraries_to_pieriandx     processing_df["submission_arn"] = processing_df.apply(  
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/frame.py", line 9568, in apply     return op.apply().__finalize__(self, method="apply")   
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/apply.py", line 764, in apply     return self.apply_standard()   
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/apply.py", line 891, in apply_standard     results, res_index = self.apply_series_generator()   
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/apply.py", line 907, in apply_series_generator     results[i] = self.f(v)   
File "/var/task/lambda_code.py", line 350, in <lambda>     if x.glims_is_validation is True   
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/generic.py", line 5902, in __getattr__     return object.__getattribute__(self, name)

Circular Dependency for LIMS Maker Stack

Error message
Circular dependency between resources: [cttsoicatopieriandxprodlimsmakerlambdastacklfBF1A5A5E, cttsoicatopieriandxprodlimsmakerlambdastacklftrigAllowEventRuleCttsoIcaToPieriandxPipelineStackProdcttsoicatopieriandxprodLimsMakerLambdaStagecttsoicatopieriandxprodlimsmakerlambdastackcttsoicatopieriandxprodlimsmakerlambdastacklfA1060767738057BF, cttsoicatopieriandxprodlimsmakerlambdastacklftrigCA91B158, cttsoicatopieriandxprodlimsmakerlambdastacklfEventInvokeConfig2D261CF4, cttsoicatopieriandxprodlimsmakerlambdastackLambdaExecutionRoleDefaultPolicy9FF70FF7, cttsoicatopieriandxprodlimsmakerlambdastackssmcdklambdaeventruleparameterC1114276, cttsoicatopieriandxprodlimsmakerlambdastackssmcdklambdaparameter157B45B6

PierianDx Case Submission Time not found in row when cleaning duplicate rows

2023-07-30 09:30:40,719 - INFO - lambda_code - lambda_handler : LineNo. 1690 - Updating lims

[ERROR] KeyError: 'pieriandx_submission_time'

Traceback (most recent call last):  

File "/var/task/lambda_code.py", line 1703, in lambda_handler    cleanup_duplicate_rows(merged_df, cttso_lims_df, excel_row_number_mapping_df)  

File "/var/task/lambda_code.py", line 1366, in cleanup_duplicate_rows    
merged_df_dedup = bind_pieriandx_case_submission_time_to_merged_df(merged_df_dedup, cttso_lims_df)  

File "/var/task/lambda_code.py", line 1464, in bind_pieriandx_case_submission_time_to_merged_df    pieriandx_case_submission_time = row['pieriandx_submission_time']  

File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/series.py", line 981, in __getitem__    return self._get_value(key)  

File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/series.py", line 1089, in _get_value    loc = self.index.get_loc(label)  

File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/indexes/base.py", line 3804, in get_loc    raise KeyError(key) from err

Got case 'None' for pending analysis SBJ00595 L2300857

Batch Submission was successful

2023-07-08 01:08:47,906 - INFO     - cttso-ica-to-pieriandx    - main                                     : LineNo. 103  - Creating case object on PierianDx for case SBJ00595_L2300857_001
--

 

But then next iteration of LIMS got case none for this subject / library combination (about 10 mins later)

[INFO]	2023-07-08T01:56:26.123Z	c32b6ae0-2b70-4d5a-822e-625e6695be6e	Got case 'None' for pending analysis SBJ00595 L2300857

Samples relaunched over a period of three hours

Over the weekend we had the cttso lims launch 16 samples via 33 analyses.

Here is the table of launches from the lims script against subject id / library id combinations with each cell representing the pieriandx accession id generated

18 March 1pm 18 March 2pm 18 March 3pm
SBJ03127_L2300341 SBJ03127_L2300341_001 SBJ03127_L2300341_002 SBJ03127_L2300341_003
SBJ03130_L2300344 SBJ03130_L2300344_001 SBJ03130_L2300344_002 SBJ03130_L2300344_003
SBJ03132_L2300346 SBJ03132_L2300346_001 SBJ03132_L2300346_002 SBJ03132_L2300346_003
SBJ03133_L2300347 SBJ03133_L2300347_001 SBJ03133_L2300347_002 SBJ03133_L2300347_003
SBJ03128_L2300342 SBJ03128_L2300342_001 SBJ03128_L2300342_002 SBJ03128_L2300342_003
SBJ03129_L2300343 SBJ03129_L2300343_001 SBJ03129_L2300343_002 SBJ03129_L2300343_003
SBJ03137_L2300351 SBJ03137_L2300351_001 SBJ03137_L2300351_002 SBJ03137_L2300351_003
SBJ00596_L2300355 SBJ00596_L2300355_001 SBJ00596_L2300355_002
SBJ03140_L2300354 SBJ03140_L2300354_001 SBJ03140_L2300354_002
SBJ03141_L2300356 SBJ03141_L2300356_001 SBJ03141_L2300356_002
SBJ03136_L2300350 SBJ03136_L2300350_001 SBJ03136_L2300350_002
SBJ03138_L2300352 SBJ03138_L2300352_001 SBJ03138_L2300352_002
SBJ03139_L2300353 SBJ03139_L2300353_001 SBJ03139_L2300353_002
SBJ03135_L2300349 SBJ03135_L2300349_001
SBJ03134_L2300348 SBJ03134_L2300348_001
SBJ03131_L2300345 SBJ03131_L2300345_001

It may not be possible as to why this occurred. Could this have been something like GSheets not returning the full dataframe to pandas? Or likewise with pieriandx case endpoint? Is there a better way to ensure that pending samples are updated? Should these be updated before other sources are merged?

GSheets Service Down Temporarily

APIError: {'code': 503, 'message': 'The service is currently unavailable.', 'status': 'UNAVAILABLE'}
Traceback (most recent call last):
  File "/var/task/lambda_code.py", line 1464, in lambda_handler
    glims_df: pd.DataFrame = get_cttso_samples_from_glims()
  File "/var/lang/lib/python3.10/site-packages/lambda_utils-0.0.1-py3.10.egg/lambda_utils/gspread_helpers.py", line 93, in get_cttso_samples_from_glims
    glims_df: pd.DataFrame = Spread(spread=get_glims_sheet_id(), sheet="Sheet1").sheet_to_df(index=0)
  File "/var/lang/lib/python3.10/site-packages/gspread_pandas-3.2.2-py3.10.egg/gspread_pandas/spread.py", line 387, in sheet_to_df
    vals = self.sheet.get_all_values()
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/utils.py", line 701, in wrapper
    return f(*args, **kwargs)
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/worksheet.py", line 452, in get_all_values
    return self.get_values(**kwargs)
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/utils.py", line 701, in wrapper
    return f(*args, **kwargs)
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/worksheet.py", line 425, in get_values
    return fill_gaps(self.get(range_name, **kwargs))
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/utils.py", line 701, in wrapper
    return f(*args, **kwargs)
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/worksheet.py", line 818, in get
    response = self.spreadsheet.values_get(range_name, params=params)
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/spreadsheet.py", line 182, in values_get
    r = self.client.request("get", url, params=params)
  File "/var/lang/lib/python3.10/site-packages/gspread_pandas-3.2.2-py3.10.egg/gspread_pandas/util.py", line 305, in request
    raise error
  File "/var/lang/lib/python3.10/site-packages/gspread_pandas-3.2.2-py3.10.egg/gspread_pandas/util.py", line 292, in request
    return ClientV4.request(client, *args, **kwargs)
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/client.py", line 92, in request
    raise APIError(response)

Upgrade cttso-ica-to-pieriandx Docker container base

Old base image is causing some errors in deployment

<html>
<body>
<!--StartFragment-->

Installing into a conda env
--
306 | Traceback (most recent call last):
307 | File "/opt/conda/bin/mamba", line 7, in <module>
308 | from mamba.mamba import main
309 | File "/opt/conda/lib/python3.9/site-packages/mamba/mamba.py", line 51, in <module>
310 | from mamba import repoquery as repoquery_api
311 | File "/opt/conda/lib/python3.9/site-packages/mamba/repoquery.py", line 9, in <module>
312 | from mamba.utils import init_api_context, load_channels
313 | File "/opt/conda/lib/python3.9/site-packages/mamba/utils.py", line 17, in <module>
314 | from conda.core.index import _supplement_index_with_system, check_allowlist
315 | ImportError: cannot import name 'check_allowlist' from 'conda.core.index' (/opt/conda/lib/python3.9/site-packages/conda/core/index.py)

<!--EndFragment-->
</body>
</html>

Update lambda decision tree and documentation

Separate out 'is_validation_sample' from decision tree on which lambda (validation or RedCap/Clinical) path to go down.

Update cttso deploy readme to have the following diagrams and have decision logic to reflect these diagrams.

cttso-ica-to-pieriandx-Overview drawio (2)

cttso-ica-to-pieriandx-Choose Launch Pathway drawio (2)

Update CDK to 2.85.0

Currently sitting on version 2.39, may require a bit of work for beta modules like aws-batch-alpha

Cloudwatch logs not starting

CannotStartContainerError: Error response from daemon: failed to initialize logging driver: failed to create Cloudwatch log stream: AccessDeniedException: User: arn:aws:sts::843407916570:assumed-role/cttso-ica-to-pieriandx-de-cttsoicatopieriandxdevba-E60G

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.