Code Monkey home page Code Monkey logo

Comments (9)

suriyan3 avatar suriyan3 commented on July 24, 2024

I have attached the sample data and queries for reference.

from fhir-data-pipes.

bashir2 avatar bashir2 commented on July 24, 2024

A few points:

  • Does that make sense to write FHIRPath statements for specific questions/answers that you need from the QR? I mean, instead of extracting all items into a single column, separate items, e.g., based on their linkId using constraints on the FHIRPath.
  • In general, loading most data in all resources of a specific type into a Pandas dataframe is not a great idea as you may hit memory limit issues if the data gets too big. It is probably better to filter-out as much data as possible using the FHIRPath constraints when creating FHIR-views.
  • Have you tried writing direct SQL queries against the Parquet files, similar to the pattern suggested here? That doesn't directly solve the specific problem you mentioned but I am curious if the flattening ideas suggested there suit your needs.

from fhir-data-pipes.

suriyan3 avatar suriyan3 commented on July 24, 2024
  1. Actually this QR table will be created with the necessary columns and then it will be joined with Encounter table. On the final table we will be running expressions to get the values. Can I run FHIRPATH statements directly on the Parquet files? I have tried running FHIRPath statements on parquet files and it was not working.
  2. Yes I have tried writing direct SQL queries against the Parquet files using DBeaver but facing issues to create columns for nested items.

from fhir-data-pipes.

bashir2 avatar bashir2 commented on July 24, 2024

Re 1: Well a subset of FHIRPath functions are supported, that's the way that we define each "column" in a view. For example, take a look at examples here where the first() function of FHIRPath is used to choose one member of a collection. Are you saying that even simple examples like first() does not work on Parquet+Spark OR is it more advanced FHIRPaths that don't work? If it is the latter, can you provide some examples?

from fhir-data-pipes.

suriyan3 avatar suriyan3 commented on July 24, 2024

Ok, So when I create a dataframe using

QuestionnaireResponse_df = runner.to_dataframe(
qrs.select(
{
"qrs_id": qrs.id,
"qrs_tag": qrs.meta.tag.code,
"qrs_patient_id":qrs.subject.reference,
"qrs_encounter_id": qrs.encounter.reference,
"qrs_questionnaire": qrs.questionnaire,
"qrs_item":qrs.item
}
)
)

when I do QuestionnaireResponse_df.iloc[0]['item']
it gives me this nested item as a string value

'[{"id":null,"linkId":"vitals","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"3.1","definition":null,"text":"ANC service provided at *"},{"id":null,"linkId":"service-type-details","definition":null,"text":null},{"id":null,"linkId":"machine-available","definition":null,"text":"Is BP Machine Available? *"},{"id":null,"linkId":"3.2","definition":null,"text":null},{"id":null,"linkId":"3.3","definition":null,"text":null},{"id":null,"linkId":"3.4","definition":null,"text":null},{"id":null,"linkId":"3.5","definition":null,"text":null},{"id":null,"linkId":"3.6","definition":null,"text":"Blood Group *"},{"id":null,"linkId":"3.10","definition":null,"text":null},{"id":null,"linkId":"3.11","definition":null,"text":"Rh factor *"},{"id":null,"linkId":"3.12","definition":null,"text":null}]},{"id":null,"linkId":"counselling","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"4.1","definition":null,"text":"Was counselling done for the following"},{"id":null,"linkId":"4.2","definition":null,"text":null},{"id":null,"linkId":"4.4","definition":null,"text":null},{"id":null,"linkId":"4.6","definition":null,"text":null}]},{"id":null,"linkId":"syphilis","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"5.1.0","definition":null,"text":null},{"id":null,"linkId":"5.3.0","definition":null,"text":null}]},{"id":null,"linkId":"hepatitis-B","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"6.1.0","definition":null,"text":null},{"id":null,"linkId":"6.3.0","definition":null,"text":null}]},{"id":null,"linkId":"hepatitis-C","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"7.1.0","definition":null,"text":null},{"id":null,"linkId":"7.3.0","definition":null,"text":null}]},{"id":null,"linkId":"hiv","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"8.1.0","definition":null,"text":null}]},{"id":null,"linkId":"malaria","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"malaria-1.0","definition":null,"text":null},{"id":null,"linkId":"malaria-3.0","definition":null,"text":null}]},{"id":null,"linkId":"SickleCellTest","definition":null,"text":"Sickle Cell and HbsAg test","answer":null,"item":[{"id":null,"linkId":"9.1.0","definition":null,"text":null},{"id":null,"linkId":"9.3.0","definition":null,"text":null}]},{"id":null,"linkId":"hematology","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"10.1.0","definition":null,"text":null}]},{"id":null,"linkId":"gestational-diabetes","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"11.1","definition":null,"text":null}]},{"id":null,"linkId":"urinalysis","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"12.1.0","definition":null,"text":null}]},{"id":null,"linkId":"Foetal-Examination","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"13.1","definition":null,"text":"Foetal Movement"},{"id":null,"linkId":"13.3","definition":null,"text":null},{"id":null,"linkId":"13.5","definition":null,"text":null},{"id":null,"linkId":"13.6","definition":null,"text":null},{"id":null,"linkId":"13.7","definition":null,"text":"Foetal Position"},{"id":null,"linkId":"13.11","definition":null,"text":null}]},{"id":null,"linkId":"hemanitic-calcium-supplements","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"15.1.0","definition":null,"text":null},{"id":null,"linkId":"15.2.0","definition":null,"text":null}]},{"id":null,"linkId":"llin-and-ipt-test","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"16.1.0","definition":null,"text":null},{"id":null,"linkId":"16.2.0","definition":null,"text":"Was IPT given?"}]},{"id":null,"linkId":"tetanus-diptheria","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"17.1.0","definition":null,"text":"Was Td given?"}]},{"id":null,"linkId":"associated-problems","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"18.1.0","definition":null,"text":"Associated problems"}]},{"id":null,"linkId":"previous-medical-history","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"19.1.0","definition":null,"text":"Previous medical history"}]},{"id":null,"linkId":"outcome-of-visit","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"20.1","definition":null,"text":null}]}]'

while creating the dataframe itself when I change to list or json system restarts or system goes unresponsive
QuestionnaireResponse_df = runner.to_dataframe(
qrs.select(
{
"qrs_id": qrs.id,
"qrs_tag": qrs.meta.tag.code,
"qrs_patient_id":qrs.subject.reference,
"qrs_encounter_id": qrs.encounter.reference,
"qrs_questionnaire": qrs.questionnaire,
"qrs_item":list(qrs.item) // json.loads(qrs.item)
}
)
)

from fhir-data-pipes.

bashir2 avatar bashir2 commented on July 24, 2024

Update: what I meant by FHIRPath constraints was, for example, like this:

qr.select(
        {
            "id": qr.id,
            "answer1_0": qr.item.where(qr.item.linkId=='1.0').answer.first().value.ofType('coding').code,
        }
    )

This works and the advantage is that it picks a specific answer and assigns it to a single "column" so you don't need to deal with json/arrays, etc. However, when I tried this with qr.item.item it failed. It seems to be a bug in the fhir-views code. I have informed the team and they are looking into it.

In the meantime doing the same with SQL is probably your best bet. It also avoids loading everything into a dataframe in memory.

from fhir-data-pipes.

bashir2 avatar bashir2 commented on July 24, 2024

Update: The mentioned bug with qr.item.item was fixed by this commit (thanks to @rbrush) but then the current Spark runner implementation failed at generating proper SQL for constraints on qr.item.item. This is another issue which we need to look into. However, I like to suggest to reconsider doing direct SQL in the meantime. Here are some examples that hopefully should help:

The basic idea is to EXPLODE repeated columns to get a flat view and then apply required constraints. In your case, because you have answer fields at different levels, (e.g., item.item.answer vs item.item.item.answer), this is a little bit trickier but still doable. First, if you try this query (please pay attention to whether you have a _ in your table name or not):

SELECT QR.id,
  (CASE item_2.linkId WHEN '1.1' THEN item_2.answer[0].value.coding.code ELSE NULL END) AS answer_1_1,
  (CASE item_2.linkId WHEN '3.1' THEN item_2.answer[0].value.coding.code ELSE NULL END) AS answer_3_1,
  (CASE item_3.linkId WHEN '1.2.5' THEN item_3.answer[0].value.coding.code ELSE NULL END) AS answer_1_2_5,
  (CASE item_4.linkId WHEN 'v-value' THEN item_4.answer[0].value.coding.code ELSE NULL END) AS answer_v_value
FROM questionnaire_response AS QR
  LATERAL VIEW OUTER EXPLODE(QR.item) AS item_1
  LATERAL VIEW OUTER EXPLODE(item_1.item) AS item_2
  LATERAL VIEW OUTER EXPLODE(item_2.item) AS item_3
  LATERAL VIEW OUTER EXPLODE(item_3.item) AS item_4

you will see that for each QR.id, you get many rows. In the four answer columns, there are a lot of NULL values but there are also the corresponding answers. So we just need to aggregate these as follows:

SELECT QR.id,
  FIRST((CASE item_2.linkId WHEN '1.1' THEN item_2.answer[0].value.coding.code ELSE NULL END), true) AS answer_1_1,
  FIRST((CASE item_2.linkId WHEN '3.1' THEN item_2.answer[0].value.coding.code ELSE NULL END), true) AS answer_3_1,
  FIRST((CASE item_3.linkId WHEN '1.2.5' THEN item_3.answer[0].value.coding.code ELSE NULL END), true) AS answer_1_2_5,
  FIRST((CASE item_4.linkId WHEN 'v-value' THEN item_4.answer[0].value.coding.code ELSE NULL END), true) AS answer_v_value
FROM questionnaire_response AS QR
  LATERAL VIEW OUTER EXPLODE(QR.item) AS item_1
  LATERAL VIEW OUTER EXPLODE(item_1.item) AS item_2
  LATERAL VIEW OUTER EXPLODE(item_2.item) AS item_3
  LATERAL VIEW OUTER EXPLODE(item_3.item) AS item_4
GROUP BY QR.id;

We are using the FIRST aggregation function, dropping NULL values. I am not sure about the performance of this approach but it at least unblocks you.

from fhir-data-pipes.

bashir2 avatar bashir2 commented on July 24, 2024

Just to clarify re performance: A major concern is the GROUP BY part which is not really required to achieve what we are doing here; i.e., all answer elements should be extracted in the context of a single row without requiring a table-wide GROUP BY. I believe that's how fhir-views work but its generated SQL is too complex, hence this simpler approach.

from fhir-data-pipes.

suriyan3 avatar suriyan3 commented on July 24, 2024

Thanks @bashir I have tested this SQL queries it is working fine, we can capture the columns we need and create a flat table. Is it possible to write a query that can loop through all the objects and in each objects there might be nested items and answer so if answer is not null then can we create a column name from the value of text key and its column value be taken from answer array.

from fhir-data-pipes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.