Comments (9)
I have attached the sample data and queries for reference.
from fhir-data-pipes.
A few points:
- Does that make sense to write FHIRPath statements for specific questions/answers that you need from the QR? I mean, instead of extracting all
item
s into a single column, separate items, e.g., based on theirlinkId
using constraints on the FHIRPath. - In general, loading most data in all resources of a specific type into a Pandas dataframe is not a great idea as you may hit memory limit issues if the data gets too big. It is probably better to filter-out as much data as possible using the FHIRPath constraints when creating FHIR-views.
- Have you tried writing direct SQL queries against the Parquet files, similar to the pattern suggested here? That doesn't directly solve the specific problem you mentioned but I am curious if the flattening ideas suggested there suit your needs.
from fhir-data-pipes.
- Actually this QR table will be created with the necessary columns and then it will be joined with Encounter table. On the final table we will be running expressions to get the values. Can I run FHIRPATH statements directly on the Parquet files? I have tried running FHIRPath statements on parquet files and it was not working.
- Yes I have tried writing direct SQL queries against the Parquet files using DBeaver but facing issues to create columns for nested items.
from fhir-data-pipes.
Re 1: Well a subset of FHIRPath functions are supported, that's the way that we define each "column" in a view. For example, take a look at examples here where the first()
function of FHIRPath is used to choose one member of a collection. Are you saying that even simple examples like first()
does not work on Parquet+Spark OR is it more advanced FHIRPaths that don't work? If it is the latter, can you provide some examples?
from fhir-data-pipes.
Ok, So when I create a dataframe using
QuestionnaireResponse_df = runner.to_dataframe(
qrs.select(
{
"qrs_id": qrs.id,
"qrs_tag": qrs.meta.tag.code,
"qrs_patient_id":qrs.subject.reference,
"qrs_encounter_id": qrs.encounter.reference,
"qrs_questionnaire": qrs.questionnaire,
"qrs_item":qrs.item
}
)
)
when I do QuestionnaireResponse_df.iloc[0]['item']
it gives me this nested item as a string value
'[{"id":null,"linkId":"vitals","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"3.1","definition":null,"text":"ANC service provided at *"},{"id":null,"linkId":"service-type-details","definition":null,"text":null},{"id":null,"linkId":"machine-available","definition":null,"text":"Is BP Machine Available? *"},{"id":null,"linkId":"3.2","definition":null,"text":null},{"id":null,"linkId":"3.3","definition":null,"text":null},{"id":null,"linkId":"3.4","definition":null,"text":null},{"id":null,"linkId":"3.5","definition":null,"text":null},{"id":null,"linkId":"3.6","definition":null,"text":"Blood Group *"},{"id":null,"linkId":"3.10","definition":null,"text":null},{"id":null,"linkId":"3.11","definition":null,"text":"Rh factor *"},{"id":null,"linkId":"3.12","definition":null,"text":null}]},{"id":null,"linkId":"counselling","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"4.1","definition":null,"text":"Was counselling done for the following"},{"id":null,"linkId":"4.2","definition":null,"text":null},{"id":null,"linkId":"4.4","definition":null,"text":null},{"id":null,"linkId":"4.6","definition":null,"text":null}]},{"id":null,"linkId":"syphilis","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"5.1.0","definition":null,"text":null},{"id":null,"linkId":"5.3.0","definition":null,"text":null}]},{"id":null,"linkId":"hepatitis-B","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"6.1.0","definition":null,"text":null},{"id":null,"linkId":"6.3.0","definition":null,"text":null}]},{"id":null,"linkId":"hepatitis-C","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"7.1.0","definition":null,"text":null},{"id":null,"linkId":"7.3.0","definition":null,"text":null}]},{"id":null,"linkId":"hiv","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"8.1.0","definition":null,"text":null}]},{"id":null,"linkId":"malaria","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"malaria-1.0","definition":null,"text":null},{"id":null,"linkId":"malaria-3.0","definition":null,"text":null}]},{"id":null,"linkId":"SickleCellTest","definition":null,"text":"Sickle Cell and HbsAg test","answer":null,"item":[{"id":null,"linkId":"9.1.0","definition":null,"text":null},{"id":null,"linkId":"9.3.0","definition":null,"text":null}]},{"id":null,"linkId":"hematology","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"10.1.0","definition":null,"text":null}]},{"id":null,"linkId":"gestational-diabetes","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"11.1","definition":null,"text":null}]},{"id":null,"linkId":"urinalysis","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"12.1.0","definition":null,"text":null}]},{"id":null,"linkId":"Foetal-Examination","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"13.1","definition":null,"text":"Foetal Movement"},{"id":null,"linkId":"13.3","definition":null,"text":null},{"id":null,"linkId":"13.5","definition":null,"text":null},{"id":null,"linkId":"13.6","definition":null,"text":null},{"id":null,"linkId":"13.7","definition":null,"text":"Foetal Position"},{"id":null,"linkId":"13.11","definition":null,"text":null}]},{"id":null,"linkId":"hemanitic-calcium-supplements","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"15.1.0","definition":null,"text":null},{"id":null,"linkId":"15.2.0","definition":null,"text":null}]},{"id":null,"linkId":"llin-and-ipt-test","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"16.1.0","definition":null,"text":null},{"id":null,"linkId":"16.2.0","definition":null,"text":"Was IPT given?"}]},{"id":null,"linkId":"tetanus-diptheria","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"17.1.0","definition":null,"text":"Was Td given?"}]},{"id":null,"linkId":"associated-problems","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"18.1.0","definition":null,"text":"Associated problems"}]},{"id":null,"linkId":"previous-medical-history","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"19.1.0","definition":null,"text":"Previous medical history"}]},{"id":null,"linkId":"outcome-of-visit","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"20.1","definition":null,"text":null}]}]'
while creating the dataframe itself when I change to list or json system restarts or system goes unresponsive
QuestionnaireResponse_df = runner.to_dataframe(
qrs.select(
{
"qrs_id": qrs.id,
"qrs_tag": qrs.meta.tag.code,
"qrs_patient_id":qrs.subject.reference,
"qrs_encounter_id": qrs.encounter.reference,
"qrs_questionnaire": qrs.questionnaire,
"qrs_item":list(qrs.item) // json.loads(qrs.item)
}
)
)
from fhir-data-pipes.
Update: what I meant by FHIRPath constraints was, for example, like this:
qr.select(
{
"id": qr.id,
"answer1_0": qr.item.where(qr.item.linkId=='1.0').answer.first().value.ofType('coding').code,
}
)
This works and the advantage is that it picks a specific answer and assigns it to a single "column" so you don't need to deal with json/arrays, etc. However, when I tried this with qr.item.item
it failed. It seems to be a bug in the fhir-views code. I have informed the team and they are looking into it.
In the meantime doing the same with SQL is probably your best bet. It also avoids loading everything into a dataframe in memory.
from fhir-data-pipes.
Update: The mentioned bug with qr.item.item
was fixed by this commit (thanks to @rbrush) but then the current Spark runner implementation failed at generating proper SQL for constraints on qr.item.item
. This is another issue which we need to look into. However, I like to suggest to reconsider doing direct SQL in the meantime. Here are some examples that hopefully should help:
The basic idea is to EXPLODE
repeated columns to get a flat view and then apply required constraints. In your case, because you have answer
fields at different levels, (e.g., item.item.answer
vs item.item.item.answer
), this is a little bit trickier but still doable. First, if you try this query (please pay attention to whether you have a _
in your table name or not):
SELECT QR.id,
(CASE item_2.linkId WHEN '1.1' THEN item_2.answer[0].value.coding.code ELSE NULL END) AS answer_1_1,
(CASE item_2.linkId WHEN '3.1' THEN item_2.answer[0].value.coding.code ELSE NULL END) AS answer_3_1,
(CASE item_3.linkId WHEN '1.2.5' THEN item_3.answer[0].value.coding.code ELSE NULL END) AS answer_1_2_5,
(CASE item_4.linkId WHEN 'v-value' THEN item_4.answer[0].value.coding.code ELSE NULL END) AS answer_v_value
FROM questionnaire_response AS QR
LATERAL VIEW OUTER EXPLODE(QR.item) AS item_1
LATERAL VIEW OUTER EXPLODE(item_1.item) AS item_2
LATERAL VIEW OUTER EXPLODE(item_2.item) AS item_3
LATERAL VIEW OUTER EXPLODE(item_3.item) AS item_4
you will see that for each QR.id, you get many rows. In the four answer columns, there are a lot of NULL values but there are also the corresponding answers. So we just need to aggregate these as follows:
SELECT QR.id,
FIRST((CASE item_2.linkId WHEN '1.1' THEN item_2.answer[0].value.coding.code ELSE NULL END), true) AS answer_1_1,
FIRST((CASE item_2.linkId WHEN '3.1' THEN item_2.answer[0].value.coding.code ELSE NULL END), true) AS answer_3_1,
FIRST((CASE item_3.linkId WHEN '1.2.5' THEN item_3.answer[0].value.coding.code ELSE NULL END), true) AS answer_1_2_5,
FIRST((CASE item_4.linkId WHEN 'v-value' THEN item_4.answer[0].value.coding.code ELSE NULL END), true) AS answer_v_value
FROM questionnaire_response AS QR
LATERAL VIEW OUTER EXPLODE(QR.item) AS item_1
LATERAL VIEW OUTER EXPLODE(item_1.item) AS item_2
LATERAL VIEW OUTER EXPLODE(item_2.item) AS item_3
LATERAL VIEW OUTER EXPLODE(item_3.item) AS item_4
GROUP BY QR.id;
We are using the FIRST
aggregation function, dropping NULL values. I am not sure about the performance of this approach but it at least unblocks you.
from fhir-data-pipes.
Just to clarify re performance: A major concern is the GROUP BY
part which is not really required to achieve what we are doing here; i.e., all answer
elements should be extracted in the context of a single row without requiring a table-wide GROUP BY
. I believe that's how fhir-views work but its generated SQL is too complex, hence this simpler approach.
from fhir-data-pipes.
Thanks @bashir I have tested this SQL queries it is working fine, we can capture the columns we need and create a flat table. Is it possible to write a query that can loop through all the objects and in each objects there might be nested items and answer so if answer is not null then can we create a column name from the value of text key and its column value be taken from answer array.
from fhir-data-pipes.
Related Issues (20)
- Setup Sonar for developers to identify bugs through static code analysis
- Sonar detected bugs on metrics package of pipeline-controller HOT 1
- Fix Code smells in pipeline-controller
- Investigate Cloud Build flakiness because of dockerised pipeline runs HOT 4
- compose-controller-spark-sql-single.yaml fails to launch HOT 2
- Remove `hiveJdbcDriver` configuration property and unify driver loading
- New recurring Thrift server errors HOT 3
- Upgrade HAPI FHIR version
- Fix issues with `compose-controller-spark-sql-single.yaml` HOT 6
- Do not display the `View Raw Logs` button in case of no logs
- In the HAPI JDBC mode, when resources are created with PUT, resource Id's mismatch between the original FHIR resource and the corresponding resource in the parquet file. HOT 4
- Missing extension in the parquet file compared to source FHIR resource. HOT 7
- How to handle DataFormatException while parsing JSON encoded FHIR content HOT 1
- Make recursive depth of Bunsen a configuration parameter. HOT 1
- The `answer` fields are dropped in QuestionnaireResponse due to recursive structure.
- Investigate high memory utilisation for the pipeline controller and provide configurations to control them HOT 6
- Cannot override fhirServerUserName or fhirServerPassword as it keeps defaulting HOT 6
- Help needed running a fhri pipeline controller HOT 6
- Gracefully handle resource types with no instances in the FHIR server. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fhir-data-pipes.