CART

Data source: Data were analysed from the POCLS, a prospective longitudinal cohort of 4126 children aged 0-17 years who entered the OOHC system (i.e., on interim care and protection orders) in NSW Australia between 2010 and 2011 (i.e., POCLS population cohort).

Statistical analysis: CART analyses were conducted to identify subgroups of children, with similar levels of socio-emotional difficulties. First, we calculated frequencies and percentages to describe potential factors related to socio-emotional difficulties of children in OOHC in four groups: demographic characteristics, pre-care maltreatment, placement experiences and caregiver-related characteristics. Second, we conducted the bivariable logistic regression analyses to assess the association, measured as crude odds ratios (ORs) with 95% confidence intervals (CIs), between children’s socio-emotional difficulties (i.e., “clinical” vs. “non-clinical”) and individual risk factors in these four groups. Finally, we conducted the CART analyses (Breiman, Friedman, Stone, & Olshen, 1984) to identify subgroups of children, with similar levels of socio-emotional difficulties. We retained age at the interview and removed age at first entry into OOHC and duration in care from the model to avoid multicollinearity. The CART method utilises the if-then rules, partitioning a sample into progressively smaller but increasingly homogeneous subsets with regard to a given outcome, in this case, socio-emotional difficulties (Morgan, 2014). Gini index or Gini impurity was used to measure the classification error rate, that is, the probability of a particular feature being wrongly classified when it is randomly chosen (Timofeev, 2004). The algorithm starts by selecting the variable (root node) that provides the most efficient pathway to a decision (i.e., minimises error measures) and then splits the dataset into sub-groups. This process is applied recursively until either the sub-groups reach a minimum size (a sample size of 10 is commonly recommended) or until an improved capacity of explaining the variation of the outcome can be achieved by adding more factors (Therneau & Atkinson, 1997). At each partitioning, the node where the sample is split is referred to as the “parent node”, and the next nodes where the sample is further divided are called the “child nodes”. The final split or node, referred to as a “leaf”, generates the prevalence of the outcome (Morgan, 2014; Timofeev, 2004). For each split, the CART process imputes missing values and retains variables based on the impurity measures (Hayes, Usami, Jacobucci, & McArdle, 2015; Therneau & Atkinson, 1997). We randomly split the sample into a training set (80% of the full sample; n = 1230) and a validation sample (the remaining 20%) (Vabalas, Gowen, Poliakoff, & Casson, 2019). A cross-validation procedure was used to determine the classification accuracy of the model by repeating the tree classification process for the training sample sets. The accuracy rate (i.e. measured as area under the curve (AUC) representing the overall percentage of correctly identifying the child with clinical socio-emotional difficulties from the child with no clinical socio-emotional difficulties in a randomly selected set of two children), sensitivity (i.e. true positive: the ability to correctly identify children who had clinical socio-emotional difficulties) and specificity (i.e. true negative: the ability to correctly identify children who had no clinical socio-emotional difficulties) were calculated to evaluate the model classification accuracy.

Due to the sensitivity of the data this script doesn't contain data

yalemzewod / cart Goto Github PK

cart's Introduction

CART

cart's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent