Code Monkey home page Code Monkey logo

daedalus's People

Contributors

michaelkotrous avatar

Stargazers

 avatar

Watchers

 avatar

daedalus's Issues

NULL does not import into Stata nicely

Adding the NULL values corrected problems importing the data into R, but Stata treats it as a string instead of a missing value, which results in variables that need to be treated as integers being treated as strings.

Potential fixes:

  1. Add option to aircra.sh that allows user to specify if empty fields should be labeled NULL or left empty.
  2. Post Stata snippet in README.md for do file that converts NULL to missing and variables to the correct format for analysis.

Empty values throw errors with import into R

The shell script is currently stripping the NULL values generated by the csv export from MySQL. This is causing an error in R because it is detecting the end of a row at column 203 instead of 204 when the final value is NULL in the MySQL table.

ERROR 1292 (22007) at line 225: Truncated incorrect date value: '12/16/1998 0:00:00'

Hi, thanks for creating this repository, I'm looking forward to sifting through this data.

Aircraft table is created, but getting an error while importing the csv file on the first datetime value it encounters in aircraft.csv.

It seems like the csv has an incorrect date format (12/16/1998 0:00:00 instead of 1998-12-16 00:00:00) although I tried hardcoding different date formats and still got the error.

MacOS, MySQL 8

Aircraft Keys do not reflect number of aircraft in event

I've assumed thus far that an event involving one aircraft would identify the aircraft with key equal to 1, two aircraft events would have aircraft keys 1 and 2, and so on. This is not always the case! I work a fix for my working dataset with R code, but a fix in the dataset build scripts in MySQL is needed.

My R fix is pasted below.

# occdata is a subset of the dataset produced by daedalus I'm currently working with
# accordingly, the next line is needed to recount the 'levels' of the event id variable present
occdata$ev_id <- factor(occdata$ev_id)

# create data frame of event ids by their aircraft keys
# note: 3 is the highest aircraft_key present in my dataset. Tweak this code to fit your slice of the dataset as necessary
aircraft_keys <- as.data.frame.matrix(table(occdata$ev_id,occdata$aircraft_key))
names(aircraft_keys)[1] <- "aircraft1"
names(aircraft_keys)[2] <- "aircraft2"
names(aircraft_keys)[3] <- "aircraft3"
aircraft_keys$ev_id <- row.names(aircraft_keys)
aircraft_keys <- aircraft_keys[c("ev_id", "aircraft1", "aircraft2", "aircraft3")]

# denote classes of errors in aircraft keys that will need to be fixed
aircraft_keys$error1 <- 0
aircraft_keys$error1[aircraft_keys$aircraft1 == 0 & aircraft_keys$aircraft2 == 1 & aircraft_keys$aircraft3 == 0] <- 1

aircraft_keys$error2 <- 0
aircraft_keys$error2[aircraft_keys$aircraft1 == 0 & aircraft_keys$aircraft2 == 1 & aircraft_keys$aircraft3 == 1] <- 1
aircraft_keys$error2[aircraft_keys$aircraft1 == 0 & aircraft_keys$aircraft2 == 0 & aircraft_keys$aircraft3 == 1] <- 1

aircraft_keys$error3 <- 0
aircraft_keys$error3[aircraft_keys$aircraft1 == 1 & aircraft_keys$aircraft2 == 0 & aircraft_keys$aircraft3 == 1] <- 1

aircraft_key_errors <- subset(aircraft_keys, error1 == 1 | error2 == 1 | error3 == 1)
aircraft_key_errors$ev_id <- factor(aircraft_key_errors$ev_id)

error1_ids <- aircraft_key_errors$ev_id[aircraft_key_errors$error1 == 1]
error2_ids <- aircraft_key_errors$ev_id[aircraft_key_errors$error2 == 1]
error3_ids <- aircraft_key_errors$ev_id[aircraft_key_errors$error3 == 1]

for(id in error1_ids) {
    occdata$aircraft_key[occdata$aircraft_key == 2 & occdata$ev_id == id] <- 1
}

for(id in error2_ids) {
    occdata$aircraft_key[occdata$aircraft_key == 3 & occdata$ev_id == id] <- 1
}

for(id in error3_ids) {
    occdata$aircraft_key[occdata$aircraft_key == 3 & occdata$ev_id == id] <- 2
}

Occurrences & Sequence Events data recoded in 2008

According to the NTSB data dictionary, cause factors were moved from the seq_of_events table to the findings table beginning in 2008. This explains the discontinuity in occurrence codes, phase of flight codes, etc. from Jan. 1, 2008 to present (see chart below)

It would be nice to integrate the finding table into the final dataset. Careful consideration will have to be made as to how the differences in the table structures will present themselves to analysis across the entire time series and conclusions that can be drawn.

occurrences_seq_events-vars

Exported dataset csv includes ASCII char SUB

This appears to have no affects on MacOS, but it prevents R on Windows from importing the entire database. Refer to line 187 of aircraft-GAaccidents-final.csv for first occurrence of this character, and to Wikipedia for explanation of meaning behind the character.

Not sure at which point in the conversion process this problem is being introduced. The character does not appear in the raw csv files included in directory ntsb_mdb_export. Perhaps explicitly setting the encoding of the exported csv will strip this character from the dataset.

Consistent Use of `-p` flag

My custom mysql script defaults to requiring password for user, whereas mysql commands, like mysqldump, default to not requiring a password unless the -p flag is specified. Modifying the script to default to no password and using the -p flag to prompt for a password would make my script consistent with mysql commands, which will make the tool more user friendly.

Two variables not concatenating correctly

Columns modifier_code and person_code need to concatenate multiple values when more than one is present for a given occurrence in the dataset. This is caused by not encoding the variables correctly in the aircraft and occurrences tables.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.