hey! ๐ my name is John. nice to meet u!! ๐๐ฅฐ
โญ๐ I'm a SWE and social influencer passionate about teaching others to code! Check out my socials here. Use this template to make your own social site!
๐ฅ๏ธโจ๏ธ I love teaching people to code! Learn with me through Ladderly and my educational TikTok page! ๐
Are we missing academic years for the "second entry" of this guy because he's missing a degree year for his MA degree? Or is something else going on? In source we have academic years 1958-1959 and 1959-1960.
Big picture: Genderize has worked very well! Below are 2 suggested edits.
A couple hundred entries (both fellows sponsors) have no predicted gender because the name we've genderized a middle initial or initial of a two-part first name, instead of a single first name.
Suggested solution: after comma that follows lastname; split text by space into separate names to be genderized. Example: Entry#318_4 "BRADY, J. Mark" we should genderize both "J." and "Mark" . This has the added benefit of solving problem 2)
Occasionally there are multiple firstnames that actually predict differently. See Sponsor "Lee Robert Johnston", Lee yields a 75% chance of male, but Robert yields 100%. Jointly we can accept it as male, but depending on threshold we may not currently, but more importantly, there are cases which I believe assigns the wrong gender because of this.
P.S. I completely understand if you wished we'd done this in Stata so I could contribute more to the legwork with this... (I plan on "starting" the stata and analysis bit later today..)
P.S.S. Is the difference between the output and ordered-output files that the latter includes the non-adjacent entries?
Big picture: The vast differences on my side leading up to our conversation yesterday was caused by different line endings. This solved that problem "git config --global core.autocrlf true". Could it be the cause of the above?
See ANDREWS, John J. (Entry 81_5)
ARESHIDZE, Giorgi (89_5) etc.
I see some of these are institutions with "sub-campus" i.e. University of Texas, Austin. Maybe that's the problem? I will edit the source code for these anyway (replace comma with period)
Not sure why, but I'm missing some entries even though they're in your ordered_output and I think I've done everything to keep it up to date? One instance is SEELEY, Luke. Not super urgent, but because I'll have to doublecheck all the cleaning I've done when the output is complete I'll hold off on the cleaning until we find out what's going on..