Code Monkey home page Code Monkey logo

data-science-practice's Introduction

hey! ๐Ÿ‘‹ my name is John. nice to meet u!! ๐Ÿ˜„๐Ÿฅฐ

  • โญ๐Ÿ” I'm a SWE and social influencer passionate about teaching others to code! Check out my socials here. Use this template to make your own social site!
  • ๐Ÿ–ฅ๏ธโŒจ๏ธ I love teaching people to code! Learn with me through Ladderly and my educational TikTok page! ๐Ÿ˜„
  • ๐Ÿ’ผ๐Ÿ” Check out my portfolio here. Use my portfolio as a template to make your own!
  • ๐Ÿ”ฎ๐Ÿง™ Get your fortune told using my current recreational project called Futurecaster
  • ๐Ÿ“ซ๐Ÿ’ฌ Twitter is an easy way to reach out to me.

data-science-practice's People

Contributors

kpn703 avatar mbjoerkh avatar vandivier avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

kpn703

data-science-practice's Issues

Not Urgent.. Genderize minor edits

Big picture: Genderize has worked very well! Below are 2 suggested edits.

  1. A couple hundred entries (both fellows sponsors) have no predicted gender because the name we've genderized a middle initial or initial of a two-part first name, instead of a single first name.

Suggested solution: after comma that follows lastname; split text by space into separate names to be genderized. Example: Entry#318_4 "BRADY, J. Mark" we should genderize both "J." and "Mark" . This has the added benefit of solving problem 2)

  1. Occasionally there are multiple firstnames that actually predict differently. See Sponsor "Lee Robert Johnston", Lee yields a 75% chance of male, but Robert yields 100%. Jointly we can accept it as male, but depending on threshold we may not currently, but more importantly, there are cases which I believe assigns the wrong gender because of this.

P.S. I completely understand if you wished we'd done this in Stata so I could contribute more to the legwork with this... (I plan on "starting" the stata and analysis bit later today..)

P.S.S. Is the difference between the output and ordered-output files that the latter includes the non-adjacent entries?

Output: Missing Academic Year(s) for 62 Entries

See ANDREWS, John J. (Entry 81_5)
ARESHIDZE, Giorgi (89_5) etc.

I see some of these are institutions with "sub-campus" i.e. University of Texas, Austin. Maybe that's the problem? I will edit the source code for these anyway (replace comma with period)

Some missing entries

Not sure why, but I'm missing some entries even though they're in your ordered_output and I think I've done everything to keep it up to date? One instance is SEELEY, Luke. Not super urgent, but because I'll have to doublecheck all the cleaning I've done when the output is complete I'll hold off on the cleaning until we find out what's going on..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.