Code Monkey home page Code Monkey logo

in-connect's Introduction

in-connect

LinkedIn provides a lot of really good information about how you are connected. They do not provide all the ways a normal user might want to look at their contacts. I was interested to see what people I had in my network that worked either at the same company, or held similar job titles. This project is designed to give a way to group our contacts by company or position.

I initially started with this article in ordrer to create a tree map of my data. Which was super easy to do, but the draw backs quickly limited my view into the data.

  • People do not always input the company name in the same format, for instance: ABC Corp, ABC Corporation, ABC Corp.
  • There are various punctuations / capitalizations of names.
  • Job positions are also the same way, you have Sr. Systems Engineer, Senior Systems Engineer, Sr. Sys Engineer. Which are all the same title, yet not the same text.

This task requires we cluster the different companies / titles with some kind of similarity test, and then use a clustering algorithm (dbscan or similar) to cluster the results using the distance metric.

Approach

The goal here was not to find the best algorithm for determining similarity, but try to get something that would work reasonably well, with some potential for contamination.

  • First filter each company to remove punctuation, special symbols and to lowercase each word (token).
  • Using dbscan with the cosine similarity metric cluster the companies.
  • Group and plot the results

During the trade offs, I did try using bigrams and Levenshtein Distance. I read up on a few others, but ultimately settled on this approach due to the simplistic nature of it and it working for my use case.

Requirements:

  • pip install plotly
  • pip install Faker
  • pip install pandas
  • pip install numpy
  • pip install sklearn

Usage

python python/cluster_companies.py Connections.csv

Generate Fake Connections

python python/generate_data.py mycontact.csv --num_contacts=1000 --num_companies=25

The result from running cluster_companies.py provides a bubble chart that gives you all of your contacts within a company when you mouse over each bubble.

image

You can see a "live" version of the plot here

Generate a Tree Map

python python/gen_treemap.py mycontact.csv --network_name="Fake Network"

You can view my "fake network" I generated to get an idea of how you might drill down into your data.

in-connect's People

Contributors

crroush avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.