in-connect

LinkedIn provides a lot of really good information about how you are connected. They do not provide all the ways a normal user might want to look at their contacts. I was interested to see what people I had in my network that worked either at the same company, or held similar job titles. This project is designed to give a way to group our contacts by company or position.

I initially started with this article in ordrer to create a tree map of my data. Which was super easy to do, but the draw backs quickly limited my view into the data.

People do not always input the company name in the same format, for instance: ABC Corp, ABC Corporation, ABC Corp.
There are various punctuations / capitalizations of names.
Job positions are also the same way, you have Sr. Systems Engineer, Senior Systems Engineer, Sr. Sys Engineer. Which are all the same title, yet not the same text.

This task requires we cluster the different companies / titles with some kind of similarity test, and then use a clustering algorithm (dbscan or similar) to cluster the results using the distance metric.

Approach

The goal here was not to find the best algorithm for determining similarity, but try to get something that would work reasonably well, with some potential for contamination.

First filter each company to remove punctuation, special symbols and to lowercase each word (token).
Using dbscan with the cosine similarity metric cluster the companies.
Group and plot the results

During the trade offs, I did try using bigrams and Levenshtein Distance. I read up on a few others, but ultimately settled on this approach due to the simplistic nature of it and it working for my use case.

Requirements:

pip install plotly
pip install Faker
pip install pandas
pip install numpy
pip install sklearn

Usage

python python/cluster_companies.py Connections.csv

Generate Fake Connections

python python/generate_data.py mycontact.csv --num_contacts=1000 --num_companies=25

The result from running cluster_companies.py provides a bubble chart that gives you all of your contacts within a company when you mouse over each bubble.

You can see a "live" version of the plot here

Generate a Tree Map

python python/gen_treemap.py mycontact.csv --network_name="Fake Network"

You can view my "fake network" I generated to get an idea of how you might drill down into your data.

shell-company / in-connect Goto Github PK

in-connect's Introduction

in-connect

Approach

Requirements:

Usage

in-connect's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent