Code Monkey home page Code Monkey logo

dsc-network-clustering-lab's Introduction

Network Clustering - Lab

Introduction

In this lab you'll practice your clustering and visualization skills to investigate stackoverflow! Specifically, the dataset you'll be investigating examines tags on stackoverflow. With this, you should be able to explore some of the related technologies currently in use by developers.

Objectives

In this lab you will:

  • Make visualizations of clusters and gain insights about how the clusters have formed

Load the Dataset

Load the data from the 'stack-overflow-tag-network/stack_network_links.csv' file. For now, simply load the file as a standard pandas DataFrame.

# Your code here

Transform the Dataset into a Network Graph using NetworkX

Transform the dataset from a Pandas DataFrame into a NetworkX graph.

# Your code here

Create an Initial Graph Visualization

Next, create an initial visualization of the network.

# Your code here

Perform an Initial Clustering using k-clique Clustering

Begin to explore the impact of using different values of k.

# Your code here

Visualize The Clusters Produced from the K-Clique Algorithm

Level-Up: Experiment with different nx.draw() settings. See the draw documentation here for a full list. Some recommended settings that you've previewed include the position parameter pos, with_labels=True, node_color, alpha, node_size, font_weight and font_size. Note that nx.spring_layout(G) is particularly useful for laying out a well formed network. With this, you can pass in parameters for the relative edge distance via k and set a random_seed to have reproducible results as in nx.spring_layout(G, k=2.66, seed=10). For more details, see the spring_layout documentation here.

# Your code here
# Your code here

Perform an Alternative Clustering Using the Girvan-Newman Algorithm

Recluster the network using the Girvan-Newman algorithm. Remember that this will give you a list of cluster lists corresponding to the clusters that from removing the top $n$ edges according to some metric, typically edge betweenness.

# Your code here

Create a Visualization Wrapper

Now that you have an idea of how splintered the network becomes based on the number of edges removed, you'll want to examine some of the subsequent groups that gradually break apart. Since the network is quiet complex to start with, using subplots is not a great option; each subplot would be too small to accurately read. Create a visualization function plot_girvan_newman(G, clusters) which takes a NetworkX graph object as well as one of the clusters from the output of the Girvan-Newman algorithm above and plots the network with a unique color for each cluster.

Level-Up: Experiment with different nx.draw() settings. See the draw documentation here for a full list. Some recommended settings that you've previewed include the position parameter pos, with_labels=True, node_color, alpha, node_size, font_weight and font_size. Note that nx.spring_layout(G) is particularly useful for laying out a well formed network. With this, you can pass in parameters for the relative edge distance via k and set a random_seed to have reproducible results as in nx.spring_layout(G, k=2.66, seed=10). For more details, see the spring_layout documentation here.

def plot_girvan_newman(G, clusters):
    # Your code here 
    pass

Visualize the Various Clusters that Form Throughout the Girvan-Newman Algorithm

Use your function to visualize the various clusters that form throughout the Girvan-Newman algorithm as you remove more and more edges from the network.

# Your code here

Cluster Decay Rate

Create a visual to help yourself understand the rate at which clusters of this network formed versus the number of edges removed.

Level-Up: Based on your graphic, what would you predict is an appropriate number of clusters?

# Your code here

Choose a Clustering

Now that you have generated various clusters within the overall network, which do you think is the most appropriate or informative?

# Your code/response here

Summary

In this lab you practice using the k-clique and Girvan-Newman methods for clustering. Additionally, you may have also gotten a better sense of some of the current technological landscape. As you can start to see, network clustering provides you with powerful tools to further subset large networks into smaller constituencies allowing you to dig deeper into their particular characteristics.

dsc-network-clustering-lab's People

Contributors

mathymitchell avatar loredirick avatar sumedh10 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.