Code Monkey home page Code Monkey logo

twitter-graph's Introduction

twitter-graph

I spend a lot of time on Twitter, and over the years I have been following a wide variety of people: old friends, work colleagues, funny accounts, etc. Gradually, my timeline has become this messy mix that makes Twitter so enjoyable.

Wouldn't it be be nice, though, to have some hindsight and perspective on what's actually going on?

Contents | Example | Usage | References | Credits

Behold: the graph of my Twitter friends

Friends

My twitter world litterally looks like a world map, which is fantastic!

Download

Original 1080p 2160p 4320p 8640p pdf
Labeled 1080p 2160p 4320p 8640p pdf
Hubs 1080p 2160p 4320p 8640p pdf

Clusters

By running a clustering algorithm [1, 2], several communities are automatically discovered:

  • #f00 the Machine Learning research community;
  • #00f French academia;
  • #0ff software engineers, mainly from my internship at Twitter, and silicon valley startups;
  • #0f0 the Drone community, from my time at Parrot;
  • #ff0 entertainment accounts: youtubers, cartoonists, video games.

As we zoom in closer, we can find additional smaller clusters:

  • #f90 the SequeL lab, where I am doing my PhD, and French researchers in theoretical ML
  • #b0b Anglo-Saxon academia;
  • #09f students and staff of Mines ParisTech, my university;
  • #cf5 French tech, startups and entrepreneurs

The size of the nodes represents which accounts are the most popular according to this graph.

Intuitively, a popular account is followed by many other popular accounts. Popularity is also related the probability of reaching a node by walking randomly in the graph. The PageRank algorithm [3], used in search engines, provides such a metric.

On this graph the results are reasonable: accounts with many followers such as @elonmusk, @ylecun and @snowden end up with a high PageRank. But the structure of the network also plays an important part, since by only relying on the number of followers, accounts such as @TheRealJimCarey, @RobertDowneyJr, @tomhanks would be very salient while they barely stand out in terms of PageRank.

Hubs

Instead of scaling the nodes by popularity, we can also look for nodes that are in-between several communities, and connect them together. It is measured by the Betweenness Centrality, which measures how often a node appears on shortest paths between nodes of the Network.

For instance, we see accounts that belong to both the AI/ML research and French academia stand out, like @freakonometrics and @bguedj, or people like @chr1sa who is in-between drones and Silicon Valley clusters.

Statistics

Statistics Value
Nodes 2406
Edges 107697
Diameter 9
Average path length 3.1
Average degree 44.65
Average clustering coefficient 0.234

The graph of my followers

Friends

The first thing we can notice is that this graph looks more clustered than the previous one, which is confirmed by a higher average clustering coefficient. Some clusters also seem to have disappeared, namely the #ff0 entertainment and #b0b Anglo-Saxon academia.

Downloads

Original 1080p 2160p 4320p 8640p pdf svg
Labeled 1080p 2160p 4320p 8640p pdf svg
Hubs 1080p 2160p 4320p 8640p pdf svg

Statistics

Statistics Value
Nodes 839
Edges 10614
Diameter 8
Average path length 3.4
Average degree 12.65
Average clustering coefficient 0.316

Usage

Step 1. Get the data

To get access to the Twitter API, you must first register on the Twitter Developer Portal. Then, create an app (with read permissions) and record your authentication keys in credentials.json.

Then, install requirements with pip3 install -r requirements.txt

and finally run the script. python3 fetch_data.py

Usage: fetch_data [options]

Options:
  -h --help              Show this screen.
  --screen-name <name>   Screen name of the user. By default, the account used for authentication to the API.
  --graph-nodes <type>   Nodes to consider in the graph: friends, followers or all. [default: followers].
  --edges-ratio <ratio>  Ratio of edges to export in the graph (sampled among non-mutuals). [default: 1].
  --credentials <file>   Path of the credentials for Twitter API [default: credentials.json].
  --cache <path>         Path of the user's friends cache [default: cache].
  --out <path>           Path of the graph files [default: out/graph].
  --stop-on-rate-limit   Stop fetching data and export the graph when reaching the rate limit of Twitter API.
  --run-http-server      Run an HTTP server to visualize the graph in you browser with d3.js.

The script will start by getting the list of your friends and followers, before going through these accounts one by one in order to build the edges of the graph.

Found 841 followers.
Found 2406 friends.
[1/2406] Fetching friends of @Mehdi_Moussaid
[2/2406] Fetching friends of @Inria_Lille
[3/2406] Fetching friends of @Limericking

Since Twitter limits the rate of its API to 15 requests per window of 15 minutes, this is going to take a while. In order to interrupt and resume the requests at any time, a very simple caching system immediately exports the requests results to a local json file.

KeyboardInterrupt

python3 fetch_data.py
[1/2406] @Mehdi_Moussaid found in cache.
[2/2406] @Inria_Lille found in cache.
[3/2406] @Limericking found in cache.
[4/2406] Fetching friends of @Ariane_lis

If you are too impatient and want to preview the graph with the data downloaded so far, use the --stop-on-rate-limit option.

The resulting graph will be exported to two .csv files containing the nodes and edges.

[4/2406] Fetching friends of @Ariane_lis
...but it failed. Error: [{'message': 'Rate limit exceeded', 'code': 88}]
You reached the rate limit. Disable --stop-on-rate-limit or try again later.
Successfully exported 2406 nodes to out\graph.nodes.csv.
Successfully exported 128 edges to out\graph.edges.csv.

Finally, note that you can skip an account by filling out the exclude.json file.

Step 2. (optional) Visualize with d3.js

Once exported, the graph can be visualized directly in your browser with d3-force.

To that end, use the --run-http-server to automatically spawn an HTTP server at the end of the script.

[2406/2406] Fetching friends of @AdrienRahier
Successfully exported 2406 nodes to out\graph.nodes.csv.
Successfully exported 107697 edges to out\graph.edges.csv.
Serving HTTP at http://localhost:8000?nodes=out/graph.nodes.csv&edges=out/graph.edges.csv

Open the URL in your browser to see the results. While d3-force is lightweight and convenient, it can be a bit slow when the graph becomes too large (about 2000 nodes on my computer), and will only handle the graph layout. For more advanced customization options, you can turn to Gephi.

Step 2. (bis) Visualize with Gephi

Gephi is the leading visualization and exploration software for all kinds of graphs and networks. Gephi is open-source and free.

The User Guide contains all the information that you need, and I recommend that you read the Quick Start Guide. I will simply recall the main steps involved.

1. Import nodes

  • Start a new project;
  • go to the the Data Laboratory tab;
  • select Import Spreadsheet in the toolbar, and choose out/graph.nodes.csv;
  • in the General Options pane, select Import as: Nodes table, then click Next and Finish;
  • in the Import report window, select Append to existing workspace, and click OK.

A table of nodes should appear in the Data Laboratory.

2. Import edges

  • select again Import Spreadsheet in the toolbar, and choose out/graph.edges.csv;
  • in the General Options pane, select Import as: Edges table (not Matrix), then click Next and Finish;
  • in the Import report window, click on More options and uncheck Create-missing nodes and choose Edges merge strategy: Last;
  • select Append to existing workspace, and click OK.

A table of edges should appear in the Data Laboratory.

3. Choose a layout

  • Go back to the Overview tab. You should see the graph with a random square layout;
  • In the Layout window, select a force-based layout, and click Run. I use ForceAtlas2 [4];
  • You can tinker with the layout parameters, such as strength, Dissuade Hubs or Prevent Overlap.

The graph will reorganise so that connected nodes are closer, and you should see the emergence of clusters. Once the graph has converged, stop the simulation.

4. Set the nodes sizes

As mentioned above, I use PageRank [3] to set the nodes sizes.

  • First, the PageRank of nodes must be computed. In the Statistics window, locate Network Overview/PageRank and click Run. Keep default parameters and close the report;
  • In the Appearance window, select Nodes and Size in the toolbar. Then, select Ranking, PageRank. Select the range of sizes (I use 10-50), and click Apply.

The nodes labels can be enabled by clicking the black T icon in the bottom Overview toolbar. Then, the labels can be scaled with node size by selecting the A icon (Size mode) and choosing Node size.

4. Set the nodes colors

The nodes can be colored automatically in the Appearance/Nodes/Color tab, by either a Partition of attributes (e.g. verified or location), or by a Ranking of attributes (e.g. Degree, In-Degree, Out-Degree, followers_count, etc.).

In order to identify clusters, we must first run the Modularity algorithm from the Statistics window. Use the Resolution parameter to tune the desired number of clusters. Then, set the nodes colours in the Appearance window by Ranking of Modularity.

5. Render

Go to the Preview window, select the desired options, and Export to png, pdf or svg.

6. (optional) Exclude the very far nodes

In order to improve the final Render, it might be useful to exclude the very far nodes. To do this, go to the Overview tab, click on Filters (right panel). Open Topology and double-click on Giant Component. Just below, in the Queries panel, activate this filter by pressing the Filter button. You should now be able to see in the Context panel at the top the percentage of nodes visible.

7. (optional) Display Twitter Handles rather than names on the Graph

An easy solution for this is to go to the Data Laboratory tab. Click at the bottom on the Copy data to other column button. Select screen_name to Label. If you return now to the Overview tab you will see the Twitter's handle as labels.

References

Credits

This project was more than inspired by this excellent video by Mehdi Moussaïd 📺.

twitter-graph's People

Contributors

aegiz avatar eleurent avatar jilljenn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.