Code Monkey home page Code Monkey logo

instagram-network_scraping_and_analysis's Introduction

Instagram-Network_scraping_and_analysis

Python script to scrape "Followers" and "Following" lists of an Instagram account and also the lists of all accounts in that "Following" list.

The script can be used to scrape the accounts in small batches. An adjacency list of the connections is created which is directly compatible with Networkx library. The adjList.txt file contains nodes represented the form of -

Node1 Node2 Node3 Node4
Node2 Node1 Node5

where Node1 follows Node2, Node3 and Node4. Similary Node2 follows Node1 and Node 5. Refer Networkx representation

Note: The script for scraping the Instagram web pages was created 2 years back and the CSS tags might have been changed

Requirements

  1. Python 3.x
  2. Mozilla Firefox browser Download
  3. Geckodriver Download
  4. Libraries
    1. Selenium webdriver
    2. Networkx

Directions for usage

  1. Run the scrapeMyAccount.py file first to scrape the list of followers and following of your account
  2. Once the followingLinks.txt is generated, run the scrapingFollowing.py file to scrape the "Following" and "Followers" list of the accounts you follow

Note

  1. Run the cells in in correct order. Run the Scraping in batches cell in scrapingFollowing.py only after the logging in is successful
  2. Scrape in small batches
  3. Instagram will temporarily disable your account if you log-in frequently. Check if the account is not disabled before scraping
  4. Disable headless mode in scrapingFollowing.py if something went wrong to troubleshoot

Example network graph

graoh1_yifan_communities The above directed graph portraying my Instagram network consisting of some 80,000 nodes in Yifan Hu layout was generated in Gephi. The nodes in this graph include the "Followers" and "Following" of all the accounts I follow (except accounts with more than 2000 followers). Communities present in the graph are marked by different colors.

Graph2_Atlas The above directed graph is a subset of the previous graph consisting only the accounts I follow and the accounts which follow me. There are noticeable demarcations among my highschool friends circle, middle school friends circle, college friends circle and meme pages. The graph is in ForceAtlas2 layout and was generated in Gephi.

The Instagram-Network-Analysis.ipynb contains analysis of my network graph with a few "Off-the-Shelf" functions in Networkx library.

instagram-network_scraping_and_analysis's People

Contributors

arjun-siva avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

instagram-network_scraping_and_analysis's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.