Code Monkey home page Code Monkey logo

instamancer's Introduction

Instamancer

Quality Coverage Speed NPM Dependencies Chat

Scrape Instagram's API with Puppeteer.


Notice: Instagram's Web UI and API now requires users to be logged in to access hashtag and account endpoints through a browser. As instamancer is designed to access publicly available data, it currently does not work as intended. Given that this change is unlikely to be reversed, Instamancer will remain unsupported and unmaintained indefinitely. Please use this pinned issue to discuss.


Instamancer is a new type of scraping tool that leverages Puppeteer's ability to intercept requests made by a webpage to an API.

Read more about how Instamancer works here.

Features

  • Scrape hashtags, users' posts, and individual posts
  • Download images, albums, and videos
  • Output JSON, CSV
  • Batch scraping
  • Search hashtags, users, and locations
  • API response validation
  • Upload files to S3 and depot
  • Plugins

Data

Metadata that Instamancer is able to gather from posts:

  • Text
  • Timestamps
  • Tagged users
  • Accessibility captions
  • Like counts
  • Comment counts
  • Images (Thumbnails, Dimensions, URLs)
  • Videos (URL, View count, Duration)
  • Comments (Timestamp, Text, Like count, User)
  • User (Username, Full name, Profile picture, Profile privacy)
  • Location (Name, Street, Zip code, City, Region, Country)
  • Sponsored status
  • Gating information
  • Fact checking information

Install

Linux

Enable user namespace cloning:

sysctl -w kernel.unprivileged_userns_clone=1

Or run without a sandbox:

# WARNING: unsafe
export NO_SANDBOX=true

See Puppeteer troubleshooting

Without downloading chromium

If you wish to install Instamancer without downloading chromium, enable the PUPPETEER_SKIP_CHROMIUM_DOWNLOAD environment variable before installation

export PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true

From NPM

npm install -g instamancer

If you're using root to install globally, use the following command to install the Puppeteer dependency

sudo npm install -g instamancer --unsafe-perm=true

From NPX

npx instamancer

From this repository

git clone https://github.com/ScriptSmith/instamancer.git
cd instamancer
npm install
npm run build
npm install -g

Usage

Command Line

$ instamancer
Usage: instamancer <command> [options]

Commands:
  instamancer hashtag [id]       Scrape a hashtag
  instamancer user [id]          Scrape a users posts
  instamancer post [ids]         Scrape a comma-separated list of posts
  instamancer search [query]     Perform a search of users, tags and places
  instamancer batch [batchfile]  Read newline-separated arguments from a file

Configuration
  --count, -c    Number of posts to download (0 for all)   [number] [default: 0]
  --full, -f     Retrieve full post data              [boolean] [default: false]
  --sleep, -s    Seconds to sleep between interactions     [number] [default: 2]
  --graft, -g    Enable grafting                       [boolean] [default: true]
  --browser, -b  Browser path. Defaults to the puppeteer version        [string]
  --sameBrowser  Use a single browser when grafting   [boolean] [default: false]

Download
  --download, -d      Save images from posts          [boolean] [default: false]
  --downdir           Download path       [default: "downloads/[endpoint]/[id]"]
  --video, -v         Download videos (requires full) [boolean] [default: false]
  --sync              Force download between requests [boolean] [default: false]
  --threads, -k       Parallel download / depot threads    [number] [default: 4]
  --waitDownload, -w  Download media after scraping   [boolean] [default: false]

Upload
  --bucket  Upload files to an AWS S3 bucket                            [string]
  --depot   Upload files to a URL with a PUT request (depot)            [string]

Output
  --file, -o       Output filename. '-' for stdout    [string] [default: "[id]"]
  --type, -t       Filetype   [choices: "csv", "json", "both"] [default: "json"]
  --mediaPath, -m  Add filepaths to _mediaPath        [boolean] [default: false]

Display
  --visible    Show browser on the screen             [boolean] [default: false]
  --quiet, -q  Disable progress output                [boolean] [default: false]

Logging
  --logging, -l    [choices: "none", "error", "info", "debug"] [default: "none"]
  --logfile      Log file name             [string] [default: "instamancer.log"]

Validation
  --strict  Throw an error on response type mismatch  [boolean] [default: false]

Plugins
  --plugin, -p  Use a plugin from the plugins directory    [array] [default: []]

Options:
  --help     Show help                                                 [boolean]
  --version  Show version number                                       [boolean]

Examples:
  instamancer hashtag instagood -fvd        Download all the available posts,
                                            and their media from #instagood
  instamancer user arianagrande --type=csv  Download Ariana Grande's posts to a
  --logging=info --visible                  CSV file with a non-headless
                                            browser, and log all events

Source code available at https://github.com/ScriptSmith/instamancer

Module

ES2018 Typescript example:

import {createApi, IOptions} from "instamancer"

const options: IOptions = {
    total: 10
};
const hashtag = createApi("hashtag", "beach", options);

(async () => {
    for await (const post of hashtag.generator()) {
        console.log(post);
    }
})();

Generator functions

import {createApi} from "instamancer"

createApi("hashtag", id, options);
createApi("user", id, options);
createApi("post", ids, options);
createApi("search", query, options);

Options

const options: Instamancer.IOptions = {
    // Total posts to download. 0 for unlimited
    total: number,

    // Run Chrome in headless mode
    headless: boolean,

    // Logging events
    logger: winston.Logger,

    // Run without output to stdout
    silent: boolean,

    // Time to sleep between interactions with the page
    sleepTime: number,

    // Throw an error if type validation has been failed
    strict: boolean,

    // Time to sleep when rate-limited
    hibernationTime: number,

    // Enable the grafting process
    enableGrafting: boolean,

    // Extract the full amount of information from the API
    fullAPI: boolean,

    // Use a proxy in Chrome to connect to Instagram
    proxyURL: string,

    // Location of the chromium / chrome binary executable
    executablePath: string,

    // Custom io-ts validator
    validator: Type<unknown>,

    // Custom plugins
    plugins: IPlugin[]
}

Comparison

A comparison of Instagram scraping tools. Please suggest more tools and criteria through a pull request.

To see a speed comparison, visit this page

Tool Hashtags Users Tagged posts Locations Posts Stories Login not required Private feeds Batch mode Plugins Command-line Library/Module Download media Download metadata Scraping method Daily builds Main language Speed ____________________________ License ____________________________ Last commit ____________________________ Open Issues ____________________________ Closed Issues ____________________________ Build status ____________________________ Test coverage ____________________________ Code quality ____________________________
Instamancer ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ Web API request interception ✔️ Typescript
Instaphyte ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ Web API simulation ✔️ Python
Instaloader ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ Web API simulation Python
Instalooter ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ Web API simulation Python
Instagram crawler ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ Web DOM reading Python
Instagram Scraper ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ Web API simulation Python
Instagram Private API ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ App and Web API simulation Python
Instagram PHP Scraper ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ Web API simulation PHP

instamancer's People

Contributors

dependabot[bot] avatar goooseman avatar karimkairo avatar scriptsmith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

instamancer's Issues

[BUG] Locations are not supported any more

Describe the bug
Instagram has made /explore/locations to be available only to logged-in users.

https://www.***.com/explore/locations/212988663 - here is the route, which is used by Instamancer. If you will open this route in your browser - it will ask to log in.

To Reproduce
instamancer location 212988663

OR

npm test -- -t "Library Classes"

Expected behavior
Location photos to be found

Output

 ● Library Classes › location

    expect(received).toBe(expected) // Object.is equality

    Expected: 10
    Received: 0

[BUG] Cannot Install Using Repository

Describe the bug
Instammacer installation fails as it is unable to find the cli.js file. I took a look at the folder, and I find a _cli.ts_file instead. Screenshot of error provided below:

image

Setup

  • OS: [Ubuntu 19.04]
  • Instamancer version [master]
  • Node version [v10.15.3]
  • NPM version (if applicable) [ 6.4.1]

Additional context
Add any other context about the problem here.

I'm not getting the latest posts

I want to get all the comments for the latest x posts for a user.

My sample code...

const { createApi } = require('instamancer');

const go = async () => {
  const posts = createApi('user', 'loveabrahamhicks', {
    total: 8
  });

  for await (const { node } of posts.generator()) {
    console.log(`\nPost: ${node.edge_media_to_caption.edges[0].node.text}`);
    console.log(`Thumbnail: ${node.thumbnail_src}`);
    console.log(`Posted on: ${new Date(node.taken_at_timestamp * 1000)}`);
    console.log(`Total comments: ${node.edge_media_to_comment.count}`);
    for (const edge of node.edge_media_to_comment.edges) {
      const comment = edge.node;
      console.log(
        `Comment "${comment.text}" made by ${comment.owner.username}`
      );
    }
    console.log('\n');
  }
};

go();

This generates..

Post: 💖
Thumbnail: https://scontent-syd2-1.cdninstagram.com/v/t51.2885-15/sh0.08/e35/s640x640/115992768_285551976032020_8589796504624099624_n.jpg?_nc_ht=scontent-syd2-1.cdninstagram.com&_nc_cat=104&_nc_ohc=2L61bDz9emkAX_vQ672&oh=a8ac720f59ed942b4dbf31e33f0ba96c&oe=5F5BBE00
Posted on: Wed Jul 29 2020 03:55:43 GMT+1200 (New Zealand Standard Time)
Total comments: 26
Comment "Gratitude and appreciation are where it’s at 😍" made by luxestyle_ls
Comment "Did u know that Abraham Hicks is a woman?" made by paeson19
Comment "✨💖" made by marisol_eisden
Comment "🕯🕯🕯🕯🕯" made by joannakatiee

Post: 💓
Thumbnail: https://scontent-syd2-1.cdninstagram.com/v/t51.2885-15/sh0.08/e35/s640x640/115932254_388410075472267_4214569155118894660_n.jpg?_nc_ht=scontent-syd2-1.cdninstagram.com&_nc_cat=107&_nc_ohc=Gf9CeKdkSlIAX9E93Bf&oh=b0593c37835c2c153a4097298631ec1b&oe=5F5AC190
Posted on: Tue Jul 28 2020 03:49:02 GMT+1200 (New Zealand Standard Time)
Total comments: 36
Comment "😍😍" made by arameshedell2525
Comment "Always keep the focus on what your heart desires and work towards them. The beautiful part is knowing you silenced the limiting beliefs and overcame any obstacles. Your unstoppable 💫💖" made by iamshaleensami
Comment "❤️❤️❤️" made by nonnagabbana
Comment "Yesssssss.⭐️" made by candyl_light

Post: 💝
Thumbnail: https://scontent-syd2-1.cdninstagram.com/v/t51.2885-15/sh0.08/e35/s640x640/115750073_926134387882322_8316503760834510050_n.jpg?_nc_ht=scontent-syd2-1.cdninstagram.com&_nc_cat=102&_nc_ohc=PUhjczw7y14AX9PL4UZ&oh=6d0ea3eb89d3cdef94287ac08150c437&oe=5F5A4D66
Posted on: Mon Jul 27 2020 03:53:08 GMT+1200 (New Zealand Standard Time)
Total comments: 37
Comment "✨💝✨" made by salemsaintseven
Comment "💝" made by edithwright875
Comment "I don't understand" made by hardknocktexan
Comment "Practice makes perfect! Practicing self appreciation everyday! 💓🙌" made by the_wellthylife

Post: 💖
Thumbnail: https://scontent-syd2-1.cdninstagram.com/v/t51.2885-15/sh0.08/e35/s640x640/109107103_3029376757130872_1986054905794966674_n.jpg?_nc_ht=scontent-syd2-1.cdninstagram.com&_nc_cat=104&_nc_ohc=78CzTkyURlMAX_pwcSX&oh=ba0ec99ce1786f968dbbe7de5749569d&oe=5F59FA3D
Posted on: Sun Jul 26 2020 03:51:49 GMT+1200 (New Zealand Standard Time)
Total comments: 42
Comment "😍😍😍" made by ms_chellemybelle
Comment "BS, such thing as destiny doesn't exist. You create your own life path." made by wisemencave
Comment "WORD!!!!!" made by career_recruitment_advisor
Comment "♥️♥️♥️♥️♥️♥️♥️" made by jennyobrienxx

Post: 💕
Thumbnail: https://scontent-syd2-1.cdninstagram.com/v/t51.2885-15/sh0.08/e35/s640x640/116153782_189953942507814_3711265029020900151_n.jpg?_nc_ht=scontent-syd2-1.cdninstagram.com&_nc_cat=106&_nc_ohc=UW8y2JnM8pIAX_GhJGA&oh=eb5acb825d52536271fb7de0ec1d427c&oe=5F58CBC2
Posted on: Sat Jul 25 2020 03:08:59 GMT+1200 (New Zealand Standard Time)
Total comments: 68
Comment "❤️❤️❤️❤️❤️" made by marylcelizpaz
Comment "💯❤️" made by abrahamliciouss
Comment "😍😍😍" made by ms_chellemybelle
Comment "Soooo.....yep. gn." made by luminasun

Post: 💗
Thumbnail: https://scontent-syd2-1.cdninstagram.com/v/t51.2885-15/sh0.08/e35/s640x640/111544129_160981182157638_9167337507903501009_n.jpg?_nc_ht=scontent-syd2-1.cdninstagram.com&_nc_cat=106&_nc_ohc=0aIYgorzv8UAX9WzOve&oh=3d25ee0a5531c70910cdaf598451f1e2&oe=5F592308
Posted on: Fri Jul 24 2020 03:15:25 GMT+1200 (New Zealand Standard Time)
Total comments: 85
Comment "🙏" made by abrahamliciouss
Comment "🙏🏼💫" made by theurbanindigo
Comment "Love that, fav post❤️🙏" made by elevatelifecoach
Comment "💜💜💜" made by limitless.manifesting

Post: ❤️
Thumbnail: https://scontent-syd2-1.cdninstagram.com/v/t51.2885-15/sh0.08/e35/s640x640/109797266_723849751726473_976177502836491438_n.jpg?_nc_ht=scontent-syd2-1.cdninstagram.com&_nc_cat=109&_nc_ohc=4zjAcLKrqngAX_BleAU&oh=da0c3bbf627a998c9e93a25fc73ede3e&oe=5F5B9057
Posted on: Thu Jul 23 2020 03:12:32 GMT+1200 (New Zealand Standard Time)
Total comments: 60
Comment "❤️" made by friedlandlorie
Comment "@thenancymitchell inbox me for reading" made by ogundele__balogun
Comment "Hello I'm Dr balogun do you need help in improving your business, inbox me for help with spell to make your business move faster 
Spell to bring back your ex
spell to make a lover love only you 
Spell to help win court case 
Spell for protection and guidance 
Spell to remove curse and bad luck 
Spell for good luck
inbox me to get yours" made by ogundele__balogun

Post: 💗
Thumbnail: https://scontent-syd2-1.cdninstagram.com/v/t51.2885-15/sh0.08/e35/s640x640/115905932_277940600166450_3936075643612791626_n.jpg?_nc_ht=scontent-syd2-1.cdninstagram.com&_nc_cat=106&_nc_ohc=BGMt66PS60sAX_ZbsHb&oh=7adb15d2b394c215152d11fa887eb019&oe=5F5855E1
Posted on: Wed Jul 22 2020 03:42:33 GMT+1200 (New Zealand Standard Time)
Total comments: 50
Comment "🙏🏾🧡" made by ninasnaturalclinic
Comment "@gabikovalenko1 We are doing a 21 day gratitude challenge on my page if you want to join us ❤️ when we focus on what we want, it expands. When we are in a state of gratitude we start to see the beauty in everything, and abundance follows ." made by lilly_blue_lmao
Comment "@lilly_blue_lmao beautiful, thank you! 💖" made by gabikovalenko

Issue 1
So the most recent post returned here is July 29th, whereas they have 12 posts since then.

Issue 2
The total comment count is correct, but the array of comments returned is much shorter in each case.

[BUG] - Symbol.asyncIterator is not defined.

Describe the bug
I'm getting the following error when trying to run instamancer

TypeError: Symbol.asyncIterator is not defined.
    at __asyncGenerator (/instamancer/src/api/instagram.js:4:38)
    at Hashtag.generator (/instamancer/src/api/instagram.js:145:16)
    at spawn (/instamancer/src/cli.js:238:41)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

To Reproduce
I ran instamancer hashtag instagood -d after running through the installation steps on the README.

Setup (please complete the following information):
Node version - 6.9.0
Typescript version - 3.3.4000
Instamancer version - 1.3.1

[BUG]

['node']['owner']['profile_pic_url'] no more exist. Is it a bug or do you removed it ? Why ?

Instgram login pops up and scraping freezes [BUG - possibly...?]

Instamancer has been working like a dream for me up until last week. Now, after it has scraped roughly 21 images, Chromium opens the 'please login to instagram' window and scraping stops.

Left alone this will just sleep and re-sleep, and I have to manually exit the process. If I try to login to the modal on Chromium it also does nothing.

Has there been any changes? Or am I doing something wrong.

this is the command I have been using:
instamancer hashtag castlesemple --full --logging=info --visible

Mac Mojave, npm version 6.12.1

Enhanced logging on failed requests for new insta users

Running into a weird issue where newly created users don't seem to return results, posting here just wondering if anyone has seen similar? I set debug to logging but I'm not getting a lot of detail:

{"message":"\u001b[43m\u001b[30m tvswaynetemple \u001b[39m\u001b[49m\u001b[40m Total: Unlimited \u001b[49m\u001b[47m\u001b[30m State: Scraping \u001b[39m\u001b[49m\u001b[47m\u001b[30m Sleeping: 0 \u001b[39m\u001b[49m\u001b[47m\u001b[30m Scraped: 0 \u001b[39m\u001b[49m","level":"debug"}
{"message":"\u001b[43m\u001b[30m tvswaynetemple \u001b[39m\u001b[49m\u001b[40m Total: Unlimited \u001b[49m\u001b[47m\u001b[30m State: Branching \u001b[39m\u001b[49m\u001b[47m\u001b[30m Sleeping: 0 \u001b[39m\u001b[49m\u001b[47m\u001b[30m Scraped: 0 \u001b[39m\u001b[49m","level":"debug"}
{"message":"\u001b[43m\u001b[30m tvswaynetemple \u001b[39m\u001b[49m\u001b[40m Total: Unlimited \u001b[49m\u001b[47m\u001b[30m State: Scraping \u001b[39m\u001b[49m\u001b[47m\u001b[30m Sleeping: 2 \u001b[39m\u001b[49m\u001b[47m\u001b[30m Scraped: 0 \u001b[39m\u001b[49m","level":"debug"}
{"message":"\u001b[43m\u001b[30m tvswaynetemple \u001b[39m\u001b[49m\u001b[40m Total: Unlimited \u001b[49m\u001b[47m\u001b[30m State: Scraping \u001b[39m\u001b[49m\u001b[47m\u001b[30m Sleeping: 1 \u001b[39m\u001b[49m\u001b[47m\u001b[30m Scraped: 0 \u001b[39m\u001b[49m","level":"debug"}
{"message":"\u001b[43m\u001b[30m tvswaynetemple \u001b[39m\u001b[49m\u001b[40m Total: Unlimited \u001b[49m\u001b[47m\u001b[30m State: Scraping \u001b[39m\u001b[49m\u001b[47m\u001b[30m Sleeping: 0 \u001b[39m\u001b[49m\u001b[47m\u001b[30m Scraped: 0 \u001b[39m\u001b[49m","level":"debug"}
{"message":"\u001b[43m\u001b[30m tvswaynetemple \u001b[39m\u001b[49m\u001b[40m Total: Unlimited \u001b[49m\u001b[47m\u001b[30m State: Branching \u001b[39m\u001b[49m\u001b[47m\u001b[30m Sleeping: 0 \u001b[39m\u001b[49m\u001b[47m\u001b[30m Scraped: 0 \u001b[39m\u001b[49m","level":"debug"}
{"message":"\u001b[43m\u001b[30m tvswaynetemple \u001b[39m\u001b[49m\u001b[40m Total: Unlimited \u001b[49m\u001b[47m\u001b[30m State: Scraping \u001b[39m\u001b[49m\u001b[47m\u001b[30m Sleeping: 2 \u001b[39m\u001b[49m\u001b[47m\u001b[30m Scraped: 0 \u001b[39m\u001b[49m","level":"debug"}

[BUG]

Describe the bug
Discrepancy between posts collected on the instamancer command with different parameters

To Reproduce
Steps to reproduce the behavior.

I noticed a discrepancy between how many posts I can collect between the command: instamancer user science.sam --type=csv and instamancer user science.sam -fvd. I performed these two command line on July 21st.

The first command allows me to collect 501 posts which does not include around 10 recent posts of this user, while the second command allows me to collect all of the 512 posts by the user. However, I really like the csv format and I thought the first command should allow me to collect all of this user’s posts, right?

Expected behavior
I thought that the two command lines should allow me to collect the same amount of posts.

Output
If applicable, add Instamancer's output in a code block

here

Setup (please complete the following information):

  • MacOS
  • Instamancer version [v1.1.4]
  • Node version [e.g. v11.6.0]
  • NPM version (if applicable) [eg. 6.5.0]

Additional context
Add any other context about the problem here.

Write to data file on the fly?

The way I understand it, Instamancer first runs an entire scrape (and downloads media as it goes, depending on settings). But it is only when the full scrape is done that it writes to json, csv or similar.

So, if running a scrape to get unlimited posts for a hashtag, when will the data file be written? And where does the json/csv data go if the job breaks? Is it only stored in RAM until the end of the run?

[BUG] Instagram requires login

Instagram's API is now much more aggressive in requiring users to be logged in, which effectively renders Instamancer useless.

Instamancer is designed to be used for gathering publicly available data, and will not add the ability to log in to retrieve data.

[FEATURE]

Is this work available in the video version? if not yet available, would you like to make it in a simple video? thank you sir

get user not work correctly

Hi...

I love this library

but I tried this..

const options = Instamancer.IOptions = {
total: 0
}

let getPost = async () => {
const userData = Instamancer.user("global_66", options);
for await (let user of userData) {
console.log(user);
}
}

getPost()

"only return 15 scraped posts", but on the ig are 27 post

node v10.15.3

[FEATURE] Serverless Framework Support

Is your feature request related to a problem? Please describe.
Many of the API endpoints for instamancer could in theory be ported to a Serverless function that relies on either AWS Lambda (with puppeteer layer) or Google Cloud Functions (that automatically has access to puppeteer by default). This would increase the scalability of the solution and also allow lower level / starter users to take advantage of their free Lambda / function executions on a monthly basis.

Describe the solution you'd like
Add Serverless Framework as a dependency and create a serverless config file to handle configuration when deploying.

Describe alternatives you've considered
Serverless Framework would help to abstract the difference in platforms etc for anyone who wants to run this serverlessly have not considered alternatives.

Additional context
The biggest issue will be data persistence (where to deposit photos / which db to insert records into).

[BUG] Cannot use tagged

Instagram has removed access to tagged posts when not logged in.

This functionality will be removed from instamancer in the upcoming major release.

[FEATURE] Parallel Batch Processing?

Love this project. I can see this method being used more often. One thing I'm starting to think about is the ability to parallel process a batch on the command line. Is this something that is possible by opening multiple tabs in puppeteer?

At the moment, I'm running the script in a loop to process multiple scrapes, and maybe trying to instigate multiple instamancer processes at the same time could work for now.

Any plans for parallel batch processing? Cheers.

[BUG] After scraping around 800 hashtags Instamancer reloads the browser

Describe the bug
When scraping for hashtag's, recently it seem's to fail after scraping around ~800 (this is fairly consistent). When reaching around 800 Instamancer restarts the browser and tries again from scratch.

It seems to be related to this line of code:

if (this.foundGraft && newApiRequest) {

Specifically the this.start() method which causes the browser to reload.

And by looking at the network logs in chrome I can see that one of the graphql requests returns an error around the 800 post mark. Every other request after this one seems to work ok.

To Reproduce
Search for any hashtag, and make sure the limit is higher than 800.

Setup (please complete the following information):

  • OS: [e.g. MacOS Catalina]
  • Instamancer version [e.g. v3.0.1]

I will add more info here as I debug the issue further.

[ERROR] RESERVE KEYWORD ERROR

/usr/local/lib/node_modules/instamancer/src/cli.js:268
for await (const post of obj.generator()) {
^^^^^

SyntaxError: Unexpected reserved word
at createScript (vm.js:80:10)
at Object.runInThisContext (vm.js:139:10)
at Module._compile (module.js:616:28)
at Object.Module._extensions..js (module.js:663:10)
at Module.load (module.js:565:32)
at tryModuleLoad (module.js:505:12
at Function.Module._load (module.js:497:3)
at Function.Module.runMain (module.js:693:10)
at startup (bootstrap_node.js:188:16)
at bootstrap_node.js:609:3

[FEATURE] Tags/users search

Describe the solution you'd like
For my use case a feature to search for users or hashtags is needed to build an autocomplete field.

A function should receive a search query and return an array of users and hashtags were found with avatar, description for user profiles and number of posts for hashtags.

The data will be taken from the search input on Instagram page:
Screen Shot 2019-06-28 at 0 40 23

Proposal

A public method with a following signature:

search: async (query: string, options: Instamancer.IOptions): Promise<Response>

I can implement this feature, if you will accept it. But for now I can see that it can conflict with the ideology of the project, because we need to fire DOM events first to capture the response. Maybe you can have an idea how we can implement this?

[BUG] Basic API does not work

Describe the bug
Instamancer instantly goes to sleep when trying to use the api in non-full more. It finishes after a minute with no result.

obraz

To Reproduce
instamancer hashtag puppies

Instamancer 3.3.1
Node 12.18.3
npm 6.14.4

I've tried this on multiple environments. The result is the same if I use the RegularAPI from javascript.

Unable to use instamancer in Angular CLI project

I am trying to use instamancer as part of an Angular project. When I try to compile angular and run the website with the instamancer node module, it gives me the below error:

ERROR in ./node_modules/instamancer/src/api/api.ts Module build failed (from ./node_modules/@ngtools/webpack/src/index.js): Error: /Volumes/SD_CACHE/PHPStorm/dmsolutions-angular/node_modules/instamancer/src/api/api.ts is missing from the TypeScript compilation. Please make sure it is in your tsconfig via the 'files' or 'include' property. The missing file seems to be part of a third party library. TS files in published libraries are often a sign of a badly packaged library. Please open an issue in the library repository to alert its author and ask them to package the library using the Angular Package Format (https://goo.gl/jB3GVv). at AngularCompilerPlugin.getCompiledFile (/Volumes/SD_CACHE/PHPStorm/dmsolutions-angular/node_modules/@ngtools/webpack/src/angular_compiler_plugin.js:912:23) at plugin.done.then (/Volumes/SD_CACHE/PHPStorm/dmsolutions-angular/node_modules/@ngtools/webpack/src/loader.js:41:31) at process._tickCallback (internal/process/next_tick.js:68:7)

I tried editing my tsconfig.json like the error recommended, but it remains unresolved as it asks me to include more and more files from Instamancer, which results in more errors reported.

My machine information:

  • OS: MacOS 10.11 El Capitan
  • Instamancer version 1.4.2
  • Node version 10.16.2
  • NPM version 6.10.3
  • Angular CLI: 8.2.2
  • Angular: 8.2.2

To reproduce this error, I tried replicating the ES2018 Typescript example in an angular service, but it is unable to compile with "ng build".

Is Instamancer incompatible with Angular?

Scraped: 0 in production server

Hi there, the script worked perfectly well on the production server until yesterday. No problem on my development machine, in production currently it scrapes nothing. Is it possible that the server was blocked by Instagram? Is there a recommended production configuration to avoid being blocked by Instagram? Thank you.

[FEATURE] Top/recent hashtag flag

Is your feature request related to a problem? Please describe.
No, this tool is amazing. Thank you.

Describe the solution you'd like
Have a flag that allows scraping hashtags from either top posts, or most recent.

[BUG] Scraping is not working anymore because Instagram requres authorization

Describe the bug
A clear and concise description of what the bug is.
Scraping is not working anymore. The issue is caused by Instagram itself. You have to log into your account in order to use it. To make sureI’m right, I turned off headless mode and started instamancer. As expected, it shows the login page and istamancer is not able to do its work.

To Reproduce
Steps to reproduce the behavior.

  1. Scrape something

Setup (please complete the following information):

  • OS: POP!_OS 20.04
  • Instamancer version [e.g. v1.1.4]
  • Node version: v12.18.1
  • NPM version: 6.14.5

Additional context
I tried to fix this by creating a function that will run puppeteer and authorize me but the browser wasn't saving my data. According to puppeteer docs, you have to specify userDataDir property, which contains the path to the user data of your browser. The question I struggle with is how do I change this property in instamancer.

[BUG] Redirecting to Instagram login page

I'm trying to run Instamancer on my VPS with CLI but it goes on a loop of sleeping 60s and it doesn't scrap anything.

instamancer user therock --count=3 --l="info"

And this was the log result:

{"message":"Starting API at 1580676295309","level":"info"}
{"level":"error","message":"ErrorNavigation failed because browser has disconnected!","stack":"Error: Navigation failed because browser has disconnected!\n    at CDPSession.<anonymous> (/usr/lib/node_modules/instamancer/node_modules/puppeteer/lib/LifecycleWatcher.js:46:107)\n    at CDPSession.emit (events.js:321:20)\n    at CDPSession._onClosed (/usr/lib/node_modules/instamancer/node_modules/puppeteer/lib/Connection.js:215:10)\n    at Connection._onClose (/usr/lib/node_modules/instamancer/node_modules/puppeteer/lib/Connection.js:138:15)\n    at WebSocket.<anonymous> (/usr/lib/node_modules/instamancer/node_modules/puppeteer/lib/WebSocketTransport.js:48:22)\n    at WebSocket.onClose (/usr/lib/node_modules/instamancer/node_modules/ws/lib/event-target.js:124:16)\n    at WebSocket.emit (events.js:321:20)\n    at WebSocket.emitClose (/usr/lib/node_modules/instamancer/node_modules/ws/lib/websocket.js:191:10)\n    at Socket.socketOnClose (/usr/lib/node_modules/instamancer/node_modules/ws/lib/websocket.js:850:15)\n    at Socket.emit (events.js:321:20)\n  -- ASYNC --\n    at Frame.<anonymous> (/usr/lib/node_modules/instamancer/node_modules/puppeteer/lib/helper.js:111:15)\n    at Page.goto (/usr/lib/node_modules/instamancer/node_modules/puppeteer/lib/Page.js:670:49)\n    at Page.<anonymous> (/usr/lib/node_modules/instamancer/node_modules/puppeteer/lib/helper.js:112:23)\n    at User.constructPage (/usr/lib/node_modules/instamancer/src/api/instagram.js:587:29)\n    at processTicksAndRejections (internal/process/task_queues.js:97:5)\n    at async User.start (/usr/lib/node_modules/instamancer/src/api/instagram.js:222:9)\n    at async spawn (/usr/lib/node_modules/instamancer/src/cli.js:343:5)\n    at async Object.handler (/usr/lib/node_modules/instamancer/src/cli.js:72:9)"}
{"message":"https://instagram.com/therock","level":"error"}

I've tried to use the module but it looks like it's redirecting me to Instagram's login page and I've also tried to get on a username's page with Puppeteer but it was also redirecting.

Is this problem IP related maybe? Do you have any experience with these events on Instamancer?

  • OS: Linux - Ubuntu 18.04.4 LTS
  • Instamancer version 3.1.0
  • Node version v13.7.0
  • NPM version 6.13..6

Omitting fullAPI skips first 12 posts

Currently having an issue when running the user command. The resulting data that is generated skips the first 12 posts on the user's timeline.

For example, the posts that are retrieved from the following commands are different...

# skips first 12 results
npx instamancer user nintendo  --count 24 --type csv


# successfully gets first 12 results
npx instamancer user nintendo  --count 24 --type csv --full

One file for all scraps

I'm using batchfile to scrap multi accounts, is it possible to download one json file at the end of all scaps instead a file by username ?

Alert from # used in post.

Is it possible to get an alert from post when the post has a #something in it?

Or is that already possible at the time of this question? It is very unclear to me if it indeed is possible with the readme..

Regards!

[FEATURE] Need a step-by-step example

Hi, I am a newbie python web scrapper. I really want to use instamancer to scrape data from instagram but I find it hard to get started.

Is there any step-by-step example about how to use the package from scratch? For instance, a code snippet that scraps posts from the tag "skateboard" on 2018.

A step-by-step example like this will certainly help newbies like me and make instamancer more popular.

Thank you very much for your help.

[FEATURE]

Is it possible to get the user's most recent media only?

[Question] 'Fake' Streaming

Can i use this library to make a 'fake' stream to a user account?

Like fetching data on new post every X time and if it's a new post get all the photos/videos of the new post?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.