Code Monkey home page Code Monkey logo

fast-instagram-scraper's People

Contributors

do-me avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fast-instagram-scraper's Issues

Login required for locations as of 04/2022

After recent changes, the old endpoints, e.g.:

https://instagram.com/graphql/query/?query_hash=ac38b90f0f3981c42092016a37c59bf7&variables={"id":"<some_location_id>","first":50,"after":""}

seem not to work anymore.

However location IDs can still be mined successfully via a different endpoint:

https://www.instagram.com/explore/locations/<some_location_id>/?__a=1&max_id=<last_cursor>

Currently don't have time to investigate further but this quick fix works for me:

  1. Change lines 47-49 to:
    if location_or_hashtag == "location":
        instalink = 'https://www.instagram.com/explore/locations/' + str(object_id_or_string) + '/?__a=1&max_id=' + cursor 
        return instalink
  1. Due to different JSON respone replace all idata["data"] with idata["graphql"].

Same applies to hashtags. Just updated the code so hashtags can be mined again.

Downloading images

Hi! Tried using this scraper after having issues with arc298's scraper, and I was wondering if it's possible to download not just the data, but also the images with it?

Edit: Perhaps it is possible to save only urls in the .csv file? Then it should be pretty easy to download

Problems with Torpy?

Any Ideas on this behaviour? There seem to be problems with torpy, have you experienced this before and any idea how to solve it?

[...]

Initiating tor session 233
                  Circuit built.
Start iteration 0: 2022-10-06 14:40:11.079573
Tor end node blocked. Last response: <Response [404]>
0it [01:16, ?it/s]
Initiating tor session 234
                  Circuit built.
Start iteration 0: 2022-10-06 14:41:28.718957
Tor end node blocked. Last response: <Response [404]>
0it [00:07, ?it/s]
Initiating tor session 235
                  Circuit built.
Start iteration 0: 2022-10-06 14:41:37.347591
ERROR:torpy.cell_socket:_ssl.c:1112: The handshake operation timed out
ERROR:root:[ignored]
Traceback (most recent call last):
  File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\cell_socket.py", line 63, in connect
    self._socket.connect((self._router.ip, self._router.or_port))
  File "C:\Users\...\anaconda3\envs\scrape\lib\ssl.py", line 1343, in connect
    self._real_connect(addr, False)
  File "C:\Users\...\anaconda3\envs\scrape\lib\ssl.py", line 1334, in _real_connect
    self.do_handshake()
  File "C:\Users\...\anaconda3\envs\scrape\lib\ssl.py", line 1310, in do_handshake
    self._sslobj.do_handshake()
socket.timeout: _ssl.c:1112: The handshake operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\utils.py", line 79, in newfn
    return func(*args, **kwargs)
  File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\consesus.py", line 183, in newfn
    return func(*args, **kwargs)
  File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\consesus.py", line 426, in get_descriptor
    with self._get_dir_client() as dir_client:
  File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\consesus.py", line 375, in _get_dir_client
    self._dir_guard, self._dir_circuit = self._create_dir_circuit(purpose='Internal dir client')
  File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\consesus.py", line 365, in _create_dir_circuit
    guard = TorGuard(router, purpose=purpose)
  File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\guard.py", line 66, in __init__
    self.__tor_socket.connect()
  File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\cell_socket.py", line 69, in connect
    raise TorSocketConnectError(e)
torpy.cell_socket.TorSocketConnectError: _ssl.c:1112: The handshake operation timed out
WARNING:torpy.utils:Retry with another router...
0it [00:31, ?it/s]
'graphql'
Initiating tor session 236
                  Circuit built.
Start iteration 0: 2022-10-06 14:42:09.078514
Tor end node blocked. Last response: <Response [404]>
0it [00:06, ?it/s]
Initiating tor session 237
                  Circuit built.
Start iteration 0: 2022-10-06 14:42:16.572684
WARNING:torpy.circuit:#80000242 circuit: has been destroyed already
ERROR:torpy.utils:[ignored] torpy.circuit.CellTimeoutError: Timeout wait for CellRelayExtended2 or CellRelayTruncated
WARNING:torpy.utils:Retry circuit creation
Tor end node blocked. Last response: <Response [404]>
0it [00:52, ?it/s]
Initiating tor session 238

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.