do-me / fast-instagram-scraper Goto Github PK
View Code? Open in Web Editor NEWA fast Instagram Scraper based on Torpy.
A fast Instagram Scraper based on Torpy.
When --save_media is True, once --max_posts is reached, the program exits without downloading the media.
After recent changes, the old endpoints, e.g.:
https://instagram.com/graphql/query/?query_hash=ac38b90f0f3981c42092016a37c59bf7&variables={"id":"<some_location_id>","first":50,"after":""}
seem not to work anymore.
However location IDs can still be mined successfully via a different endpoint:
https://www.instagram.com/explore/locations/<some_location_id>/?__a=1&max_id=<last_cursor>
Currently don't have time to investigate further but this quick fix works for me:
if location_or_hashtag == "location":
instalink = 'https://www.instagram.com/explore/locations/' + str(object_id_or_string) + '/?__a=1&max_id=' + cursor
return instalink
idata["data"]
with idata["graphql"]
.Same applies to hashtags. Just updated the code so hashtags can be mined again.
Hi! Tried using this scraper after having issues with arc298's scraper, and I was wondering if it's possible to download not just the data, but also the images with it?
Edit: Perhaps it is possible to save only urls in the .csv file? Then it should be pretty easy to download
Any Ideas on this behaviour? There seem to be problems with torpy, have you experienced this before and any idea how to solve it?
[...]
Initiating tor session 233
Circuit built.
Start iteration 0: 2022-10-06 14:40:11.079573
Tor end node blocked. Last response: <Response [404]>
0it [01:16, ?it/s]
Initiating tor session 234
Circuit built.
Start iteration 0: 2022-10-06 14:41:28.718957
Tor end node blocked. Last response: <Response [404]>
0it [00:07, ?it/s]
Initiating tor session 235
Circuit built.
Start iteration 0: 2022-10-06 14:41:37.347591
ERROR:torpy.cell_socket:_ssl.c:1112: The handshake operation timed out
ERROR:root:[ignored]
Traceback (most recent call last):
File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\cell_socket.py", line 63, in connect
self._socket.connect((self._router.ip, self._router.or_port))
File "C:\Users\...\anaconda3\envs\scrape\lib\ssl.py", line 1343, in connect
self._real_connect(addr, False)
File "C:\Users\...\anaconda3\envs\scrape\lib\ssl.py", line 1334, in _real_connect
self.do_handshake()
File "C:\Users\...\anaconda3\envs\scrape\lib\ssl.py", line 1310, in do_handshake
self._sslobj.do_handshake()
socket.timeout: _ssl.c:1112: The handshake operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\utils.py", line 79, in newfn
return func(*args, **kwargs)
File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\consesus.py", line 183, in newfn
return func(*args, **kwargs)
File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\consesus.py", line 426, in get_descriptor
with self._get_dir_client() as dir_client:
File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\consesus.py", line 375, in _get_dir_client
self._dir_guard, self._dir_circuit = self._create_dir_circuit(purpose='Internal dir client')
File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\consesus.py", line 365, in _create_dir_circuit
guard = TorGuard(router, purpose=purpose)
File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\guard.py", line 66, in __init__
self.__tor_socket.connect()
File "C:\Users\...\anaconda3\envs\scrape\lib\site-packages\torpy\cell_socket.py", line 69, in connect
raise TorSocketConnectError(e)
torpy.cell_socket.TorSocketConnectError: _ssl.c:1112: The handshake operation timed out
WARNING:torpy.utils:Retry with another router...
0it [00:31, ?it/s]
'graphql'
Initiating tor session 236
Circuit built.
Start iteration 0: 2022-10-06 14:42:09.078514
Tor end node blocked. Last response: <Response [404]>
0it [00:06, ?it/s]
Initiating tor session 237
Circuit built.
Start iteration 0: 2022-10-06 14:42:16.572684
WARNING:torpy.circuit:#80000242 circuit: has been destroyed already
ERROR:torpy.utils:[ignored] torpy.circuit.CellTimeoutError: Timeout wait for CellRelayExtended2 or CellRelayTruncated
WARNING:torpy.utils:Retry circuit creation
Tor end node blocked. Last response: <Response [404]>
0it [00:52, ?it/s]
Initiating tor session 238
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.