shaikhsajid1111 / twitter-scraper-selenium Goto Github PK
View Code? Open in Web Editor NEWPython's package to scrap Twitter's front-end easily
Home Page: https://pypi.org/project/twitter-scraper-selenium
License: MIT License
Python's package to scrap Twitter's front-end easily
Home Page: https://pypi.org/project/twitter-scraper-selenium
License: MIT License
scrape_profile()
- Its alternative is scrape_profile_with_api()
scrape_keyword()
- Its alternative is scrape_keyword_with_api()
scrape_topic()
- Its alternative is scrape_topic_with_api()
I found that we spend 90% of the time in wait_until_completion( )
, because the delay value time.sleep(randint(3, 5))
is 3 to 5 seconds, which seems very high - why is that?
time.sleep(random.uniform(0.1, 0.2))
seems more than enough for my simple tests, but maybe I'm missing something?
Code
scrap_keyword(keyword="nft", browser="chrome", tweets_count=999999, until="2022-06-30", since="2022-06-29",output_format="csv",filename="nft")
Error Message
[WDM] - Current google-chrome version is 103.0.5060
[WDM] - Get LATEST driver version for 103.0.5060
[WDM] - Driver [./.wdm/drivers/chromedriver/mac64/103.0.5060.53/chromedriver] found in cache
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Username = input("Account: ")
tweets = int(input("How many tweets: "))
path = "C:/Users/HP Probook/PycharmProjects/scrap-scripts/"
data = scrape_keyword_with_api(f"(from:{Username})")
print(data)
and got this error 2023-07-26 15:03:21,739 - twitter_scraper_selenium.keyword_api - WARNING - Failed to make request!
Hello! Is it possible to get tweets in order of date(recent to older)?
Hello,
I got that installation error with
pip install twitter-scraper-selenium
Error:
Collecting twitter-scraper-selenium
Using cached twitter_scraper_selenium-0.1.3.tar.gz (14 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-sw0o3qs2/twitter-scraper-selenium_e1bf4012aa9340ebaba22e04e0ba72df/setup.py", line 3, in
with open("README.MD", "r") as file:
FileNotFoundError: [Errno 2] No such file or directory: 'README.MD'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
It will be fine if you checke coding and integration of README.MD. Also the standard is README.md [----> non capital letters of extension *.md].
first off all thanks for the package.
i'm getting this error:
ValueError: {'message': "API rate limit exceeded for (my ip adress) (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)", 'documentation_url': 'https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting'}
any way to get around this?
These lines of code worked fine just a few weeks ago and now I'm getting TypeError: object of type 'NoneType' has no len()
This is the code I'm using:
from twitter_scraper_selenium import get_profile_details
twitter_username = "tim_cook"
filename = "twitter_dummy_ceo"
get_profile_details(twitter_username=twitter_username, filename=filename)
I can't get any tweets T T
I don't know why
You can inject javascript to get the media url.
This can fix issue #72
As mentioned in the title, I'm getting only 4 tweets using this code
from twitter_scraper_selenium import scrape_profile
import os
import json
account = input("Account: ")
tweets = int(input("How many tweets: "))
path = os.path.join(parent_dir, account)
if os.path.exists(path) == False:
os.mkdir(path)
data = scrape_profile(twitter_username=account, output_format="json", browser="firefox", tweets_count=tweets)
print(data)
parsed = json.loads(data)
json_data = json.dumps(parsed, indent=4)
with open(path + "\\" + account + ".json", "w") as outfile:
outfile.write(json_data)
And this is the print output: https://pastebin.com/p1UuxZFa
Thank you.
import os
from twitter_scraper_selenium import scrape_topic_with_api
scrape_topic_with_api(URL='https://twitter.com/i/topics/1468157909318045697', output_filename='tweets', tweets_count=10, headless=False)
if os.path.isfile("./tweets.json") == False:
scrape_topic_with_api(URL='https://twitter.com/i/topics/1468157909318045697', output_filename='tweets', tweets_count=10, headless=False)
if len(open("./tweets.json").read()) < 100:
scrape_topic_with_api(URL='https://twitter.com/i/topics/1468157909318045697', output_filename='tweets', tweets_count=10, headless=False)
I'm repeatedly getting the warning "Failed to make request!" even though I can see firefox opening and I can see the tweets.
Hello,
I would like to use the lib to do some dev with it but i'm facing issues even launching a project with it.
My project is built with poetry so i'm handling the libs through this tool. My config file looks like this:
[tool.poetry]
name = "poetrytestproject"
version = "0.1.0"
description = "Test project"
authors = ["Kamigaku <[email protected]>"]
[tool.poetry.dependencies]
python = "^3.10"
matplotlib = "^3.5.3"
Unidecode = "^1.3.4"
numpy = "^1.23.4"
scipy = "^1.9.3"
tweepy = "^4.12.1"
Pillow = "^9.2.0"
fonttools = "^4.38.0"
twitter-scraper-selenium = "^4.1.2"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
When i'm launching the project with a simple import of your solution, i'm getting this error:
Traceback (most recent call last):
File "G:\Code\Python\PoetryTestProject\main_selenium.py", line 6, in <module>
from twitter_scraper_selenium import scrape_profile
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\__init__.py", line 5, in <module>
from .keyword import scrape_keyword
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\keyword.py", line 4, in <module>
from .driver_initialization import Initializer
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\driver_initialization.py", line 12, in <module>
from seleniumwire import webdriver
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\webdriver.py", line 27, in <module>
from seleniumwire import backend, utils
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\backend.py", line 4, in <module>
from seleniumwire.server import MitmProxy
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\server.py", line 5, in <module>
from seleniumwire.handler import InterceptRequestHandler
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\handler.py", line 5, in <module>
from seleniumwire import har
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\har.py", line 11, in <module>
from seleniumwire.thirdparty.mitmproxy import connections
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\thirdparty\mitmproxy\connections.py", line 10, in <module>
from seleniumwire.thirdparty.mitmproxy.net import tls, tcp
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tls.py", line 43, in <module>
"SSLv2": (SSL.SSLv2_METHOD, BASIC_OPTIONS),
AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv2_METHOD'. Did you mean: 'SSLv23_METHOD'?
Process finished with exit code 1
I've read on multiple spot on the internet (and even on your facebook scrapper project) that the issue might comes from the "PyOpenSSL" package and i've downgraded its version to "21.0.0" using poetry and the error changes to:
Traceback (most recent call last):
File "G:\Code\Python\PoetryTestProject\main_selenium.py", line 6, in <module>
from twitter_scraper_selenium import scrape_profile
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\__init__.py", line 5, in <module>
from .keyword import scrape_keyword
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\keyword.py", line 4, in <module>
from .driver_initialization import Initializer
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\driver_initialization.py", line 12, in <module>
from seleniumwire import webdriver
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\webdriver.py", line 27, in <module>
from seleniumwire import backend, utils
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\backend.py", line 4, in <module>
from seleniumwire.server import MitmProxy
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\server.py", line 5, in <module>
from seleniumwire.handler import InterceptRequestHandler
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\handler.py", line 5, in <module>
from seleniumwire import har
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\har.py", line 11, in <module>
from seleniumwire.thirdparty.mitmproxy import connections
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\thirdparty\mitmproxy\connections.py", line 9, in <module>
from seleniumwire.thirdparty.mitmproxy import certs, exceptions, stateobject
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\thirdparty\mitmproxy\certs.py", line 10, in <module>
import OpenSSL
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\OpenSSL\__init__.py", line 8, in <module>
from OpenSSL import crypto, SSL
File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\OpenSSL\crypto.py", line 3279, in <module>
_lib.OpenSSL_add_all_algorithms()
AttributeError: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'
Process finished with exit code 1
Any idea what might cause it?
I've got a friend that is also having the same issue.
I'm using python version 3.10.0
Thanks!
Authenticated proxy doesn't load correctly, if i check "whatismyipaddress" on the driver i get my real IP
Hello,
Thank you very much for your work/
Is it possible to change the function scrape_keyword in order to search multiple keyword with the same session ?
instead of a loop of authentication, search, close ??
Thank you very much
from twitter_scraper_selenium import scrap_keyword
#scrap 10 posts by searching keyword "india" from date 30th August till date 31st August
india = scrap_keyword(keyword="movie", browser="chrome",
tweets_count=10, output_format="json" ,until="2021-08-31", since="2021-08-29")
print(india)
is the code that I am trying to run.
At first it begins the read out begins with
Current google-chrome version is 103.0.5060
Get LATEST driver version for 103.0.5060
[WDM] - Driver found in cache
however, I beleive the following is where the error is occuring:
create_client_context
param = SSL._lib.SSL_CTX_get0_param(context._context)
AttributeError: module 'lib' has no attribute 'SSL_CTX_get0_param'
Output exceeds the size limit. Open the full output data in a text editor
Message: unknown error: net::ERR_CONNECTION_CLOSED
(Session info: headless chrome=103.0.5060.114)
Hello I keep getting this error. I am new to this package so i do not know what I am missing.
"Tweets did not appear!, Try setting headless=False to see what is happening"
When I execute the code example in readme file it gives the attribute error mentioned in title.
code:
from twitter_scraper_selenium import scrap_keyword
#scrap 10 posts by searching keyword "india" from date 30th August till date 31st August
india = scrap_keyword(keyword="india", browser="firefox",
tweets_count=10,output_format="json" ,until="2021-08-31", since="2021-08-30")
print(india)
i just tried this program to get twitter topic
here is the final result
from twitter_scraper_selenium.keyword import Keyword
URL = 'https://twitter.com/i/topics/1415728297065861123'
headless = False
keyword = 'steamdeck'
browser = 'firefox'
keyword_bot = Keyword(keyword, browser=browser, url=URL, headless=headless, proxy=None, tweets_count=1000)
data = keyword_bot.scrap()
with open('steamdeck.json', 'w') as f:
json.dump(json.loads(data), f, indent=2)
# print result
import textwrap
width = 120
for item in sorted(list(json.loads(data).values()), key=lambda x: x['posted_time']):
wrap_text = '\n'.join(textwrap.wrap(item['content'], width=width))
print(f"{item['posted_time']} {item['tweet_url']}\n{wrap_text}")
print('-'*width)
some note on this
pip install 'pyOpenSSL==22.0.0'
should fix it from linked issueexample 1
try:
# assume error on this line because import webdriver failed
from inspect import currentframe
except Exception as ex:
print(ex)
# error happened again because currentframe is not imported
frameinfo = currentframe()
OS Version:
Edition Windows 10 Pro
Version 21H2
Installed on 2022.04.10
OS build 19044.1706
Experience Windows Feature Experience Pack 120.2212.4170.0
Python Version:
PS C:\Users\xxx> python -V
Python 3.10.5
Twitter scraper selenium Version:
0.1.6
requirement.txt is already installed
from twitter_scraper_selenium import scrap_profile
microsoft = scrap_profile(twitter_username="Microsoft", output_format="json", browser="firefox", tweets_count=10)
print(microsoft)
C:\Users\xxx\PycharmProjects\venv\Scripts\python.exe
C:/Users/xxx/PycharmProjects/Twitter_Selenium.py
[WDM] - Driver [C:\Users\xxx\.wdm\drivers\geckodriver\win64\v0.31.0\geckodriver.exe] found in cache
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
{}
Process finished with exit code 0
geckodriver.log:
1655093631147 geckodriver INFO Listening on 127.0.0.1:23023
1655093631171 mozrunner::runner INFO Running command: "C:\\Program Files\\Mozilla Firefox\\firefox.exe" "--marionette" "--headless" "--no-sandbox" "--disable-dev-shm-usage" "--ignore-certificate-errors" "--disable-gpu" "--log-level=3" "--disable-notifications" "--disable-popup-blocking" "-no-remote" "-profile" "C:\\Users\\xxx\\AppData\\Local\\Temp\\rust_mozprofileMChkmM"
*** You are running in headless mode.
1655093631395 Marionette INFO Marionette enabled
[GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt
console.warn: SearchSettings: "get: No settings file exists, new profile?" (new NotFoundError("Could not open the file at C:\\Users\\xxx\\AppData\\Local\\Temp\\rust_mozprofileMChkmM\\search.json.mozlz4", (void 0)))
1655093632149 Marionette INFO Listening on port 23060
Read port: 23060
1655093632179 RemoteAgent WARN TLS certificate errors will be ignored for this session
1655093632180 RemoteAgent INFO Proxy settings initialised: {"proxyType":"manual","httpProxy":"127.0.0.1:23022","noProxy":[],"sslProxy":"127.0.0.1:23022"}
1655093781898 Marionette INFO Stopped listening on port 23060
###!!! [Parent][PImageBridgeParent] Error: RunMessage(msgname=PImageBridge::Msg_WillClose) Channel closing: too late to send/recv, messages will be lost
Hi there team,
Thanks for this amazing lib, used it a few times already and had no problem but today trying to make it work again and got stuck on a gecko driver error.
...
selenium.common.exceptions.SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line
...
I created a docker to have a fresh install and make sure it's not my setup, I use it on a Macos and have the same error
Here are the files and how to reproduce
Dockerfile
# Use the official Python 3.9.16 image as the base image
FROM python:3.9.16
# Set the working directory to /app
WORKDIR /app
# Install the necessary dependencies
RUN pip install twitter-scraper-selenium
# Set up the shared volume
VOLUME ["/app"]
# Set the default command to run your script
CMD [ "python", "scrapper.py" ]
scrapper.py
from twitter_scraper_selenium import scrape_profile
microsoft = scrape_profile(twitter_username="microsoft",output_format="json",browser="firefox",tweets_count=10)
print(microsoft)
To run, after having docker installed and setup do:
docker build -t twitter-scraper .
docker run -v $(pwd):/app twitter-scraper
The bellow error happens on docker run and couldn't find anything useful on the internet to help me fix. Can you help me understand what is happening? It's seems to be with the geckodriver, not exactly with the twitter-scapper-selenium but I'm not sure where else to look
Full logs bellow
[WDM] - There is no [linux64] geckodriver for browser in cache
[WDM] - Getting latest mozilla release info for v0.33.0
[WDM] - Trying to download new driver from https://github.com/mozilla/geckodriver/releases/download/v0.33.0/geckodriver-v0.33.0-linux64.tar.gz
[WDM] - Driver has been saved in cache [/root/.wdm/drivers/geckodriver/linux64/v0.33.0]
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 118, in scrap
self.__start_driver()
File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 39, in __start_driver
self.__driver = Initializer(
File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/driver_initialization.py", line 104, in init
driver = self.set_driver_for_browser(self.browser_name)
File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/driver_initialization.py", line 97, in set_driver_for_browser
return webdriver.Firefox(service=FirefoxService(executable_path=GeckoDriverManager().install()), options=self.set_properties(browser_option))
File "/usr/local/lib/python3.9/site-packages/seleniumwire/webdriver.py", line 179, in __init__
super().__init__(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/firefox/webdriver.py", line 197, in __init__
super().__init__(command_executor=executor, options=options, keep_alive=True)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 288, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 381, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 444, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 249, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/scrapper.py", line 21, in <module>
microsoft = scrape_profile(twitter_username="microsoft",output_format="json",browser="firefox",tweets_count=10)
File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 197, in scrape_profile
data = profile_bot.scrap()
File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 128, in scrap
self.__close_driver()
File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 43, in __close_driver
self.__driver.close()
AttributeError: 'str' object has no attribute 'close'
Appreciate any help
ValueError: not enough values to unpack (expected 2, got 0)
Traceback (most recent call last):
File "/home/arch/Code/Commisions/CryptoGuys/./src/util/scrape.py", line 3, in
scrap_topic(filename="tweets", url='https://twitter.com/i/topics/1468157909318045697',browser="firefox", tweets_count=10, directory='./src/util')
File "/usr/lib/python3.10/site-packages/twitter_scraper_selenium/topic.py", line 60, in scrap_topic
output_path = directory / "{}.json".format(filename)
TypeError: unsupported operand type(s) for /: 'str' and 'str'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 128, in scrap
self.start_driver()
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 52, in start_driver
self.browser, self.headless, self.proxy, self.browser_profile).init()
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 104, in init
driver = self.set_driver_for_browser(self.browser_name)
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 97, in set_driver_for_browser
return webdriver.Firefox(service=FirefoxService(executable_path=GeckoDriverManager().install()), options=self.set_properties(browser_option))
File "/usr/local/lib/python3.10/dist-packages/seleniumwire/webdriver.py", line 178, in __init__
super().__init__(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/firefox/webdriver.py", line 177, in __init__
super().__init__(
File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 277, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 370, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Failed to decode response from marionette
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/Bots/DiscordBots/TwitterTopics/MTC/src/util/scrape.py", line 6, in <module>
scrape_topic(filename="tweets", url='https://twitter.com/i/topics/1468157909318045697',browser="firefox", tweets_count=25)
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/topic.py", line 53, in scrape_topic
data = keyword_bot.scrap()
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 140, in scrap
self.close_driver()
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 55, in close_driver
self.driver.close()
AttributeError: 'str' object has no attribute 'close'
from twitter_scraper_selenium.topic import scrap_topic
scrap_topic(
filename='linux',
url='https://twitter.com/i/topics/848959431836487680',
headless=False,
browser_profile='/home/r3r/Documents/selenium_profile'
)
output
> python steamdeck.py
INFO:root:Loading Profile from /home/r3r/Documents/selenium_profile
[WDM] - Driver [/home/r3r/.wdm/drivers/geckodriver/linux64/v0.31.0/geckodriver] found in cache
INFO:WDM:Driver [/home/r3r/.wdm/drivers/geckodriver/linux64/v0.31.0/geckodriver] found in cache
INFO:seleniumwire.storage:Using default request storage
INFO:seleniumwire.backend:Created proxy listening on 127.0.0.1:44371
this will freeze after those text output
looking at source code for selenium setting profile is deprecated https://github.com/SeleniumHQ/selenium/blob/a4995e2c096239b42c373f26498a6c9bb4f2b3e7/py/selenium/webdriver/firefox/options.py#L101-L105
i have better success by using profile argument just like chrome https://github.com/rachmadaniHaryono/twitter-scraper-selenium/tree/bugfix/profile
but i only test it with linux
add support for logging into twitter using email/username/password
once logged in, can more data be accessed?
is the session going to be more stable?
Can i scrape people search results using this tool
hi,there。The address returned by calling scrape_profile(twitter_username="blvckledge", output_format="json", browser="chrome", tweets_count=10) for videos is blob:https://twitter.com/facd7bdb-73ec-49ff-9492-993a165a3585, but the actual address is https://video.twimg.com/ext_tw_video/1673328923311058944/pu/vid/848x464/KNnl7Rqk_MfiX4X9.mp4?tag=12. Why is an incorrect address being returned? Additionally, I receive errors when calling scrape_topic_with_api() and scrape_profile_with_api(). What could be causing this?
does using hashtags work?
I want data on people who use a specific hashtag and find out who uses it the most
it working just need to wait my fault sorry
I am still learning to use the package. It is VERY nice in terms of functionality.
The delays before seeing anything can be 2-3 minutes or longer, depending on how many tweets are being fetched.
Is there an easy way to monitor progress without slowing down the scraping process?
I see that logging is built in. Where are the log files stored?
Can the log be streamed to another terminal in my IDE? I am using PyCharm Pro.
Running on my VPS: Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-52-generic x86_64)
[WDM] - Driver [/root/.wdm/drivers/geckodriver/linux64/v0.32.2/geckodriver] found in cache
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 118, in scrap
self.__start_driver()
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 40, in __start_driver
self.browser, self.headless, self.proxy, self.browser_profile).init()
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 104, in init
driver = self.set_driver_for_browser(self.browser_name)
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 97, in set_driver_for_browser
return webdriver.Firefox(service=FirefoxService(executable_path=GeckoDriverManager().install()), options=self.set_properties(browser_option))
File "/usr/local/lib/python3.10/dist-packages/seleniumwire/webdriver.py", line 179, in __init__
super().__init__(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/firefox/webdriver.py", line 197, in __init__
super().__init__(command_executor=executor, options=options, keep_alive=True)
File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 288, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 381, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 444, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 249, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/crypto-ordinals/twitter_feeds.py", line 67, in <module>
tweets = scrape_profile(twitter_username="ordswapbot",output_format="json",browser="firefox",tweets_count=10,headless=False)
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 197, in scrape_profile
data = profile_bot.scrap()
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 128, in scrap
self.__close_driver()
File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 43, in __close_driver
self.__driver.close()
AttributeError: 'str' object has no attribute 'close'
Hey, I was wondering if there is a way to scrape all the followers of a profile using the package, I could not figure it out
Any help will be appreciated!
I'm using this awesome lib to get the newest tweets containing certain keywords. If you don't supply values for "since" and "until" you will get the same tweets everytime - the first tweets of the current day. I managed to get around that by deleting the since and until part from the search URL but that is certainly not intended 😄
Could you maybe add a live feature so your code doesn't get corrupted by me? 😆
When scraping a profile, retweets get an incorrect 'tweet_url'
in profile.py
tweet_url = "https://twitter.com/{}/status/{}".format(username,status)
status
is the same as username
when it is a retweet.
Hi thanks for putting this together. Been looking around for active scrapping projects since twint has been archived and was happy to find this!
I want to request getting the conversaion_id
. I didn't that's not in this project's docs. More info here: https://developer.twitter.com/en/docs/twitter-api/conversation-id
Hello, I am trying to scrape every tweet from a user. From the twitter page, I can see that they have tweeted more than 5000 times. However, even when I set my tweets_count to 5000, I am getting less than 1000 tweets from that user.
My code is below:
scrape_profile(twitter_username = "elonmusk", output_format ="csv", tweets_count = 6000, browser = "chrome", filename = "elonmusk")
(Note that @ElonMusk is just a stand-in example)
More Examples and Documentation are needed...
I am new to using this package. I see a lot of functionality, but not enough examples of how to use it.
It would be nice if there was a ReadTheDocs.io website for it.
I am interested in pulling images and metadata associated with tweets. The element_finder.py file appears to pull images, but it unclear how to call it separately, or whether that is the best way to use it in this package.
@staticmethod
def find_images(tweet) -> Union[list, None]:
"""finds all images of the tweet"""
try:
image_element = tweet.find_elements(By.CSS_SELECTOR,
'div[data-testid="tweetPhoto"]')
images = []
for image_div in image_element:
href = image_div.find_element(By.TAG_NAME,
"img").get_attribute("src")
images.append(href)
return images
except Exception as ex:
logger.exception("Error at method find_images : {}".format(ex))
return []
Many of the functions do not have docstrings. I am going through them now to try to add descriptions for my own benefit.
I also want to add things to outputs to automate the data capture. For example, I want to time-stamp output files, because during testing I will have to do multiple runs to make sure that I got all information I was looking for.
For JSON files, I would like to pretty-print them before saving them, because they come out in a one-liner format in the file.
I am not sure where to insert these functions. I will keep browsing the package structure to see if I can figure out these things, and add helpful docstrings where they are missing.
[WDM] - Driver [C:\Users\HP Probook.wdm\drivers\geckodriver\win64\v0.33.0\geckodriver.exe] found in cache
2023-07-13 13:12:40,345 - twitter_scraper_selenium.driver_utils - ERROR - Tweets did not appear!, Try setting headless=False to see what is happening
Traceback (most recent call last):
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\twitter_scraper_selenium\driver_utils.py", line 35, in wait_until_tweets_appear
WebDriverWait(driver, 80).until(EC.presence_of_element_located(
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\selenium\webdriver\support\wait.py", line 95, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:183:5
NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:395:5
element.find/</<@chrome://remote/content/marionette/element.sys.mjs:134:16
I'm trying to make an API, but this error keeps troubling me. This snippet works in my local machine but gets broken when I ship it to Render using docker.
def init(app: flask.app.Flask):
@app.route("/user/<string:username>")
def user(username):
filename = "get_profile_details"
get_profile_details(
twitter_username=username,
filename=filename,
)
data = json.load(open(filename + ".json"))
return data
pls add there is no comment scraper for twitter that i could find
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.