shaikhsajid1111 / twitter-scraper-selenium Goto Github PK

View Code? Open in Web Editor NEW

299.0 6.0 46.0 126 KB

Python's package to scrap Twitter's front-end easily

Home Page: https://pypi.org/project/twitter-scraper-selenium

License: MIT License

Python 100.00%

python python3 selenium twitter twitter-scraper twitter-bot twitter-profile twitter-hashtag twitter-profiles automation

twitter-scraper-selenium's Issues

These Functions will be removed on new release

scrape_profile() - Its alternative is scrape_profile_with_api()
scrape_keyword() - Its alternative is scrape_keyword_with_api()
scrape_topic()- Its alternative is scrape_topic_with_api()

why the very long wait in wait_until_completion( )

I found that we spend 90% of the time in wait_until_completion( ), because the delay value time.sleep(randint(3, 5)) is 3 to 5 seconds, which seems very high - why is that?

time.sleep(random.uniform(0.1, 0.2)) seems more than enough for my simple tests, but maybe I'm missing something?

"Tweets did not appear!"

Code
scrap_keyword(keyword="nft", browser="chrome", tweets_count=999999, until="2022-06-30", since="2022-06-29",output_format="csv",filename="nft")

Error Message
[WDM] - Current google-chrome version is 103.0.5060
[WDM] - Get LATEST driver version for 103.0.5060

[WDM] - Driver [./.wdm/drivers/chromedriver/mac64/103.0.5060.53/chromedriver] found in cache
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!

Other Infomation
requirement.txt has already been installed

Failed to make request!

Username = input("Account: ")
tweets = int(input("How many tweets: "))
path = "C:/Users/HP Probook/PycharmProjects/scrap-scripts/"

data = scrape_keyword_with_api(f"(from:{Username})")
print(data)
and got this error 2023-07-26 15:03:21,739 - twitter_scraper_selenium.keyword_api - WARNING - Failed to make request!

Tweets not ordered by date

Hello! Is it possible to get tweets in order of date(recent to older)?

README.MD error within oistallation by pip

Hello,

I got that installation error with

pip install twitter-scraper-selenium

Error:
Collecting twitter-scraper-selenium
Using cached twitter_scraper_selenium-0.1.3.tar.gz (14 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-sw0o3qs2/twitter-scraper-selenium_e1bf4012aa9340ebaba22e04e0ba72df/setup.py", line 3, in
with open("README.MD", "r") as file:
FileNotFoundError: [Errno 2] No such file or directory: 'README.MD'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

It will be fine if you checke coding and integration of README.MD. Also the standard is README.md [----> non capital letters of extension *.md].

API rate limit exceeded with scrape_profile() function

first off all thanks for the package.

i'm getting this error:

ValueError: {'message': "API rate limit exceeded for (my ip adress) (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)", 'documentation_url': 'https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting'}

any way to get around this?

Are Musk,s new limitations impacting this module?

These lines of code worked fine just a few weeks ago and now I'm getting TypeError: object of type 'NoneType' has no len()

This is the code I'm using:
from twitter_scraper_selenium import get_profile_details

twitter_username = "tim_cook"
filename = "twitter_dummy_ceo"
get_profile_details(twitter_username=twitter_username, filename=filename)

Tweets did not appear!

I can't get any tweets T T
I don't know why

Inject javascript to get media url

You can inject javascript to get the media url.
This can fix issue #72

Scraping only 4 tweets no matter the tweet count

As mentioned in the title, I'm getting only 4 tweets using this code

from twitter_scraper_selenium import scrape_profile
import os
import json
account = input("Account: ")
tweets = int(input("How many tweets: "))
path = os.path.join(parent_dir, account)
if os.path.exists(path) == False:
    os.mkdir(path)
data = scrape_profile(twitter_username=account, output_format="json", browser="firefox", tweets_count=tweets)
print(data)
parsed = json.loads(data)
json_data = json.dumps(parsed, indent=4)
with open(path + "\\" + account + ".json", "w") as outfile:
    outfile.write(json_data)

And this is the print output: https://pastebin.com/p1UuxZFa

Thank you.

Failed to make request!

import os
from twitter_scraper_selenium import scrape_topic_with_api

scrape_topic_with_api(URL='https://twitter.com/i/topics/1468157909318045697', output_filename='tweets', tweets_count=10, headless=False)

if os.path.isfile("./tweets.json") == False:
    scrape_topic_with_api(URL='https://twitter.com/i/topics/1468157909318045697', output_filename='tweets', tweets_count=10, headless=False)

if len(open("./tweets.json").read()) < 100:
    scrape_topic_with_api(URL='https://twitter.com/i/topics/1468157909318045697', output_filename='tweets', tweets_count=10, headless=False)

I'm repeatedly getting the warning "Failed to make request!" even though I can see firefox opening and I can see the tweets.

Unable to launch the lib

Hello,

I would like to use the lib to do some dev with it but i'm facing issues even launching a project with it.
My project is built with poetry so i'm handling the libs through this tool. My config file looks like this:

[tool.poetry]
name = "poetrytestproject"
version = "0.1.0"
description = "Test project"
authors = ["Kamigaku <[email protected]>"]

[tool.poetry.dependencies]
python = "^3.10"
matplotlib = "^3.5.3"
Unidecode = "^1.3.4"
numpy = "^1.23.4"
scipy = "^1.9.3"
tweepy = "^4.12.1"
Pillow = "^9.2.0"
fonttools = "^4.38.0"
twitter-scraper-selenium = "^4.1.2"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

When i'm launching the project with a simple import of your solution, i'm getting this error:

Traceback (most recent call last):
  File "G:\Code\Python\PoetryTestProject\main_selenium.py", line 6, in <module>
    from twitter_scraper_selenium import scrape_profile
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\__init__.py", line 5, in <module>
    from .keyword import scrape_keyword
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\keyword.py", line 4, in <module>
    from .driver_initialization import Initializer
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\driver_initialization.py", line 12, in <module>
    from seleniumwire import webdriver
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\webdriver.py", line 27, in <module>
    from seleniumwire import backend, utils
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\backend.py", line 4, in <module>
    from seleniumwire.server import MitmProxy
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\server.py", line 5, in <module>
    from seleniumwire.handler import InterceptRequestHandler
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\handler.py", line 5, in <module>
    from seleniumwire import har
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\har.py", line 11, in <module>
    from seleniumwire.thirdparty.mitmproxy import connections
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\thirdparty\mitmproxy\connections.py", line 10, in <module>
    from seleniumwire.thirdparty.mitmproxy.net import tls, tcp
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tls.py", line 43, in <module>
    "SSLv2": (SSL.SSLv2_METHOD, BASIC_OPTIONS),
AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv2_METHOD'. Did you mean: 'SSLv23_METHOD'?

Process finished with exit code 1

I've read on multiple spot on the internet (and even on your facebook scrapper project) that the issue might comes from the "PyOpenSSL" package and i've downgraded its version to "21.0.0" using poetry and the error changes to:

Traceback (most recent call last):
  File "G:\Code\Python\PoetryTestProject\main_selenium.py", line 6, in <module>
    from twitter_scraper_selenium import scrape_profile
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\__init__.py", line 5, in <module>
    from .keyword import scrape_keyword
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\keyword.py", line 4, in <module>
    from .driver_initialization import Initializer
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\driver_initialization.py", line 12, in <module>
    from seleniumwire import webdriver
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\webdriver.py", line 27, in <module>
    from seleniumwire import backend, utils
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\backend.py", line 4, in <module>
    from seleniumwire.server import MitmProxy
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\server.py", line 5, in <module>
    from seleniumwire.handler import InterceptRequestHandler
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\handler.py", line 5, in <module>
    from seleniumwire import har
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\har.py", line 11, in <module>
    from seleniumwire.thirdparty.mitmproxy import connections
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\thirdparty\mitmproxy\connections.py", line 9, in <module>
    from seleniumwire.thirdparty.mitmproxy import certs, exceptions, stateobject
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\thirdparty\mitmproxy\certs.py", line 10, in <module>
    import OpenSSL
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\OpenSSL\__init__.py", line 8, in <module>
    from OpenSSL import crypto, SSL
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\OpenSSL\crypto.py", line 3279, in <module>
    _lib.OpenSSL_add_all_algorithms()
AttributeError: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'

Process finished with exit code 1

Any idea what might cause it?
I've got a friend that is also having the same issue.
I'm using python version 3.10.0

Thanks!

Proxy

Authenticated proxy doesn't load correctly, if i check "whatismyipaddress" on the driver i get my real IP

scrape_keyword : search in array of keyword not just a string

Hello,

Thank you very much for your work/

Is it possible to change the function scrape_keyword in order to search multiple keyword with the same session ?
instead of a loop of authentication, search, close ??

Thank you very much

pip installed and running example, throws error.

from twitter_scraper_selenium import scrap_keyword
#scrap 10 posts by searching keyword "india" from date 30th August till date 31st August
india = scrap_keyword(keyword="movie", browser="chrome",
tweets_count=10, output_format="json" ,until="2021-08-31", since="2021-08-29")
print(india)

is the code that I am trying to run.
At first it begins the read out begins with

Current google-chrome version is 103.0.5060
Get LATEST driver version for 103.0.5060
[WDM] - Driver found in cache

however, I beleive the following is where the error is occuring:

create_client_context
param = SSL._lib.SSL_CTX_get0_param(context._context)
AttributeError: module 'lib' has no attribute 'SSL_CTX_get0_param'

Output exceeds the size limit. Open the full output data in a text editor
Message: unknown error: net::ERR_CONNECTION_CLOSED
(Session info: headless chrome=103.0.5060.114)

Tweets did not appear

Hello I keep getting this error. I am new to this package so i do not know what I am missing.

"Tweets did not appear!, Try setting headless=False to see what is happening"

AttributeError: 'Keyword' object has no attribute '_Keyword__driver' while trying to search

When I execute the code example in readme file it gives the attribute error mentioned in title.

code:
from twitter_scraper_selenium import scrap_keyword
#scrap 10 posts by searching keyword "india" from date 30th August till date 31st August
india = scrap_keyword(keyword="india", browser="firefox",
tweets_count=10,output_format="json" ,until="2021-08-31", since="2021-08-30")
print(india)

example for twitter topic

i just tried this program to get twitter topic

here is the final result

from twitter_scraper_selenium.keyword import Keyword
URL = 'https://twitter.com/i/topics/1415728297065861123'
headless = False
keyword = 'steamdeck'
browser = 'firefox'
keyword_bot = Keyword(keyword, browser=browser, url=URL, headless=headless, proxy=None, tweets_count=1000)
data = keyword_bot.scrap()
with open('steamdeck.json', 'w') as f:
    json.dump(json.loads(data), f, indent=2)

#  print result
import textwrap
width = 120
for item in sorted(list(json.loads(data).values()), key=lambda x: x['posted_time']):
    wrap_text = '\n'.join(textwrap.wrap(item['content'], width=width))
    print(f"{item['posted_time']} {item['tweet_url']}\n{wrap_text}")
    print('-'*width)

some note on this

i got error when initializing webdriver similar to scrapy/scrapy#5635
- pip install 'pyOpenSSL==22.0.0' should fix it from linked issue
- this is little bit confusing because all import error is catch with general exception see also example 1 below
  - if possible just let error happened and end the program
save json will replace old data, so be careful
- it is possible to update json data by load the data first if file exist
- the same thing happen with csv
selenium can use custom profile folder, currently i have to edit on either set_properties or set_driver_for_browser on driver_initialization.Initializer
any reason why Keyword.scrap have to return json string? why not just return it as dict? when saving the data as csv, it have to be decoded back to dict

example 1

try:
 # assume error on this line because import webdriver failed
 from inspect import currentframe
except Exception as ex:
 print(ex)

# error happened again because currentframe is not imported
frameinfo = currentframe()

Returns only "Tweets did not appear!"

Environment Information

OS Version:

Edition	Windows 10 Pro
Version	21H2
Installed on	‎2022.‎04.‎10
OS build	19044.1706
Experience	Windows Feature Experience Pack 120.2212.4170.0

Python Version:

PS C:\Users\xxx> python -V
Python 3.10.5

Twitter scraper selenium Version:

0.1.6

requirement.txt is already installed

Code

from twitter_scraper_selenium import scrap_profile

microsoft = scrap_profile(twitter_username="Microsoft", output_format="json", browser="firefox", tweets_count=10)
print(microsoft)

Error Information

C:\Users\xxx\PycharmProjects\venv\Scripts\python.exe
C:/Users/xxx/PycharmProjects/Twitter_Selenium.py
[WDM] - Driver [C:\Users\xxx\.wdm\drivers\geckodriver\win64\v0.31.0\geckodriver.exe] found in cache
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
{}

Process finished with exit code 0

log

geckodriver.log:

1655093631147	geckodriver	INFO	Listening on 127.0.0.1:23023
1655093631171	mozrunner::runner	INFO	Running command: "C:\\Program Files\\Mozilla Firefox\\firefox.exe" "--marionette" "--headless" "--no-sandbox" "--disable-dev-shm-usage" "--ignore-certificate-errors" "--disable-gpu" "--log-level=3" "--disable-notifications" "--disable-popup-blocking" "-no-remote" "-profile" "C:\\Users\\xxx\\AppData\\Local\\Temp\\rust_mozprofileMChkmM"
*** You are running in headless mode.
1655093631395	Marionette	INFO	Marionette enabled
[GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt
console.warn: SearchSettings: "get: No settings file exists, new profile?" (new NotFoundError("Could not open the file at C:\\Users\\xxx\\AppData\\Local\\Temp\\rust_mozprofileMChkmM\\search.json.mozlz4", (void 0)))
1655093632149	Marionette	INFO	Listening on port 23060
Read port: 23060
1655093632179	RemoteAgent	WARN	TLS certificate errors will be ignored for this session
1655093632180	RemoteAgent	INFO	Proxy settings initialised: {"proxyType":"manual","httpProxy":"127.0.0.1:23022","noProxy":[],"sslProxy":"127.0.0.1:23022"}
1655093781898	Marionette	INFO	Stopped listening on port 23060

###!!! [Parent][PImageBridgeParent] Error: RunMessage(msgname=PImageBridge::Msg_WillClose) Channel closing: too late to send/recv, messages will be lost

Install help: selenium.common.exceptions.SessionNotCreatedException

Hi there team,
Thanks for this amazing lib, used it a few times already and had no problem but today trying to make it work again and got stuck on a gecko driver error.

...
selenium.common.exceptions.SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line
...

I created a docker to have a fresh install and make sure it's not my setup, I use it on a Macos and have the same error

Here are the files and how to reproduce

Dockerfile

# Use the official Python 3.9.16 image as the base image
FROM python:3.9.16

# Set the working directory to /app
WORKDIR /app

# Install the necessary dependencies
RUN pip install twitter-scraper-selenium

# Set up the shared volume
VOLUME ["/app"]

# Set the default command to run your script
CMD [ "python", "scrapper.py" ]

scrapper.py

from twitter_scraper_selenium import scrape_profile

microsoft = scrape_profile(twitter_username="microsoft",output_format="json",browser="firefox",tweets_count=10)
print(microsoft)

To run, after having docker installed and setup do:

docker build -t twitter-scraper .
docker run -v $(pwd):/app twitter-scraper

The bellow error happens on docker run and couldn't find anything useful on the internet to help me fix. Can you help me understand what is happening? It's seems to be with the geckodriver, not exactly with the twitter-scapper-selenium but I'm not sure where else to look

Full logs bellow

[WDM] - There is no [linux64] geckodriver for browser  in cache
[WDM] - Getting latest mozilla release info for v0.33.0
[WDM] - Trying to download new driver from https://github.com/mozilla/geckodriver/releases/download/v0.33.0/geckodriver-v0.33.0-linux64.tar.gz
[WDM] - Driver has been saved in cache [/root/.wdm/drivers/geckodriver/linux64/v0.33.0]
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 118, in scrap
    self.__start_driver()
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 39, in __start_driver
    self.__driver = Initializer(
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/driver_initialization.py", line 104, in init
    driver = self.set_driver_for_browser(self.browser_name)
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/driver_initialization.py", line 97, in set_driver_for_browser
    return webdriver.Firefox(service=FirefoxService(executable_path=GeckoDriverManager().install()), options=self.set_properties(browser_option))
  File "/usr/local/lib/python3.9/site-packages/seleniumwire/webdriver.py", line 179, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/firefox/webdriver.py", line 197, in __init__
    super().__init__(command_executor=executor, options=options, keep_alive=True)
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 288, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 381, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 444, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 249, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/scrapper.py", line 21, in <module>
    microsoft = scrape_profile(twitter_username="microsoft",output_format="json",browser="firefox",tweets_count=10)
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 197, in scrape_profile
    data = profile_bot.scrap()
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 128, in scrap
    self.__close_driver()
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 43, in __close_driver
    self.__driver.close()
AttributeError: 'str' object has no attribute 'close'

Appreciate any help

directory option not working

ValueError: not enough values to unpack (expected 2, got 0)
Traceback (most recent call last):
File "/home/arch/Code/Commisions/CryptoGuys/./src/util/scrape.py", line 3, in
scrap_topic(filename="tweets", url='https://twitter.com/i/topics/1468157909318045697',browser="firefox", tweets_count=10, directory='./src/util')
File "/usr/lib/python3.10/site-packages/twitter_scraper_selenium/topic.py", line 60, in scrap_topic
output_path = directory / "{}.json".format(filename)
TypeError: unsupported operand type(s) for /: 'str' and 'str'

Incomprehendable Error

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 128, in scrap
    self.start_driver()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 52, in start_driver
    self.browser, self.headless, self.proxy, self.browser_profile).init()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 104, in init
    driver = self.set_driver_for_browser(self.browser_name)
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 97, in set_driver_for_browser
    return webdriver.Firefox(service=FirefoxService(executable_path=GeckoDriverManager().install()), options=self.set_properties(browser_option))
  File "/usr/local/lib/python3.10/dist-packages/seleniumwire/webdriver.py", line 178, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/firefox/webdriver.py", line 177, in __init__
    super().__init__(
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 277, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 370, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Failed to decode response from marionette


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/Bots/DiscordBots/TwitterTopics/MTC/src/util/scrape.py", line 6, in <module>
    scrape_topic(filename="tweets", url='https://twitter.com/i/topics/1468157909318045697',browser="firefox", tweets_count=25)
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/topic.py", line 53, in scrape_topic
    data = keyword_bot.scrap()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 140, in scrap
    self.close_driver()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 55, in close_driver
    self.driver.close()
AttributeError: 'str' object has no attribute 'close'

I do not understand this error.

Install false

Please check . Some thinks wrong . Thanks for suppot

can't set firefox profile path

from twitter_scraper_selenium.topic import scrap_topic
scrap_topic(
    filename='linux',
    url='https://twitter.com/i/topics/848959431836487680',
    headless=False,
    browser_profile='/home/r3r/Documents/selenium_profile'
)

output

> python steamdeck.py
INFO:root:Loading Profile from /home/r3r/Documents/selenium_profile
[WDM] - Driver [/home/r3r/.wdm/drivers/geckodriver/linux64/v0.31.0/geckodriver] found in cache
INFO:WDM:Driver [/home/r3r/.wdm/drivers/geckodriver/linux64/v0.31.0/geckodriver] found in cache
INFO:seleniumwire.storage:Using default request storage
INFO:seleniumwire.backend:Created proxy listening on 127.0.0.1:44371

this will freeze after those text output

looking at source code for selenium setting profile is deprecated https://github.com/SeleniumHQ/selenium/blob/a4995e2c096239b42c373f26498a6c9bb4f2b3e7/py/selenium/webdriver/firefox/options.py#L101-L105

i have better success by using profile argument just like chrome https://github.com/rachmadaniHaryono/twitter-scraper-selenium/tree/bugfix/profile

but i only test it with linux

login support before executing apis

add support for logging into twitter using email/username/password
once logged in, can more data be accessed?
is the session going to be more stable?

Search results

Can i scrape people search results using this tool

The address returned by calling scrape_profile is wrong！

hi，there。The address returned by calling scrape_profile(twitter_username="blvckledge", output_format="json", browser="chrome", tweets_count=10) for videos is blob:https://twitter.com/facd7bdb-73ec-49ff-9492-993a165a3585, but the actual address is https://video.twimg.com/ext_tw_video/1673328923311058944/pu/vid/848x464/KNnl7Rqk_MfiX4X9.mp4?tag=12. Why is an incorrect address being returned? Additionally, I receive errors when calling scrape_topic_with_api() and scrape_profile_with_api(). What could be causing this?

scrap hashtag

does using hashtags work?
I want data on people who use a specific hashtag and find out who uses it the most

is it working with more than ten tweets?

it working just need to wait my fault sorry

not enough values to. unpack

hi,guys.I pulled the code again from GitHub and reinstalled it, but why am I still getting the following error?

How to show progress for the scraping process? Right now there is no indication that it works until it is all done.

I am still learning to use the package. It is VERY nice in terms of functionality.

The delays before seeing anything can be 2-3 minutes or longer, depending on how many tweets are being fetched.

Is there an easy way to monitor progress without slowing down the scraping process?

I see that logging is built in. Where are the log files stored?
Can the log be streamed to another terminal in my IDE? I am using PyCharm Pro.

AttributeError: 'str' object has no attribute 'close'

Running on my VPS: Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-52-generic x86_64)

[WDM] - Driver [/root/.wdm/drivers/geckodriver/linux64/v0.32.2/geckodriver] found in cache
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 118, in scrap
    self.__start_driver()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 40, in __start_driver
    self.browser, self.headless, self.proxy, self.browser_profile).init()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 104, in init
    driver = self.set_driver_for_browser(self.browser_name)
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 97, in set_driver_for_browser
    return webdriver.Firefox(service=FirefoxService(executable_path=GeckoDriverManager().install()), options=self.set_properties(browser_option))
  File "/usr/local/lib/python3.10/dist-packages/seleniumwire/webdriver.py", line 179, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/firefox/webdriver.py", line 197, in __init__
    super().__init__(command_executor=executor, options=options, keep_alive=True)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 288, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 381, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 444, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 249, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status 1


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/crypto-ordinals/twitter_feeds.py", line 67, in <module>
    tweets = scrape_profile(twitter_username="ordswapbot",output_format="json",browser="firefox",tweets_count=10,headless=False)
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 197, in scrape_profile
    data = profile_bot.scrap()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 128, in scrap
    self.__close_driver()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 43, in __close_driver
    self.__driver.close()
AttributeError: 'str' object has no attribute 'close'

Is it possible to get a list of all the followers?

Hey, I was wondering if there is a way to scrape all the followers of a profile using the package, I could not figure it out
Any help will be appreciated!

Get newest tweets with keyword

I'm using this awesome lib to get the newest tweets containing certain keywords. If you don't supply values for "since" and "until" you will get the same tweets everytime - the first tweets of the current day. I managed to get around that by deleting the since and until part from the search URL but that is certainly not intended 😄

Could you maybe add a live feature so your code doesn't get corrupted by me? 😆

Incorrect Retweets 'tweet_url '

When scraping a profile, retweets get an incorrect 'tweet_url'

in profile.py
tweet_url = "https://twitter.com/{}/status/{}".format(username,status)
status is the same as username when it is a retweet.

Feature request - conversation ID

Hi thanks for putting this together. Been looking around for active scrapping projects since twint has been archived and was happy to find this!

I want to request getting the conversaion_id. I didn't that's not in this project's docs. More info here: https://developer.twitter.com/en/docs/twitter-api/conversation-id

Not scraping every tweet from a user

Hello, I am trying to scrape every tweet from a user. From the twitter page, I can see that they have tweeted more than 5000 times. However, even when I set my tweets_count to 5000, I am getting less than 1000 tweets from that user.

My code is below:

scrape_profile(twitter_username = "elonmusk", output_format ="csv", tweets_count = 6000, browser = "chrome", filename = "elonmusk")

(Note that @ElonMusk is just a stand-in example)

hi，dou you have a plan of adding a api about user followers count？

timeout exception

[WDM] - Driver [C:\Users\HP Probook.wdm\drivers\geckodriver\win64\v0.33.0\geckodriver.exe] found in cache
2023-07-13 13:12:40,345 - twitter_scraper_selenium.driver_utils - ERROR - Tweets did not appear!, Try setting headless=False to see what is happening
Traceback (most recent call last):
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\twitter_scraper_selenium\driver_utils.py", line 35, in wait_until_tweets_appear
WebDriverWait(driver, 80).until(EC.presence_of_element_located(
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\selenium\webdriver\support\wait.py", line 95, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:183:5
NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:395:5
element.find/</<@chrome://remote/content/marionette/element.sys.mjs:134:16

twitter_scraper_selenium.scraping_utilities:Error at find_x_guest_token: 'guest_token'

I'm trying to make an API, but this error keeps troubling me. This snippet works in my local machine but gets broken when I ship it to Render using docker.

def init(app: flask.app.Flask):
    @app.route("/user/<string:username>")
    def user(username):
        filename = "get_profile_details"
        get_profile_details(
            twitter_username=username,
            filename=filename,
        )
        data = json.load(open(filename + ".json"))
        return data

add input tweet url to scrape tweet comments and nested replies

pls add there is no comment scraper for twitter that i could find

shaikhsajid1111 / twitter-scraper-selenium Goto Github PK

twitter-scraper-selenium's Issues

Environment Information

Code

Error Information

log

Recommend Projects

Recommend Topics

Recommend Org