Code Monkey home page Code Monkey logo

twitter-scraper-selenium's Issues

These Functions will be removed on new release

  • scrape_profile() - Its alternative is scrape_profile_with_api()
  • scrape_keyword() - Its alternative is scrape_keyword_with_api()
  • scrape_topic()- Its alternative is scrape_topic_with_api()

why the very long wait in wait_until_completion( )

I found that we spend 90% of the time in wait_until_completion( ), because the delay value time.sleep(randint(3, 5)) is 3 to 5 seconds, which seems very high - why is that?

time.sleep(random.uniform(0.1, 0.2)) seems more than enough for my simple tests, but maybe I'm missing something?

"Tweets did not appear!"

Code
scrap_keyword(keyword="nft", browser="chrome", tweets_count=999999, until="2022-06-30", since="2022-06-29",output_format="csv",filename="nft")

Error Message
[WDM] - Current google-chrome version is 103.0.5060
[WDM] - Get LATEST driver version for 103.0.5060

[WDM] - Driver [./.wdm/drivers/chromedriver/mac64/103.0.5060.53/chromedriver] found in cache
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!

Other Infomation
requirement.txt has already been installed
Screenshot 2022-06-28 at 1 52 17 PM

Failed to make request!

Username = input("Account: ")
tweets = int(input("How many tweets: "))
path = "C:/Users/HP Probook/PycharmProjects/scrap-scripts/"

data = scrape_keyword_with_api(f"(from:{Username})")
print(data)
and got this error 2023-07-26 15:03:21,739 - twitter_scraper_selenium.keyword_api - WARNING - Failed to make request!

README.MD error within oistallation by pip

Hello,

I got that installation error with

pip install twitter-scraper-selenium

Error:
Collecting twitter-scraper-selenium
Using cached twitter_scraper_selenium-0.1.3.tar.gz (14 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-sw0o3qs2/twitter-scraper-selenium_e1bf4012aa9340ebaba22e04e0ba72df/setup.py", line 3, in
with open("README.MD", "r") as file:
FileNotFoundError: [Errno 2] No such file or directory: 'README.MD'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

It will be fine if you checke coding and integration of README.MD. Also the standard is README.md [----> non capital letters of extension *.md].

Are Musk,s new limitations impacting this module?

These lines of code worked fine just a few weeks ago and now I'm getting TypeError: object of type 'NoneType' has no len()

This is the code I'm using:
from twitter_scraper_selenium import get_profile_details

twitter_username = "tim_cook"
filename = "twitter_dummy_ceo"
get_profile_details(twitter_username=twitter_username, filename=filename)

Scraping only 4 tweets no matter the tweet count

As mentioned in the title, I'm getting only 4 tweets using this code

from twitter_scraper_selenium import scrape_profile
import os
import json
account = input("Account: ")
tweets = int(input("How many tweets: "))
path = os.path.join(parent_dir, account)
if os.path.exists(path) == False:
    os.mkdir(path)
data = scrape_profile(twitter_username=account, output_format="json", browser="firefox", tweets_count=tweets)
print(data)
parsed = json.loads(data)
json_data = json.dumps(parsed, indent=4)
with open(path + "\\" + account + ".json", "w") as outfile:
    outfile.write(json_data)

And this is the print output: https://pastebin.com/p1UuxZFa

Thank you.

Failed to make request!

import os
from twitter_scraper_selenium import scrape_topic_with_api

scrape_topic_with_api(URL='https://twitter.com/i/topics/1468157909318045697', output_filename='tweets', tweets_count=10, headless=False)

if os.path.isfile("./tweets.json") == False:
    scrape_topic_with_api(URL='https://twitter.com/i/topics/1468157909318045697', output_filename='tweets', tweets_count=10, headless=False)

if len(open("./tweets.json").read()) < 100:
    scrape_topic_with_api(URL='https://twitter.com/i/topics/1468157909318045697', output_filename='tweets', tweets_count=10, headless=False)

I'm repeatedly getting the warning "Failed to make request!" even though I can see firefox opening and I can see the tweets.

Unable to launch the lib

Hello,

I would like to use the lib to do some dev with it but i'm facing issues even launching a project with it.
My project is built with poetry so i'm handling the libs through this tool. My config file looks like this:

[tool.poetry]
name = "poetrytestproject"
version = "0.1.0"
description = "Test project"
authors = ["Kamigaku <[email protected]>"]

[tool.poetry.dependencies]
python = "^3.10"
matplotlib = "^3.5.3"
Unidecode = "^1.3.4"
numpy = "^1.23.4"
scipy = "^1.9.3"
tweepy = "^4.12.1"
Pillow = "^9.2.0"
fonttools = "^4.38.0"
twitter-scraper-selenium = "^4.1.2"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

When i'm launching the project with a simple import of your solution, i'm getting this error:

Traceback (most recent call last):
  File "G:\Code\Python\PoetryTestProject\main_selenium.py", line 6, in <module>
    from twitter_scraper_selenium import scrape_profile
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\__init__.py", line 5, in <module>
    from .keyword import scrape_keyword
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\keyword.py", line 4, in <module>
    from .driver_initialization import Initializer
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\driver_initialization.py", line 12, in <module>
    from seleniumwire import webdriver
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\webdriver.py", line 27, in <module>
    from seleniumwire import backend, utils
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\backend.py", line 4, in <module>
    from seleniumwire.server import MitmProxy
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\server.py", line 5, in <module>
    from seleniumwire.handler import InterceptRequestHandler
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\handler.py", line 5, in <module>
    from seleniumwire import har
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\har.py", line 11, in <module>
    from seleniumwire.thirdparty.mitmproxy import connections
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\thirdparty\mitmproxy\connections.py", line 10, in <module>
    from seleniumwire.thirdparty.mitmproxy.net import tls, tcp
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tls.py", line 43, in <module>
    "SSLv2": (SSL.SSLv2_METHOD, BASIC_OPTIONS),
AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv2_METHOD'. Did you mean: 'SSLv23_METHOD'?

Process finished with exit code 1

I've read on multiple spot on the internet (and even on your facebook scrapper project) that the issue might comes from the "PyOpenSSL" package and i've downgraded its version to "21.0.0" using poetry and the error changes to:

Traceback (most recent call last):
  File "G:\Code\Python\PoetryTestProject\main_selenium.py", line 6, in <module>
    from twitter_scraper_selenium import scrape_profile
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\__init__.py", line 5, in <module>
    from .keyword import scrape_keyword
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\keyword.py", line 4, in <module>
    from .driver_initialization import Initializer
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\twitter_scraper_selenium\driver_initialization.py", line 12, in <module>
    from seleniumwire import webdriver
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\webdriver.py", line 27, in <module>
    from seleniumwire import backend, utils
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\backend.py", line 4, in <module>
    from seleniumwire.server import MitmProxy
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\server.py", line 5, in <module>
    from seleniumwire.handler import InterceptRequestHandler
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\handler.py", line 5, in <module>
    from seleniumwire import har
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\har.py", line 11, in <module>
    from seleniumwire.thirdparty.mitmproxy import connections
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\thirdparty\mitmproxy\connections.py", line 9, in <module>
    from seleniumwire.thirdparty.mitmproxy import certs, exceptions, stateobject
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\seleniumwire\thirdparty\mitmproxy\certs.py", line 10, in <module>
    import OpenSSL
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\OpenSSL\__init__.py", line 8, in <module>
    from OpenSSL import crypto, SSL
  File "C:\Users\Aurelien\AppData\Local\pypoetry\Cache\virtualenvs\poetrytestproject-3wjWjU16-py3.10\lib\site-packages\OpenSSL\crypto.py", line 3279, in <module>
    _lib.OpenSSL_add_all_algorithms()
AttributeError: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'

Process finished with exit code 1

Any idea what might cause it?
I've got a friend that is also having the same issue.
I'm using python version 3.10.0

Thanks!

Proxy

Authenticated proxy doesn't load correctly, if i check "whatismyipaddress" on the driver i get my real IP

pip installed and running example, throws error.

from twitter_scraper_selenium import scrap_keyword
#scrap 10 posts by searching keyword "india" from date 30th August till date 31st August
india = scrap_keyword(keyword="movie", browser="chrome",
tweets_count=10, output_format="json" ,until="2021-08-31", since="2021-08-29")
print(india)

is the code that I am trying to run.
At first it begins the read out begins with

Current google-chrome version is 103.0.5060
Get LATEST driver version for 103.0.5060
[WDM] - Driver found in cache

however, I beleive the following is where the error is occuring:

create_client_context
param = SSL._lib.SSL_CTX_get0_param(context._context)
AttributeError: module 'lib' has no attribute 'SSL_CTX_get0_param'

Output exceeds the size limit. Open the full output data in a text editor
Message: unknown error: net::ERR_CONNECTION_CLOSED
(Session info: headless chrome=103.0.5060.114)

Tweets did not appear

Hello I keep getting this error. I am new to this package so i do not know what I am missing.

"Tweets did not appear!, Try setting headless=False to see what is happening"

AttributeError: 'Keyword' object has no attribute '_Keyword__driver' while trying to search

When I execute the code example in readme file it gives the attribute error mentioned in title.

code:
from twitter_scraper_selenium import scrap_keyword
#scrap 10 posts by searching keyword "india" from date 30th August till date 31st August
india = scrap_keyword(keyword="india", browser="firefox",
tweets_count=10,output_format="json" ,until="2021-08-31", since="2021-08-30")
print(india)

example for twitter topic

i just tried this program to get twitter topic

here is the final result

from twitter_scraper_selenium.keyword import Keyword
URL = 'https://twitter.com/i/topics/1415728297065861123'
headless = False
keyword = 'steamdeck'
browser = 'firefox'
keyword_bot = Keyword(keyword, browser=browser, url=URL, headless=headless, proxy=None, tweets_count=1000)
data = keyword_bot.scrap()
with open('steamdeck.json', 'w') as f:
    json.dump(json.loads(data), f, indent=2)

#  print result
import textwrap
width = 120
for item in sorted(list(json.loads(data).values()), key=lambda x: x['posted_time']):
    wrap_text = '\n'.join(textwrap.wrap(item['content'], width=width))
    print(f"{item['posted_time']} {item['tweet_url']}\n{wrap_text}")
    print('-'*width)

some note on this

  • i got error when initializing webdriver similar to scrapy/scrapy#5635
    • pip install 'pyOpenSSL==22.0.0' should fix it from linked issue
    • this is little bit confusing because all import error is catch with general exception see also example 1 below
      • if possible just let error happened and end the program
  • save json will replace old data, so be careful
    • it is possible to update json data by load the data first if file exist
    • the same thing happen with csv
  • selenium can use custom profile folder, currently i have to edit on either set_properties or set_driver_for_browser on driver_initialization.Initializer
  • any reason why Keyword.scrap have to return json string? why not just return it as dict? when saving the data as csv, it have to be decoded back to dict

example 1

try:
 # assume error on this line because import webdriver failed
 from inspect import currentframe
except Exception as ex:
 print(ex)

# error happened again because currentframe is not imported
frameinfo = currentframe()

Returns only "Tweets did not appear!"

Environment Information

OS Version:

Edition	Windows 10 Pro
Version	21H2
Installed on	‎2022.‎04.‎10
OS build	19044.1706
Experience	Windows Feature Experience Pack 120.2212.4170.0

Python Version:

PS C:\Users\xxx> python -V
Python 3.10.5

Twitter scraper selenium Version:

0.1.6

requirement.txt is already installed

Code

from twitter_scraper_selenium import scrap_profile

microsoft = scrap_profile(twitter_username="Microsoft", output_format="json", browser="firefox", tweets_count=10)
print(microsoft)

Error Information

C:\Users\xxx\PycharmProjects\venv\Scripts\python.exe
C:/Users/xxx/PycharmProjects/Twitter_Selenium.py
[WDM] - Driver [C:\Users\xxx\.wdm\drivers\geckodriver\win64\v0.31.0\geckodriver.exe] found in cache
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
Tweets did not appear!
{}

Process finished with exit code 0

log

geckodriver.log:

1655093631147	geckodriver	INFO	Listening on 127.0.0.1:23023
1655093631171	mozrunner::runner	INFO	Running command: "C:\\Program Files\\Mozilla Firefox\\firefox.exe" "--marionette" "--headless" "--no-sandbox" "--disable-dev-shm-usage" "--ignore-certificate-errors" "--disable-gpu" "--log-level=3" "--disable-notifications" "--disable-popup-blocking" "-no-remote" "-profile" "C:\\Users\\xxx\\AppData\\Local\\Temp\\rust_mozprofileMChkmM"
*** You are running in headless mode.
1655093631395	Marionette	INFO	Marionette enabled
[GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt
console.warn: SearchSettings: "get: No settings file exists, new profile?" (new NotFoundError("Could not open the file at C:\\Users\\xxx\\AppData\\Local\\Temp\\rust_mozprofileMChkmM\\search.json.mozlz4", (void 0)))
1655093632149	Marionette	INFO	Listening on port 23060
Read port: 23060
1655093632179	RemoteAgent	WARN	TLS certificate errors will be ignored for this session
1655093632180	RemoteAgent	INFO	Proxy settings initialised: {"proxyType":"manual","httpProxy":"127.0.0.1:23022","noProxy":[],"sslProxy":"127.0.0.1:23022"}
1655093781898	Marionette	INFO	Stopped listening on port 23060

###!!! [Parent][PImageBridgeParent] Error: RunMessage(msgname=PImageBridge::Msg_WillClose) Channel closing: too late to send/recv, messages will be lost

Install help: selenium.common.exceptions.SessionNotCreatedException

Hi there team,
Thanks for this amazing lib, used it a few times already and had no problem but today trying to make it work again and got stuck on a gecko driver error.

...
selenium.common.exceptions.SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line
...

I created a docker to have a fresh install and make sure it's not my setup, I use it on a Macos and have the same error

Here are the files and how to reproduce

Dockerfile

# Use the official Python 3.9.16 image as the base image
FROM python:3.9.16

# Set the working directory to /app
WORKDIR /app

# Install the necessary dependencies
RUN pip install twitter-scraper-selenium

# Set up the shared volume
VOLUME ["/app"]

# Set the default command to run your script
CMD [ "python", "scrapper.py" ]

scrapper.py

from twitter_scraper_selenium import scrape_profile

microsoft = scrape_profile(twitter_username="microsoft",output_format="json",browser="firefox",tweets_count=10)
print(microsoft)

To run, after having docker installed and setup do:

docker build -t twitter-scraper .
docker run -v $(pwd):/app twitter-scraper

The bellow error happens on docker run and couldn't find anything useful on the internet to help me fix. Can you help me understand what is happening? It's seems to be with the geckodriver, not exactly with the twitter-scapper-selenium but I'm not sure where else to look

Full logs bellow

[WDM] - There is no [linux64] geckodriver for browser  in cache
[WDM] - Getting latest mozilla release info for v0.33.0
[WDM] - Trying to download new driver from https://github.com/mozilla/geckodriver/releases/download/v0.33.0/geckodriver-v0.33.0-linux64.tar.gz
[WDM] - Driver has been saved in cache [/root/.wdm/drivers/geckodriver/linux64/v0.33.0]
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 118, in scrap
    self.__start_driver()
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 39, in __start_driver
    self.__driver = Initializer(
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/driver_initialization.py", line 104, in init
    driver = self.set_driver_for_browser(self.browser_name)
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/driver_initialization.py", line 97, in set_driver_for_browser
    return webdriver.Firefox(service=FirefoxService(executable_path=GeckoDriverManager().install()), options=self.set_properties(browser_option))
  File "/usr/local/lib/python3.9/site-packages/seleniumwire/webdriver.py", line 179, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/firefox/webdriver.py", line 197, in __init__
    super().__init__(command_executor=executor, options=options, keep_alive=True)
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 288, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 381, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 444, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 249, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/scrapper.py", line 21, in <module>
    microsoft = scrape_profile(twitter_username="microsoft",output_format="json",browser="firefox",tweets_count=10)
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 197, in scrape_profile
    data = profile_bot.scrap()
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 128, in scrap
    self.__close_driver()
  File "/usr/local/lib/python3.9/site-packages/twitter_scraper_selenium/profile.py", line 43, in __close_driver
    self.__driver.close()
AttributeError: 'str' object has no attribute 'close'

Appreciate any help

directory option not working

ValueError: not enough values to unpack (expected 2, got 0)
Traceback (most recent call last):
File "/home/arch/Code/Commisions/CryptoGuys/./src/util/scrape.py", line 3, in
scrap_topic(filename="tweets", url='https://twitter.com/i/topics/1468157909318045697',browser="firefox", tweets_count=10, directory='./src/util')
File "/usr/lib/python3.10/site-packages/twitter_scraper_selenium/topic.py", line 60, in scrap_topic
output_path = directory / "{}.json".format(filename)
TypeError: unsupported operand type(s) for /: 'str' and 'str'

Incomprehendable Error

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 128, in scrap
    self.start_driver()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 52, in start_driver
    self.browser, self.headless, self.proxy, self.browser_profile).init()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 104, in init
    driver = self.set_driver_for_browser(self.browser_name)
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 97, in set_driver_for_browser
    return webdriver.Firefox(service=FirefoxService(executable_path=GeckoDriverManager().install()), options=self.set_properties(browser_option))
  File "/usr/local/lib/python3.10/dist-packages/seleniumwire/webdriver.py", line 178, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/firefox/webdriver.py", line 177, in __init__
    super().__init__(
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 277, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 370, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Failed to decode response from marionette


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/Bots/DiscordBots/TwitterTopics/MTC/src/util/scrape.py", line 6, in <module>
    scrape_topic(filename="tweets", url='https://twitter.com/i/topics/1468157909318045697',browser="firefox", tweets_count=25)
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/topic.py", line 53, in scrape_topic
    data = keyword_bot.scrap()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 140, in scrap
    self.close_driver()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 55, in close_driver
    self.driver.close()
AttributeError: 'str' object has no attribute 'close'
  • I do not understand this error.

Install false

image
Please check . Some thinks wrong . Thanks for suppot

can't set firefox profile path

from twitter_scraper_selenium.topic import scrap_topic
scrap_topic(
    filename='linux',
    url='https://twitter.com/i/topics/848959431836487680',
    headless=False,
    browser_profile='/home/r3r/Documents/selenium_profile'
)

output

> python steamdeck.py
INFO:root:Loading Profile from /home/r3r/Documents/selenium_profile
[WDM] - Driver [/home/r3r/.wdm/drivers/geckodriver/linux64/v0.31.0/geckodriver] found in cache
INFO:WDM:Driver [/home/r3r/.wdm/drivers/geckodriver/linux64/v0.31.0/geckodriver] found in cache
INFO:seleniumwire.storage:Using default request storage
INFO:seleniumwire.backend:Created proxy listening on 127.0.0.1:44371

this will freeze after those text output

looking at source code for selenium setting profile is deprecated https://github.com/SeleniumHQ/selenium/blob/a4995e2c096239b42c373f26498a6c9bb4f2b3e7/py/selenium/webdriver/firefox/options.py#L101-L105

i have better success by using profile argument just like chrome https://github.com/rachmadaniHaryono/twitter-scraper-selenium/tree/bugfix/profile

but i only test it with linux

login support before executing apis

add support for logging into twitter using email/username/password
once logged in, can more data be accessed?
is the session going to be more stable?

The address returned by calling scrape_profile is wrong!

hi,there。The address returned by calling scrape_profile(twitter_username="blvckledge", output_format="json", browser="chrome", tweets_count=10) for videos is blob:https://twitter.com/facd7bdb-73ec-49ff-9492-993a165a3585, but the actual address is https://video.twimg.com/ext_tw_video/1673328923311058944/pu/vid/848x464/KNnl7Rqk_MfiX4X9.mp4?tag=12. Why is an incorrect address being returned? Additionally, I receive errors when calling scrape_topic_with_api() and scrape_profile_with_api(). What could be causing this?

scrap hashtag

does using hashtags work?
I want data on people who use a specific hashtag and find out who uses it the most

not enough values to. unpack

image
hi,guys.I pulled the code again from GitHub and reinstalled it, but why am I still getting the following error?

How to show progress for the scraping process? Right now there is no indication that it works until it is all done.

I am still learning to use the package. It is VERY nice in terms of functionality.

The delays before seeing anything can be 2-3 minutes or longer, depending on how many tweets are being fetched.

Is there an easy way to monitor progress without slowing down the scraping process?

I see that logging is built in. Where are the log files stored?
Can the log be streamed to another terminal in my IDE? I am using PyCharm Pro.

AttributeError: 'str' object has no attribute 'close'

Running on my VPS: Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-52-generic x86_64)

[WDM] - Driver [/root/.wdm/drivers/geckodriver/linux64/v0.32.2/geckodriver] found in cache
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 118, in scrap
    self.__start_driver()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 40, in __start_driver
    self.browser, self.headless, self.proxy, self.browser_profile).init()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 104, in init
    driver = self.set_driver_for_browser(self.browser_name)
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 97, in set_driver_for_browser
    return webdriver.Firefox(service=FirefoxService(executable_path=GeckoDriverManager().install()), options=self.set_properties(browser_option))
  File "/usr/local/lib/python3.10/dist-packages/seleniumwire/webdriver.py", line 179, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/firefox/webdriver.py", line 197, in __init__
    super().__init__(command_executor=executor, options=options, keep_alive=True)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 288, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 381, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 444, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 249, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status 1


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/crypto-ordinals/twitter_feeds.py", line 67, in <module>
    tweets = scrape_profile(twitter_username="ordswapbot",output_format="json",browser="firefox",tweets_count=10,headless=False)
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 197, in scrape_profile
    data = profile_bot.scrap()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 128, in scrap
    self.__close_driver()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/profile.py", line 43, in __close_driver
    self.__driver.close()
AttributeError: 'str' object has no attribute 'close'

Get newest tweets with keyword

I'm using this awesome lib to get the newest tweets containing certain keywords. If you don't supply values for "since" and "until" you will get the same tweets everytime - the first tweets of the current day. I managed to get around that by deleting the since and until part from the search URL but that is certainly not intended 😄

Could you maybe add a live feature so your code doesn't get corrupted by me? 😆

Incorrect Retweets 'tweet_url '

When scraping a profile, retweets get an incorrect 'tweet_url'

in profile.py
tweet_url = "https://twitter.com/{}/status/{}".format(username,status)
status is the same as username when it is a retweet.

Not scraping every tweet from a user

Hello, I am trying to scrape every tweet from a user. From the twitter page, I can see that they have tweeted more than 5000 times. However, even when I set my tweets_count to 5000, I am getting less than 1000 tweets from that user.

My code is below:

scrape_profile(twitter_username = "elonmusk", output_format ="csv", tweets_count = 6000, browser = "chrome", filename = "elonmusk")

(Note that @ElonMusk is just a stand-in example)

More Examples and Documentation Needed

More Examples and Documentation are needed...

I am new to using this package. I see a lot of functionality, but not enough examples of how to use it.
It would be nice if there was a ReadTheDocs.io website for it.

I am interested in pulling images and metadata associated with tweets. The element_finder.py file appears to pull images, but it unclear how to call it separately, or whether that is the best way to use it in this package.

@staticmethod
def find_images(tweet) -> Union[list, None]:
    """finds all images of the tweet"""
    try:
        image_element = tweet.find_elements(By.CSS_SELECTOR,
                                            'div[data-testid="tweetPhoto"]')
        images = []
        for image_div in image_element:
            href = image_div.find_element(By.TAG_NAME,
                                          "img").get_attribute("src")
            images.append(href)
        return images
    except Exception as ex:
        logger.exception("Error at method find_images : {}".format(ex))
        return []

Many of the functions do not have docstrings. I am going through them now to try to add descriptions for my own benefit.

I also want to add things to outputs to automate the data capture. For example, I want to time-stamp output files, because during testing I will have to do multiple runs to make sure that I got all information I was looking for.

For JSON files, I would like to pretty-print them before saving them, because they come out in a one-liner format in the file.

I am not sure where to insert these functions. I will keep browsing the package structure to see if I can figure out these things, and add helpful docstrings where they are missing.

timeout exception

[WDM] - Driver [C:\Users\HP Probook.wdm\drivers\geckodriver\win64\v0.33.0\geckodriver.exe] found in cache
2023-07-13 13:12:40,345 - twitter_scraper_selenium.driver_utils - ERROR - Tweets did not appear!, Try setting headless=False to see what is happening
Traceback (most recent call last):
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\twitter_scraper_selenium\driver_utils.py", line 35, in wait_until_tweets_appear
WebDriverWait(driver, 80).until(EC.presence_of_element_located(
File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\selenium\webdriver\support\wait.py", line 95, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:183:5
NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:395:5
element.find/</<@chrome://remote/content/marionette/element.sys.mjs:134:16

twitter_scraper_selenium.scraping_utilities:Error at find_x_guest_token: 'guest_token'

I'm trying to make an API, but this error keeps troubling me. This snippet works in my local machine but gets broken when I ship it to Render using docker.

def init(app: flask.app.Flask):
    @app.route("/user/<string:username>")
    def user(username):
        filename = "get_profile_details"
        get_profile_details(
            twitter_username=username,
            filename=filename,
        )
        data = json.load(open(filename + ".json"))
        return data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.