Code Monkey home page Code Monkey logo

rss-bot's Introduction

GitHub tag (latest SemVer) License GitHub stars

A bot for Lemmy and Sublinks that watches rss feeds and posts new posts from them in communities.

About

This is a bot that watches rss feeds and posts new posts from them in communities

Prerequisites

  • You need to have installed node.js or Docker in order to run the bot

Setup with Node.js (Option 1)

  1. Clone the repository
  2. Create an account in the instance you want the bot to have as its home (just make a regular user)
  3. Create a file called .env in the bot folder and give it values in this format with the data in the quotes (dont add the slashes or the part after the slashes)
LEMMY_INSTANCE="" // The instance the bot account is in
LEMMY_USERNAME="" // The bot username
LEMMY_PASSWORD="" // The bot password
  1. Change the data in config.yaml based on what you want set. Set the communities and feeds you want here
  2. Open a terminal in the bot folder and run npm install to install dependendies and then node main.js to run the bot (whenever you want to start the bot again you can just do ctrl+c to interrupt the process and node main.js to start the bot)

I recommend installing something like forever.js which will make it start back up again if it errors at some point

If you run into issues feel free to dm me on Matrix here

Setup with Docker (Option 2)

  1. Clone the repository
  2. Create an account in the instance you want the bot to have as its home (just make a regular user)
  3. Create a file called .env in the bot folder and give it values in this format with the data in the quotes (dont add the slashes or the part after the slashes)
LEMMY_INSTANCE="" // The instance the bot account is in
LEMMY_USERNAME="" // The bot username
LEMMY_PASSWORD="" // The bot password
  1. Change the data in config.yaml based on what you want set. Set the communities and feeds you want here
  2. In the project directory build the docker image by running docker build -t <your name>/<desired_image_name> . and then launch a new container with docker run <your name>/<desired_image_name>

Other Bots

[To be added]

rss-bot's People

Contributors

ategon avatar doug-wade avatar mjpc13 avatar sacmii avatar th3raid0r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

rss-bot's Issues

ERR_UNHANDLED_REJECTION

The error includes incorrect_login, however I'm able to log in just fine with the same credentials on the site.

stdout

STARTED: Started Bot
INSTANCES: 1 instances loaded.
FEEDS: 1 feeds loaded.
Starting bot
Creating database file
DB: Connected to the database.
Initializing DB
logging in
TABLE: Loaded posts table.
TABLE: Loaded time table
POSTS: 0 posts in database.
node:internal/process/promises:289
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason "incorrect_login".] {
  code: 'ERR_UNHANDLED_REJECTION'
}

Node.js v20.5.1

config.yaml

---
# Whether to give the bot a bot tag (true) or not (false). Recommended to mark it but the option is here if you already 
# marked it manually and it starts throwing user_already_exists errors
markAsBot: true

# How often to check for new posts in minutes
postCheckInterval: 10

# How often to check for a new day in minutes (for unpinning posts)
dayCheckInterval: 10

# The timezone to use for the bot (as reference for unpinning posts at midnight)
# You can see the options here: https://www.inmotionhosting.com/support/website/tz-ref-table/
timezone: 'Europe/Amsterdam'

# Posts from how many days ago are you willing to backpost when the bot starts
dayCutOff: 3

# Set to true to add all posts to the db without posting them. Good to set for one run to clear out backposts if you 
# dont want any old posts posted when the bot is first ran. Set to false to post normally
stopPosts: false

# Set to true if you want to see log messages. False if not
# (Note I cant control log messages sent by the bot library so those will still show. Just ones thrown by the bot wont)
showLogs: true

# The maximum amount of posts it will do on every post check. Set to 0 for no limit. (Each post being posted to another instance is separate in here but itll finish up the same post before it stops)
maxPosts: 5

# The time in milliseconds it will sleep before doing another post in the same post check
postSleepDuration: 5000

# The instances and communities used by the bot
instances:
  sh.itjust.works: # The instance name
    nos_tech:
      - "nos_tech"


# The rss feeds used to pull posts from
feeds:
  nos_tech:
    url: 'https://feeds.nos.nl/nosnieuwstech'
    content: 'description'
    datefield: 'pubDate'

post_id is always null

I spelunking in my post database to debug an issue, and I found that the post_id is always null.

sqlite> select count(*) from posts;
119
sqlite> select count(*) from posts where post_id is null;
119

I'm on version v2.1.0 using node v20.11.0.

Looking in the code, it seems we're not capturing the response from createPost, which should have the post id on response.post_view.post.id.

I am planning a migration to v2.2.0 in the next couple of weeks, so I'm not certain a backport of the bugfix to the v2.1.x branch is worthwhile unless there are other users of the v2.1.x branch that are being affected by this issue.

Allow customizing rss-parser options

I'm having trouble with some of my rss feeds timing out. I think 1 minute is probably a reasonable default, but for debugging purposes, I'd like to increase it to 2 minutes. Similarly, I'm wondering if passing a custom user-agent string will help.

I'd like to add a new key to config.yaml that is the options passed to rss-parser. For example

parserOptions:
  timeout: 120000
  headers:
    User-Agent: "Mozilla/5.0 (compatible; lemmy-mega-bot/1.0; +https://github.com/Ategon/Lemmy-Mega-Bot)"

TypeError: Invalid URL

I'm getting the following stack when I attempt to start the bot

SQLITE_ERROR: no such table: posts
TypeError: Invalid URL
    at new URL (node:internal/url:775:36)
    at parseURL (/home/doug/workspace/Lemmy-Mega-Bot/node_modules/node-fetch/lib/index.js:1164:12)
    at new Request (/home/doug/workspace/Lemmy-Mega-Bot/node_modules/node-fetch/lib/index.js:1210:17)
    at /home/doug/workspace/Lemmy-Mega-Bot/node_modules/node-fetch/lib/index.js:1453:19
    at new Promise (<anonymous>)
    at LemmyHttp.fetch (/home/doug/workspace/Lemmy-Mega-Bot/node_modules/node-fetch/lib/index.js:1451:9)
    at LemmyHttp.fetch (/home/doug/workspace/Lemmy-Mega-Bot/node_modules/cross-fetch/dist/node-ponyfill.js:10:20)
    at LemmyHttp.<anonymous> (/home/doug/workspace/Lemmy-Mega-Bot/node_modules/lemmy-js-client/dist/http.js:876:90)
    at Generator.next (<anonymous>)
    at /home/doug/workspace/Lemmy-Mega-Bot/node_modules/lemmy-js-client/dist/http.js:8:71 {
  code: 'ERR_INVALID_URL',
  input: 'https://"lemmy.world" // The instance the bot account is in/api/v3/user/login'
}

You can see my full configuration here.

Emit error logging when community is not found

I made a mistake in my config.yaml where I misspelled the name of my community and my posts were not being created in that community. There wasn't any indication that there was a problem with the community not existing or not being found, which made it hard to diagnose the issue. I think the change is probably updating the logging here to be a bit more descriptive, though I think there is likely also upstream work that could be done in LemmyBot to improve the error it returns.

Here is my error logging from the relevant timeframe:

user_already_exists
user_already_exists
FetchError: invalid json response body at https://lemmy.world/api/v3/post reason: Unexpected token 'J', "Json deser"... is not valid JSON
    at /home/doug/workspace/Lemmy-Mega-Bot/node_modules/node-fetch/lib/index.js:273:32
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  type: 'invalid-json'
}
FetchError: invalid json response body at https://lemmy.world/api/v3/post reason: Unexpected token 'J', "Json deser"... is not valid JSON
    at /home/doug/workspace/Lemmy-Mega-Bot/node_modules/node-fetch/lib/index.js:273:32
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  type: 'invalid-json'
}
user_already_exists

Filter by category

I have a feed that I am interested in running a bot for, and I would like to post two categories to two different communities -- "Ride of the Valkyries" to lemmy.world/c/reign_fc and everything else, except "Subscriber-only content" to lemmy.world/c/sounders_fc. I don't think from my reading of the comments in config.yaml, the bot can support this use case.

I'd like to propose we add a pair of new keys to a feed in config.yaml, include_categories and exclude_categories, which take lists of categories which are included in the feed or excluded from the feed, respectively.

For my use case, for example, I would have the following config:

feeds:
  example_include:
    include_categories: 
     - "Ride Of The Valkyries"

  example_exclude:
    exclude_categories: 
      - "Ride Of The Valkyries"
      - "Subscriber-only content"

If there is interest in this feature, I would be interested in contributing a patch.

Error: Bad Request

Same config as #8

Using sh.itjust.works instance (running 0.18.5).

stdout

STARTED: Started Bot
INSTANCES: 1 instances loaded.
FEEDS: 1 feeds loaded.
Starting bot
Initializing DB
Logging in
DB: Connected to the database.
TABLE: Loaded time table
TABLE: Loaded posts table.
POSTS: 0 posts in database.
Logged in
Marking account as bot account
Error: Bad Request
    at LemmyHttp.<anonymous> (/home/user/Programming/Web/rss-bot/node_modules/lemmy-js-client/dist/http.js:887:19)
    at Generator.throw (<anonymous>)
    at rejected (/home/user/Programming/Web/rss-bot/node_modules/lemmy-js-client/dist/http.js:6:65)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

config.yaml

---
# Whether to give the bot a bot tag (true) or not (false). Recommended to mark it but the option is here if you already 
# marked it manually and it starts throwing user_already_exists errors
markAsBot: true

# How often to check for new posts in minutes
postCheckInterval: 10

# How often to check for a new day in minutes (for unpinning posts)
dayCheckInterval: 10

# The timezone to use for the bot (as reference for unpinning posts at midnight)
# You can see the options here: https://www.inmotionhosting.com/support/website/tz-ref-table/
timezone: 'Europe/Amsterdam'

# Posts from how many days ago are you willing to backpost when the bot starts
dayCutOff: 3

# Set to true to add all posts to the db without posting them. Good to set for one run to clear out backposts if you 
# dont want any old posts posted when the bot is first ran. Set to false to post normally
stopPosts: false

# Set to true if you want to see log messages. False if not
# (Note I cant control log messages sent by the bot library so those will still show. Just ones thrown by the bot wont)
showLogs: true

# The maximum amount of posts it will do on every post check. Set to 0 for no limit. (Each post being posted to another instance is separate in here but itll finish up the same post before it stops)
maxPosts: 5

# The time in milliseconds it will sleep before doing another post in the same post check
postSleepDuration: 5000

# The instances and communities used by the bot
instances:
  sh.itjust.works: # The instance name
    nos_tech:
      - "nos_tech"


# The rss feeds used to pull posts from
feeds:
  nos_tech:
    url: 'https://feeds.nos.nl/nosnieuwstech'
    content: 'description'
    datefield: 'pubDate'

FetchError: request failed, reason: getaddrinfo EAI_AGAIN

I left my bot up and running for about 15 days, and after about 2 weeks of polling every 10 minutes, I started to get this error on every request:

FetchError: request to https://lemmy.world/api/v3/post/list?type_=Subscribed&auth=<redacted> failed, reason: getaddrinfo EAI_AGAIN lemmy.world
    at ClientRequest.<anonymous> (/home/doug/workspace/Lemmy-Mega-Bot/node_modules/node-fetch/lib/index.js:1505:11)
    at ClientRequest.emit (node:events:518:28)
    at TLSSocket.socketErrorListener (node:_http_client:495:9)
    at TLSSocket.emit (node:events:518:28)
    at emitErrorNT (node:internal/streams/destroy:169:8)
    at emitErrorCloseNT (node:internal/streams/destroy:128:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  type: 'system',
  errno: 'EAI_AGAIN',
  code: 'EAI_AGAIN'
}

I think what's happened is that we're doing too many dns queries. I think we might want to investigate caching our dns lookups with something like cacheable-lookup to reduce the number of dns queries for the same server.

TypeError: Cannot read properties of undefined (reading 'replace')

Have I put something stupid in .env?

~/Downloads/Lemmy-Mega-Bot-main$ node main.js
STARTED: Started Bot
INSTANCES: 1 instances loaded.
FEEDS: 1 feeds loaded.
/home/sam/Downloads/Lemmy-Mega-Bot-main/node_modules/lemmy-bot/dist/helpers.js:36
const stripPort = (instance) => instance.replace(/:.*/, '');
^

TypeError: Cannot read properties of undefined (reading 'replace')
at stripPort (/home/sam/Downloads/Lemmy-Mega-Bot-main/node_modules/lemmy-bot/dist/helpers.js:36:42)
at /home/sam/Downloads/Lemmy-Mega-Bot-main/node_modules/lemmy-bot/dist/bot.js:336:141
at Array.some ()
at new LemmyBot (/home/sam/Downloads/Lemmy-Mega-Bot-main/node_modules/lemmy-bot/dist/bot.js:336:99)
at file:///home/sam/Downloads/Lemmy-Mega-Bot-main/main.js:97:13
at ModuleJob.run (node:internal/modules/esm/module_job:194:25)

Node.js v18.13.0

Support for using cookies?

There is a feed I am wanting to access at https://pso2.com/players/pso2news.xml that requires that you go through a rather over-zealous age gate process first before you are able to access it. The age gate sets a few cookies to confirm that you have gone through it then serves the feed to you if the cookies exist.
I had a look through the configuration files but I cannot see a way to provide cookies for a feed?

Errors stop processing of all subsequent feeds

I'm having trouble with one of my rss feeds, which is timing out after 60000ms. I'd like to leave it in the config.yaml, in case it comes back online in the future, but the first error encountered stops all later feeds from being processed. I'd rather we catch and log the error and continue processing the rest of the feeds.

Cross-post articles that appear in multiple feeds

I have a post from a link that appears in two of my feeds (this one and this one). It was only posted in a single community -- the first one that processed -- and did not appear in the second community at all. I would prefer if each subsequent time we encountered the link, we cross-posted the link to the other communities that have the same link in the feed.

Bot Unable To Post To Instances

Bot acct is on: lemm.ee
Initial error posting to: lemmy.world/c/metal
Second error (after repointing target community) posting to: lemm.ee/c/test

The bot I created ran fine for 2 days then stopped working. After talking with u/Ategon we thought it might be because the instance (lemmy.world) was behind a CloudFlare challenge. I then stopped the bot and edited the settings to point at lemm.ee/c/test which is a Community to test posting for apps and bots. Lemm.ee is not using CloudFlare. The bot will still not post to an instance. I use forever (as per the readme) so I have a log of the issue (attached)

Bb8N.log

But the error is: Could not find [email protected]

This community definitely exists.

Unable to install

I'm unable to setup the bot. When I run node main.js I get the following error:

~/Lemmy-Mega-Bot/main.js:1
import LemmyBot from 'lemmy-bot';
       ^^^^^^^^

SyntaxError: Unexpected identifier
    at Module._compile (internal/modules/cjs/loader.js:723:23)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10)
    at Module.load (internal/modules/cjs/loader.js:653:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
    at Function.Module._load (internal/modules/cjs/loader.js:585:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:831:12)
    at startup (internal/bootstrap/node.js:283:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:623:3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.