Light

zero-to-mastery / breads-server Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 29.0 874 KB

Server code for Breads. Keep track of what you read online, and see what your friends are reading.

Home Page: https://www.breads.io/

License: Other

JavaScript 82.70% Python 17.21% Shell 0.09%

express mysql nodejs python

breads-server's Introduction

zero-to-mastery

Zero to Mastery Open Source

breads-server's People

Contributors

Stargazers

Watchers

breads-server's Issues

Users can see specifics of backend errors

Users should only be able to see simple error messages "Unauthorized", "Email needed", etc.

Restructure tables

How to separate existing table?
User Tags (user_id, tag_id)
User Readings (user_id, reading_id)
Reading Tags (reading_id, tag_id)
Favorites
Subscriptions
Users
Readings
Tags

DB error when searching for users

ER_TABLE_CANT_HANDLE_FT: The used table type doesn't support FULLTEXT indexes

CleardDB addon in Heroku defaults to an older version of MySQL that doesn't support InnoDB as a storage engine. Switching to MyISAM might cause data loss, so I'm looking into JawsDB

Found small issue regarding the content of an article preview.

Hi all,

I found your project on the ZTM Discord server. After checking out your website, I noticed a small issue with the content of one of your articles.

The description of the post "China's port congestion ties up 565 bulkers" contains a paragraph HTML tag. I would've attempted to fix the issue myself, but It is likely saved on a database that I won't have access to.

Please view the attached screenshot for a better idea.

Regards,

Tristan

Python scraper connects to db

Change create reading mysql query to come from node.js. Print values from Python and insert in node

Web Scraper Improvements - Work Better with Bot Detectors

Original thread - Link

Fixed issues with the initial scrape failing, and then if it still fails, it passes the url to a webdriver, which attempts to mimic normal user behavior to get past bot detection.

Repo: Link

Works about 90% of the time now. The slower it goes, the better it works. Which brings me to the TODO list:

Throttle requests - checking the database for previous failed attempts and creating a batch of re-scrapes with a rate limit of 30-60 sec between requests
Refactor to classes - make the code a little more readable and user-friendly for code improvements later
Placeholders - sometimes we can't get a description or an image, so instead of leaving them blank, create a placeholder
PDFs - I know there's a pdf scraper being developed, too. That will just need to be added when it's done.

Also, this is my first contribution on a public project ever, so any pointers/feedback are welcome!

Web Scraper Improvements

The current web scraper works maybe 95% of the time. Sometimes a website detects the bot and won't display the article. Other times a user will save a PDF or video they watched, but the current scraper is built to only handle articles, so the content doesn't display correctly. For a start, I created a checklist below of these edge cases/other features that would make the scraper more robust.

Work better with bot detectors - #40
Better error handling. Python errors should be passed into an array (along with the values array) that is printed at the end of reading_scraper.py to be picked up by nodejs
PDFs
Tests
YouTube and Vimeo videos
Web-based Podcasts
Better error handling/general refactoring for better readability
Working with articles written in other languages
As improvements are made to the scraper to make it more accurate, we need a way to go back and update old readings

Feel free to break off one of the tasks above into its own issue for better collaboration

Cannot update reading if it is outdated

As the article scraper improves, it is able to accurately scrape articles that were previously unable to be scraped. Because of this users can update an article if it is missing data and was scraped with a previous version of the scraper. But as changes have been made, the functions over updating an article have been skipped over so now a user is unable to update an old article. This is due to a syntax error in the MySQL code. Whenever a user clicks on the "Update" button on the frontend, it should trigger a new scrape of the article and save the new data in the db.

Update tests

Add favorited option to reading table

~~Will need to eventually make it's own table~~
Make favorites table and join to readings whenever fetched

Refactor API routes

Not currently following best practices. Append nested resources to parent resource path

Saving repeat articles

Right now, every url is added to db with new id. There is no checking to see if the url has been read before. Our "readings" table should only add rows for new articles, and for previously read articles, we can just add the appropriate ids to the "user readings" table. This would mean comparing entire url strings, which I doubt is very performant.

Error uploading matching reading url

User should be able to upload a reading that another user has read. The server should not upload it again to the db table, but instead increment a tally for that url.

Some reading titles are wrong url

Not sure why this is happening, but some title's become 'https:///search?q=cache:URL HERE'

~~Is the server overloaded?~~
updating atlantic articles 5 seconds apart caused this to happen. No error was listed on heroku
Why does a url get inserted into title?

not sure why this is happening, but it is related to google cache service identifying the breads bot

Add number of time read column to readings

Add ReadMe file

https://www.writethedocs.org/guide/writing/beginners-guide-to-docs/

https://medium.com/@meakaakka/a-beginners-guide-to-writing-a-kickass-readme-7ac01da88ab3#:~:text=A%20README%20is%20like%20the,README%20is%20not%20so%20useful.&text=No%20README%2C%20no%20description%2C%20no%20nothing.

Node/Express best practices

Thumb through these resources and identify steps to improve breads server

https://github.com/goldbergyoni/nodebestpractices#readme
https://www.codementor.io/@mattgoldspink/nodejs-best-practices-du1086jja
https://www.freecodecamp.org/news/express-js-security-tips/
https://medium.com/skyshidigital/6-tricks-to-speed-up-and-improve-your-node-js-performance-fadc06d15cbe

https://cheatsheetseries.owasp.org/cheatsheets/Forgot_Password_Cheat_Sheet.html
https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html

Add json response for csp report route

Add support for full unicode in db

Resource here:

https://mathiasbynens.be/notes/mysql-utf8mb4#character-sets

DB connection downtime is too long

Error: Connection lost: The server closed the connection.

Consider moving from .createConnection() to .createPool()

"message":"Not allowed by CORS" Environment Variable Issue

Hello Everyone!

I'm getting this {"error":{"message":"Not allowed by CORS"}} message every time I go on http://localhost:8080/. I've followed all the steps of creating a .env file and adding environment variables (like this: LOCAL_CORS="http://localhost:8080") but cannot seem to get rid of this error.

These are the two articles I was following for reference:
Environment variables with Node.js
Working with Environment Variables in Node.js

Is there something i'm missing? I tried creating a .config file but that didn't seem to work.

Any suggestions would be very helpful!

if the reading is favorited
if the reading has associated tags

Funnel http images through breads service to stay https

search stackoverflow

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.