Code Monkey home page Code Monkey logo

breads-server's Introduction

zero-to-mastery

Zero to Mastery Open Source

Screenshot

breads-server's People

Contributors

areezy avatar ashutosh00710 avatar aubundy avatar dependabot[bot] avatar mattcsmith avatar maxemileffort avatar tas09009 avatar togenplusplus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

breads-server's Issues

Restructure tables

  • How to separate existing table?

  • User Tags (user_id, tag_id)

  • User Readings (user_id, reading_id)

  • Reading Tags (reading_id, tag_id)

  • Favorites

  • Subscriptions

  • Users

  • Readings

  • Tags

DB error when searching for users

ER_TABLE_CANT_HANDLE_FT: The used table type doesn't support FULLTEXT indexes

CleardDB addon in Heroku defaults to an older version of MySQL that doesn't support InnoDB as a storage engine. Switching to MyISAM might cause data loss, so I'm looking into JawsDB

Found small issue regarding the content of an article preview.

Hi all,

I found your project on the ZTM Discord server. After checking out your website, I noticed a small issue with the content of one of your articles.

The description of the post "China's port congestion ties up 565 bulkers" contains a paragraph HTML tag. I would've attempted to fix the issue myself, but It is likely saved on a database that I won't have access to.

Please view the attached screenshot for a better idea.

Regards,

Tristan

bread io-issue

Web Scraper Improvements - Work Better with Bot Detectors

Original thread - Link

Fixed issues with the initial scrape failing, and then if it still fails, it passes the url to a webdriver, which attempts to mimic normal user behavior to get past bot detection.

Repo: Link

Works about 90% of the time now. The slower it goes, the better it works. Which brings me to the TODO list:

  • Throttle requests - checking the database for previous failed attempts and creating a batch of re-scrapes with a rate limit of 30-60 sec between requests
  • Refactor to classes - make the code a little more readable and user-friendly for code improvements later
  • Placeholders - sometimes we can't get a description or an image, so instead of leaving them blank, create a placeholder
  • PDFs - I know there's a pdf scraper being developed, too. That will just need to be added when it's done.

Also, this is my first contribution on a public project ever, so any pointers/feedback are welcome!

Web Scraper Improvements

The current web scraper works maybe 95% of the time. Sometimes a website detects the bot and won't display the article. Other times a user will save a PDF or video they watched, but the current scraper is built to only handle articles, so the content doesn't display correctly. For a start, I created a checklist below of these edge cases/other features that would make the scraper more robust.

  • Work better with bot detectors - #40
  • Better error handling. Python errors should be passed into an array (along with the values array) that is printed at the end of reading_scraper.py to be picked up by nodejs
  • PDFs
  • Tests
  • YouTube and Vimeo videos
  • Web-based Podcasts
  • Better error handling/general refactoring for better readability
  • Working with articles written in other languages
  • As improvements are made to the scraper to make it more accurate, we need a way to go back and update old readings

Feel free to break off one of the tasks above into its own issue for better collaboration

Cannot update reading if it is outdated

As the article scraper improves, it is able to accurately scrape articles that were previously unable to be scraped. Because of this users can update an article if it is missing data and was scraped with a previous version of the scraper. But as changes have been made, the functions over updating an article have been skipped over so now a user is unable to update an old article. This is due to a syntax error in the MySQL code. Whenever a user clicks on the "Update" button on the frontend, it should trigger a new scrape of the article and save the new data in the db.

Saving repeat articles

Right now, every url is added to db with new id. There is no checking to see if the url has been read before. Our "readings" table should only add rows for new articles, and for previously read articles, we can just add the appropriate ids to the "user readings" table. This would mean comparing entire url strings, which I doubt is very performant.

Error uploading matching reading url

User should be able to upload a reading that another user has read. The server should not upload it again to the db table, but instead increment a tally for that url.

Some reading titles are wrong url

Not sure why this is happening, but some title's become 'https:///search?q=cache:URL HERE'

Is the server overloaded?
updating atlantic articles 5 seconds apart caused this to happen. No error was listed on heroku
Why does a url get inserted into title?

  • not sure why this is happening, but it is related to google cache service identifying the breads bot

Node/Express best practices

"message":"Not allowed by CORS" Environment Variable Issue

Hello Everyone!

I'm getting this {"error":{"message":"Not allowed by CORS"}} message every time I go on http://localhost:8080/. I've followed all the steps of creating a .env file and adding environment variables (like this: LOCAL_CORS="http://localhost:8080") but cannot seem to get rid of this error.

These are the two articles I was following for reference:
Environment variables with Node.js
Working with Environment Variables in Node.js

Is there something i'm missing? I tried creating a .config file but that didn't seem to work.

Any suggestions would be very helpful!

Include instructions to setup mysql server

Problem: Currently there is no description written in README.md for the installation process of mysql on local machines.
Proposed Solution: Add step by step instructions (including screenshots) so that contributors can follow smoothly.

I am interested in adding the instructions for windows users.

Add tagging

For tagging feature, add appropriate db table and queries

MySQL Authentication Error

Using MySQL 8.0, the below error appears when trying to login to db with the mysql_native_password authentication method instead of caching_sha2_password

ER_NOT_SUPPORTED_AUTH_MODE: Client does not support authentication protocol requested by server; consider upgrading MySQL client

Check if user has previously read an article

Add a users api endpoint that checks the db to see if a user has already read an article. It should return true if so, and false if not. There is some overlap with #62 with needing to compare url strings

Cannot delete reading

Due to the structure of some SQL tables, some readings cannot be deleted:

  • if the reading is favorited
  • if the reading has associated tags

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.