Code Monkey home page Code Monkey logo

Comments (6)

muety avatar muety commented on August 28, 2024 1

Great idea! I wasn't aware of difflib and that it's even part of the standard library. I'll adopt it as soon as possible.

This project started out as a very baisc, hacky script and the "change detection" logic wasn't changed since then. Now it's time to become a little more mature.

from website-watcher.

muety avatar muety commented on August 28, 2024 1

I'll look into it again these days!

from website-watcher.

coveritytest avatar coveritytest commented on August 28, 2024

Kudos for the quick implementation, there is a bug though. doc1 and doc2 are different, although the HTML source code didn't change. For example umlauts are treated differently, so are CRLF: here's the debug view. Don't know why atm, would have to look deeper but I would say f.read().encode("utf-8") and r.text.encode("utf-8") return different strings.

Okay found it, you will get the correct encoding when you open the file with utf-8 encoding like this, indead of reading with utf-8:

with open(tmp_location, 'r', encoding='utf8') as f:
doc1 = filter_document(get_nodes(args.xpath, f.read()))

Now there is still the problem with the CRLF. Okay solved the problem with the CRLF, you need to open it with newline='':

with open(tmp_location, 'r', encoding='utf8', newline='') as f:

Will send a push request soon.

from website-watcher.

coveritytest avatar coveritytest commented on August 28, 2024

Unfortunately, there's still some issue with the encoding. The content read from the file differs from the one it gets via requests. Not sure where and why, though.

from website-watcher.

muety avatar muety commented on August 28, 2024

Please try again.

from website-watcher.

coveritytest avatar coveritytest commented on August 28, 2024

Perfect, that was the problem. Thanks!

from website-watcher.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.