Code Monkey home page Code Monkey logo

web_scraper's Introduction

Amazon webscraper whole scrapes description, front page images and technical specs of a given product from two difrent amazon domains - polish, german and french, where french is not in polish language. It' s main purpuse is to speed up proces of creating product despriptions in resale stores as outlets etc. Code can also be implementet to various activities, like for colecting data for machine learning ect.

Program it's meant to be helper not executor. Polish text scrapted from Amazon mostly needs to be corected due to poor translation or wrong formating.

I used Beautifulsoup4 to scrap data from website I added Regex expresions get rid of html language.

GUI was made by using QT-Designer and I have created simple functionality(with in interface) in pyqt5.

In order to get it working you must give it an ASIN number. It is critical that you copied it from amazon link for exaple:

* in https://www.amazon.pl/-/dp/B09TRW5QTX ASIN number is B09TRW5QTX and that need to be an input to the program

* https://www.amazon.pl/Bistro-11160-57EURO-4PL-elektryczny-mlynek-nierdzewna/dp/B07N23V6P1/.../ - ASIN = B07N23V6P1

* https://www.amazon.pl/Led-Lenser-7495TP-LENSER-Stirnlampe/dp/B0018O9KVC/.../ ASIN = B07N23V6P1

Currently ASIN copied in diffrent way won't work!! I'm not shure why, but as pandemic thought as "it is what it is...".

If you have right ASIN copy it to text edditor and press "ZNAJDŹ". After those steps you should be able to scroll content in text browser. Text has 3 diffrent indents that tell you wrom which part of the website are they from. If you want to copy it press "KOPIUJ".

By pressing "POBIERZ ZDJĘCIA" you download front page pictures from amazon. To acces them you need find images folder(in webscraper) and search given ASIN of the product. Path to pictures dictionary should look something like this "C:...\web_scraper\images\B0018O9KVC(PRODUCT ASIN)" , where images are saved in folder with ASIN name.

web_scraper's People

Contributors

prosowiec avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.