This is a web scraping program that harvest the website information and organize the prouct information in a sorted way. Everytime you run the program it will out put the updated information from the website
bs4
pip install beautifulsoup4
requests
python -m pip install requests
python3 web-scrape.py
uClient
is opening up a connection with the my_url
and grabing the page information and store it in page_html
my_url = "https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphic%20card"
#Opening up connection grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
Then we use soup to parse the html page and sotre it in page_soup
. In order to know which div contains all of the products information, we need to inspect the web page and find the class name for that. In this case, we want to find all divs that have class name "item-container". containers is a list that contains all of the products info in this htnl page.
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div", {"class":"item-container"})
In the for loop, we are iterating over each individual item and check ther brand, title, and shipping info.
for container in containers:
brand_container = container.findAll("a", {"class":"item-brand"})
brand = brand_container[0].img["title"]
title_container = container.findAll("a", {"class":"item-title"})
product_name = title_container[0].text
shipping_container = container.findAll("li", {"class":"price-ship"})
shipping_price = shipping_container[0].text.strip()
print("brand : ",brand)
print("name : ",product_name)
print("shipping price : ",shipping_price)
print("--------------------")
- [Python] - Programming Language
π± MIT π±