Code Monkey home page Code Monkey logo

metu-nte-scraper's Introduction

Metu NTE scraper

Metu NTE scraper project was created for educational purposes and community needs. It comprises of 3 tools for 3 different jobs:

  1. main.py that collects the NTE's given to ur department this semestr
  2. NewCourseAlarm.py that alerts the user if there are new courses that are given to ur department(uses "out2.txt")
  3. capacityCheck.py that searches through courses given to ur department and finds those with unused capacity(uses "out2.txt")

Note: capacityCheck.py uses the CNN model Basic-number-captcha-solver that was specifically developed to be used in this scraper. The current model works with 99.94% accuracy.

Getting Started

These instructions will help you list Non-Technical Elective courses given to your department in the current semester

Prerequisites

Python 3.x
Google Chrome
Selenium(installed via requirements.txt) - An API for python to write functional/acceptance tests using Selenium WebDriver.
Tensorflow(installed via requirements.txt) - A free open-source library for AI and machine learning applications

Install necessary packages with(including selenium and tensorflow):

sudo pip3 install -r requirements.txt

If u encounter any problems apply these commands:

sudo pip install selenium webdriver_manager
sudo python3 -m pip install webdriver-manager --upgrade
sudo python3 -m pip install packaging

Options

Before running the code make sure you change the below variables inside main.py to neccessary values:

  • For main.py and NewCourseAlarm.py
    • myDEPT (contains department abbreviation to help find courses given to that department)(default value set for ceng change it to your department's code)
    • class_codes (contains departments that give NTE courses)(you can delete the department numbers that you do not want in your list)
  • For capacityCheck.py
    • Username (fill your metu username)(It is only used to access metu capacity checker which is unaccessable withput a username and password)
    • Password (fill your metu password))(It is only used to access metu capacity checker which is unaccessable withput a username and password)

Running the code

Use below line to scrape current NTE list(it takes about 6 minutes)

python main.py >out2.txt

After the creation of out2.txt use below command to check for new courses given to ur department:

python NewCourseAlarm.py

After the creation of out2.txt use below command to check for capacities of the listed courses:

python capacityCheck.py

How it collects

capacityCheck.py simulates the user using selenium.
The program first goes into the course capacity section by entering user's password and username . After this point until every course in "out2.txt" is exhausted it answers captchas by first uploading the captcha image which is send to the CNN model provided which solves the captcha and the result gets sent back to the browser.

You can see how it looks when capacityCheck.py is running from below gif.

metu-nte-scraper's People

Contributors

e-hengirmen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

enesizgi

metu-nte-scraper's Issues

Python versions and requirements

Since I spent quite a bit of time trying to get this up and running on my computer, I thought it would be a good idea to help others who are trying to run this software.
I think the easiest and the cleanest way to handle the dependencies is to create a pyenv virtualenv. Even though README doesn't specify a Python version, I can tell you that -after trial and error- 3.7 and 3.11 are not compatible with some dependencies listed in requirements.txt. I tried 3.10 and it was compatible.
The second thing to note is that if one is going to use virtualenv to encapsulate packages, one should also invoke the pip install command as-is. I believe that the reason sudo pip install doesn't work is that it tries to install the packages in a "global" setting. Perhaps the same effect could be achieved with --user command, but I don't know Python or pip well enough to comment, just invoking pip install -r requirements.txt worked for me.
The third thing is that it seems webdriver_manager and selenium versions in requirements.txt are incompatible with the code, I had to upgrade these packages to run main.py without issues. The versions I am currently running are 4.0.1 for webdriver_manager and 4.13.0 for selenium.
I couldn't try the new course alarm and capacity check scripts, but for the people who are trying to use main.py in their own computers, these commands should cover what you're trying to do:
git clone https://github.com/e-hengirmen/metu-NTE-scraper.git
cd metu-NTE-scraper
pyenv install 3.10
pyenv virtualenv 3.10 nte # Create virtual environment called nte
pyenv activate nte # Activate the virtual environment
pip install -r requirements.txt
pip install webdriver_manager --upgrade
pip install selenium --upgrade
python main.py > out2.txt
pyenv deactivate nte # Deactivate the virtual environment
pyenv virtualenv-delete nte # Delete the virtual environment

Use ML to solve captcha to improve capacity checker

Current way is to inefficient and slow.

  1. collect samples(1000-10000 from student.metu.edu.tr 158 capacity check)
  2. Use "text line extraction" methods to seperate digits
  3. Use a basic MLP classifier.
  4. Integrate with the capacity checker using collected weights

add course alarm

Add a course alarm that checks if there are new classes given to u which wasn't given u before

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.