Code Monkey home page Code Monkey logo

xmuseats's Introduction

Xiamen University Seats Usage Analysis Project

A project in order to collect seats usage situation in Xiamen University Library and then do some data analysis on them. Here is the Web Crawler code and the data collected by me. More information about this project and Chinese edition can be found on my blog.

Caution

This project has moved to new schema of database since DEC 16,2016. The new database is named libseat.db. The data collected from OCT 14,2016 to DEC 15,2016 are stored in seatState_old.db.

About the Crawler

The web crawler is written in Python 2.7.11 and has tested on Windows 10, Ubuntu 16.04 and CentOS 7. It uses sqlite3 to store the data collected. Caution that you may meet gibberish problem when you open the database under Windows. The program is fully Open Source under GPL V3 Lincense.

Dependencies

  • Requests 2.7.0
  • Beautiful Soup 4.3.2
  • lxml
  • sqlite3
  • time

About the DataBase

New Database: The new database is named libseat.db, covering data from DEC 16,2016 till now. The new schema is much more simple and can save a lot of space, so that I will collect data every minute. It will updated every week.

newdb

Old Database: The old database is named seatState_old.db, it covers data from CT 14,2016 to DEC 15,2016 (collect every 2 minutes). This database will NOT update anymore.

newdb

Problems in the Data

According to the following reasons, data may be inaccurate and you may need to do some data cleaning work before data analysis. For seatState_old.db, preprocess.py can finish such work automatically.

  1. The library will close every Wednesday afternoon from 1 p.m. to 6 p.m., when students are not allowed to enter the library.
  2. The seat usage data becomes zero after 21:30 every day, not 22:00.
  3. You need to add 8 hours when using data from DEC 9,2016 to DEC 11,2016, and these data may be incomplete. (Sorry for my wrong VPS timezone settings)

About the Data

I run this web crawler on my Virtual Private Server and collect data every minute (every 2 minutes in seatState_old.db). I will share the data every Sunday. You can use it for data analysis purpose if you specify the source information.

If you have any advise or complaints, please feel free to contact me.

xmuseats's People

Contributors

smartjinyu avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.