Code Monkey home page Code Monkey logo

cs172-twittersearch's Introduction

Twitter Search Engine

Licensing Information: MIT

Project source can be downloaded from https://github.com/khuan013/CS172-Crawler.git

Author & Contributor List

  • Kenneth Huang
  • Tien Tran

Overview

Part 1 Twitter Crawler (Python)

This application uses the Twitter Streaming API to collect geolocated tweets and stores them in text files of 10MB each.

Instructions on how to deploy the system

In order to run the program, you must have:

  • Python 2.7
  • Tweepy Twitter API library installed.
  • lxml

Download the repository from https://github.com/khuan013/CS172-TwitterSearch.git

If on Unix/Linux, run the crawler.sh shellscript, and pass the number of tweets you want to search (if the number is 0, the crawler will go on untill it reaches 5 GB in data) and output directory name, which will execute the Python program.

By default, the files are placed in /data and number of tweets are not limited.

Examples:

  1. ./crawler.sh [num-tweets] [output-dir]
  2. ./crawler.sh [num-tweets]
  3. ./crawler.sh

Part 2 Indexing/Webpage (Java, JSP)

alt text

Instructions on how to deploy the system

In order to run the program you must have the following installed:

  • Eclipse for Java EE
  • Apache Tomcat version 7.0
  • Lucene version 3.7.2
  1. Download the repository from https://github.com/khuan013/CS172-TwitterSearch.git
  2. Put MyLucene.java and MySearch.jsp into your Eclipse project directory.
  3. If you already have twitter data, run MyLucene.java to create an index. Otherwise run the python program twitterGeo.py, refer to Part A documentation on how to use it.
  4. Once MyLucene.java finishes it will create a folder called testIndex. Put this folder at your Desktop directory.
  5. Run MySearch.jsp on the tomcat servers using Eclipse. This should bring up a webpage with a search bar.

cs172-twittersearch's People

Contributors

khuan013 avatar

Stargazers

 avatar

Watchers

Tien Tran avatar  avatar

Forkers

0ttran asnjudy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.