Code Monkey home page Code Monkey logo

search-engine's Introduction

Description: This is a search engine written in C. 
It includes:

1. Crawler -- crawls web pages recursively with a specified depth
2. Indexer -- creates an inverted index list of words and their occurrences along with the document ID
3. Query Engine -- command-line interface that creates a ranking system and query processor

How to build/test/clean:
* Run BATS_TSE.sh to build/test/clean crawler/indexer/query engine
* To clean the crawler logs and the crawler indexed HTML data, use "make cleanlog" in crawler dir
* To clean the indexer logs, use "make cleanlog" in indexer dir

Author: Delos Chang

Directory Structure:
.
├── BATS_TSE.sh
├── crawler_dir
│   ├── BATS.sh
│   ├── crawler
│   ├── crawler.c
│   ├── crawler.h
│   ├── data
│   ├── html.c
│   ├── html.h
│   ├── Makefile
│   ├── README
│   └── TESTING
├── indexer_dir
│   ├── BATS.sh
│   ├── documentation.pdf
│   ├── indexer
│   ├── indexer.c
│   ├── indexer.h
│   ├── Makefile
│   ├── README
│   └── TESTING
├── queryengine_dir
│   ├── Makefile
│   ├── queryengine.c
│   ├── queryengine.h
│   ├── queryengine_test.c
│   ├── querylogic.c
│   ├── querylogic.h
│   ├── README.md
│   └── TESTING
├── README
└── utils
    ├── file.c
    ├── file.h
    ├── hash.c
    ├── hash.h
    ├── header.h
    ├── index.c
    └── index.h

-- For Query Engine -- 
Functional Credit
1. To exit, type in "!exit" and press enter

Refactoring Credit
1. Refactored common definitions and macros to
   (../utils/header.h)
2. Refactored common index structs to (../utils/index.h)
3. Refactored indexer BATS.sh diff test to BATS_TSE.sh
4. Refactored common idioms in indexer and query engine to
   ../utils/index.c (i.e. creating a Document Node with X docId and Y
   frequency, getting a filepath, creating a WordNode)

search-engine's People

Contributors

deloschang avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.