Code Monkey home page Code Monkey logo

documentsimilarityapi's Introduction

This application consists of a DocumentSimilarityAPI and a Runner class.

In order to run the application, navigate to the directory that oop.jar is in, and run the command

java –cp ./oop.jar ie.gmit.sw.Runner

or with the command

java -jar ./oop.jar

to launch the application. You must be sure to have java 1.8+ installed.

This jar contains all additional required libraries (Guava & Jsoup)

It is possible compare Documents created from files, urls with HTML tags, and plain text. The Jaacard Index of the provided documents is computed and reported to the user.

JavaDocs were generated for this project using the command

javadoc -sourcepath ./src -d ./docs -subpackages . -noqualifier all -private

I chose to add the -private flag as many important methods are private, and most of the interfaces have a single public method.

I added the noqualifier flag to avoid clutter with standard library names.

#Design Descisions.

Immutibilitiy - I chose to make as many classes as possible immutable. And used Guavas immutable collectiosn packages extensively.

I chose to use Futures instead of Runnables/Threads to make it easier to reason about and debug the code during development.

I aimed to make the classes depend on interfaces (Document, Shinglizer, SimilarityIndex) instead of any concrete implementations of these classes. While the Shinglizer interface may not be required as there is only a single implementation, this would easily allow additional implementations to be written with not change to the JaacardIndex class.

#Extra features.

  1. You can create URLDocuments by choosing the relevant option in the menu. This utilzes the Jsoup library (https://jsoup.org/) to construct a URLDocument which implements the Document interface and so these Documents can be compared with a SimilarityIndex.

  2. CachingSimilarityIndex - this is an implementation of the SimilairtyIndex interface that caches any results that the object has already computed. This object requires a SimilarityIndex instance to be passed into its constructor. It will cache the results computed via that implementatation.

  3. The SimilairtIndex interface requires a List of Documents, meaning that you can compare more than just 2 documents. (fewer than 2 will results in IllegalArgumentException in current implementations.).

  4. PlantUML was used to generate the provided UML diagram. (http://plantuml.com/). See the UML.puml file included.

  5. The UI class relies on a "User" implementation. This allows you to automate interaction with the UI class by providing a series of pre-set instructions in a file instead of manually interacting the program. And saving the results for examination in another file. (more useful for testing rather than from a human user's perspective.)

UML Design created using PlantUML

UML

documentsimilarityapi's People

Contributors

chatton avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.