Code Monkey home page Code Monkey logo

androidsuiyue / documentplagiarismchecker Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fherstk/documentplagiarismchecker

0.0 0.0 0.0 364 KB

Document Plagiarism Checker is an Open-Source C# project over .NET Core 2.1 that has been developed for academic purposes only, which implements a really simple way to compare a set of documents between each other in order to check if some of them are copies.

License: GNU Affero General Public License v3.0

C# 100.00%

documentplagiarismchecker's Introduction

Document Plagiarism Checker

Document Plagiarism Checker is an Open-Source C# project over .NET Core 2.1 that has been developed for academic purposes only, which implements a really simple way to compare a set of documents between each other in order to check if some of them are copies.

Feel free to use, copy, fork or modify this project; but please refer a mention to this project and its author respecting also the licenses of the included third party software.

Third party software and licenses:

Please notice than this project could not be possible without the help of:

WARNING: still in an early development stage.

How to use it:

As an stand-alone console app:

Clone the repository to your local working directory, restore the dependencies with dotnet restore, build it with dotnet build and, finally, run the project with dotnet run.

If there is no settings.yaml file in the same folder as the program, it will be mandatory to manually set some arguments when calling the program; please call dotnet run --info for further details or explore the settings.yaml file that comes within this project.

As a library:

Do the same as with the stand-alone app but import the compiled DocumentPlagiarismChecker.dll file to your project. Then invoke the CompareFiles method inside the API object to get the results. You can also send them to an output with the WriteOutput method inside the same API object:

Synchronous example

API api = new API();
api.CompareFiles();
api.WriteOutput();

Asynchronous example

API api = new API();
Task compare = Task.Run(() => 
    api.CompareFiles()
).ContinueWith((x) => {
    api.WriteOutput();
});

//SOME CODE

Asynchronous example (with progress indicator)

API api = new API();
Task compare = Task.Run(() => 
    api.CompareFiles()
);

Task progress = Task.Run(() => {
    while(api.Progress < 1){
        Console.Write("\r{0:P2}", api.Progress);
        System.Threading.Thread.Sleep(1000);
    }

    Console.Write("\rLoading... {0:P2}", api.Progress);
    Console.WriteLine();
    Console.WriteLine("Done! Printing results:");
});

progress.Wait();

Please, notice that all configuration is performed through the settings.yaml file under the same path as the program, so if there is no file a new one will must be established with Settings.Instance.Load(path); in order to proceed.

How to add new comparator:

New comparators will be added as long as the tool became improved with new capabilities but, if anyone wants to contribute or just code their own comparator, feel free to enjoy following those steps:

  1. Copy the _tamplate folder with all its content inside the Comparators folder.
  2. Rename the new folder with the name of your comparator.
  3. Correct the namespace of the copied folders and replace _template with the name of your comparator (must match the name of the folder).
  4. Code both files following the indications, but you can use the current comparators as a guide.

List of comparators (marked ones are avaliables, the other ones are under development):

  • Document Word Counter: compares two PDF files and check how many words and how many times appears within each document, useful for checking if two documents are almost equals.
  • Paragraph Word Counter: compares two PDF files and check how many paragraphs contains similar sentences, useful for checking which if two paragraphs are almost equals.
  • Paragraph Length Counter: compares two PDF files and check how many paragraph has the same length, useful for checking which parts of two documents could have been replaced by synonyms.
  • Document Image Counter: like the word one, but with images.

Roadmap:

The full list of ideas and improvements can be found at issues section (with the enhancements tag).

Changelog:

  • v0.6.0.0-alpha (20/12/2018):

    • New settings in order to allow regular expressions as exclussion list items, so paragraphs matching those expressions will be ignored by the Paragraph Word Counter comparator.
    • A new output format for the terminal output (Left file [matching %] -> Right file [matching %] -> Comparator [matching %]).
    • The Paragraph Word Counter has been splitted into a new comparator: the Paragraph Length Counter.
    • For further information see the full list of changes.
  • v0.5.0.0-alpha (09/12/2018):

    • A progress indicator has been added when running the app through the terminal.
    • A new parameter has been added to the settings (recursive) in order to set the file search method inside the given folder.
    • New parameters has been added to the settings in order to set the threshold values that will be used in order to determine if there is a match between two comparisons.
    • The Document and FileMatchingScore objects stores the full path for a file instead of its single name.
    • For further information see the full list of changes.
  • v0.4.0.0-alpha (06/12/2018):

    • A settings file has been added, so the input arguments (console) can be omited if the mandatory settings are defined inside the yaml file or Settings.Instance.Set(setting, value) method (API) can be used. Notice that settings file data will be overwriten if new information is sent throught the arguments console or API.
    • The output console has been improved, adding multi-level options, output colors and indentation.
  • v0.3.0.0-alpha (06/12/2018):

    • The "Paragraph Word Counter" has been added, and can be used in order to count how many words and how many times appears on each paragraph, having also in count the paragraph's length when calculing the matching percentage.
    • The output console has been improved, adding multi-level options, output colors and indentation.
  • v0.2.0.0-alpha (03/12/2018):

    • A sample file can be used in order to exclude some data from the comparisson.
  • v0.1.0.0-alpha (02/12/2018):

    • Initial release.

documentplagiarismchecker's People

Contributors

fherstk avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.