Code Monkey home page Code Monkey logo

de-duplication's Introduction

De-Duplication

Dealing with the duplicate data store at server side

Now a days lots of duplicate data is being uploaded to any storage based cloud or server. So to deal with duplication, there are some already existing techniques. One of them is De-Duplication.

Essentially first question comes is, what is de-duplication?

In computing, data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data. Related and somewhat synonymous terms are intelligent (data) compression and single-instance (data) storage. This technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent. In the deduplication process, unique chunks of data, or byte patterns, are identified and stored during a process of analysis. As the analysis continues, other chunks are compared to the stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times (the match frequency is dependent on the chunk size), the amount of data that must be stored or transferred can be greatly reduced.[1]

Ref : http://en.wikipedia.org/wiki/Data_deduplication

I Implemented deduplication in My way. Implementation Explaination and files.

Assumptions : (i) I have generated SHA-1 Key from the byteStream. So I assumed that, if two SHA-1 keys for two objects are same, that means both the objects are same.

 (ii) Here, I have written the code in java. So Instead of "char* blobString",
      I have used "InputeStream blobObject", in the functions.
      
      Hence I have implemented :
                   Integer put(InputStream blobObject);
                   InputStream get(Integer id);
                   void remove(Integer id);

Description of Classes :

   (i). BlobStore.java 
     : get, put, remove all the functions are implemented in this class
  (ii). BlobWrapper.java
          : It is a wrapper around blob Object, Contain sha1 (sha key), objectPath (file path),
           referenceCount (Used in Removal) and blobRemoveListner
          
 (iii). BlobRemoveListner.java 
          :Interface to Remove the object from blobstore once the referencecount is zero.
  
  (iv). BlobRemoveConcreteListener.java
          : Implementation of BlobRemoveListner.java 
  (v). DeDuplicationDriverClass
          : Few basic test cases are written in this class

de-duplication's People

Contributors

rjsikarwar avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.