Code Monkey home page Code Monkey logo

deduplicate-by-profile's Introduction

Deduplicate by Profile

License This script was last tested in Nuix 9.0

View the GitHub project here or download the latest release here.

Overview

This script provides a way to create an item set which is deduplicated by generating an MD5 for each item based on the concatenation of values yielded by a MetadataProfile for each item.

When creating an item set in Nuix using the API, the API provides a way to provide a custom expression. This expression can be thought of as a function that is provided an item and is expected to provide back a value relating to that item. The value returned is then used in place of the MD5 digest Nuix would have originally calculated for the item at the time of processing.

This script leverages the ProfileDigester class of SuperUtilities to generate a custom MD5 digest by using the concatenated values yielded by a provided metadata profile for each item.

Technical Note: A concatentation of values is not actually used, instead for a given item each metadata profile field is evaluated against that item and each resulting string value is converted to a byte array and successively used to update the digest. See code for ProfileDigester.generateMd5Bytes for more detail. Effectively this should be the same result as digesting the concatenation of the fields, but with potentially lower resource overhead.

Getting Started

Setup

Begin by downloading the latest release of this code. Extract the contents of the archive into your Nuix scripts directory. In Windows the script directory is likely going to be either of the following:

  • %appdata%\Nuix\Scripts - User level script directory
  • %programdata%\Nuix\Scripts - System level script directory

Usage

image

  1. Choose the metadata profile to use. The fields present in the selected metadata profile dictate which values are used to generate the custom MD5 digest.
  2. Choose whether the item's content text should be included when generating the custom MD5 digest.
  3. Choose the name of the item set. If no item set with the provided name exists, one will be created. If an item set with the given name does exist, items will be added to that item set. Important: Adding items to an existing item set in which items were previously added using any other means or with this script but different settings (such as a different metadata profile) will produce undefined results. If adding to an existing item set, make sure to use this script and the same settings each time!
  4. Choose whether deduplication is performed per item or per family.
  5. Choose whether the custom MD5 digest is recorded onto the item as custom metadata.
  6. Choose the name of the custom metadata field to record the custom MD5 digest into.
  7. Choose whether to use existing values in the custom metadata field if they are present. See below if you wish to enable this setting!

If items are selected in the result view when the script is ran, those selected items will be added to the designated item set. If no items are selected when the script is ran all items in the case will be added.

Use Existing Value if Present

Please read this section thoroughly if using this setting!

When checked, the code that normally would generate the customized digest for an item during the item set creation, will look to see if the item already has a value stored in the specified custom metadata field. If the item has a custom metadata field with the same name (as specified for setting Digest Custom Metadata Field) the script will further check:

  • Is the value of the field non null?
  • Is the value a String?
  • Is the value not empty or only whitespace character?

If the value passes all of these checks, then the existing value will be used rather than generating the value from the profile.

Care should be taken when using this setting! The script has no way to know if the digest value present in the field was generated by this script or using the same metadata profile! It is very possible to run the script one time with profile A on some items and then run the script again later with profile B (or even modified profile A) and some of those same items and incompatible digests could be used!

Cloning this Repository

This script relies on code from Nx to present a settings dialog and progress dialog. This JAR file is not included in the repository (although it is included in release downloads). If you clone this repository, you will also want to obtain a copy of Nx.jar by either:

  1. Building it from the source
  2. Downloading an already built JAR file from the Nx releases

Once you have a copy of Nx.jar, make sure to include it in the same directory as the script.

This script also relies on code from SuperUtilities. This JAR file is not included in the repository (although it is included in release downloads). If you clone this repository, you will also want to obtain a copy of SuperUtilities.jar by either:

  1. Building it from the source
  2. Downloading an already built JAR file from the SuperUtilities releases

Once you also have a copy of SuperUtilities.jar, make sure to include it in the same directory as the script.

License

Copyright 2021 Nuix

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

deduplicate-by-profile's People

Contributors

juicydragon avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deduplicate-by-profile's Issues

Update Readme

Readme looks like it may be lacking some more recently added features. Should go through to check/update.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.