Code Monkey home page Code Monkey logo

dltclust's Introduction

DLTClust

A tool to derive candidate shared ledger combinations for industry ecosystems.

Multi-ledger designs are critical in industry ecosystems based on the Distributed Ledger Technology (DLT), including blockchain, to manage conflicting demands on data transparency for improved integrity against hiding commercially sensitive data. DLTClust can be used to answer the following questions in multi-ledger designs:

  • What parties should share which ledgers?
  • What data should be on those ledgers?

Given a matrix where elements represent parties' interest in data generated by other parties, DLTClust can ascertain ledger membership while minimizing data exposure and number of ledgers. It captures the following party-party relationships:

  1. Clusters - Well-connected set of parties or party-data relationships, e.g., consortium
  2. Busses - Parties that read/write data from/to all other parties, e.g., a supply chain integrator
  3. Sinks - Parties that read data from all other parties, e.g., regulator
  4. Sources - Parties that write data to all other parties, e.g., oracle

Alternatively, given a matrix where elements represent parties' interest in specific data elements, DLTClust can ascertain which party and data element should be placed on what ledger while minimizing data exposure and number of ledgers.

Party-party and party-data relationships should be encoded as binary Design Structure Matrix (DSM) and Domain Mapping Matrix (DMM), respectively. Given a DSM or DMM, DLTClust uses extended versions of the following clustering algorithms to identify DLT-ecosystem-specific party-party and party-data relationships:

  1. Tian-Li Yu, Ali A. Yassine, and David E. Goldberg, "An information theoretic method for developing modular architectures using genetic algorithms" Res Eng Design, 18:91-109, 2007, DOI 10.1007/s00163-007-0030-1
  2. McCormick, William T; Schweitzer, Paul J; and White, Thomas W, "Problem Decomposition and Data Reorganization by a Clustering Technique," Operations Research, 20(5), Sep. 1972, 993-1009

[2] is used sometimes to improve the layout of clustered representation.

Paper

See our paper H.M.N. Dilum Bandara, Mark Staples, and Sidra Malik, "Designing for Shared Ledgers in Industry Ecosystems", URL for more details #TODO

How to Use

Tested on Python 3.8.7

Configuration parameters

Set following configuration values on config.ini (if unsure start with default values from Yu et al.):

  • alpha - Type I error weight
  • beta - Type II error weight
  • population_size - Initial population size
  • offspring_size - Number of offsprings to generate
  • p_c - Crossover probability
  • p_m - Mutation probability
  • generation_limit - No of generation cycles to try
  • generation_limit_without_improvement - Stop if this many consecutive generations show no improvement
  • cluster_can_have_read_only_elements - Can a square cluster have real-only elements/parties?
  • cluster_can_have_partial_bus - Can a bus have a subset of rows and columns filled?
  • cluster_can_have_partial_sink - Can a sink have a subset of column filled?
  • cluster_can_have_partial_source - Can a source have a subset of row filled?

Typical values from Yu et al. [1]:

  • Population size = 3000
  • OffSpring size = 3000
  • Crossover probability p_c = 1/chromosome-length, 0.5, or 1
  • Mutation probability p_m = 1/chromosome-length
  • Generation limit = Not specified
  • Generation limit without improvement = 50
  • Alpha (Type I error weight) = 0.8116
  • Beta (Type II error weight) = 0.1102
  • 1 - Alpha - Beta = 0.0784 If number of clusters (n_c) is not decided, a typical value could be 1/2 n_n

Input File Format

DSM and DMM files should be in CSV format. A DSM file should have the following format:

  • List of labels starting with a comma to indicate an empty cell, e.g., ,C,D,E,A,B
  • Keep diagonals empty
  • From row 2 onwards 1st value must be a label
  • Use 1, x, or X to indicate the presence of a relationship
  • Use 0, o, O, or empty cell to indicate the absence of a relationship

A DMM file should have the following format:

  • Data must be in rows while parties must be in columns (just an assumption).
  • List of party labels starting with a comma to indicate an empty cell, e.g., ,P2,P1,P3,P5,P4
  • From row 2 onwards 1st value must be a data label
  • Use 1, x, or X to indicate the presence of a relationship
  • Use 0, o, O, or empty cell to indicate the absence of a relationship

Sample input DSM and DMM files are provided. Also, DSMs and DMMs from our paper can be found in DSMs_DMMs_from_paper folder.

Command-line Arguments

Set following command-line arguments (some are optional, and if not provided default values are used):

  • -b, --busses - No of busses to generate (integer). Optional (default is 0)
  • -c, --clusters - No of square clusters to generate (integer). Compulsory
  • -i, --input - Input matrix file to cluster (DSM or DMM). It should be a CSV file in given format (see example). Optional (default is dsm.csv)
  • -o, --output - Output matrix file name. Optional (default is clusters.csv)
  • -p, --params - Config file with parameters. Optional (default is config.ini)
  • -u, --sources - No of sources (aka writers) to generate (integer). Optional (default is 0)
  • -r, --seed - Random seed (integer). Optional (default is 123)
  • -s, --Sinks - No of Sinks (aka readers) to generate (integer). Optional (default is 0)
  • -t, --type - Type of matrix to cluster, i.e., DSM or DMM. Optional (default is DSM

Examples

DSM Clustering

To cluster the sample DSM use the following command:

python3 DLTClust -c 2

Use the following command to cluster one of the example DSMs from the paper to get up to 4 square clusters, a data bus, sink, and source while setting the random seed to 123:

python3 DLTClust -c 4 -b 1 -u 1 -s 1 -r 123 -i DLTClust/DSMs_DMMs_from_paper/DSM_single_party_instance_supply_chain.csv

DMM Clustering

To cluster the sample DMM use the following command:

python3 DLTClustlust -c 2 -i DLTClust/dmm.csv -t DMM

Use the following command to cluster one of the example DMMs from the paper to get up to 5 clusters while setting the random seed to 999: python3 DLTClust -c 5 -r 999 -i dltclust/DSMs_DMMs_from_paper/DMM_BCDFI.csv -t DMM

Unit Tests

python3 -m unittest discover DLTClust

This project uses linting: pylint for code quality controls.

License

This software is released under the CSIRO Open Source Software License Agreement. Details can be found at LICENSE.

dltclust's People

Contributors

dilumb avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.