DLTClust

A tool to derive candidate shared ledger combinations for industry ecosystems.

Multi-ledger designs are critical in industry ecosystems based on the Distributed Ledger Technology (DLT), including blockchain, to manage conflicting demands on data transparency for improved integrity against hiding commercially sensitive data. DLTClust can be used to answer the following questions in multi-ledger designs:

What parties should share which ledgers?
What data should be on those ledgers?

Given a matrix where elements represent parties' interest in data generated by other parties, DLTClust can ascertain ledger membership while minimizing data exposure and number of ledgers. It captures the following party-party relationships:

Clusters - Well-connected set of parties or party-data relationships, e.g., consortium
Busses - Parties that read/write data from/to all other parties, e.g., a supply chain integrator
Sinks - Parties that read data from all other parties, e.g., regulator
Sources - Parties that write data to all other parties, e.g., oracle

Alternatively, given a matrix where elements represent parties' interest in specific data elements, DLTClust can ascertain which party and data element should be placed on what ledger while minimizing data exposure and number of ledgers.

Party-party and party-data relationships should be encoded as binary Design Structure Matrix (DSM) and Domain Mapping Matrix (DMM), respectively. Given a DSM or DMM, DLTClust uses extended versions of the following clustering algorithms to identify DLT-ecosystem-specific party-party and party-data relationships:

Tian-Li Yu, Ali A. Yassine, and David E. Goldberg, "An information theoretic method for developing modular architectures using genetic algorithms" Res Eng Design, 18:91-109, 2007, DOI 10.1007/s00163-007-0030-1
McCormick, William T; Schweitzer, Paul J; and White, Thomas W, "Problem Decomposition and Data Reorganization by a Clustering Technique," Operations Research, 20(5), Sep. 1972, 993-1009

[2] is used sometimes to improve the layout of clustered representation.

Paper

See our paper H.M.N. Dilum Bandara, Mark Staples, and Sidra Malik, "Designing for Shared Ledgers in Industry Ecosystems", URL for more details #TODO

How to Use

Tested on Python 3.8.7

Configuration parameters

Set following configuration values on config.ini (if unsure start with default values from Yu et al.):

alpha - Type I error weight
beta - Type II error weight
population_size - Initial population size
offspring_size - Number of offsprings to generate
p_c - Crossover probability
p_m - Mutation probability
generation_limit - No of generation cycles to try
generation_limit_without_improvement - Stop if this many consecutive generations show no improvement
cluster_can_have_read_only_elements - Can a square cluster have real-only elements/parties?
cluster_can_have_partial_bus - Can a bus have a subset of rows and columns filled?
cluster_can_have_partial_sink - Can a sink have a subset of column filled?
cluster_can_have_partial_source - Can a source have a subset of row filled?

Typical values from Yu et al. [1]:

Population size = 3000
OffSpring size = 3000
Crossover probability p_c = 1/chromosome-length, 0.5, or 1
Mutation probability p_m = 1/chromosome-length
Generation limit = Not specified
Generation limit without improvement = 50
Alpha (Type I error weight) = 0.8116
Beta (Type II error weight) = 0.1102
1 - Alpha - Beta = 0.0784 If number of clusters (n_c) is not decided, a typical value could be 1/2 n_n

Input File Format

DSM and DMM files should be in CSV format. A DSM file should have the following format:

List of labels starting with a comma to indicate an empty cell, e.g., ,C,D,E,A,B
Keep diagonals empty
From row 2 onwards 1st value must be a label
Use 1, x, or X to indicate the presence of a relationship
Use 0, o, O, or empty cell to indicate the absence of a relationship

A DMM file should have the following format:

Data must be in rows while parties must be in columns (just an assumption).
List of party labels starting with a comma to indicate an empty cell, e.g., ,P2,P1,P3,P5,P4
From row 2 onwards 1st value must be a data label
Use 1, x, or X to indicate the presence of a relationship
Use 0, o, O, or empty cell to indicate the absence of a relationship

Sample input DSM and DMM files are provided. Also, DSMs and DMMs from our paper can be found in DSMs_DMMs_from_paper folder.

Command-line Arguments

Set following command-line arguments (some are optional, and if not provided default values are used):

-b, --busses - No of busses to generate (integer). Optional (default is 0)
-c, --clusters - No of square clusters to generate (integer). Compulsory
-i, --input - Input matrix file to cluster (DSM or DMM). It should be a CSV file in given format (see example). Optional (default is dsm.csv)
-o, --output - Output matrix file name. Optional (default is clusters.csv)
-p, --params - Config file with parameters. Optional (default is config.ini)
-u, --sources - No of sources (aka writers) to generate (integer). Optional (default is 0)
-r, --seed - Random seed (integer). Optional (default is 123)
-s, --Sinks - No of Sinks (aka readers) to generate (integer). Optional (default is 0)
-t, --type - Type of matrix to cluster, i.e., DSM or DMM. Optional (default is DSM

Examples

DSM Clustering

To cluster the sample DSM use the following command:

python3 DLTClust -c 2

Use the following command to cluster one of the example DSMs from the paper to get up to 4 square clusters, a data bus, sink, and source while setting the random seed to 123:

python3 DLTClust -c 4 -b 1 -u 1 -s 1 -r 123 -i DLTClust/DSMs_DMMs_from_paper/DSM_single_party_instance_supply_chain.csv

DMM Clustering

To cluster the sample DMM use the following command:

python3 DLTClustlust -c 2 -i DLTClust/dmm.csv -t DMM

Use the following command to cluster one of the example DMMs from the paper to get up to 5 clusters while setting the random seed to 999: python3 DLTClust -c 5 -r 999 -i dltclust/DSMs_DMMs_from_paper/DMM_BCDFI.csv -t DMM

Unit Tests

python3 -m unittest discover DLTClust

This project uses for code quality controls.

License

This software is released under the CSIRO Open Source Software License Agreement. Details can be found at LICENSE.

dilumb / dltclust Goto Github PK

dltclust's Introduction

DLTClust

Paper

How to Use

Configuration parameters

Input File Format

Command-line Arguments

Examples

DSM Clustering

DMM Clustering

Unit Tests

License

dltclust's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent