A tool to derive candidate shared ledger combinations for industry ecosystems.
Multi-ledger designs are critical in industry ecosystems based on the Distributed Ledger Technology (DLT), including blockchain, to manage conflicting demands on data transparency for improved integrity against hiding commercially sensitive data. DLTClust can be used to answer the following questions in multi-ledger designs:
- What parties should share which ledgers?
- What data should be on those ledgers?
Given a matrix where elements represent parties' interest in data generated by other parties, DLTClust can ascertain ledger membership while minimizing data exposure and number of ledgers. It captures the following party-party relationships:
- Clusters - Well-connected set of parties or party-data relationships, e.g., consortium
- Busses - Parties that read/write data from/to all other parties, e.g., a supply chain integrator
- Sinks - Parties that read data from all other parties, e.g., regulator
- Sources - Parties that write data to all other parties, e.g., oracle
Alternatively, given a matrix where elements represent parties' interest in specific data elements, DLTClust can ascertain which party and data element should be placed on what ledger while minimizing data exposure and number of ledgers.
Party-party and party-data relationships should be encoded as binary Design Structure Matrix (DSM) and Domain Mapping Matrix (DMM), respectively. Given a DSM or DMM, DLTClust uses extended versions of the following clustering algorithms to identify DLT-ecosystem-specific party-party and party-data relationships:
- Tian-Li Yu, Ali A. Yassine, and David E. Goldberg, "An information theoretic method for developing modular architectures using genetic algorithms" Res Eng Design, 18:91-109, 2007, DOI 10.1007/s00163-007-0030-1
- McCormick, William T; Schweitzer, Paul J; and White, Thomas W, "Problem Decomposition and Data Reorganization by a Clustering Technique," Operations Research, 20(5), Sep. 1972, 993-1009
[2] is used sometimes to improve the layout of clustered representation.
See our paper H.M.N. Dilum Bandara, Mark Staples, and Sidra Malik, "Designing for Shared Ledgers in Industry Ecosystems", URL for more details #TODO
Tested on Python 3.8.7
Set following configuration values on config.ini
(if unsure start with default values from Yu et al.):
alpha
- Type I error weightbeta
- Type II error weightpopulation_size
- Initial population sizeoffspring_size
- Number of offsprings to generatep_c
- Crossover probabilityp_m
- Mutation probabilitygeneration_limit
- No of generation cycles to trygeneration_limit_without_improvement
- Stop if this many consecutive generations show no improvementcluster_can_have_read_only_elements
- Can a square cluster have real-only elements/parties?cluster_can_have_partial_bus
- Can a bus have a subset of rows and columns filled?cluster_can_have_partial_sink
- Can a sink have a subset of column filled?cluster_can_have_partial_source
- Can a source have a subset of row filled?
Typical values from Yu et al. [1]:
- Population size = 3000
- OffSpring size = 3000
- Crossover probability p_c = 1/chromosome-length, 0.5, or 1
- Mutation probability p_m = 1/chromosome-length
- Generation limit = Not specified
- Generation limit without improvement = 50
- Alpha (Type I error weight) = 0.8116
- Beta (Type II error weight) = 0.1102
- 1 - Alpha - Beta = 0.0784 If number of clusters (n_c) is not decided, a typical value could be 1/2 n_n
DSM and DMM files should be in CSV format. A DSM file should have the following format:
- List of labels starting with a comma to indicate an empty cell, e.g.,
,C,D,E,A,B
- Keep diagonals empty
- From row 2 onwards 1st value must be a label
- Use
1
,x
, orX
to indicate the presence of a relationship - Use
0
,o
,O
, or empty cell to indicate the absence of a relationship
A DMM file should have the following format:
- Data must be in rows while parties must be in columns (just an assumption).
- List of party labels starting with a comma to indicate an empty cell, e.g.,
,P2,P1,P3,P5,P4
- From row 2 onwards 1st value must be a data label
- Use
1
,x
, orX
to indicate the presence of a relationship - Use
0
,o
,O
, or empty cell to indicate the absence of a relationship
Sample input DSM and DMM files are provided. Also, DSMs and DMMs from our paper can be found in DSMs_DMMs_from_paper folder.
Set following command-line arguments (some are optional, and if not provided default values are used):
-b
,--busses
- No of busses to generate (integer). Optional (default is0
)-c
,--clusters
- No of square clusters to generate (integer). Compulsory-i
,--input
- Input matrix file to cluster (DSM or DMM). It should be a CSV file in given format (see example). Optional (default isdsm.csv
)-o
,--output
- Output matrix file name. Optional (default isclusters.csv
)-p
,--params
- Config file with parameters. Optional (default isconfig.ini
)-u
,--sources
- No of sources (aka writers) to generate (integer). Optional (default is0
)-r
,--seed
- Random seed (integer). Optional (default is123
)-s
,--Sinks
- No of Sinks (aka readers) to generate (integer). Optional (default is0
)-t
,--type
- Type of matrix to cluster, i.e.,DSM
orDMM
. Optional (default isDSM
To cluster the sample DSM use the following command:
python3 DLTClust -c 2
Use the following command to cluster one of the example DSMs from the paper to get up to 4 square clusters, a data bus, sink, and source while setting the random seed to 123:
python3 DLTClust -c 4 -b 1 -u 1 -s 1 -r 123 -i DLTClust/DSMs_DMMs_from_paper/DSM_single_party_instance_supply_chain.csv
To cluster the sample DMM use the following command:
python3 DLTClustlust -c 2 -i DLTClust/dmm.csv -t DMM
Use the following command to cluster one of the example DMMs from the paper to get up to 5 clusters while setting the random seed to 999:
python3 DLTClust -c 5 -r 999 -i dltclust/DSMs_DMMs_from_paper/DMM_BCDFI.csv -t DMM
python3 -m unittest discover DLTClust
This project uses for code quality controls.
This software is released under the CSIRO Open Source Software License Agreement. Details can be found at LICENSE.