Text-Classification-Based-Approach-for-Evaluating-and-Enhancing-Machine-Interpretability-of-Building

--author: zhengzhe
--date: 2022.10.26

Description: Code and dataset for the paper named "Text Classification-Based Approach for Evaluating and Enhancing Machine Interpretability of Building Codes".
This is a Pytorch-based Bert Chinese text classification approach.
Bert models thanks to Bert-Chinese-Text-Classification-Pytorch
Other models thanks to Chinese-Text-Classification-Pytorch

environment

python 3.7
torch 1.12.1+cu116 boto3 1.24.28
matplotlib 3.5.3
tqdm
sklearn
tensorboardX

Description: Chinese rule dataset including seven categories are established to classify the interpretability level of each rule in a building code
The original labeled dataset can be found in CivilRules/dataset
The training, validation, and test dataset can be found in CivilRules/data

Category	Definition	Interpretability
direct	The required information is explicitly available from the BIM model	Easy
indirect	The required information is implicitly stored in the BIM model. A set of derivations and calculations should be performed.	Easy
method	An extended data structure and domain-specific knowledge are required.	Medium
reference	The external information, including pictures, formulas, tables, and other rules or appendices in the current code or other codes, is required.	Medium
general	The rules provide macro design guidance.	Hard
term	The rules define the terms used in the codes.	Hard
other	The rules do not belong to the above six categories.	Hard

The original Bert model can be found in google drive
- Please put the original Bert model in ./bert_pretrain
The further pretrained domain-specific Bert model (RuleBERT) can be found in google drive
- Please put the RuleBERT model in ./bert_pretraindc

The well-trained models (TextCNN, TextRNN, TextRNN-Att, Transformers) can be found in google drive
Reproduce the result can use the code from Chinese-Text-Classification-Pytorch

assert the bert models and the finetune models have been put into the right place
put test dataset (test.txt) in to ./CivilRules/data

# validate the bert model weighted F1 score
python test.py --model bert
# validate the RuleBERT model weighted F1 score
python test.py --model bertDC

# to finetune bert model
python grid_search.py --model bert
# to finetune RuleBERT model
python grid_search.py --model bertDC

prepare your own prediction dataset (predict.txt) and named it to dev.txt, and then put it in to ./CivilRules/data
modify the dataset in application.py
prepare well-trained bert model in to ./CivilRules/save_dict

python application.py --model bert
python application.py --model bertDC