Code Monkey home page Code Monkey logo

text-classification-based-approach-for-evaluating-and-enhancing-machine-interpretability-of-building's Introduction

Text-Classification-Based-Approach-for-Evaluating-and-Enhancing-Machine-Interpretability-of-Building

--author: zhengzhe
--date: 2022.10.26

environment

python 3.7
torch 1.12.1+cu116 boto3 1.24.28
matplotlib 3.5.3
tqdm
sklearn
tensorboardX

Dataset

  • Description: Chinese rule dataset including seven categories are established to classify the interpretability level of each rule in a building code
  • The original labeled dataset can be found in CivilRules/dataset
  • The training, validation, and test dataset can be found in CivilRules/data
Category Definition Interpretability
direct The required information is explicitly available from the BIM model Easy
indirect The required information is implicitly stored in the BIM model. A set of derivations and calculations should be performed. Easy
method An extended data structure and domain-specific knowledge are required. Medium
reference The external information, including pictures, formulas, tables, and other rules or appendices in the current code or other codes, is required. Medium
general The rules provide macro design guidance. Hard
term The rules define the terms used in the codes. Hard
other The rules do not belong to the above six categories. Hard

Models

model Weighted F1 score
TextCNN 86.3%
TextRNN 72.2%
TextRNN-Att 81.5%
Transformers 74.0%
Bert 88.04%
RuleBERT 93.68%

Further pretrained domain-specific models

  • The original Bert model can be found in google drive
    • Please put the original Bert model in ./bert_pretrain
  • The further pretrained domain-specific Bert model (RuleBERT) can be found in google drive
    • Please put the RuleBERT model in ./bert_pretraindc

Finetune BERT models

  • The well trained BERT models (.ckpt files) can be found in google drive
  • Please put these models in ./CivilRules/save_dict

Well-trained other models

How to use

Validate the BERT model results using well fine-tuned model

  • assert the bert models and the finetune models have been put into the right place
  • put test dataset (test.txt) in to ./CivilRules/data
# validate the bert model weighted F1 score
python test.py --model bert
# validate the RuleBERT model weighted F1 score
python test.py --model bertDC

Train your own model using grid_search to find the best model

  • prepare your own test dataset in to ./CivilRules/data
  • modify the dataset, learning_rates, batch_sizes in grid_search.py
# to finetune bert model
python grid_search.py --model bert
# to finetune RuleBERT model
python grid_search.py --model bertDC

Predict with the well-trained BERT model

  • prepare your own prediction dataset (predict.txt) and named it to dev.txt, and then put it in to ./CivilRules/data
  • modify the dataset in application.py
  • prepare well-trained bert model in to ./CivilRules/save_dict
python application.py --model bert
python application.py --model bertDC
  • the result will be saved in ./CivilRules/predict

text-classification-based-approach-for-evaluating-and-enhancing-machine-interpretability-of-building's People

Contributors

skydustz avatar

Stargazers

monkeycraps avatar juetes avatar SubChange avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

James Cloos avatar Kostas Georgiou avatar SubChange avatar  avatar

text-classification-based-approach-for-evaluating-and-enhancing-machine-interpretability-of-building's Issues

can't find related paper

Hellow,has this paper been published or will be published soon? I can't find related paper.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.