SE3M

This repository contains the source codes and data set used in the experiments at the article entitled "SE3M: A model for estimating software effort using pre-trained embeddings models" (FÁVERO et al., 2020).

Fávero, E. M., Casanova, D., & Pimentel, A. R. (2020). SE3M: A Model for Software Effort Estimation Using Pre-trained Embedding Models. arXiv preprint arXiv:2006.16831.

Other related works:

Fávero, E. M. D. B., Pereira, R., Pimentel, A. R., & Casanova, D. (2018). Analogy-based Effort Estimation: A Systematic Mapping of Literature. INFOCOMP Journal of Computer Science, 17(2), 07-22.
Fávero, E. M. D. B., Casanova, D., & Pimentel, A. R. (2019, September). EmbSE: A Word Embeddings Model Oriented Towards Software Engineering Domain. In Proceedings of the XXXIII Brazilian Symposium on Software Engineering (pp. 172-180).

Resources available:

Data set (user story) labeled [1], used for training and test ing the inference model. https://github.com/morakotch/datasets/tree/master/storypoint/IEEE%20TSE2018/dataset
- Correspond to a set of .CSV files for each of the projects used.
Pre-trained embeddings (generic).
- Available in the folder "pretrain_model"
  - word2vec_base
  - BERT_base
Unlabeled data set (user story) used in the fine-tuning process of pre-trained embeddings. https://github.com/morakotch/datasets/tree/master/storypoint/IEEE%20TSE2018/pretrain%20data
The pre-processing of the data used to perform the fine-tuning process with BERT, as well as fine-tuning, used the methods provided by the BERT model in its official repository at https://github.com/google-research/bert
- For data pre-processing: create_pretraining_data.py
  - Standard parameters were used, changing only the following:: -input_file= (inform file .txt containing all the textual requirements provided in item 2) -output_file=./filename.tfrecord -vocab_file= (inform file .txt corresponding to the vocabulary of the pre-trained model used, ex./uncased_L-12/vocab.txt)
Pre-trained embeddings (fine-tuned) models for the specific domain of software engineering (SE):
- word2vec_SE
- BERT_SE
The "SE3M_model.ipynb" file contains a deep learning of architecture used as an inference model for estimating software effort by analogy. Is a Google Colab notebook, simply replacing the paths of the files used.

References:

[1] M. Choetkiertikul, HK Dam, T. Tran, TTM Pham, A. Ghose e T. Menzies, "A deep learning model for estimating story points.", IEEE Trans. Softw. Eng. Vol. PP, não. 99, p. 1, 2018.

jprovencher / se3m Goto Github PK

se3m's Introduction

SE3M

se3m's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent