This repo contains the PyTorch implementation for "GraphFC: Customs Fraud Detection with Label Scarcity"
Model architecture of GraphFC. Cross features extracted from GBDT step act as node features in the transaction graph. In the pre-training stage, GraphFC learns the model weights and refine the transaction representations. Afterwards, the model is fine-tuned with labeled data with dual-task learning framework to predict the illicitness and the additional revenue.
The model code for GraphFC lies in graph_sage
directory.
Simply run graph_sage/train.py
and specify the dataset parameters could train the model and evaluate the performance.
Please refer to the scripts under the directory run_*Data.sh
for reproduce the results for individual country.
graph_sage
|-- dataset.py -> Preprocess for customs data
|-- models.py -> Main model modules
|-- parser.py -> training arguments
|-- pygData_util.py -> Data structure for graph data
|-- run_Mdata.sh
|-- run_Ndata.sh
|-- run_Tdata.sh
|-- train.py -> Train model
|-- utils.py
# Dataset parameters
--data: Country name for building dataset ['synthetic', 'real-n', 'real-m', 'real-t']
--initial_inspection_rate: Initial inspection rate of labeled data
--train_from: Starting date of training data
--test_from: Starting date of testing data
--test_length: Number of days for testing data
# GraphFC Hyperparameters
--seed: Random seed
--epoch: number of epochs
--l2: l2 regularization
--dim: dimension for hidden layers
--lr: learning rate
--device: The device name for training, if train with cpu, please use:"cpu"
You can experiment with GraphFC by downloading synthetic customs data from this repo.