Code Monkey home page Code Monkey logo

tablegan's Introduction

table-GAN

tableGAN is the implementation of Data Synthesis based on Generative Adversarial Networks paper. It is a synthetic data generation technique which has been implemented using a deep learning model based on Generative Adversarial Network (GAN) architecture. The goal of this technique is to protect sensitive data against re-identification attacks by producing synthetic data out of real data while preserving statistical features.

The paper can be accessed here: Data Synthesis based on Generative Adversarial Networks

Prerequisites

Data Sets

All the datasets used in tableGAN should be placed in /data/ directory, each dataset is placed in a separate folder with its corresponding name. Our experiments use these four datasets :

  • "Adult": Contains personal records (such as nationality, education level, occupation, work hours per week, and so forth)

Input Data Files

Each data set folder contains different files used for the training phases:

  • DATASET_train_cleaned.csv: Contains cleaned data from original source of the dataset (above link), stored in a CSV format. This is used for training only.

Usage

1.Training

To Train a model with the datasets it is required to run the following script in the shell of your operating system (assuming Python and Tensoflow and all libraries are installed before)

$ python main.py --dataset=DATASET_NAME --test_id =TEST_ID --epoch= 200 --train

Command Line Parameters:

  • DATASET_NAME parameter used in input files and generating script should be one of the following values (case-sensitive):

  • Adult

  • TEST_ID parameter is a parameter defining a set of internal parameters affecting the quality or privacy level of synthesized data.

  • TEST_ID used in training and generating command lines should have one of the following values:

  • 'OI_11_00': 'beta':1.0 , 'delta_v': 0.0 , 'delta_m' : 0.0

  • 'OI_11_11': 'beta':1.0 , 'delta_v': 0.1 , 'delta_m' : 0.1

  • 'OI_11_22': 'beta':1.0 , 'delta_v': 0.2 , 'delta_m' : 0.2

  • --train parameter indicates the training phase of the model and is very important to be placed in the command line

  • --epoch parameter defines the number of iterations(epochs) used to train the model. The default value is 100, but other values can be set. Bigger values can lead to better quality models but can be time consuming.

Example: Training a model for Adult dataset (As also mentioned in train_Adult.sh script file).

$ python main.py --dataset=Adult --test_id=OI_11_00 --train

Important : Once the training is complete, checkpoint files will be generated in the /checkpoint/DATASET_NAME/TEST_ID/DATASET_NAME_64-8-8 folder. For example for the above training command the following files will be created:

  • /checkpoint/Adult/OI_11_00/Adult_64_8_8/tableGAN_model_6002.data-0000-of -00001
  • /checkpoint/Adult/OI_11_00/Adult_64_8_8/tableGAN_model_6002.index
  • /checkpoint/Adult/OI_11_00/Adult_64_8_8/tableGAN_model_6002.meta

These files will be used automatically to generate the synthesized data.

2. Generating

To Generate synthetic data using a trained model use:

$ python main.py --dataset=DATA_SET_NAME --test_id =TEST_ID

All the parameters are similar to the training phase but the --train-- paramter should NOT be applied (as mentioned in generate_Adult.sh script file).

Example:
$ python main.py --dataset=Adult --test_id=OI_11_00 
  • Results

The generated fakes files are placed in the /samples/ folder of the root. The results of each data-set is placed in a separate folder such as: /samples/Adult. Because the fake tables are generated using different TEST_ID settings (affecting data privacy and data utility of results), each data-set folder has sub-folders with the corresponding TEST-ID values. For example /samples/Adult/dcgan contains all the results for Adult dataset generated under the settings indicated by "OI_11_00".

Each TEST_ID subfolder contains the following content:

  • DATASET_TESTID_fake.csv : Generated fake table.

For example, results of table "Adult" with settings of TEST_ID=OI_11_00:

  • Results folder : /samples/Adult/OI_11_00

  • Generated Fake Tabels : Adult_OI_11_00_fake.csv

tablegan's People

Contributors

mahmoodm2 avatar baukebrenninkmeijer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.