Search for Extra Terrestrial Intelligence (SETI)
Project overview:
Each night, using the Allen Telescope Array (ATA) in northern California, the SETI Institute scans the sky at various radio frequencies, observing star systems with known exoplanets, searching for faint but persistent signals. The current signal detection system is programmed to search only for particular kinds of signals: narrow-band carrier waves. However, the detection system sometimes triggers on signals that are not narrow-band signals (with unknown efficiency) and are also not explicitly-known radio frequency interference (RFI). There seems to be various categories of these kinds of events that have been observed in the past.
Our goal is to classify these accurately in real-time. This may allow the signal detection system to make better observational decisions, increase the efficiency of the nightly scans, and allow for explicit detection of these other signal types.
For more information refer to SETI hackathon page.
When you’ve completed this pattern, you will understand how to:
- Convert signal data into image data
- Build and train a convolutional neural networks
- Display and share results in Jupyter Notebooks This pattern will assist application developers who need to efficiently build powerful deep learning applications and use GPUs to train the model quickly.
Notebooks:
This repository includes 3 parts:
1. Preparing dataset
- Converting images to binary files using Numpy (SETI_img_to-binary.ipynb)
- In this notebook we read the Basic 4 dataset and convert signals into a binary file. Also, we split data into train/test datasets.
- Optional: Converting images to binary files using Spark (SETI_img_to_binary_spark.ipynb)
- In this notebook we read the Basic 4 dataset through Spark, and convert signals into a binary file. It is an optional notebook and you dont have to run it if you have already converted the images to binary files.
2. Classification
- Classification of images using CCN on Single GPU (SETI_CNN_Tf_SingleGpu.ipynb)
- In this Notebook, we will use the famous SETI Dataset to build a Convolutional Neural Networks capable to perform signals classification. CNN will say, with some associated error, what type of signal is the presented input.
- In our case, as we are running this notebook on IBM PowerAI, you hvae access to multi GPU, but we use one of the GPUs in this notebook, for the sake of simplicity.
- Optional: Classification of images using CCN on Multi GPU (SETI_CNN_Tf_MultiGpu.ipynb)
- This Notebook, builds a Convolutional Neural Networks, but using multi GPUs. You will use IBM PowerAI with multiple GPU to train the model in parallel manner.
- You can run this notebook in case you have access to an environment with multiple GPUs.
3. Prediction
- Use the trained model for prediciton (SETI_prediction.ipynb)
- In this notebook you can load a pre-trained model and predict the signal class.
Performance
Convelutional Neural Network involves a lot of matrix and vector multiplications that can parallelized, so GPUs can overperform, because GPUs were designed to handle these kind of matrix operations in parallel!
Why GPU overperforms?
A single core CPU takes a matrix operation in serial, one element at a time. But, a single GPU could have hundreds or thousands of cores, while a CPU typically has no more than a few cores.
How to use GPU with TensorFlow?
It is important to notice that if both CPU and GPU are available on the machine that you are running the noebook, and if a TensorFlow operation has both CPU and GPU implementations, the GPU devices will be given priority when the operation is assigned to a device.
Benchmark:
- SETI_single_gpu_train.py achieves ~72% accuracy after 3k epochs of data (75K steps).
- Speed: With batch_size 128.
- Notice: The model is not optimized to reach to its highest accuracy, you can achive better results tuning the parameters.
CPU Architecture | CPU cores | Memory | GPU | Step time (sec/batch) | Accuracy |
POWER8 | 40 | 256 GB | 1 x Tesla K80 | ~0.127 | ~72% at 75K steps (3 hours) |
POWER8 | 32 | 128 GB | 1 x Tesla P100 w/NVLink np8g4 | ~0.035 | ~72% at 75K steps (1 hour) |
- SETI_multi_gpu_train.py achieves ~72% accuracy after 75K steps.
- Speed: With batch_size 128.
- Notice: The model is not optimized to reach to highest accuracy, and you can achive better results tuning the parameters.
CPU Architecture | CPU cores | Memory | GPU | Step time (sec/batch) | Accuracy |
POWER8 | 160 | 1 TB | 4 x Tesla K80 | ~0.066 | ~72% at 75K steps (83 minutes) |
POWER8 | 64 | 256 GB | 2 x Tesla P100 w/NVLink np8g4 | ~0.033 | ~72% at 75K steps (40 minutes) |
POWER8 | 128 | 512 GB | 4 x Tesla P100 w/NVLink np8g4 | ~0.017 | ~72% at 75K steps (20 minutes) |