Code Monkey home page Code Monkey logo

quasar-mass-machine-learning's Introduction

Quasar mass machine learning

Using machine learning to predict the mass of quasar supermassive black holes

A quasar is a distant and very luminous active galactic nucleus which outshines all the starlight from its galaxy. Each quasar is powered by its galaxy's central supermassive black hole. The mass of the black hole can be found from the characteristics of the quasar spectrum using well-established "single-epoch" spectral fitting estimates - see McLure & Dunlop 2004, Vestergaard & Peterson 2006, Shen+ 2011. This involves subtracting the effects of dust in our galaxy, also subtracting a model of the main "continuum" spectrum, and estimating the width of key emission lines in the remaining spectrum, typically lines associated with emissions from hydrogen, magnesium and carbon.

The Sloan Digital Sky Survey (SDSS) has observed an extensive set of quasar spectra - see Lyke+ 2020 for details. From the SDSS Chen+ 2019 made single-epoch black hole mass estimates for 173,559 quasars by spectral fitting.

The code in this repository uses a subset of the Chen+ 2019 dataset to predict quasar black hole masses through an artificial neural network (ANN) implemented using Keras on Tensorflow2. The ANN is loosely based on a model for SketchRNN presented by @ageron in his solutions for Chapter 15 of Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow. It predicts the log of the black hole mass, log10(MBH/MSun), for each single spectrum in a selected subset of the Chen+ quasars.

The ANN takes the Chen+ "single-epoch" mass estimates as the model's ground truth for training. (Apologies that epoch has distinct astrophysics and ML meanings - hopefully always obvious from the context. The ML "epoch" only occurs in the monitor that runs with the ANN.)

The ANN comprises three Conv1D layers (the middle one with a batch normalization) followed by two LSTM layers and then two Dense layers. Two rounds of Nadam optimization are used, with learning rates of 1e-3 and 1e-5.

On the test set, this model gives an R2 of 0.72. Some quasars have been observed many times with ~70 spectra taken; for those, using the mean mass prediction gives a test set R2 of 0.75. An issue for further investigation is whether machine learning predictions for quasars with multiple spectra can be improved with a method more sophisticated than taking the mean of the associated predictions - for example an ANN approach to pooling estimates.

The files and folders are as follows:

  • Quasar_mass_ML.ipynb, the main program file to run, a Jupyter iPython file;
  • this README.md file;
  • get_Chen_data.py, a script file;
  • quasar_analysis.py, another script file;
  • a models folder for hdf5 files in two folders:
    • an empty checkpoints folder where checkpoints automatically generated by the ANN in ... will be placed; and
    • a Best_models folder where the best ANN model is kept with its two different sets of optimizers and weights. There are two files for the best model:
      • Best_ANN_intermediate_stage.h5 which has the intermediate weights from the 1e-3 learning rate model run; and
      • Best_ANN_final_stage.h5 which is the model to use for prediction, having the final weights from the 1e-5 learning rate model run;
  • a data folder, containing:

Additional data is needed to run the model as a spectra.parquet file is too big to be on GitHub (2.4GB) and is stored on Open Science Foundation at https://osf.io/6hbqx/. Download this and add it to the bigger_data sub-folder.

To create the quasars.parquet and spectra.parquet files from scratch, download to bigger_data an SDSS data file https://data.sdss.org/sas/dr16/eboss/qso/DR16Q/DR16Q_v4.fits, which is described on the relevant SDSS datamodel webpage. Follow instructions in the second cell of Quasar_mass_ML.ipynb. Please note further steps of this process downloads around 7GB, taking about half a day to run with a standard laptop and internet connection.

quasar-mass-machine-learning's People

Contributors

andrewwren avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

wolffem

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.