henrysky / astronn Goto Github PK

View Code? Open in Web Editor NEW

189.0 9.0 53.0 181.3 MB

Deep Learning for Astronomers with Tensorflow

Home Page: http://astronn.readthedocs.io/

License: MIT License

Python 99.98% HTML 0.02%

tensorflow neural-network python astronomy astrophysics science neural-networks

astronn's People

Contributors

Stargazers

Watchers

Forkers

jobovy errai34 npricejones bbw7561135 ivanredbread hoondori himelys rgcl birajaghoshal tundeakins phisyche wh-forker yonghoonkwon kfeeeeee elainezhuo abtinshahidi blackcatrecycler salmanhiro jiadonglee jgraving ninoc neil-lid saeedtaghavi suyanzhou626 ondrocks renlliang3 calvinkkd binodbhttr daocalendar sailfish009 yutaozhou igomezv abdulfattahbaalawi ernurator pooyam toanupdixit dra-chaos bkmgit worldmovers shubhangi17002 streetquant plutomingyu kmeyer001 rkpradhan zhaopw5 richardscottoz giuliorusso wuyunfa suman-mukherje sharmarahul20 adrita-khan

astronn's Issues

Weird errors raised by running the new accelerated BNN test() method

System information

Have I written custom code?: Irrelevant
OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): Win10 v1706 x64, CentOS 7.4
astroNN (Build or Version): 0.9.2.8dev
Did you try the latest astroNN commit?: Yes
TensorFlow installed from (source or binary, official build?): official GPU build for Windows and CPU for CentOS
TensorFlow version: 1.7.0rc-1 for Windows, 1.7.0 for CentOS
Keras version: 2.1.5
Python version: 3.6
CUDA/cuDNN version (Only neccessary if you are using Tensorflow-gpu): 9.0/7.0
GPU model and memory (Only neccessary if you are using Tensorflow-gpu): Irrelevant
Exact command/script to reproduce (optional): Running BNN test() mulitple times in a row

Describe the problem

Running BNN test() mulitple times (the 7th time??) in a row will raise a weird error complaining shape not right, or dimension not right, can be reproduced on both CPU and GPU on my Windows and astro department linux server.

This bug is initially discovered by doing open/globular clusters benchmark, because I need to run BNN test() method for every cluster by stopped by this bug

Source code / logs

Variation 1:

ValueError                                Traceback (most recent call last)
<ipython-input-7-c36beede380f> in <module>()
     57         print(np.sum(np.isnan(spec)))
     58     print(name, ' and number of stars: ', indices.shape[0])
---> 59     pred, pred_var = bcnn.test(spec[1:])
     60     means = np.mean(pred, axis=0)
     61     mad_stds = mad_std(pred, axis=0)

d:\university\ast425\astronn\astroNN\models\BayesianCNNBase.py in test(self, input_data, inputs_err)
    210                                                                                           inputs_err[data_gen_shape:])
    211             remainder_result = np.asarray(new.predict_generator(remainder_generator, steps=1))
--> 212             result = np.concatenate((result, remainder_result))
    213 
    214         if result.ndim < 3:  # in case only 1 test data point, in such case we need to add a dimension

ValueError: all the input arrays must have same number of dimensions

Variation 2

ValueError                                Traceback (most recent call last)
<ipython-input-8-b4056e9283f7> in <module>()
     55         print(np.sum(np.isnan(spec)))
     56     print(name, ' and number of stars: ', indices.shape[0])
---> 57     pred, pred_var = bcnn.test(spec[1:])
     58     means = np.mean(pred, axis=0)
     59     mad_stds = mad_std(pred, axis=0)

d:\university\ast425\astronn\astroNN\models\BayesianCNNBase.py in test(self, input_data, inputs_err)
    218 
    219         predictions = result[:, :half_first_dim, 0]  # mean prediction
--> 220         mc_dropout_uncertainty = result[:, :half_first_dim, 1] * (self.labels_std ** 2)  # model uncertainty
    221         predictions_var = np.exp(result[:, half_first_dim:, 0]) * (self.labels_std ** 2)  # predictive uncertainty
    222 

ValueError: operands could not be broadcast together with shapes (1,5075) (25,)

Suggestion

The cause is unknown but BNN test_old() method is unaffected

Galaxy-10 missing images

I was considering doing a few demos with the Galaxy10 dataset but noticed that the Galaxy10.h5 file linked here has 21785 images and not the 25753 stated on the webpage. Was this a typo or are some images missing?

Thanks for assembling this fun toy dataset :)

Bugs in 3 of the demo_tutorial/NN_uncertainty_analysis

System information

I try to learn astroNN by executing
https://github.com/henrysky/astroNN/tree/master/demo_tutorial/NN_uncertainty_analysis noteboos on my Mac or on binder

But these introductory examples are buggy. As a beginner on deeplearning, it is not obvious for me to correct some simple bugs.

Those notebooks are very old and are not working anymore.

**OS Platform and Distribution MacOSX: Big Sur (but same on binder)
astroNN (Build or Version): master
Did you try the latest astroNN commit?: I have done git clone from master
TensorFlow installed from (source or binary, official build?): pip install
TensorFlow version: tensorflow 2.12.0
Python version: Python 3.9.16
Exact command/script to reproduce (if applicable):

Describe the problem

Describe the problem clearly here. Be sure to describe here why it's a bug in astroNN (instead of Tensorflow's problem) or a feature request.

Among the 4 examples

Uncertainty_Demo_MNIST.ipynb --> OK
Uncertainty_Demo_quad.ipynb --> Does not work
Uncertainty_Demo_x_sinx.ipynb --> Does not work
Uncertainty_Demo_x_sinx_tfp.ipyn --> Does note work

After minor numpy format correction I have found inUncertainty_Demo_quad.ipynb , the generator generate_train_batch(x, y, y_err) is not accepted by model.fit(), more over the proposed model.fit_generator() is not accepted anymore by Tensorflow.

In the section Third, use a single model to get both epistemic and aleatoric uncertainty with variational inference

I tried to skip the generator by providing directly the data not involving any generator, but the data format was not accepted.

   the_in,the_out =  next(generator) 
  model.fit(the_in,the_out, epochs=20, max_queue_size=20, verbose=0, 
                steps_per_epoch= x.shape[0] // batch_size)

I have no deep knowledge in Tensorflow to understand the data format error.

     TypeError: You are passing KerasTensor(type_spec=TensorSpec(shape=(), dtype=tf.float32, name=None), name='Placeholder:0', description="created by layer 'tf.cast_2'"), an intermediate Keras symbolic input/output, to a TF API that does not allow registering custom dispatchers, such as `tf.cond`, `tf.function`, gradient tapes, or `tf.map_fn`. Keras Functional model construction only supports TF API calls that *do* support dispatching, such as `tf.math.add` or `tf.reshape`. Other APIs cannot be called directly on symbolic Kerasinputs/outputs. You can work around this limitation by putting the operation in a custom Keras layer `call` and calling that layer on this symbolic input/output.

I hope you could quickly fix these simple examples such I could start from a simple working example.

Many thanks.

Complete Tensorflow support without installing Keras separately

System information

Have I written custom code?: Irrelevant
OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): Irrelevant
astroNN (Build or Version): Irrelevant
Did you try the latest astroNN commit?: Irrelevant
TensorFlow installed from (source or binary, official build?): Irrelevant
TensorFlow version: >=1.5.0
Keras version: Irrelevant
Python version: >=3.5
CUDA/cuDNN version (Only neccessary if you are using Tensorflow-gpu): Irrelevant
GPU model and memory (Only neccessary if you are using Tensorflow-gpu): Irrelevant
Exact command/script to reproduce (optional): Irrelevant

Describe the problem

Since Tensorflow 1.5.0, Keras is an official part of Tensroflow API (tensorflow.keras). astroNN should support both keras and tensorflow.keras.

What is done?

Loss functions are all written in tensorflow

What is not done?

Layers and CallBacks are all written with keras
Models and training process are all written with keras
Session management is currently done with keras
astroNN's configuration file

Source code / logs

A relevant discussion on Keras github

Suggestion

Configuration file (let users choose keras or tensorflow.keras)
Default confuguration should point to keras or tensorflow.keras??

ODE example on tensorflow 2.2.0

When I run the odeint example on tensorflow 2.2.0 i get the error:

  File "C:\Users\jhsmi\pp\astroNN\astroNN\neuralode\dop853.py", line 177, in dopri853core
    if tf.equal(hmax, 0.0):
  File "C:\Users\jhsmi\Miniconda3\envs\py37_tf_dev\lib\site-packages\tensorflow\python\framework\ops.py", line 778, in __bool__
    self._disallow_bool_casting()
  File "C:\Users\jhsmi\Miniconda3\envs\py37_tf_dev\lib\site-packages\tensorflow\python\framework\ops.py", line 545, in _disallow_bool_casting
    "using a `tf.Tensor` as a Python `bool`")
  File "C:\Users\jhsmi\Miniconda3\envs\py37_tf_dev\lib\site-packages\tensorflow\python\framework\ops.py", line 532, in _disallow_when_autograph_enabled
    " decorating it directly with @tf.function.".format(task))
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did not convert this function. Try decorating it directly with @tf.function.

It works fine for me on TF 2.1.0

Parrallel odeint integration wrt func or parameter

If I have an ODE function for example like this:

class ODE(object):
    def __init__(self, k1, k2):
        self.k1, self.k2 = k1, k2

    def __call__(self, y, t):
        d_1 = - self.k1 * y[0] + self.k2 * y[1]
        d_2 = self.k1 * y[0] - self.k2 * y[1]

        return tf.stack([d_1, d_2])

ode_func = ODE(3., 5.)

And if I now would like to do this in parallel over k1, k2, would this be the way to do it?

class ODE(object):
    def __init__(self, k1, k2):
        self.k1, self.k2 = k1, k2
        self.size = len(k1)

    def __call__(self, y, t):
        d_1 = - self.k1 * y[:self.size] + self.k2 * y[self.size:]
        d_2 = self.k1 * y[:self.size] - self.k2 * y[self.size:]

        return tf.concat([d_1, d_2], axis=0)

cpu_fallback()
gpu_memory_manage()


k1 = tf.constant(np.arange(1., 6), dtype=tf.float64)
k2 = tf.constant(np.arange(1., 6)[::-1], dtype=tf.float64)

ode_func = ODE(k1, k2)

NUM_SAMPLES=100
y_init = tf.concat([np.ones(5, dtype=np.float), np.zeros(5, dtype=np.float)], axis=0)
t = tf.constant(np.linspace(0., 10., num=NUM_SAMPLES), dtype=tf.float64)
f = ODE(k1, k2)
y = odeint(f, y_init, t, precision=tf.float64)

Transfer learning & Fine-tuning

Hi, Henry. I've got a well trained astroNN model, but I want to do some transfer learning to make it adaptable to another survey. What I've done is remove the top dense layer of the base model and build a new dense layer, but now it can only be treat like an ordinary keras model. By the way, the base model itself is a custom model under the parent class ''BayesianCNNBase''

I'm wondering:

What should I do if I want to build a new astroNN model on an astroNN base model? Should I build a new class, say ''transfer_model'', under ''BayesianCNNBase'' and load the base model in my new def model() function?
How can I do the fine-tuning step(fit_on_batch seems not enough)?

Thank you!

Current .h5 dataset loading mechanism is problematic

Currently, this is viewed as a low priority performance related issue. Probably wont be fixed in near future

System information

Have I written custom code?: Irrelevant
OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): Irrelevant
astroNN (Build or Version): commit 29fde34
TensorFlow installed from (source or binary, official build?): Irrelevant
TensorFlow version: Irrelevant
Keras version: Irrelevant
Python version: Irrelevant
CUDA/cuDNN version (Only neccessary if you are using Tensorflow-gpu): Irrelevant
GPU model and memory (Only neccessary if you are using Tensorflow-gpu): Irrelevant
Exact command/script to reproduce (optional): Irrelevant

Describe the problem

Current .h5 dataset loading mechanism is problematic due to the fact that astroNN load the whole dataset into memory regardless of the size. It will eventually be a serious problem if the dataset is too big and have too little memory (Already a little problem of loading APOGEE training data (~12GB on my 16GB RAM laptop and desktop)

Source code / logs

Irrelevant

Suggestion

Neural Network/Data generator should talk to H5Loader directly instead of H5Loader loads the whole dataset to memory to Neural Network/Data generator.

Can not reproduce results of Uncertainty_Demo_MNIST.ipynb

Hi, thanks for sharing these great implementation on github! Nice work.

I ran your notebook Uncertainty_Demo_MNIST.ipynb.
However I can not get the same results as it showed in the notebook output. The loss I got are all nan.

Could you suggest why?

The output I got from the second cell (Train the neural network on MNIST training set):

Number of Training Data: 54000, Number of Validation Data: 6000
====Message from Normalizer====
You selected mode: 255
Featurewise Center: False
Datawise Center: False
Featurewise std Center: False
Datawise std Center: False
====Message ends====
====Message from Normalizer====
You selected mode: 0
Featurewise Center: False
Datawise Center: False
Featurewise std Center: False
Datawise std Center: False
====Message ends====
Sorry but there is a known issue of the loss not handling loss correctly. I will fix it in May-- Henry 19 April 2018
Epoch 1/5
 - 163s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0980 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.0991
Epoch 2/5
 - 159s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0987 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1047

Epoch 00002: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.
Epoch 3/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.1001 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.0971

Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.
Epoch 4/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0967 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1008

Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.
Epoch 5/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0998 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1003

Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.
Completed Training, 794.97s in total

Thanks!

Keras's fit_generator failed when use_multiprocessing=True on WIndows only

System information

Have I written custom code?: Nope
OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): Windows 10 v1709 x64
astroNN (Build or Version): commit b27d557
TensorFlow installed from (source or binary, official build?): official py36 build
TensorFlow version: 1.5-rc-1
Keras version: 2.1.3
Python version: 3.6.3
CUDA/cuDNN version (Only neccessary if you are using Tensorflow-gpu): Cuda 9.0, CuDNN 7.0
GPU model and memory (Only neccessary if you are using Tensorflow-gpu): GTX1060 6GB
Exact command/script to reproduce (optional): use_multiprocessing=True in fit_generator

Describe the problem

astroNN's generator is already thread safe

It is a known issue on Windows caused by python. Probably will work on Linux/MacOS.

So far the only issue is CPU can't generate data fast enough for a fast GPU (GTX970 or above and at least 4 threads CPU).

Only neccessary when you are using BCNN with GPU training

Link: matterport/Mask_RCNN#13
Link: keras-team/keras#6582

Source code / logs

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-17f261cd711f> in <module>()
      2 bcnn = Apogee_BCNN()
      3 bcnn.max_epochs = 75
----> 4 bcnn.train(x,y,x_err,y_err)

d:\university\ast425\astronn\astroNN\models\Apogee_BCNN.py in train(self, input_data, labels, inputs_err, labels_err)
    111                                        validation_steps=self.val_num // self.batch_size,
    112                                        epochs=self.max_epochs, verbose=2, workers=os.cpu_count(),
--> 113                                        callbacks=[reduce_lr, csv_logger], use_multiprocessing=True)
    114 
    115         # Call the post training checklist to save parameters

~\Anaconda3\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name +
     90                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

~\Anaconda3\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   2097                             val_enqueuer = GeneratorEnqueuer(validation_data,
   2098                                                              use_multiprocessing=use_multiprocessing,
-> 2099                                                              wait_time=wait_time)
   2100                         val_enqueuer.start(workers=workers, max_queue_size=max_queue_size)
   2101                         validation_generator = val_enqueuer.get()

Suggestion

Detect user's OS and enable multiprocessing in fit_generator on MacOS and Linux

tensorflow 2.4.1

Hello, thank you for your work!
Does astroNN work with tensorflow 2.4.1?
Because whenever I import a module I get

cannot import name 'get_default_session' from 'tensorflow'

For example I am trying to do

from astroNN.models.apogee_models import ApogeeBCNN

thank you in advance, Lucia

DR16 astroNN catalog of distances produces incorrect parsec values for Md and Mg stars

System information

Have I written custom code?:
OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64):
astroNN (Build or Version):
Did you try the latest astroNN commit?:
TensorFlow installed from (source or binary, official build?):
TensorFlow version:
Python version:
CUDA & cuDNN version (if applicable):
GPU model and memor (if applicable)y:
Exact command/script to reproduce (if applicable):

Describe the problem

astroNN Gaia DR2 parallax zero-point offset with deep learning

Gaia DR2 calculates it as −0.029 mas.
Sloan Digital Sky Survey Apogee calculates it as −0.0523 mas.
Modified parallax = parallax - zero point offset
Data model: apogee_astroNN provides spectro-photometric deep learning parsec distances.
Distance in parsecs to the Orion Nebula for star classes BA, Fd, GKd and GKg pretty much agree. But astroNN appears to produce 4-5 times larger distances for Md and Mg stars.

Parsecs calculated with parallax zero point offset options:
Parsec- no offset
Dist - Apogee Deep Learning
DistApogee - use Apogee offset
DistGaia - use Gaia offset

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Suggestion

Optional, if you have any idea how to fix the issue

Issue loading the Galaxy10 dataset

Thank you for this lovely library first and foremost.

I am trying to access the Galaxy10 DECals dataset (as opposed to the SDSS one) without using the h5 reader as I want to use it as a colab demo.

I've run both ! pip install astroNN and tried cloning directly into the colab following your instructions on this commit: 9dcd394

Despite that, using load_galaxy10 still seems to be loading the SDSS dataset and not the DECals. Do you have any guidance?

I've looked at your code and I can't see why it's loading the old dataset.

Maybe the issue is in imports?

# Import statements
 
from astroNN.datasets import load_galaxy10
from tensorflow.keras import utils
 
# To load images and labels (will download automatically at the first time)
# labels corresponds to galaxy classes as specified by Galaxy Zoo
images, labels = load_galaxy10()

Thank you so much for your help!

Loading Galaxy10 dataset

"To load images and labels (will download automatically at the first time)"
"# First time downloading location will be ~/.astroNN/datasets/"
images, labels = load_galaxy10()

Trying to load the galaxy10 dataset using astroNN but i am getting the following error:
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate(_ssl.c:1131)>

Anyone knows why this is? Thanks in advance.

ApogeeBCNN() dimensions

Hello and thank you for sharing your work.
I want to classify images with color depth with a Bayesian Neural Network.
Though, with this model, I am getting a dimensions error:

Input 0 of layer max_pooling1d_13 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 75, 75, 3)

My input is a dataset loaded with

training_dataset = tf.keras.preprocessing.image_dataset_from_directory

and converted to tensors with

images, labels = next(iter(training_dataset))

so I am trying to train the model with

bcnn_net = ApogeeBCNN()
bcnn_net.fit(images, labels )

Why am I getting this error? Is there a specific way to pass the data?

Thank you, Lucia

henrysky / astronn Goto Github PK

astronn's People

Contributors

Stargazers

Watchers

Forkers

astronn's Issues

System information

Describe the problem

Source code / logs

Variation 1:

Variation 2

Suggestion

System information

Describe the problem

System information

Describe the problem

Source code / logs

Suggestion

Currently, this is viewed as a low priority performance related issue. Probably wont be fixed in near future

System information

Describe the problem

Source code / logs

Suggestion

System information

Describe the problem

Source code / logs

Suggestion

System information

Describe the problem

Source code / logs

Suggestion

Recommend Projects

Recommend Topics

Recommend Org