astronn's Issues

Have a bug when reproduce "demo_tutorial/galaxy10/Galaxy10_Tutorial.ipynb"

System information

  • Have I written custom code?:
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64):Mac M1
  • astroNN (Build or Version):
  • Did you try the latest astroNN commit?:
  • TensorFlow installed from (source or binary, official build?):
  • TensorFlow version:2.16.1
  • Python version: 3.9
  • CUDA & cuDNN version (if applicable):
  • GPU model and memor (if applicable)y:
  • Exact command/script to reproduce (if applicable):

Describe the problem

have the problem when train the nerual net, the error is :

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

TypeError Traceback (most recent call last)
Cell In[12], line 3
1 # To train the nerual net
2 # astroNN will normalize the data by default
----> 3 galaxy10net.train(train_images, train_labels)

File ~/miniforge3/envs/py3.9/lib/python3.9/site-packages/astroNN/shared/, in deprecated_copy_signature..deco..tgt(*args, **kwargs)
49 warnings.warn(
50 f"Call to function {}() is deprecated and will be removed in "
51 + f"future. Use {}() instead.",
52 stacklevel=2,
53 )
54 inspect.signature(signature_source).bind(*args, **kwargs)
---> 55 return target(*args, **kwargs)

File ~/miniforge3/envs/py3.9/lib/python3.9/site-packages/astroNN/models/, in CNNBase.train(self, *args, **kwargs)
700 @deprecated_copy_signature(fit)
701 def train(self, *args, **kwargs):
--> 702 return*args, **kwargs)

File ~/miniforge3/envs/py3.9/lib/python3.9/site-packages/astroNN/models/, in, input_data, labels, sample_weight)
380 """
381 Train a Convolutional neural network
391 :History: 2017-Dec-06 - Written - Henry Leung (University of Toronto)
392 """
393 # Call the checklist to create astroNN folder and save parameters
--> 394 self.pre_training_checklist_child(input_data, labels, sample_weight)
396 reduce_lr = ReduceLROnPlateau(
397 monitor="val_loss",
398 factor=0.5,
403 verbose=self.verbose,
404 )
406 early_stopping = EarlyStopping(
407 monitor="val_loss",
408 min_delta=self.early_stopping_min_delta,
411 mode="min",
412 )

File ~/miniforge3/envs/py3.9/lib/python3.9/site-packages/astroNN/models/, in CNNBase.pre_training_checklist_child(self, input_data, labels, sample_weight)
315 norm_labels = self.labels_normalizer.normalize(labels, calc=False)
316 if (
317 self.keras_model is None
318 ): # only compile if there is no keras_model, e.g. fine-tuning does not required
--> 319 self.compile()
321 norm_data = self._tensor_dict_sanitize(norm_data, self.keras_model.input_names)
322 norm_labels = self._tensor_dict_sanitize(
323 norm_labels, self.keras_model.output_names
324 )

File ~/miniforge3/envs/py3.9/lib/python3.9/site-packages/astroNN/models/, in CNNBase.compile(self, optimizer, loss, metrics, weighted_metrics, loss_weights, sample_weight_mode)
229 raise RuntimeError(
230 'Only "regression", "classification" and "binary_classification" are supported'
231 )
233 self.keras_model = self.model()
--> 235 self.keras_model.compile(
236 loss=loss_func,
237 optimizer=self.optimizer,
238 metrics=self.metrics,
239 weighted_metrics=weighted_metrics,
240 loss_weights=loss_weights,
241 sample_weight_mode=sample_weight_mode,
242 )
244 # inject custom training step if needed
245 try:

File ~/miniforge3/envs/py3.9/lib/python3.9/site-packages/keras/src/utils/, in filter_traceback..error_handler(*args, **kwargs)
119 filtered_tb = _process_traceback_frames(e.traceback)
120 # To get the full stack trace, call:
121 # keras.config.disable_traceback_filtering()
--> 122 raise e.with_traceback(filtered_tb) from None
123 finally:
124 del filtered_tb

File ~/miniforge3/envs/py3.9/lib/python3.9/site-packages/keras/src/utils/, in no_automatic_dependency_tracking..wrapper(*args, **kwargs)
23 @wraps(fn)
24 def wrapper(*args, **kwargs):
25 with DotNotTrackScope():
---> 26 return fn(*args, **kwargs)

TypeError: compile() got an unexpected keyword argument 'sample_weight_mode'


Optional, if you have any idea how to fix the issue

Complete Tensorflow support without installing Keras separately

System information

  • Have I written custom code?: Irrelevant
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): Irrelevant
  • astroNN (Build or Version): Irrelevant
  • Did you try the latest astroNN commit?: Irrelevant
  • TensorFlow installed from (source or binary, official build?): Irrelevant
  • TensorFlow version: >=1.5.0
  • Keras version: Irrelevant
  • Python version: >=3.5
  • CUDA/cuDNN version (Only neccessary if you are using Tensorflow-gpu): Irrelevant
  • GPU model and memory (Only neccessary if you are using Tensorflow-gpu): Irrelevant
  • Exact command/script to reproduce (optional): Irrelevant

Describe the problem

Since Tensorflow 1.5.0, Keras is an official part of Tensroflow API (tensorflow.keras). astroNN should support both keras and tensorflow.keras.

What is done?

  • Loss functions are all written in tensorflow

What is not done?

  • Layers and CallBacks are all written with keras
  • Models and training process are all written with keras
  • Session management is currently done with keras
  • astroNN's configuration file

Source code / logs

A relevant discussion on Keras github


  • Configuration file (let users choose keras or tensorflow.keras)
  • Default confuguration should point to keras or tensorflow.keras??

Keras's fit_generator failed when use_multiprocessing=True on WIndows only

System information

  • Have I written custom code?: Nope
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): Windows 10 v1709 x64
  • astroNN (Build or Version): commit b27d557
  • TensorFlow installed from (source or binary, official build?): official py36 build
  • TensorFlow version: 1.5-rc-1
  • Keras version: 2.1.3
  • Python version: 3.6.3
  • CUDA/cuDNN version (Only neccessary if you are using Tensorflow-gpu): Cuda 9.0, CuDNN 7.0
  • GPU model and memory (Only neccessary if you are using Tensorflow-gpu): GTX1060 6GB
  • Exact command/script to reproduce (optional): use_multiprocessing=True in fit_generator

Describe the problem

astroNN's generator is already thread safe

It is a known issue on Windows caused by python. Probably will work on Linux/MacOS.

So far the only issue is CPU can't generate data fast enough for a fast GPU (GTX970 or above and at least 4 threads CPU).

Only neccessary when you are using BCNN with GPU training

Link: matterport/Mask_RCNN#13
Link: keras-team/keras#6582

Source code / logs

ValueError                                Traceback (most recent call last)
<ipython-input-2-17f261cd711f> in <module>()
      2 bcnn = Apogee_BCNN()
      3 bcnn.max_epochs = 75
----> 4 bcnn.train(x,y,x_err,y_err)

d:\university\ast425\astronn\astroNN\models\ in train(self, input_data, labels, inputs_err, labels_err)
    111                                        validation_steps=self.val_num // self.batch_size,
    112                                        epochs=self.max_epochs, verbose=2, workers=os.cpu_count(),
--> 113                                        callbacks=[reduce_lr, csv_logger], use_multiprocessing=True)
    115         # Call the post training checklist to save parameters

~\Anaconda3\lib\site-packages\keras\legacy\ in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name +
     90                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

~\Anaconda3\lib\site-packages\keras\engine\ in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   2097                             val_enqueuer = GeneratorEnqueuer(validation_data,
   2098                                                              use_multiprocessing=use_multiprocessing,
-> 2099                                                              wait_time=wait_time)
   2100                         val_enqueuer.start(workers=workers, max_queue_size=max_queue_size)
   2101                         validation_generator = val_enqueuer.get()


Detect user's OS and enable multiprocessing in fit_generator on MacOS and Linux

Transfer learning & Fine-tuning

Hi, Henry. I've got a well trained astroNN model, but I want to do some transfer learning to make it adaptable to another survey. What I've done is remove the top dense layer of the base model and build a new dense layer, but now it can only be treat like an ordinary keras model. By the way, the base model itself is a custom model under the parent class ''BayesianCNNBase''

I'm wondering:

  1. What should I do if I want to build a new astroNN model on an astroNN base model? Should I build a new class, say ''transfer_model'', under ''BayesianCNNBase'' and load the base model in my new def model() function?
  2. How can I do the fine-tuning step(fit_on_batch seems not enough)?

Thank you!

Bugs in 3 of the demo_tutorial/NN_uncertainty_analysis

System information

But these introductory examples are buggy. As a beginner on deeplearning, it is not obvious for me to correct some simple bugs.

Those notebooks are very old and are not working anymore.

  • **OS Platform and Distribution MacOSX: Big Sur (but same on binder)

  • astroNN (Build or Version): master

  • Did you try the latest astroNN commit?: I have done git clone from master

  • TensorFlow installed from (source or binary, official build?): pip install

  • TensorFlow version: tensorflow 2.12.0

  • Python version: Python 3.9.16

  • Exact command/script to reproduce (if applicable):

Describe the problem

Describe the problem clearly here. Be sure to describe here why it's a bug in astroNN (instead of Tensorflow's problem) or a feature request.

Among the 4 examples

  • Uncertainty_Demo_MNIST.ipynb --> OK
  • Uncertainty_Demo_quad.ipynb --> Does not work
  • Uncertainty_Demo_x_sinx.ipynb --> Does not work
  • Uncertainty_Demo_x_sinx_tfp.ipyn --> Does note work

After minor numpy format correction I have found inUncertainty_Demo_quad.ipynb , the generator generate_train_batch(x, y, y_err) is not accepted by, more over the proposed model.fit_generator() is not accepted anymore by Tensorflow.

In the section Third, use a single model to get both epistemic and aleatoric uncertainty with variational inference

I tried to skip the generator by providing directly the data not involving any generator, but the data format was not accepted.

   the_in,the_out =  next(generator),the_out, epochs=20, max_queue_size=20, verbose=0, 
                steps_per_epoch= x.shape[0] // batch_size)

I have no deep knowledge in Tensorflow to understand the data format error.

     TypeError: You are passing KerasTensor(type_spec=TensorSpec(shape=(), dtype=tf.float32, name=None), name='Placeholder:0', description="created by layer 'tf.cast_2'"), an intermediate Keras symbolic input/output, to a TF API that does not allow registering custom dispatchers, such as `tf.cond`, `tf.function`, gradient tapes, or `tf.map_fn`. Keras Functional model construction only supports TF API calls that *do* support dispatching, such as `tf.math.add` or `tf.reshape`. Other APIs cannot be called directly on symbolic Kerasinputs/outputs. You can work around this limitation by putting the operation in a custom Keras layer `call` and calling that layer on this symbolic input/output.

I hope you could quickly fix these simple examples such I could start from a simple working example.

Many thanks.

Parrallel odeint integration wrt func or parameter

If I have an ODE function for example like this:

class ODE(object):
    def __init__(self, k1, k2):
        self.k1, self.k2 = k1, k2

    def __call__(self, y, t):
        d_1 = - self.k1 * y[0] + self.k2 * y[1]
        d_2 = self.k1 * y[0] - self.k2 * y[1]

        return tf.stack([d_1, d_2])

ode_func = ODE(3., 5.)

And if I now would like to do this in parallel over k1, k2, would this be the way to do it?

class ODE(object):
    def __init__(self, k1, k2):
        self.k1, self.k2 = k1, k2
        self.size = len(k1)

    def __call__(self, y, t):
        d_1 = - self.k1 * y[:self.size] + self.k2 * y[self.size:]
        d_2 = self.k1 * y[:self.size] - self.k2 * y[self.size:]

        return tf.concat([d_1, d_2], axis=0)


k1 = tf.constant(np.arange(1., 6), dtype=tf.float64)
k2 = tf.constant(np.arange(1., 6)[::-1], dtype=tf.float64)

ode_func = ODE(k1, k2)

y_init = tf.concat([np.ones(5, dtype=np.float), np.zeros(5, dtype=np.float)], axis=0)
t = tf.constant(np.linspace(0., 10., num=NUM_SAMPLES), dtype=tf.float64)
f = ODE(k1, k2)
y = odeint(f, y_init, t, precision=tf.float64)

ApogeeBCNN() dimensions

Hello and thank you for sharing your work.
I want to classify images with color depth with a Bayesian Neural Network.
Though, with this model, I am getting a dimensions error:

Input 0 of layer max_pooling1d_13 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 75, 75, 3)

My input is a dataset loaded with

training_dataset = tf.keras.preprocessing.image_dataset_from_directory

and converted to tensors with

images, labels = next(iter(training_dataset))

so I am trying to train the model with

bcnn_net = ApogeeBCNN(), labels )

Why am I getting this error? Is there a specific way to pass the data?

Thank you, Lucia

Problem with "demo_tutorial/galaxy10/Galaxy10_Tutorial.ipynb

System information

  • Have I written custom code?: No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): No
  • astroNN (Build or Version): 1.1.0
  • Did you try the latest astroNN commit?: Yes
  • TensorFlow installed from (source or binary, official build?): pypi
  • TensorFlow version:2.16.1
  • Python version: 3.10
  • CUDA & cuDNN version (if applicable): No
  • GPU model and memor (if applicable)y: No
  • Exact command/script to reproduce (if applicable): No

Describe the problem

Describe the problem clearly here. Be sure to describe here why it's a bug in astroNN (instead of Tensorflow's problem) or a feature request.

When I tried to try the tutorial, the galaxy10net.train report an unexpected keyword argument 'sample_weight_mode'

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

galaxy10net.train(train_images, train_labels)
<timed eval>:3: UserWarning: Call to function train() is deprecated and will be removed in future. Use fit() instead.

Number of Training Data: 17646, Number of Validation Data: 1960
====Message from Normalizer====
You selected mode: 255
Featurewise Center: {'input': False}
Datawise Center: {'input': False} 
Featurewise std Center: {'input': False}
Datawise std Center: {'input': False} 
====Message ends====
====Message from Normalizer====
You selected mode: 0
Featurewise Center: {'output': False}
Datawise Center: {'output': False} 
Featurewise std Center: {'output': False}
Datawise std Center: {'output': False} 
====Message ends====

TypeError                                 Traceback (most recent call last)
File <timed eval>:3

File ~/.local/lib/python3.10/site-packages/astroNN/shared/, in deprecated_copy_signature.<locals>.deco.<locals>.tgt(*args, **kwargs)
     49 warnings.warn(
     50     f"Call to function {target.__name__}() is deprecated and will be removed in "
     51     + f"future. Use {signature_source.__name__}() instead.",
     52     stacklevel=2,
     53 )
     54 inspect.signature(signature_source).bind(*args, **kwargs)
---> 55 return target(*args, **kwargs)

File ~/.local/lib/python3.10/site-packages/astroNN/models/, in CNNBase.train(self, *args, **kwargs)
    700 @deprecated_copy_signature(fit)
    701 def train(self, *args, **kwargs):
--> 702     return*args, **kwargs)

File ~/.local/lib/python3.10/site-packages/astroNN/models/, in, input_data, labels, sample_weight)
    380 """
    381 Train a Convolutional neural network
    391 :History: 2017-Dec-06 - Written - Henry Leung (University of Toronto)
    392 """
    393 # Call the checklist to create astroNN folder and save parameters
--> 394 self.pre_training_checklist_child(input_data, labels, sample_weight)
    396 reduce_lr = ReduceLROnPlateau(
    397     monitor="val_loss",
    398     factor=0.5,
    403     verbose=self.verbose,
    404 )
    406 early_stopping = EarlyStopping(
    407     monitor="val_loss",
    408     min_delta=self.early_stopping_min_delta,
    411     mode="min",
    412 )

File ~/.local/lib/python3.10/site-packages/astroNN/models/, in CNNBase.pre_training_checklist_child(self, input_data, labels, sample_weight)
    315     norm_labels = self.labels_normalizer.normalize(labels, calc=False)
    316 if (
    317     self.keras_model is None
    318 ):  # only compile if there is no keras_model, e.g. fine-tuning does not required
--> 319     self.compile()
    321 norm_data = self._tensor_dict_sanitize(norm_data, self.keras_model.input_names)
    322 norm_labels = self._tensor_dict_sanitize(
    323     norm_labels, self.keras_model.output_names
    324 )

File ~/.local/lib/python3.10/site-packages/astroNN/models/, in CNNBase.compile(self, optimizer, loss, metrics, weighted_metrics, loss_weights, sample_weight_mode)
    229     raise RuntimeError(
    230         'Only "regression", "classification" and "binary_classification" are supported'
    231     )
    233 self.keras_model = self.model()
--> 235 self.keras_model.compile(
    236     loss=loss_func,
    237     optimizer=self.optimizer,
    238     metrics=self.metrics,
    239     weighted_metrics=weighted_metrics,
    240     loss_weights=loss_weights,
    241     sample_weight_mode=sample_weight_mode,
    242 )
    244 # inject custom training step if needed
    245 try:

File ~/.local/lib/python3.10/site-packages/keras/src/utils/, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    119     filtered_tb = _process_traceback_frames(e.__traceback__)
    120     # To get the full stack trace, call:
    121     # `keras.config.disable_traceback_filtering()`
--> 122     raise e.with_traceback(filtered_tb) from None
    123 finally:
    124     del filtered_tb

File ~/.local/lib/python3.10/site-packages/keras/src/utils/, in no_automatic_dependency_tracking.<locals>.wrapper(*args, **kwargs)
     23 @wraps(fn)
     24 def wrapper(*args, **kwargs):
     25     with DotNotTrackScope():
---> 26         return fn(*args, **kwargs)

TypeError: Trainer.compile() got an unexpected keyword argument 'sample_weight_mode'


Which versions of the packages used in this tutorial?

DR16 astroNN catalog of distances produces incorrect parsec values for Md and Mg stars

System information

  • Have I written custom code?:
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64):
  • astroNN (Build or Version):
  • Did you try the latest astroNN commit?:
  • TensorFlow installed from (source or binary, official build?):
  • TensorFlow version:
  • Python version:
  • CUDA & cuDNN version (if applicable):
  • GPU model and memor (if applicable)y:
  • Exact command/script to reproduce (if applicable):

Describe the problem

astroNN Gaia DR2 parallax zero-point offset with deep learning

Gaia DR2 calculates it as โˆ’0.029 mas.
Sloan Digital Sky Survey Apogee calculates it as โˆ’0.0523 mas.
Modified parallax = parallax - zero point offset
Data model: apogee_astroNN provides spectro-photometric deep learning parsec distances.
Distance in parsecs to the Orion Nebula for star classes BA, Fd, GKd and GKg pretty much agree. But astroNN appears to produce 4-5 times larger distances for Md and Mg stars.

Parsecs calculated with parallax zero point offset options:
Parsec- no offset
Dist - Apogee Deep Learning
DistApogee - use Apogee offset
DistGaia - use Gaia offset


Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.


Optional, if you have any idea how to fix the issue

Loading Galaxy10 dataset

"To load images and labels (will download automatically at the first time)"
"# First time downloading location will be ~/.astroNN/datasets/"
images, labels = load_galaxy10()

Trying to load the galaxy10 dataset using astroNN but i am getting the following error:
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate(_ssl.c:1131)>

Anyone knows why this is? Thanks in advance.

Galaxy-10 missing images

I was considering doing a few demos with the Galaxy10 dataset but noticed that the Galaxy10.h5 file linked here has 21785 images and not the 25753 stated on the webpage. Was this a typo or are some images missing?

Thanks for assembling this fun toy dataset :)

ODE example on tensorflow 2.2.0

When I run the odeint example on tensorflow 2.2.0 i get the error:

  File "C:\Users\jhsmi\pp\astroNN\astroNN\neuralode\", line 177, in dopri853core
    if tf.equal(hmax, 0.0):
  File "C:\Users\jhsmi\Miniconda3\envs\py37_tf_dev\lib\site-packages\tensorflow\python\framework\", line 778, in __bool__
  File "C:\Users\jhsmi\Miniconda3\envs\py37_tf_dev\lib\site-packages\tensorflow\python\framework\", line 545, in _disallow_bool_casting
    "using a `tf.Tensor` as a Python `bool`")
  File "C:\Users\jhsmi\Miniconda3\envs\py37_tf_dev\lib\site-packages\tensorflow\python\framework\", line 532, in _disallow_when_autograph_enabled
    " decorating it directly with @tf.function.".format(task))
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did not convert this function. Try decorating it directly with @tf.function.

It works fine for me on TF 2.1.0

Issue loading the Galaxy10 dataset

Thank you for this lovely library first and foremost.

I am trying to access the Galaxy10 DECals dataset (as opposed to the SDSS one) without using the h5 reader as I want to use it as a colab demo.

I've run both ! pip install astroNN and tried cloning directly into the colab following your instructions on this commit: 9dcd394

Despite that, using load_galaxy10 still seems to be loading the SDSS dataset and not the DECals. Do you have any guidance?

I've looked at your code and I can't see why it's loading the old dataset.

Maybe the issue is in imports?

# Import statements
from astroNN.datasets import load_galaxy10
from tensorflow.keras import utils
# To load images and labels (will download automatically at the first time)
# labels corresponds to galaxy classes as specified by Galaxy Zoo
images, labels = load_galaxy10()

Thank you so much for your help!

Can not reproduce results of Uncertainty_Demo_MNIST.ipynb

Hi, thanks for sharing these great implementation on github! Nice work.

I ran your notebook Uncertainty_Demo_MNIST.ipynb.
However I can not get the same results as it showed in the notebook output. The loss I got are all nan.

Could you suggest why?

The output I got from the second cell (Train the neural network on MNIST training set):

Number of Training Data: 54000, Number of Validation Data: 6000
====Message from Normalizer====
You selected mode: 255
Featurewise Center: False
Datawise Center: False
Featurewise std Center: False
Datawise std Center: False
====Message ends====
====Message from Normalizer====
You selected mode: 0
Featurewise Center: False
Datawise Center: False
Featurewise std Center: False
Datawise std Center: False
====Message ends====
Sorry but there is a known issue of the loss not handling loss correctly. I will fix it in May-- Henry 19 April 2018
Epoch 1/5
 - 163s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0980 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.0991
Epoch 2/5
 - 159s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0987 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1047

Epoch 00002: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.
Epoch 3/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.1001 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.0971

Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.
Epoch 4/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0967 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1008

Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.
Epoch 5/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0998 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1003

Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.
Completed Training, 794.97s in total


MCDropout incompatibility with latest versions of Tensorflow

Your MCDropout layers (or variants) work very well when designing a tensorflow.keras model, for example:

x = Dense(100, activation='relu', input_shape=(n_input,))(input_z)
x = MCDropout(0.2)(x)
x = Dense(100, activation='relu')(x)
x = MCDropout(0.2)(x)

Now it still works fine during training, but the problem is when I want to load a model previously saved in an h5 file via something like tf.keras.models.save_model(neural_model, 'mymodel.h5' ). For earlier versions of Tensorflow, the following worked:

model = tf.keras.models.load_model('mymodel.h5', custom_objects={'MCDropout': MCDropout})

However, the newer versions of tensorflow do not work, and the following error is thrown:

obj = module_objects.get(object_name)
AttributeError: 'NoneType' object has no attribute 'get'

I think maybe using MCDropout after training, might work better, or redefine the method according to the new tensorflow version.

Weird errors raised by running the new accelerated BNN test() method

System information

  • Have I written custom code?: Irrelevant
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): Win10 v1706 x64, CentOS 7.4
  • astroNN (Build or Version):
  • Did you try the latest astroNN commit?: Yes
  • TensorFlow installed from (source or binary, official build?): official GPU build for Windows and CPU for CentOS
  • TensorFlow version: 1.7.0rc-1 for Windows, 1.7.0 for CentOS
  • Keras version: 2.1.5
  • Python version: 3.6
  • CUDA/cuDNN version (Only neccessary if you are using Tensorflow-gpu): 9.0/7.0
  • GPU model and memory (Only neccessary if you are using Tensorflow-gpu): Irrelevant
  • Exact command/script to reproduce (optional): Running BNN test() mulitple times in a row

Describe the problem

Running BNN test() mulitple times (the 7th time??) in a row will raise a weird error complaining shape not right, or dimension not right, can be reproduced on both CPU and GPU on my Windows and astro department linux server.

This bug is initially discovered by doing open/globular clusters benchmark, because I need to run BNN test() method for every cluster by stopped by this bug

Source code / logs

Variation 1:

ValueError                                Traceback (most recent call last)
<ipython-input-7-c36beede380f> in <module>()
     57         print(np.sum(np.isnan(spec)))
     58     print(name, ' and number of stars: ', indices.shape[0])
---> 59     pred, pred_var = bcnn.test(spec[1:])
     60     means = np.mean(pred, axis=0)
     61     mad_stds = mad_std(pred, axis=0)

d:\university\ast425\astronn\astroNN\models\ in test(self, input_data, inputs_err)
    210                                                                                           inputs_err[data_gen_shape:])
    211             remainder_result = np.asarray(new.predict_generator(remainder_generator, steps=1))
--> 212             result = np.concatenate((result, remainder_result))
    214         if result.ndim < 3:  # in case only 1 test data point, in such case we need to add a dimension

ValueError: all the input arrays must have same number of dimensions

Variation 2

ValueError                                Traceback (most recent call last)
<ipython-input-8-b4056e9283f7> in <module>()
     55         print(np.sum(np.isnan(spec)))
     56     print(name, ' and number of stars: ', indices.shape[0])
---> 57     pred, pred_var = bcnn.test(spec[1:])
     58     means = np.mean(pred, axis=0)
     59     mad_stds = mad_std(pred, axis=0)

d:\university\ast425\astronn\astroNN\models\ in test(self, input_data, inputs_err)
    219         predictions = result[:, :half_first_dim, 0]  # mean prediction
--> 220         mc_dropout_uncertainty = result[:, :half_first_dim, 1] * (self.labels_std ** 2)  # model uncertainty
    221         predictions_var = np.exp(result[:, half_first_dim:, 0]) * (self.labels_std ** 2)  # predictive uncertainty

ValueError: operands could not be broadcast together with shapes (1,5075) (25,) 


The cause is unknown but BNN test_old() method is unaffected

tensorflow 2.4.1

Hello, thank you for your work!
Does astroNN work with tensorflow 2.4.1?
Because whenever I import a module I get

cannot import name 'get_default_session' from 'tensorflow'

For example I am trying to do

from astroNN.models.apogee_models import ApogeeBCNN

thank you in advance, Lucia

Current .h5 dataset loading mechanism is problematic

Currently, this is viewed as a low priority performance related issue. Probably wont be fixed in near future

System information

  • Have I written custom code?: Irrelevant
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): Irrelevant
  • astroNN (Build or Version): commit 29fde34
  • TensorFlow installed from (source or binary, official build?): Irrelevant
  • TensorFlow version: Irrelevant
  • Keras version: Irrelevant
  • Python version: Irrelevant
  • CUDA/cuDNN version (Only neccessary if you are using Tensorflow-gpu): Irrelevant
  • GPU model and memory (Only neccessary if you are using Tensorflow-gpu): Irrelevant
  • Exact command/script to reproduce (optional): Irrelevant

Describe the problem

Current .h5 dataset loading mechanism is problematic due to the fact that astroNN load the whole dataset into memory regardless of the size. It will eventually be a serious problem if the dataset is too big and have too little memory (Already a little problem of loading APOGEE training data (~12GB on my 16GB RAM laptop and desktop)

Source code / logs



Neural Network/Data generator should talk to H5Loader directly instead of H5Loader loads the whole dataset to memory to Neural Network/Data generator.

