Code Monkey home page Code Monkey logo

donut's People

Contributors

haowen-xu avatar korepwx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

donut's Issues

Mean loss or sum loss over batches?

I am now reproducing your model and my loss(average over batches) is different from yours. I check the code, but can't figure out how you loss is calculated.

Training Loss is minus

dear author,
I'm trying to run Donut with the sample_data "cpu4.csv", and the training losses of 256 epoches are minus which range from -68 to -75. I couldn't find out the primary cause of this phenomenon, could you help me?

donut使用的问题

您好
我在用基于vae的donut算法做异常检测,但是这个模型只能对于每一个曲线训练一个模型,而我有很多曲线需要训练,这样我就需要训练许多模型。我能写一个循环对于每一条曲线训练一个模型吗
我如何将提前训练好的模型加载直接用来预测,而不用再训练

Unit test Failed: Error while reading resource variable from Container

Hello, I'm trying to run the unit test in test_prediction.py. However, it doesn't seem to pass the unittest.

The enviroment of my computer is Python2.7 + Tensorflow 1.9.

Thank you very much and looking foward to your reply.

2018-07-20 08:44:39.242085: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
E.
======================================================================
ERROR: test_prediction (__main__.DonutPredictorTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_prediction.py", line 37, in test_prediction
    res = pred.get_score(values=np.arange(5, dtype=np.float32))
  File "/usr/local/lib/python2.7/dist-packages/donut/prediction.py", line 145, in get_score
    feed_dict=feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
FailedPreconditionError: Error while reading resource variable donut/p_x_given_z/forward_1/std/dense_5/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/donut/p_x_given_z/forward_1/std/dense_5/kernel/N10tensorflow3VarE does not exist.
         [[Node: donut/p_x_given_z/forward_1/std/dense_5/Tensordot/ReadVariableOp = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](donut/p_x_given_z/forward_1/std/dense_5/kernel)]]

Caused by op u'donut/p_x_given_z/forward_1/std/dense_5/Tensordot/ReadVariableOp', defined at:
  File "tests/test_prediction.py", line 88, in <module>
    tf.test.main()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/test.py", line 64, in main
    return _googletest.main(argv)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/googletest.py", line 100, in main
    benchmark.benchmarks_main(true_main=main_wrapper)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/benchmark.py", line 344, in benchmarks_main
    true_main()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/googletest.py", line 99, in main_wrapper
    return app.run(main=g_main, argv=args)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/googletest.py", line 70, in g_main
    return unittest_main(argv=argv)
  File "/usr/lib/python2.7/unittest/main.py", line 95, in __init__
    self.runTests()
  File "/usr/lib/python2.7/unittest/main.py", line 232, in runTests
    self.result = testRunner.run(self.test)
  File "/usr/lib/python2.7/unittest/runner.py", line 151, in run
    test(result)
  File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/lib/python2.7/unittest/case.py", line 393, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "tests/test_prediction.py", line 37, in test_prediction
    res = pred.get_score(values=np.arange(5, dtype=np.float32))
  File "/usr/local/lib/python2.7/dist-packages/donut/prediction.py", line 144, in get_score
    b_r = sess.run(self._get_score_without_y(),
  File "/usr/local/lib/python2.7/dist-packages/donut/prediction.py", line 80, in _get_score_without_y
    last_point_only=self._last_point_only
  File "/usr/local/lib/python2.7/dist-packages/donut/model.py", line 198, in get_score
    p_net = self.vae.model(z=q_net['z'], x=x, n_z=n_z)  # notice: x=x
  File "/usr/local/lib/python2.7/dist-packages/tfsnippet/utils/reuse.py", line 179, in wrapper
    return method(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tfsnippet/modules/auto_encoders/vae.py", line 314, in model
    x_params = self.h_for_p_x(z)
  File "/usr/local/lib/python2.7/dist-packages/tfsnippet/modules/base.py", line 89, in __call__
    return self._forward(inputs, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tfsnippet/modules/container/sequential.py", line 78, in _forward
    outputs = c(outputs)
  File "/usr/local/lib/python2.7/dist-packages/tfsnippet/modules/base.py", line 89, in __call__
    return self._forward(inputs, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tfsnippet/modules/container/branch.py", line 126, in _forward
    ret[k] = v(inputs, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/donut/model.py", line 68, in <lambda>
    )(x)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 703, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/layers/core.py", line 910, in call
    [0]])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 2898, in tensordot
    b = ops.convert_to_tensor(b, name="b")
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1011, in convert_to_tensor
    as_ref=False)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1107, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 1031, in _dense_var_to_tensor
    return var._dense_var_to_tensor(dtype=dtype, name=name, as_ref=as_ref)  # pylint: disable=protected-access
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 982, in _dense_var_to_tensor
    return self.value()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 659, in value
    return self._read_variable_op()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 742, in _read_variable_op
    self._dtype)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_resource_variable_ops.py", line 507, in read_variable_op
    "ReadVariableOp", resource=resource, dtype=dtype, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): Error while reading resource variable donut/p_x_given_z/forward_1/std/dense_5/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/donut/p_x_given_z/forward_1/std/dense_5/kernel/N10tensorflow3VarE does not exist.
         [[Node: donut/p_x_given_z/forward_1/std/dense_5/Tensordot/ReadVariableOp = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](donut/p_x_given_z/forward_1/std/dense_5/kernel)]]


----------------------------------------------------------------------
Ran 2 tests in 0.263s

FAILED (errors=1)

Eef

Idk what is this

how to choose thresholds?

Dear haowen-xu,
"All the algorithms evaluated in this paper compute one anomaly score for each point. A threshold can be chosen to do the decision: if the score for a point is greater than the threshold, an alert should be triggered."
"We may also enumerate all thresholds, obtaining all F-scores, and use the best F-score as the metric."
Finally, I get the test score, so, my question is how to choose thresholds?
thanks!!!

关于test score的问题

test score返回的是每个窗口的重构概率,即如果我有10个数据点,窗口长度为5,那么将返回6个test score。问题是如何把这6个test score分配到10个数据点上?

Donut模型并行化的问题

你好 Haowen学长,我最近也在用donut算法做异常检测,现在的情况是我已经训练出来了多个模型,但是对于实时数据我需要跟据数据的类别分发至对应的模型进行检测,所以我需要让多个模型并行化,我在每个session里加载一个模型,然后在启用这个session对数据进行异常检测,但是会在最后的预测中获取predictor.get_score时报错:
image

我的代码如下,能否帮忙看一下是哪里出了问题~~~

g1 = tf.Graph()
g2 = tf.Graph()
sess1 = tf.Session(graph=g1)
sess2 = tf.Session(graph=g2)

with sess1.as_default():
    with g1.as_default():
        with tf.variable_scope('model') as model_vs:
            model = Donut(
                    h_for_p_x=Sequential([
                        K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
                                       activation=tf.nn.relu),
                        K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
                                       activation=tf.nn.relu),
                    ]),
                    h_for_q_z=Sequential([
                        K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
                                       activation=tf.nn.relu),
                        K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
                                       activation=tf.nn.relu),
                    ]),
                    x_dims=120,
                    z_dims=5,
                )
        trainer = DonutTrainer(model=model, model_vs=model_vs)
        predictor = DonutPredictor(model)
        save_dir = parent_path_model + path_model
        saver=VariableSaver(get_variables_as_dict(model_vs), save_dir,\
                    filename="cluster_" + "1" +"_data" + "_" + 'variables.dat',\
                    latest_file='latest_' + "cluster_" + "1" +"_data")
        saver.restore()

with sess2.as_default():
    with g2.as_default():
        with tf.variable_scope('model') as model_vs:
            model = Donut(
                    h_for_p_x=Sequential([
                        K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
                                       activation=tf.nn.relu),
                        K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
                                       activation=tf.nn.relu),
                    ]),
                    h_for_q_z=Sequential([
                        K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
                                       activation=tf.nn.relu),
                        K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
                                       activation=tf.nn.relu),
                    ]),
                    x_dims=120,
                    z_dims=5,
                )
        trainer = DonutTrainer(model=model, model_vs=model_vs)
        predictor = DonutPredictor(model)
        save_dir = parent_path_model + path_model
        saver=VariableSaver(get_variables_as_dict(model_vs), save_dir,\
                    filename="cluster_" + str(2) +"_data" + "_" + 'variables.dat',\
                    latest_file='latest_' + "cluster_" + str(2) +"_data")
        saver.restore()


with sess1.as_default():    
    with sess1.graph.as_default():        
        df = pd.read_csv("E:\\智能告警\\测试集-expand5min\\expand5min_obj_0data.csv")     
        tmp_timestamps = [time.mktime(time.strptime(t,"%Y-%m-%d %H:%M:%S")) for \                         
                                t in list(df['time'])]         
        df['timestamp'] = tmp_timestamps        
        df.sort_values(by='timestamp',axis=0,ascending=True)        
        values = [float(x) for x in list(df['data'])]        
        timestamps = list(df['timestamp'])        
        label = np.zeros_like(values, dtype=np.int32)
        #缺失值补充
        timestamps,missing,(interp_data,label) = linear_interpolation(\
                       timestamps,(values,label),mode = False)

        #标准化至均值0、标准差1
        std_data,mean,std = standardize_obj(interp_data)
        scores = predictor.get_score(std_data,missing)

sliding window

Hi, how can I set or change the sliding window length? thank you.

The dataset used in the paper that proposed Donut

In the paper: 《Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications》

It says that at $4 "We obtain 18 well-maintained business KPIs ... from a large Internet company."

What's that ?

I hope I could know which dataset is used in this paper . Thank you very much.

Ellipsis

Hi,

I am trying to run your API, but error "TypeError: 'ellipsis' object is not iterable" occurs at
"# Read the raw data.
timestamp, values, labels = ..."

I searched and found that usually this error occurs at Python 3.5 and no more at Python 3.5.3 or above. My python is 3.5.5. Am i using the API in wrong way or other hints?

Thanks in advance.

Size of prediction doesn't match size of test_values

Hello I am trying to interpret the severity of anomalies using the sample data cpu4.csv and following:

take the negative of the score, if you want something to directly indicate the severity of anomaly.

test_score.size = 5151
test_values.size  = 5270

I noticed that the size of test_values doesn't equal test_score. How can I correlate my test data to the score?
Thank you kindly!
Screen Shot 2020-05-06 at 9 50 18 AM

About M-ELBO loss function

why the realization of M-ELBO loss function just (1-label)*model['x'].log_prod rather than cross-entory loss?

the length of get_score is less than the length of test datas

Hi, Haowen Xu!
I use the method "DonutPredictor.get_score()" to get the score of every point in test datas, only to find that the length of score is 119 less than the length of test datas, so I can not know which points are not normal. How can I get to right score? Or is there other methods to get the unnormal points? Thanks a lot, here is my code:

from donut import DonutTrainer, DonutPredictor
from construct import model, model_vs
from prepare_data import train_values, train_labels, train_missing, mean, std, test_missing, test_values, values, missing, labels
import tensorflow as tf
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
trainer = DonutTrainer(model=model, model_vs=model_vs)
predictor = DonutPredictor(model)

with tf.Session().as_default():
trainer.fit(train_values, train_labels, train_missing, mean, std)
test_score = predictor.get_score(test_values, test_missing))
print(test_score)

Memory leaks when creating Donut model

Hi,

I tried to use Donut for an anomaly detection project. For some reasons, I separate the processes of restoring model and prediction, and problem happened while restoring model. Every time I create a Donut model and a Donut Trainer to restore a new model from saved file, there will be a 'Graph' instance left in memory with 1,400+ unknown back references, even if I have already cleared Donut and all other possible instances, and done garbage collection after that. This will lead memory keeps increasing until the process is shut down, when I called restore multiple times.

I used objgraph.show_growth() to monitor memory, and got this after completely finishing calling restore model.

image

These instances left in memory until process terminated. objgraph cannot output detailed graph of references, since the amount of reference could be too large. I tried to check the Donut code and did't find any suspicious part. Are there any possible reasons for this problem? Thanks.

Parameter setup for the sample dataset

Dear author,

I'm trying to run Donut for the datasets in the sample_data diretory.
I found it was quite tricky to get the training porcess converged. Would you please share the best parameter setup for these dataset? Or would please shield some light on the parameter tunning?

Best regards,
flyingkid

如何单独进行predict

demo中的predict部分如果单独运行无法成功,必须接在train过程之后,单独预测时model_vs是缺失的,请问如何加载

Fetch Tensor of Samples before MonteCarlo Integration

When calling the reconstruct function in vae.py it only returns a single (averaged) value, the reconstruction of the input. Is it possible to access the reconstructions before the averaging is done? I want to have the reconstruction for each sampled latent variable separately.

Looking at the reconstruct function:

   def reconstruct(self, x, n_z=None, n_x=None):
    """
    Sample reconstructed `x` from :math:`p(x|h(z))`, where `z` is (are)
    sampled from :math:`q(z|h(x))` using the specified observation `x`.

    Args:
        x: The observation `x` for :math:`q(z|h(x))`.
        n_z: Number of intermediate `z` samples to take for each input `x`.
        n_x: Number of reconstructed `x` samples to take for each `z`.

    Returns:
        StochasticTensor: The reconstructed samples `x`.
    """
    with tf.name_scope('VAE.reconstruct'):
        q_net = self.variational(x, n_z=n_z)
        model = self.model(z=q_net['z'], n_z=n_z, n_x=n_x)
        return model['x']

Could this be done from here?

More dimensions ?

Hello,

Would it be complicated to add the possibility of using Donut's abilities in a higher dimensional space ?
In theory I do not see a strong reason why, but I wondered if you think that it is feasible, and if so, if you have an idea how.

Want to execute for cpu4.csv of Donut dataset

Hi Haowen Xu,
I am trying to run the Donut sample dataset cpu4.csv
I have done these following things for invoking cpu4.csv

df = pd.read_csv("sample_data/cpu4.csv")
timestamp, values, labels = df.timestamp, df.value, df.label
import tensorflow as tf
from donut import Donut
from tensorflow import keras as K
from tfsnippet.modules import Sequential

We build the entire model within the scope of model_vs,

it should hold exactly all the variables of model, including

the variables created by Keras layers.

with tf.variable_scope('model') as model_vs:
model = Donut(
h_for_p_x=Sequential([
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
]),
h_for_q_z=Sequential([
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
]),
x_dims=120,
z_dims=5,
)

'''
Training of Donut model
'''
from donut import DonutTrainer, DonutPredictor

trainer = DonutTrainer(model=model, model_vs=model_vs)
predictor = DonutPredictor(model)

with tf.Session().as_default():
trainer.fit(train_values, train_labels, train_missing, mean, std)
test_score = predictor.get_score(test_values, test_missing)

I am not able to understand how to set input_x and input_y
I am getting this error message:
FailedPreconditionError: Error while reading resource variable model/sequential_1/forward/_1/dense_3/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/model/sequential_1/forward/_1/dense_3/bias)

Interpreting DonutPredictor.get_score values that are all negative

Hi Haowen,

My data set has continuous points , where each point repesent a day, and not minutes that you have shown in the paper/ sample_data.

I had provided a list of 240 points with a window size of 120 for final evaluation after training for 1000 points. After calling the DonutPredictor.get_score function on these set of points, I am getting the final list of 121 scores - where all scores are negative numbers. How do I interpret the anomaly part here?

You had mentioned in the codes :
The larger reconstruction probability, the less likely a point
is anomaly. You may take the negative of the score, if you want
something to directly indicate the severity of anomaly.

assume there are only 2 scores : -2.3, -0.5,

  1. If I keep negative as such , -0.5 is the largest , and -2.3 is the smallest. Then the -2.3 is an anomaly whereas -0.5 is not an anomaly
  2. If I take Absolute values of these negative values, then 2.3 is the largest and 0.5 is the smallest. Then 0.5 is an anomaly and 2.3 is not an anomaly

So please help me interpret the results

f-score for prediction is too bad

hi, Haowen Xu,
i am running donut with g.csv of sample_data, and i use default parameters value, but i got the fscore with 0.04. The result is too bad. But i cant understand where its wrong. Can you help me?

Understanding What Test Scores Mean

Hi, thank you so much for providing this implementation for your paper.

Could you please explain in layman's terms what exactly the test_scores mean in regards to the original timeseries input?

donut与rocka算法集成问题

您好:
我在用基于vae的donut算法做异常检测,我想问一下,donut里现在集成了rocka聚类算法了吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.