<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

ok. Are there logs? I only have console output, and redirecting ( <code class="notrans

Help reading output,about kichangkim/deepdanbooru

Comments (25)

da3dsoul commented on June 10, 2024 1

Would it be expected to have results (even if they aren't very accurate) after only 2 epochs on the entire danbooru2020 dataset? I just finished another epoch, and a picture which I think has some pretty clear features is giving no results. Even with a threshold of 0.1, it gives nothing. Here's my test image. I've tried several others, but it seems good

from deepdanbooru.

da3dsoul commented on June 10, 2024 1

First epoch yielded results! They weren't perfect results, but I wouldn't expect it. I have it queued for another 9 epochs. Thank you for all your help so far

from deepdanbooru.

KichangKim commented on June 10, 2024

You can safely ignore TensorFlow log which start with "I" (info). You can also disable these log by setting environment variable "TF_CPP_MIN_LOG_LEVEL" to "2".

So without TensorFlow log, your log prints only "Tags of /media/da3dsoul/Golias/Media/Pictures/Public/92400803_p4.jpg:" and empty line. So DeepDanbooru reported there is no estimated tags (which score is larger than 0.5). Try lower score threshold, like --threshold 0.2.

from deepdanbooru.

da3dsoul commented on June 10, 2024

Thanks much. That helps.

from deepdanbooru.

KichangKim commented on June 10, 2024

What values are printed in your training log? it contains loss, precision, recall and F1 score.
Also I didn't test danbooru 2020 dataset. I used images which are directly downloaded from danbooru server.

from deepdanbooru.

da3dsoul commented on June 10, 2024

There's days of logs. I'll post a snippet. I used DanbooruDownloader as linked. I didn't realize that wasn't danbooru2020.

Epoch[1] Loss=0.010009, P=0.662313, R=0.086312, F1=0.152721, Speed = 7.8 samples/s, 99.80 %, ETA = 2021-09-12 16:55:48
Epoch[1] Loss=0.009176, P=0.673432, R=0.094707, F1=0.166060, Speed = 9.0 samples/s, 99.80 %, ETA = 2021-09-12 16:51:03
Epoch[1] Loss=0.009122, P=0.675325, R=0.087626, F1=0.155125, Speed = 9.0 samples/s, 99.80 %, ETA = 2021-09-12 16:51:06
Epoch[1] Loss=0.008856, P=0.656134, R=0.092095, F1=0.161519, Speed = 8.9 samples/s, 99.80 %, ETA = 2021-09-12 16:51:26
Epoch[1] Loss=0.009291, P=0.643657, R=0.090885, F1=0.159280, Speed = 9.0 samples/s, 99.80 %, ETA = 2021-09-12 16:51:04
Epoch[1] Loss=0.009134, P=0.694757, R=0.095153, F1=0.167381, Speed = 9.1 samples/s, 99.81 %, ETA = 2021-09-12 16:50:57
Epoch[1] Loss=0.009488, P=0.679105, R=0.090864, F1=0.160282, Speed = 9.1 samples/s, 99.81 %, ETA = 2021-09-12 16:50:57
Epoch[1] Loss=0.008713, P=0.657303, R=0.092710, F1=0.162500, Speed = 9.0 samples/s, 99.81 %, ETA = 2021-09-12 16:51:02
Epoch[1] Loss=0.009391, P=0.652574, R=0.088928, F1=0.156526, Speed = 9.1 samples/s, 99.81 %, ETA = 2021-09-12 16:50:57
Epoch[1] Loss=0.009752, P=0.643527, R=0.087389, F1=0.153881, Speed = 9.0 samples/s, 99.81 %, ETA = 2021-09-12 16:51:05
Epoch[1] Loss=0.009485, P=0.685981, R=0.090416, F1=0.159774, Speed = 9.0 samples/s, 99.82 %, ETA = 2021-09-12 16:51:01
Epoch[1] Loss=0.009798, P=0.614815, R=0.084136, F1=0.148016, Speed = 9.0 samples/s, 99.82 %, ETA = 2021-09-12 16:51:06
Epoch[1] Loss=0.009160, P=0.647280, R=0.089961, F1=0.157967, Speed = 9.0 samples/s, 99.82 %, ETA = 2021-09-12 16:51:03
Epoch[1] Loss=0.009350, P=0.661080, R=0.091993, F1=0.161510, Speed = 9.0 samples/s, 99.82 %, ETA = 2021-09-12 16:51:08
Epoch[1] Loss=0.009661, P=0.651119, R=0.080470, F1=0.143238, Speed = 9.0 samples/s, 99.82 %, ETA = 2021-09-12 16:51:03
Epoch[1] Loss=0.009659, P=0.661080, R=0.085687, F1=0.151709, Speed = 9.0 samples/s, 99.82 %, ETA = 2021-09-12 16:51:02
Epoch[1] Loss=0.010209, P=0.707721, R=0.091167, F1=0.161527, Speed = 9.0 samples/s, 99.83 %, ETA = 2021-09-12 16:51:06
Epoch[1] Loss=0.008954, P=0.692022, R=0.097414, F1=0.170788, Speed = 9.0 samples/s, 99.83 %, ETA = 2021-09-12 16:51:08
Epoch[1] Loss=0.009082, P=0.695733, R=0.096575, F1=0.169607, Speed = 9.1 samples/s, 99.83 %, ETA = 2021-09-12 16:50:59
Epoch[1] Loss=0.010610, P=0.623616, R=0.077116, F1=0.137259, Speed = 9.0 samples/s, 99.83 %, ETA = 2021-09-12 16:51:05
Saving checkpoint ... (2021-09-12 16:26:49.032479)
Epoch[1] Loss=0.009796, P=0.668519, R=0.082703, F1=0.147197, Speed = 8.0 samples/s, 99.83 %, ETA = 2021-09-12 16:54:12
Epoch[1] Loss=0.009423, P=0.675373, R=0.087145, F1=0.154371, Speed = 9.0 samples/s, 99.84 %, ETA = 2021-09-12 16:51:05
Epoch[1] Loss=0.009166, P=0.672192, R=0.095275, F1=0.166895, Speed = 9.1 samples/s, 99.84 %, ETA = 2021-09-12 16:50:58
Epoch[1] Loss=0.010024, P=0.691450, R=0.093420, F1=0.164602, Speed = 9.0 samples/s, 99.84 %, ETA = 2021-09-12 16:51:04
Epoch[1] Loss=0.009352, P=0.655493, R=0.088198, F1=0.155477, Speed = 8.9 samples/s, 99.84 %, ETA = 2021-09-12 16:51:22
Epoch[1] Loss=0.009373, P=0.652416, R=0.087597, F1=0.154455, Speed = 8.7 samples/s, 99.84 %, ETA = 2021-09-12 16:51:52
Epoch[1] Loss=0.009812, P=0.656075, R=0.081156, F1=0.144444, Speed = 9.0 samples/s, 99.84 %, ETA = 2021-09-12 16:51:13
Epoch[1] Loss=0.010217, P=0.689720, R=0.081927, F1=0.146458, Speed = 9.1 samples/s, 99.85 %, ETA = 2021-09-12 16:50:57
Epoch[1] Loss=0.009738, P=0.656716, R=0.082824, F1=0.147096, Speed = 9.0 samples/s, 99.85 %, ETA = 2021-09-12 16:51:03
Epoch[1] Loss=0.009829, P=0.649446, R=0.083929, F1=0.148649, Speed = 9.0 samples/s, 99.85 %, ETA = 2021-09-12 16:51:05
Epoch[1] Loss=0.008472, P=0.649718, R=0.089124, F1=0.156747, Speed = 9.1 samples/s, 99.85 %, ETA = 2021-09-12 16:51:00
Epoch[1] Loss=0.009870, P=0.622468, R=0.081979, F1=0.144878, Speed = 9.0 samples/s, 99.85 %, ETA = 2021-09-12 16:51:04
Epoch[1] Loss=0.009430, P=0.687616, R=0.093233, F1=0.164202, Speed = 9.1 samples/s, 99.85 %, ETA = 2021-09-12 16:51:01
Epoch[1] Loss=0.009801, P=0.685083, R=0.091176, F1=0.160934, Speed = 9.1 samples/s, 99.86 %, ETA = 2021-09-12 16:50:59
Epoch[1] Loss=0.009719, P=0.682657, R=0.087970, F1=0.155855, Speed = 9.0 samples/s, 99.86 %, ETA = 2021-09-12 16:51:04
Epoch[1] Loss=0.010146, P=0.667904, R=0.082759, F1=0.147269, Speed = 9.0 samples/s, 99.86 %, ETA = 2021-09-12 16:51:07
Epoch[1] Loss=0.009947, P=0.702206, R=0.087735, F1=0.155982, Speed = 9.1 samples/s, 99.86 %, ETA = 2021-09-12 16:51:01
Epoch[1] Loss=0.010229, P=0.647601, R=0.087075, F1=0.153510, Speed = 9.0 samples/s, 99.86 %, ETA = 2021-09-12 16:51:06
Epoch[1] Loss=0.009836, P=0.681985, R=0.088019, F1=0.155915, Speed = 9.0 samples/s, 99.87 %, ETA = 2021-09-12 16:51:04
Epoch[1] Loss=0.009276, P=0.678373, R=0.093814, F1=0.164833, Speed = 9.0 samples/s, 99.87 %, ETA = 2021-09-12 16:51:05
Saving checkpoint ... (2021-09-12 16:32:01.515339)
Epoch[1] Loss=0.008934, P=0.671587, R=0.093142, F1=0.163596, Speed = 8.1 samples/s, 99.87 %, ETA = 2021-09-12 16:53:16
Epoch[1] Loss=0.009416, P=0.688192, R=0.090184, F1=0.159470, Speed = 8.9 samples/s, 99.87 %, ETA = 2021-09-12 16:51:20
Epoch[1] Loss=0.009463, P=0.677122, R=0.090416, F1=0.159531, Speed = 8.7 samples/s, 99.87 %, ETA = 2021-09-12 16:51:54
Epoch[1] Loss=0.009459, P=0.701107, R=0.093550, F1=0.165074, Speed = 8.2 samples/s, 99.87 %, ETA = 2021-09-12 16:53:01
Epoch[1] Loss=0.009260, P=0.659259, R=0.090378, F1=0.158964, Speed = 8.5 samples/s, 99.88 %, ETA = 2021-09-12 16:52:15
Epoch[1] Loss=0.008939, P=0.687732, R=0.094847, F1=0.166704, Speed = 7.0 samples/s, 99.88 %, ETA = 2021-09-12 16:56:23
Epoch[1] Loss=0.009165, P=0.692737, R=0.090247, F1=0.159691, Speed = 7.8 samples/s, 99.88 %, ETA = 2021-09-12 16:54:00
Epoch[1] Loss=0.009825, P=0.625461, R=0.082422, F1=0.145650, Speed = 7.2 samples/s, 99.88 %, ETA = 2021-09-12 16:55:43
Epoch[1] Loss=0.009284, P=0.643911, R=0.086708, F1=0.152836, Speed = 8.7 samples/s, 99.88 %, ETA = 2021-09-12 16:52:01
Epoch[1] Loss=0.009580, P=0.662983, R=0.089330, F1=0.157446, Speed = 7.8 samples/s, 99.88 %, ETA = 2021-09-12 16:54:04
Epoch[1] Loss=0.009263, P=0.664815, R=0.085415, F1=0.151381, Speed = 8.5 samples/s, 99.89 %, ETA = 2021-09-12 16:52:28
Epoch[1] Loss=0.010090, P=0.659889, R=0.086714, F1=0.153285, Speed = 9.0 samples/s, 99.89 %, ETA = 2021-09-12 16:51:30
Epoch[1] Loss=0.009337, P=0.650558, R=0.091455, F1=0.160367, Speed = 8.0 samples/s, 99.89 %, ETA = 2021-09-12 16:53:31
Epoch[1] Loss=0.009531, P=0.656827, R=0.088712, F1=0.156312, Speed = 5.8 samples/s, 99.89 %, ETA = 2021-09-12 17:00:11
Epoch[1] Loss=0.009358, P=0.661765, R=0.087400, F1=0.154407, Speed = 8.7 samples/s, 99.89 %, ETA = 2021-09-12 16:52:12
Epoch[1] Loss=0.010049, P=0.666048, R=0.086298, F1=0.152798, Speed = 8.8 samples/s, 99.90 %, ETA = 2021-09-12 16:52:02
Epoch[1] Loss=0.009608, P=0.662338, R=0.090471, F1=0.159197, Speed = 9.1 samples/s, 99.90 %, ETA = 2021-09-12 16:51:30
Epoch[1] Loss=0.009822, P=0.631481, R=0.083456, F1=0.147428, Speed = 9.0 samples/s, 99.90 %, ETA = 2021-09-12 16:51:40
Epoch[1] Loss=0.008857, P=0.662313, R=0.086207, F1=0.152557, Speed = 9.0 samples/s, 99.90 %, ETA = 2021-09-12 16:51:36
Epoch[1] Loss=0.009418, P=0.682657, R=0.090024, F1=0.159071, Speed = 9.0 samples/s, 99.90 %, ETA = 2021-09-12 16:51:38
Saving checkpoint ... (2021-09-12 16:37:43.558713)
Epoch[1] Loss=0.009293, P=0.695167, R=0.095652, F1=0.168165, Speed = 7.6 samples/s, 99.90 %, ETA = 2021-09-12 16:54:20
Epoch[1] Loss=0.009694, P=0.664815, R=0.087284, F1=0.154309, Speed = 9.1 samples/s, 99.91 %, ETA = 2021-09-12 16:51:34
Epoch[1] Loss=0.009461, P=0.669131, R=0.090095, F1=0.158807, Speed = 9.0 samples/s, 99.91 %, ETA = 2021-09-12 16:51:43
Epoch[1] Loss=0.009440, P=0.674766, R=0.087770, F1=0.155336, Speed = 9.1 samples/s, 99.91 %, ETA = 2021-09-12 16:51:33
Epoch[1] Loss=0.009403, P=0.689214, R=0.087899, F1=0.155914, Speed = 9.0 samples/s, 99.91 %, ETA = 2021-09-12 16:51:43
Epoch[1] Loss=0.010117, P=0.666667, R=0.084034, F1=0.149254, Speed = 9.0 samples/s, 99.91 %, ETA = 2021-09-12 16:51:40
Epoch[1] Loss=0.009759, P=0.688192, R=0.091109, F1=0.160915, Speed = 9.0 samples/s, 99.92 %, ETA = 2021-09-12 16:51:42
Epoch[1] Loss=0.008953, P=0.666048, R=0.091349, F1=0.160662, Speed = 9.0 samples/s, 99.92 %, ETA = 2021-09-12 16:51:40
Epoch[1] Loss=0.009864, P=0.674677, R=0.086946, F1=0.154041, Speed = 9.1 samples/s, 99.92 %, ETA = 2021-09-12 16:51:37
Epoch[1] Loss=0.009144, P=0.664815, R=0.091279, F1=0.160519, Speed = 7.7 samples/s, 99.92 %, ETA = 2021-09-12 16:53:41
Epoch[1] Loss=0.009868, P=0.647706, R=0.087032, F1=0.153445, Speed = 7.4 samples/s, 99.92 %, ETA = 2021-09-12 16:54:09
Epoch[1] Loss=0.009271, P=0.651291, R=0.089753, F1=0.157765, Speed = 9.1 samples/s, 99.92 %, ETA = 2021-09-12 16:51:42
Epoch[1] Loss=0.009352, P=0.698355, R=0.092561, F1=0.163457, Speed = 9.0 samples/s, 99.93 %, ETA = 2021-09-12 16:51:44
Epoch[1] Loss=0.009226, P=0.691450, R=0.092745, F1=0.163552, Speed = 7.7 samples/s, 99.93 %, ETA = 2021-09-12 16:53:36
Epoch[1] Loss=0.009283, P=0.637708, R=0.080758, F1=0.143362, Speed = 9.1 samples/s, 99.93 %, ETA = 2021-09-12 16:51:45
Epoch[1] Loss=0.010218, P=0.657459, R=0.081026, F1=0.144272, Speed = 9.1 samples/s, 99.93 %, ETA = 2021-09-12 16:51:45
Epoch[1] Loss=0.009675, P=0.606679, R=0.079024, F1=0.139833, Speed = 9.1 samples/s, 99.93 %, ETA = 2021-09-12 16:51:45
Epoch[1] Loss=0.009401, P=0.670956, R=0.089046, F1=0.157226, Speed = 9.0 samples/s, 99.93 %, ETA = 2021-09-12 16:51:47
Epoch[1] Loss=0.009079, P=0.667890, R=0.094349, F1=0.165342, Speed = 9.0 samples/s, 99.94 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.008731, P=0.688312, R=0.099785, F1=0.174301, Speed = 9.1 samples/s, 99.94 %, ETA = 2021-09-12 16:51:45
Saving checkpoint ... (2021-09-12 16:43:04.464378)
Epoch[1] Loss=0.009721, P=0.677064, R=0.089001, F1=0.157323, Speed = 8.1 samples/s, 99.94 %, ETA = 2021-09-12 16:52:49
Epoch[1] Loss=0.008743, P=0.701299, R=0.101340, F1=0.177091, Speed = 9.1 samples/s, 99.94 %, ETA = 2021-09-12 16:51:47
Epoch[1] Loss=0.009586, P=0.678832, R=0.091671, F1=0.161528, Speed = 9.1 samples/s, 99.94 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.009641, P=0.626151, R=0.079944, F1=0.141785, Speed = 9.0 samples/s, 99.95 %, ETA = 2021-09-12 16:51:49
Epoch[1] Loss=0.009199, P=0.696133, R=0.095575, F1=0.168075, Speed = 9.0 samples/s, 99.95 %, ETA = 2021-09-12 16:51:50
Epoch[1] Loss=0.009826, P=0.675277, R=0.089247, F1=0.157657, Speed = 9.1 samples/s, 99.95 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.009301, P=0.654982, R=0.083964, F1=0.148847, Speed = 9.0 samples/s, 99.95 %, ETA = 2021-09-12 16:51:50
Epoch[1] Loss=0.009410, P=0.681481, R=0.092462, F1=0.162832, Speed = 9.1 samples/s, 99.95 %, ETA = 2021-09-12 16:51:47
Epoch[1] Loss=0.009095, P=0.716912, R=0.096368, F1=0.169898, Speed = 9.1 samples/s, 99.95 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.008949, P=0.699083, R=0.101141, F1=0.176716, Speed = 9.0 samples/s, 99.96 %, ETA = 2021-09-12 16:51:49
Epoch[1] Loss=0.008488, P=0.637383, R=0.090764, F1=0.158900, Speed = 9.0 samples/s, 99.96 %, ETA = 2021-09-12 16:51:49
Epoch[1] Loss=0.009671, P=0.661765, R=0.086207, F1=0.152542, Speed = 9.0 samples/s, 99.96 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.010721, P=0.645102, R=0.083393, F1=0.147694, Speed = 9.1 samples/s, 99.96 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.009658, P=0.693309, R=0.087992, F1=0.156165, Speed = 9.0 samples/s, 99.96 %, ETA = 2021-09-12 16:51:49
Epoch[1] Loss=0.009730, P=0.640221, R=0.079991, F1=0.142213, Speed = 9.1 samples/s, 99.97 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.009328, P=0.670330, R=0.095040, F1=0.166477, Speed = 9.0 samples/s, 99.97 %, ETA = 2021-09-12 16:51:49
Epoch[1] Loss=0.008679, P=0.646409, R=0.091335, F1=0.160055, Speed = 9.0 samples/s, 99.97 %, ETA = 2021-09-12 16:51:50
Epoch[1] Loss=0.009221, P=0.648799, R=0.085194, F1=0.150611, Speed = 9.1 samples/s, 99.97 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.009234, P=0.657944, R=0.092050, F1=0.161505, Speed = 9.1 samples/s, 99.97 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.009340, P=0.665441, R=0.088617, F1=0.156405, Speed = 9.0 samples/s, 99.97 %, ETA = 2021-09-12 16:51:49
Saving checkpoint ... (2021-09-12 16:48:15.916625)
Epoch[1] Loss=0.009778, P=0.633333, R=0.086846, F1=0.152747, Speed = 8.0 samples/s, 99.98 %, ETA = 2021-09-12 16:52:17
Epoch[1] Loss=0.011003, P=0.706100, R=0.093376, F1=0.164940, Speed = 9.0 samples/s, 99.98 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009351, P=0.704797, R=0.095357, F1=0.167986, Speed = 9.0 samples/s, 99.98 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.008898, P=0.677064, R=0.096094, F1=0.168301, Speed = 9.0 samples/s, 99.98 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009003, P=0.623400, R=0.086024, F1=0.151186, Speed = 9.0 samples/s, 99.98 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.008821, P=0.661142, R=0.089616, F1=0.157837, Speed = 9.0 samples/s, 99.98 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009215, P=0.654917, R=0.087182, F1=0.153880, Speed = 9.1 samples/s, 99.99 %, ETA = 2021-09-12 16:51:50
Epoch[1] Loss=0.009103, P=0.690037, R=0.092028, F1=0.162397, Speed = 9.0 samples/s, 99.99 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009562, P=0.634686, R=0.081285, F1=0.144114, Speed = 9.0 samples/s, 99.99 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.008967, P=0.682657, R=0.095140, F1=0.167005, Speed = 9.0 samples/s, 99.99 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009289, P=0.677122, R=0.091498, F1=0.161212, Speed = 9.0 samples/s, 99.99 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009526, P=0.695167, R=0.089990, F1=0.159352, Speed = 9.0 samples/s, 100.00 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009428, P=0.683150, R=0.090162, F1=0.159300, Speed = 9.1 samples/s, 100.00 %, ETA = 2021-09-12 16:51:51

I don't know what any of these mean, of course. Machine learning is still a learning process badum tss

from deepdanbooru.

KichangKim commented on June 10, 2024

It seems that R value is too small. I think that your training is failed for some problem (overfitting, GPU calculation error, and so on).

I think that your training log contains some point that R value is decreased suddenly. If you find that point, you should re-train from nearest checkout. So I'll recommend periodically backup checkpoints.

from deepdanbooru.

da3dsoul commented on June 10, 2024

I did have several system crashes from OOM before. That was probably the cause, huh. Is it recoverable somehow, or do I need to just start over? The checkpoint file is 3GB and I don't see any logs (been going for weeks).

EDIT: I bought more RAM, so that won't happen again

from deepdanbooru.

KichangKim commented on June 10, 2024

I recommend that train from start and carefully monitor R value on log. And backup checkpoints folder everyday and if the R value is suddenly decreased, cancel training and restore checkpoints folder then start again.

from deepdanbooru.

da3dsoul commented on June 10, 2024

ok. Are there logs? I only have console output, and redirecting ( $ program > output.log ) isn't working

EDIT: on restart, I've got values like so, are these good?

Epoch[0] Loss=0.739733, P=0.001940, R=0.501037, F1=0.003865, Speed = 4.6 samples/s, 0.00 %, ETA = 2021-09-23 05:53:49

from deepdanbooru.

KichangKim commented on June 10, 2024

Are there logs?

Current DD has only console output log.

I've got values like so, are these good?

Yes, starting value seems no problem.

from deepdanbooru.

da3dsoul commented on June 10, 2024

Ok. I'll let it run. Thanks for all your help

from deepdanbooru.

da3dsoul commented on June 10, 2024

It didn't even last 12 hours. I'm not sure how long it lasted. I'll try running it CPU only and see if it at least maintains an R value. In the worst case, I can run it over network on a machine with an RTX3070, as running CPU only takes like 8x longer, and that's on the scale of months here.
Is the model portable, or does it rely on absolute paths? I can buy a new GPU for the server it's running on, but that'll take time with the current market.

from deepdanbooru.

KichangKim commented on June 10, 2024

The model is portable. It uses relative path and you can use different hardware for training <-> evaluating.

from deepdanbooru.

da3dsoul commented on June 10, 2024

Perfect, thanks.

from deepdanbooru.

da3dsoul commented on June 10, 2024

log.txt
You said a sudden drop is bad, but what about a gradual one?

I found this and have been reading. https://neptune.ai/blog/keras-metrics
Correct me if I'm wrong please.
R is recall. the value returned is something like this:

def recall(y_true, y_pred):
    y_true = K.ones_like(y_true) 
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    all_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    
    recall = true_positives / (all_positives + K.epsilon())
    return recall

from deepdanbooru.

KichangKim commented on June 10, 2024

Gradual one is fine. At initial, network is initialized with random value, so R value is quite high, then it will be decreased to some point and gradually increased again (with P value). Once R value is increasing, it should not be dropped suddenly.

from deepdanbooru.

da3dsoul commented on June 10, 2024

Ok, I'll keep an eye on it

from deepdanbooru.

da3dsoul commented on June 10, 2024

Epoch[0] Loss=0.273658, P=0.084761, R=0.094115, F1=0.089193, Speed = 47.9 samples/s, 4.02 %, ETA = 2021-09-15 11:14:22
Epoch[0] Loss=0.276624, P=0.067192, R=0.101965, F1=0.081004, Speed = 46.4 samples/s, 4.02 %, ETA = 2021-09-15 11:58:56
Epoch[0] Loss=0.273647, P=0.078138, R=0.089073, F1=0.083248, Speed = 46.9 samples/s, 4.02 %, ETA = 2021-09-15 11:41:45

Should've used the 3070 from the start. That's like 5x faster than the RX570....

from deepdanbooru.

da3dsoul commented on June 10, 2024

Random thing before I wait again after tweaking perf for the 3070. Is this okay? The None in Model instinctively makes me worry.

Using SGD optimizer ...
Loading tags ...
Creating model (resnet_custom_v4) ...
Model : (None, 299, 299, 3) -> (None, 14176)
Loading database ...

from deepdanbooru.

KichangKim commented on June 10, 2024

None is okay :) Don't worry.

from deepdanbooru.

da3dsoul commented on June 10, 2024

Ok thanks.

from deepdanbooru.

da3dsoul commented on June 10, 2024

I'm all of a sudden getting huge performance hits, and I have no idea why. CUDA (or any other GPU graph) is not even utilized, let alone bottlenecked. Do you have any ideas?

from deepdanbooru.

KichangKim commented on June 10, 2024

I've never seen anything like this. But it seems hardware trouble (or throttling?). I recommend to check GPU temperature and cooling fan status.

from deepdanbooru.

da3dsoul commented on June 10, 2024

It's 46C, so maybe? That was my first guess. 46 is cold for a GPU at load, but it has been running for weeks, so idk. I'll keep looking at it. Thanks.

EDIT: a full restart fixed it

from deepdanbooru.

Help reading output about deepdanbooru HOT 25 CLOSED

Comments (25)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent