Code Monkey home page Code Monkey logo

Comments (25)

da3dsoul avatar da3dsoul commented on June 10, 2024 1

Would it be expected to have results (even if they aren't very accurate) after only 2 epochs on the entire danbooru2020 dataset? I just finished another epoch, and a picture which I think has some pretty clear features is giving no results. Even with a threshold of 0.1, it gives nothing. Here's my test image. I've tried several others, but it seems good
HonkaiBanner

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024 1

First epoch yielded results! They weren't perfect results, but I wouldn't expect it. I have it queued for another 9 epochs. Thank you for all your help so far

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024

You can safely ignore TensorFlow log which start with "I" (info). You can also disable these log by setting environment variable "TF_CPP_MIN_LOG_LEVEL" to "2".

So without TensorFlow log, your log prints only "Tags of /media/da3dsoul/Golias/Media/Pictures/Public/92400803_p4.jpg:" and empty line. So DeepDanbooru reported there is no estimated tags (which score is larger than 0.5). Try lower score threshold, like --threshold 0.2.

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

Thanks much. That helps.

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024

What values are printed in your training log? it contains loss, precision, recall and F1 score.
Also I didn't test danbooru 2020 dataset. I used images which are directly downloaded from danbooru server.

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

There's days of logs. I'll post a snippet. I used DanbooruDownloader as linked. I didn't realize that wasn't danbooru2020.

Epoch[1] Loss=0.010009, P=0.662313, R=0.086312, F1=0.152721, Speed = 7.8 samples/s, 99.80 %, ETA = 2021-09-12 16:55:48
Epoch[1] Loss=0.009176, P=0.673432, R=0.094707, F1=0.166060, Speed = 9.0 samples/s, 99.80 %, ETA = 2021-09-12 16:51:03
Epoch[1] Loss=0.009122, P=0.675325, R=0.087626, F1=0.155125, Speed = 9.0 samples/s, 99.80 %, ETA = 2021-09-12 16:51:06
Epoch[1] Loss=0.008856, P=0.656134, R=0.092095, F1=0.161519, Speed = 8.9 samples/s, 99.80 %, ETA = 2021-09-12 16:51:26
Epoch[1] Loss=0.009291, P=0.643657, R=0.090885, F1=0.159280, Speed = 9.0 samples/s, 99.80 %, ETA = 2021-09-12 16:51:04
Epoch[1] Loss=0.009134, P=0.694757, R=0.095153, F1=0.167381, Speed = 9.1 samples/s, 99.81 %, ETA = 2021-09-12 16:50:57
Epoch[1] Loss=0.009488, P=0.679105, R=0.090864, F1=0.160282, Speed = 9.1 samples/s, 99.81 %, ETA = 2021-09-12 16:50:57
Epoch[1] Loss=0.008713, P=0.657303, R=0.092710, F1=0.162500, Speed = 9.0 samples/s, 99.81 %, ETA = 2021-09-12 16:51:02
Epoch[1] Loss=0.009391, P=0.652574, R=0.088928, F1=0.156526, Speed = 9.1 samples/s, 99.81 %, ETA = 2021-09-12 16:50:57
Epoch[1] Loss=0.009752, P=0.643527, R=0.087389, F1=0.153881, Speed = 9.0 samples/s, 99.81 %, ETA = 2021-09-12 16:51:05
Epoch[1] Loss=0.009485, P=0.685981, R=0.090416, F1=0.159774, Speed = 9.0 samples/s, 99.82 %, ETA = 2021-09-12 16:51:01
Epoch[1] Loss=0.009798, P=0.614815, R=0.084136, F1=0.148016, Speed = 9.0 samples/s, 99.82 %, ETA = 2021-09-12 16:51:06
Epoch[1] Loss=0.009160, P=0.647280, R=0.089961, F1=0.157967, Speed = 9.0 samples/s, 99.82 %, ETA = 2021-09-12 16:51:03
Epoch[1] Loss=0.009350, P=0.661080, R=0.091993, F1=0.161510, Speed = 9.0 samples/s, 99.82 %, ETA = 2021-09-12 16:51:08
Epoch[1] Loss=0.009661, P=0.651119, R=0.080470, F1=0.143238, Speed = 9.0 samples/s, 99.82 %, ETA = 2021-09-12 16:51:03
Epoch[1] Loss=0.009659, P=0.661080, R=0.085687, F1=0.151709, Speed = 9.0 samples/s, 99.82 %, ETA = 2021-09-12 16:51:02
Epoch[1] Loss=0.010209, P=0.707721, R=0.091167, F1=0.161527, Speed = 9.0 samples/s, 99.83 %, ETA = 2021-09-12 16:51:06
Epoch[1] Loss=0.008954, P=0.692022, R=0.097414, F1=0.170788, Speed = 9.0 samples/s, 99.83 %, ETA = 2021-09-12 16:51:08
Epoch[1] Loss=0.009082, P=0.695733, R=0.096575, F1=0.169607, Speed = 9.1 samples/s, 99.83 %, ETA = 2021-09-12 16:50:59
Epoch[1] Loss=0.010610, P=0.623616, R=0.077116, F1=0.137259, Speed = 9.0 samples/s, 99.83 %, ETA = 2021-09-12 16:51:05
Saving checkpoint ... (2021-09-12 16:26:49.032479)
Epoch[1] Loss=0.009796, P=0.668519, R=0.082703, F1=0.147197, Speed = 8.0 samples/s, 99.83 %, ETA = 2021-09-12 16:54:12
Epoch[1] Loss=0.009423, P=0.675373, R=0.087145, F1=0.154371, Speed = 9.0 samples/s, 99.84 %, ETA = 2021-09-12 16:51:05
Epoch[1] Loss=0.009166, P=0.672192, R=0.095275, F1=0.166895, Speed = 9.1 samples/s, 99.84 %, ETA = 2021-09-12 16:50:58
Epoch[1] Loss=0.010024, P=0.691450, R=0.093420, F1=0.164602, Speed = 9.0 samples/s, 99.84 %, ETA = 2021-09-12 16:51:04
Epoch[1] Loss=0.009352, P=0.655493, R=0.088198, F1=0.155477, Speed = 8.9 samples/s, 99.84 %, ETA = 2021-09-12 16:51:22
Epoch[1] Loss=0.009373, P=0.652416, R=0.087597, F1=0.154455, Speed = 8.7 samples/s, 99.84 %, ETA = 2021-09-12 16:51:52
Epoch[1] Loss=0.009812, P=0.656075, R=0.081156, F1=0.144444, Speed = 9.0 samples/s, 99.84 %, ETA = 2021-09-12 16:51:13
Epoch[1] Loss=0.010217, P=0.689720, R=0.081927, F1=0.146458, Speed = 9.1 samples/s, 99.85 %, ETA = 2021-09-12 16:50:57
Epoch[1] Loss=0.009738, P=0.656716, R=0.082824, F1=0.147096, Speed = 9.0 samples/s, 99.85 %, ETA = 2021-09-12 16:51:03
Epoch[1] Loss=0.009829, P=0.649446, R=0.083929, F1=0.148649, Speed = 9.0 samples/s, 99.85 %, ETA = 2021-09-12 16:51:05
Epoch[1] Loss=0.008472, P=0.649718, R=0.089124, F1=0.156747, Speed = 9.1 samples/s, 99.85 %, ETA = 2021-09-12 16:51:00
Epoch[1] Loss=0.009870, P=0.622468, R=0.081979, F1=0.144878, Speed = 9.0 samples/s, 99.85 %, ETA = 2021-09-12 16:51:04
Epoch[1] Loss=0.009430, P=0.687616, R=0.093233, F1=0.164202, Speed = 9.1 samples/s, 99.85 %, ETA = 2021-09-12 16:51:01
Epoch[1] Loss=0.009801, P=0.685083, R=0.091176, F1=0.160934, Speed = 9.1 samples/s, 99.86 %, ETA = 2021-09-12 16:50:59
Epoch[1] Loss=0.009719, P=0.682657, R=0.087970, F1=0.155855, Speed = 9.0 samples/s, 99.86 %, ETA = 2021-09-12 16:51:04
Epoch[1] Loss=0.010146, P=0.667904, R=0.082759, F1=0.147269, Speed = 9.0 samples/s, 99.86 %, ETA = 2021-09-12 16:51:07
Epoch[1] Loss=0.009947, P=0.702206, R=0.087735, F1=0.155982, Speed = 9.1 samples/s, 99.86 %, ETA = 2021-09-12 16:51:01
Epoch[1] Loss=0.010229, P=0.647601, R=0.087075, F1=0.153510, Speed = 9.0 samples/s, 99.86 %, ETA = 2021-09-12 16:51:06
Epoch[1] Loss=0.009836, P=0.681985, R=0.088019, F1=0.155915, Speed = 9.0 samples/s, 99.87 %, ETA = 2021-09-12 16:51:04
Epoch[1] Loss=0.009276, P=0.678373, R=0.093814, F1=0.164833, Speed = 9.0 samples/s, 99.87 %, ETA = 2021-09-12 16:51:05
Saving checkpoint ... (2021-09-12 16:32:01.515339)
Epoch[1] Loss=0.008934, P=0.671587, R=0.093142, F1=0.163596, Speed = 8.1 samples/s, 99.87 %, ETA = 2021-09-12 16:53:16
Epoch[1] Loss=0.009416, P=0.688192, R=0.090184, F1=0.159470, Speed = 8.9 samples/s, 99.87 %, ETA = 2021-09-12 16:51:20
Epoch[1] Loss=0.009463, P=0.677122, R=0.090416, F1=0.159531, Speed = 8.7 samples/s, 99.87 %, ETA = 2021-09-12 16:51:54
Epoch[1] Loss=0.009459, P=0.701107, R=0.093550, F1=0.165074, Speed = 8.2 samples/s, 99.87 %, ETA = 2021-09-12 16:53:01
Epoch[1] Loss=0.009260, P=0.659259, R=0.090378, F1=0.158964, Speed = 8.5 samples/s, 99.88 %, ETA = 2021-09-12 16:52:15
Epoch[1] Loss=0.008939, P=0.687732, R=0.094847, F1=0.166704, Speed = 7.0 samples/s, 99.88 %, ETA = 2021-09-12 16:56:23
Epoch[1] Loss=0.009165, P=0.692737, R=0.090247, F1=0.159691, Speed = 7.8 samples/s, 99.88 %, ETA = 2021-09-12 16:54:00
Epoch[1] Loss=0.009825, P=0.625461, R=0.082422, F1=0.145650, Speed = 7.2 samples/s, 99.88 %, ETA = 2021-09-12 16:55:43
Epoch[1] Loss=0.009284, P=0.643911, R=0.086708, F1=0.152836, Speed = 8.7 samples/s, 99.88 %, ETA = 2021-09-12 16:52:01
Epoch[1] Loss=0.009580, P=0.662983, R=0.089330, F1=0.157446, Speed = 7.8 samples/s, 99.88 %, ETA = 2021-09-12 16:54:04
Epoch[1] Loss=0.009263, P=0.664815, R=0.085415, F1=0.151381, Speed = 8.5 samples/s, 99.89 %, ETA = 2021-09-12 16:52:28
Epoch[1] Loss=0.010090, P=0.659889, R=0.086714, F1=0.153285, Speed = 9.0 samples/s, 99.89 %, ETA = 2021-09-12 16:51:30
Epoch[1] Loss=0.009337, P=0.650558, R=0.091455, F1=0.160367, Speed = 8.0 samples/s, 99.89 %, ETA = 2021-09-12 16:53:31
Epoch[1] Loss=0.009531, P=0.656827, R=0.088712, F1=0.156312, Speed = 5.8 samples/s, 99.89 %, ETA = 2021-09-12 17:00:11
Epoch[1] Loss=0.009358, P=0.661765, R=0.087400, F1=0.154407, Speed = 8.7 samples/s, 99.89 %, ETA = 2021-09-12 16:52:12
Epoch[1] Loss=0.010049, P=0.666048, R=0.086298, F1=0.152798, Speed = 8.8 samples/s, 99.90 %, ETA = 2021-09-12 16:52:02
Epoch[1] Loss=0.009608, P=0.662338, R=0.090471, F1=0.159197, Speed = 9.1 samples/s, 99.90 %, ETA = 2021-09-12 16:51:30
Epoch[1] Loss=0.009822, P=0.631481, R=0.083456, F1=0.147428, Speed = 9.0 samples/s, 99.90 %, ETA = 2021-09-12 16:51:40
Epoch[1] Loss=0.008857, P=0.662313, R=0.086207, F1=0.152557, Speed = 9.0 samples/s, 99.90 %, ETA = 2021-09-12 16:51:36
Epoch[1] Loss=0.009418, P=0.682657, R=0.090024, F1=0.159071, Speed = 9.0 samples/s, 99.90 %, ETA = 2021-09-12 16:51:38
Saving checkpoint ... (2021-09-12 16:37:43.558713)
Epoch[1] Loss=0.009293, P=0.695167, R=0.095652, F1=0.168165, Speed = 7.6 samples/s, 99.90 %, ETA = 2021-09-12 16:54:20
Epoch[1] Loss=0.009694, P=0.664815, R=0.087284, F1=0.154309, Speed = 9.1 samples/s, 99.91 %, ETA = 2021-09-12 16:51:34
Epoch[1] Loss=0.009461, P=0.669131, R=0.090095, F1=0.158807, Speed = 9.0 samples/s, 99.91 %, ETA = 2021-09-12 16:51:43
Epoch[1] Loss=0.009440, P=0.674766, R=0.087770, F1=0.155336, Speed = 9.1 samples/s, 99.91 %, ETA = 2021-09-12 16:51:33
Epoch[1] Loss=0.009403, P=0.689214, R=0.087899, F1=0.155914, Speed = 9.0 samples/s, 99.91 %, ETA = 2021-09-12 16:51:43
Epoch[1] Loss=0.010117, P=0.666667, R=0.084034, F1=0.149254, Speed = 9.0 samples/s, 99.91 %, ETA = 2021-09-12 16:51:40
Epoch[1] Loss=0.009759, P=0.688192, R=0.091109, F1=0.160915, Speed = 9.0 samples/s, 99.92 %, ETA = 2021-09-12 16:51:42
Epoch[1] Loss=0.008953, P=0.666048, R=0.091349, F1=0.160662, Speed = 9.0 samples/s, 99.92 %, ETA = 2021-09-12 16:51:40
Epoch[1] Loss=0.009864, P=0.674677, R=0.086946, F1=0.154041, Speed = 9.1 samples/s, 99.92 %, ETA = 2021-09-12 16:51:37
Epoch[1] Loss=0.009144, P=0.664815, R=0.091279, F1=0.160519, Speed = 7.7 samples/s, 99.92 %, ETA = 2021-09-12 16:53:41
Epoch[1] Loss=0.009868, P=0.647706, R=0.087032, F1=0.153445, Speed = 7.4 samples/s, 99.92 %, ETA = 2021-09-12 16:54:09
Epoch[1] Loss=0.009271, P=0.651291, R=0.089753, F1=0.157765, Speed = 9.1 samples/s, 99.92 %, ETA = 2021-09-12 16:51:42
Epoch[1] Loss=0.009352, P=0.698355, R=0.092561, F1=0.163457, Speed = 9.0 samples/s, 99.93 %, ETA = 2021-09-12 16:51:44
Epoch[1] Loss=0.009226, P=0.691450, R=0.092745, F1=0.163552, Speed = 7.7 samples/s, 99.93 %, ETA = 2021-09-12 16:53:36
Epoch[1] Loss=0.009283, P=0.637708, R=0.080758, F1=0.143362, Speed = 9.1 samples/s, 99.93 %, ETA = 2021-09-12 16:51:45
Epoch[1] Loss=0.010218, P=0.657459, R=0.081026, F1=0.144272, Speed = 9.1 samples/s, 99.93 %, ETA = 2021-09-12 16:51:45
Epoch[1] Loss=0.009675, P=0.606679, R=0.079024, F1=0.139833, Speed = 9.1 samples/s, 99.93 %, ETA = 2021-09-12 16:51:45
Epoch[1] Loss=0.009401, P=0.670956, R=0.089046, F1=0.157226, Speed = 9.0 samples/s, 99.93 %, ETA = 2021-09-12 16:51:47
Epoch[1] Loss=0.009079, P=0.667890, R=0.094349, F1=0.165342, Speed = 9.0 samples/s, 99.94 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.008731, P=0.688312, R=0.099785, F1=0.174301, Speed = 9.1 samples/s, 99.94 %, ETA = 2021-09-12 16:51:45
Saving checkpoint ... (2021-09-12 16:43:04.464378)
Epoch[1] Loss=0.009721, P=0.677064, R=0.089001, F1=0.157323, Speed = 8.1 samples/s, 99.94 %, ETA = 2021-09-12 16:52:49
Epoch[1] Loss=0.008743, P=0.701299, R=0.101340, F1=0.177091, Speed = 9.1 samples/s, 99.94 %, ETA = 2021-09-12 16:51:47
Epoch[1] Loss=0.009586, P=0.678832, R=0.091671, F1=0.161528, Speed = 9.1 samples/s, 99.94 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.009641, P=0.626151, R=0.079944, F1=0.141785, Speed = 9.0 samples/s, 99.95 %, ETA = 2021-09-12 16:51:49
Epoch[1] Loss=0.009199, P=0.696133, R=0.095575, F1=0.168075, Speed = 9.0 samples/s, 99.95 %, ETA = 2021-09-12 16:51:50
Epoch[1] Loss=0.009826, P=0.675277, R=0.089247, F1=0.157657, Speed = 9.1 samples/s, 99.95 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.009301, P=0.654982, R=0.083964, F1=0.148847, Speed = 9.0 samples/s, 99.95 %, ETA = 2021-09-12 16:51:50
Epoch[1] Loss=0.009410, P=0.681481, R=0.092462, F1=0.162832, Speed = 9.1 samples/s, 99.95 %, ETA = 2021-09-12 16:51:47
Epoch[1] Loss=0.009095, P=0.716912, R=0.096368, F1=0.169898, Speed = 9.1 samples/s, 99.95 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.008949, P=0.699083, R=0.101141, F1=0.176716, Speed = 9.0 samples/s, 99.96 %, ETA = 2021-09-12 16:51:49
Epoch[1] Loss=0.008488, P=0.637383, R=0.090764, F1=0.158900, Speed = 9.0 samples/s, 99.96 %, ETA = 2021-09-12 16:51:49
Epoch[1] Loss=0.009671, P=0.661765, R=0.086207, F1=0.152542, Speed = 9.0 samples/s, 99.96 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.010721, P=0.645102, R=0.083393, F1=0.147694, Speed = 9.1 samples/s, 99.96 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.009658, P=0.693309, R=0.087992, F1=0.156165, Speed = 9.0 samples/s, 99.96 %, ETA = 2021-09-12 16:51:49
Epoch[1] Loss=0.009730, P=0.640221, R=0.079991, F1=0.142213, Speed = 9.1 samples/s, 99.97 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.009328, P=0.670330, R=0.095040, F1=0.166477, Speed = 9.0 samples/s, 99.97 %, ETA = 2021-09-12 16:51:49
Epoch[1] Loss=0.008679, P=0.646409, R=0.091335, F1=0.160055, Speed = 9.0 samples/s, 99.97 %, ETA = 2021-09-12 16:51:50
Epoch[1] Loss=0.009221, P=0.648799, R=0.085194, F1=0.150611, Speed = 9.1 samples/s, 99.97 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.009234, P=0.657944, R=0.092050, F1=0.161505, Speed = 9.1 samples/s, 99.97 %, ETA = 2021-09-12 16:51:48
Epoch[1] Loss=0.009340, P=0.665441, R=0.088617, F1=0.156405, Speed = 9.0 samples/s, 99.97 %, ETA = 2021-09-12 16:51:49
Saving checkpoint ... (2021-09-12 16:48:15.916625)
Epoch[1] Loss=0.009778, P=0.633333, R=0.086846, F1=0.152747, Speed = 8.0 samples/s, 99.98 %, ETA = 2021-09-12 16:52:17
Epoch[1] Loss=0.011003, P=0.706100, R=0.093376, F1=0.164940, Speed = 9.0 samples/s, 99.98 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009351, P=0.704797, R=0.095357, F1=0.167986, Speed = 9.0 samples/s, 99.98 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.008898, P=0.677064, R=0.096094, F1=0.168301, Speed = 9.0 samples/s, 99.98 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009003, P=0.623400, R=0.086024, F1=0.151186, Speed = 9.0 samples/s, 99.98 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.008821, P=0.661142, R=0.089616, F1=0.157837, Speed = 9.0 samples/s, 99.98 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009215, P=0.654917, R=0.087182, F1=0.153880, Speed = 9.1 samples/s, 99.99 %, ETA = 2021-09-12 16:51:50
Epoch[1] Loss=0.009103, P=0.690037, R=0.092028, F1=0.162397, Speed = 9.0 samples/s, 99.99 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009562, P=0.634686, R=0.081285, F1=0.144114, Speed = 9.0 samples/s, 99.99 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.008967, P=0.682657, R=0.095140, F1=0.167005, Speed = 9.0 samples/s, 99.99 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009289, P=0.677122, R=0.091498, F1=0.161212, Speed = 9.0 samples/s, 99.99 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009526, P=0.695167, R=0.089990, F1=0.159352, Speed = 9.0 samples/s, 100.00 %, ETA = 2021-09-12 16:51:51
Epoch[1] Loss=0.009428, P=0.683150, R=0.090162, F1=0.159300, Speed = 9.1 samples/s, 100.00 %, ETA = 2021-09-12 16:51:51

I don't know what any of these mean, of course. Machine learning is still a learning process badum tss

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024

It seems that R value is too small. I think that your training is failed for some problem (overfitting, GPU calculation error, and so on).

I think that your training log contains some point that R value is decreased suddenly. If you find that point, you should re-train from nearest checkout. So I'll recommend periodically backup checkpoints.

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

I did have several system crashes from OOM before. That was probably the cause, huh. Is it recoverable somehow, or do I need to just start over? The checkpoint file is 3GB and I don't see any logs (been going for weeks).

EDIT: I bought more RAM, so that won't happen again

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024

I recommend that train from start and carefully monitor R value on log. And backup checkpoints folder everyday and if the R value is suddenly decreased, cancel training and restore checkpoints folder then start again.

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

ok. Are there logs? I only have console output, and redirecting ( $ program > output.log ) isn't working

EDIT: on restart, I've got values like so, are these good?

Epoch[0] Loss=0.739733, P=0.001940, R=0.501037, F1=0.003865, Speed = 4.6 samples/s, 0.00 %, ETA = 2021-09-23 05:53:49

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024

Are there logs?

Current DD has only console output log.

I've got values like so, are these good?

Yes, starting value seems no problem.

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

Ok. I'll let it run. Thanks for all your help

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

It didn't even last 12 hours. I'm not sure how long it lasted. I'll try running it CPU only and see if it at least maintains an R value. In the worst case, I can run it over network on a machine with an RTX3070, as running CPU only takes like 8x longer, and that's on the scale of months here.
Is the model portable, or does it rely on absolute paths? I can buy a new GPU for the server it's running on, but that'll take time with the current market.

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024

The model is portable. It uses relative path and you can use different hardware for training <-> evaluating.

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

Perfect, thanks.

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

log.txt
You said a sudden drop is bad, but what about a gradual one?

I found this and have been reading. https://neptune.ai/blog/keras-metrics
Correct me if I'm wrong please.
R is recall. the value returned is something like this:

def recall(y_true, y_pred):
    y_true = K.ones_like(y_true) 
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    all_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    
    recall = true_positives / (all_positives + K.epsilon())
    return recall

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024

Gradual one is fine. At initial, network is initialized with random value, so R value is quite high, then it will be decreased to some point and gradually increased again (with P value). Once R value is increasing, it should not be dropped suddenly.

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

Ok, I'll keep an eye on it

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024
Epoch[0] Loss=0.273658, P=0.084761, R=0.094115, F1=0.089193, Speed = 47.9 samples/s, 4.02 %, ETA = 2021-09-15 11:14:22
Epoch[0] Loss=0.276624, P=0.067192, R=0.101965, F1=0.081004, Speed = 46.4 samples/s, 4.02 %, ETA = 2021-09-15 11:58:56
Epoch[0] Loss=0.273647, P=0.078138, R=0.089073, F1=0.083248, Speed = 46.9 samples/s, 4.02 %, ETA = 2021-09-15 11:41:45

Should've used the 3070 from the start. That's like 5x faster than the RX570....

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

Random thing before I wait again after tweaking perf for the 3070. Is this okay? The None in Model instinctively makes me worry.

Using SGD optimizer ...
Loading tags ...
Creating model (resnet_custom_v4) ...
Model : (None, 299, 299, 3) -> (None, 14176)
Loading database ...

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024

None is okay :) Don't worry.

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

Ok thanks.

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

I'm all of a sudden getting huge performance hits, and I have no idea why. CUDA (or any other GPU graph) is not even utilized, let alone bottlenecked. Do you have any ideas?
CCzGv2OGTR

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024

I've never seen anything like this. But it seems hardware trouble (or throttling?). I recommend to check GPU temperature and cooling fan status.

from deepdanbooru.

da3dsoul avatar da3dsoul commented on June 10, 2024

It's 46C, so maybe? That was my first guess. 46 is cold for a GPU at load, but it has been running for weeks, so idk. I'll keep looking at it. Thanks.

EDIT: a full restart fixed it

from deepdanbooru.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.