Code Monkey home page Code Monkey logo

Comments (16)

KichangKim avatar KichangKim commented on June 10, 2024 3

@rachmadaniHaryono
I think that It can't be compared correctly because its dataset is changed, but here is last training logs:
v1:

Epoch[29] Loss=1416.928884, P=0.773589, R=0.502963, F1=0.609590, Speed = 47.5 samples/s, 60.00 %, ETA = 2019-12-31 15:28:03
Epoch[29] Loss=1343.514524, P=0.779304, R=0.518631, F1=0.622791, Speed = 47.5 samples/s, 60.00 %, ETA = 2019-12-31 15:14:55
Epoch[29] Loss=1406.559717, P=0.777394, R=0.508826, F1=0.615071, Speed = 47.2 samples/s, 60.00 %, ETA = 2019-12-31 16:41:41

v3:

Epoch[30] Loss=540.683345, P=0.788256, R=0.545070, F1=0.644485, Speed = 22.9 samples/s, 61.25 %, ETA = 2020-02-25 03:23:44
Epoch[30] Loss=536.273903, P=0.782580, R=0.550326, F1=0.646218, Speed = 23.1 samples/s, 61.25 %, ETA = 2020-02-25 00:30:51
Epoch[30] Loss=563.256741, P=0.784784, R=0.536157, F1=0.637072, Speed = 23.0 samples/s, 61.25 %, ETA = 2020-02-25 01:56:43

P=precision, R=recall, F1=f1 score for training dataset. DeepDanbooru doesn't have validation set.

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024 2

Speaking of which, what does this parameter affect? If you increase it, will the accuracy increase?

That is exactly what I tested internally now. v3 model will uses 512x512 resolution.

By the way, what is the difference between v2 and v1, as well as the experimental v3 from all the others in choosing a model?

v1 is the first DeepDanbooru model which is slightly deeper than original resnet-152 imagenet model. (https://github.com/microsoft/CNTK/blob/master/Examples/Image/Classification/ResNet/Python/resnet_models.py)
v2 is more deeper model than v1 but it is not fully trained/tested yet because TensorFlow throws CUDA error when training.
v3 is slightly deeper than v1 and is different for its output channel. It is created for 512x512 resolution.

You can change your input size for any model version, but large input size makes you can't train with consumer graphic card.

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024 2

Long images are handled as just "small objects with large empty space" until it has clean backgrounds because it will be padded with "edge" mode (edge pixels are duplicated for padding). So it may not be critical problem I think.

Pre tag filtering (merge confusing tags into single one and so on) may be helpful, but it needs additional knowledge for tag itself and make the system complex.

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024

In general, large minibatch size (like total samples) makes fast convergence but may be stuck in local minima and it requires huge amount of memory. In contrary, small minibatch size needs lots of iteration for convergence, but it is more robust and you can control the usage of memory.

For training data, I think you don't have to filter the data as long as it is correctly labeled. Various input data make the model more stable.

from deepdanbooru.

libid0nes avatar libid0nes commented on June 10, 2024

Well, thanks for the answer, I ask because I read this information in that post: https://stats.stackexchange.com/a/153535

For training data, I think you don't have to filter the data as long as it is correctly labeled. Various input data make the model more stable.

Even if there is text in the picture? Will this confuse the network? Because there are several versions of images in the samples, one without text, the other with text, also in some cases the text gets on parts of the body, head, which can give false data about the construction of the geometry and features of a particular character.

I mean, the neural network might start thinking that these "hieroglyphs" or "English characters" are the feature for a particular tag, right?

Although the percentage of such images is not so much, but it can still blur the accuracy of the neural network at certain points, isn't it?

from deepdanbooru.

libid0nes avatar libid0nes commented on June 10, 2024

By the way, how does the neural network react to such images? They are not only multi-frame, but also with an unusual aspect ratio and resolution.

For example: https://chan.sankakucomplex.com/post/show/19176632

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024

Unusual image ratio (too wide or tall) may be the problem because all input images are resized and padded to 299x299 preserving its ratio. So If the image is too long, as a result its actual information may be smaller.

I think hieroglyphs or english characters may not be a problem (of course as long as correctly tagged). Because that features are extracted by the network and estimated independently. Even that "noisy" inputs makes the network more robust.

from deepdanbooru.

libid0nes avatar libid0nes commented on June 10, 2024

all input images are resized and padded to 299x299

Speaking of which, what does this parameter affect? If you increase it, will the accuracy increase? Although I can say with confidence that increasing the resolution will increase the need for memory and performance, but I still wonder what effects can cause a decrease or increase in this parameter.

Even that "noisy" inputs makes the network more robust.

Well, I will try to train the network with minimal interference on my part, I will only remove monochrome, black and white images, and images with a suboptimal aspect ratio.

I still can't start training, due to data loading from sankakucomplex, as their security system causes a lot of problems...

By the way, what is the difference between v2 and v1, as well as the experimental v3 from all the others in choosing a model?

from deepdanbooru.

rachmadaniHaryono avatar rachmadaniHaryono commented on June 10, 2024

v1 & v3 diff

here is tags diff

tags diff

+ak-12_(girls_frontline)
+akanbe
+anchovy_(girls_und_panzer)
+aoba_moca
+ar-15
+artoria_pendragon_(swimsuit_ruler)(fate)
+asa_no_ha
(pattern)
+ashido_mina
+ass_shake
+assam_(girls_und_panzer)
+ballerina
+bandaid_on_arm
+blue_horns
+blunt_ends
+braided_bangs
+broken_chain
+broken_horn
+buruma_aside
+calligraphy_brush_(medium)
+carpaccio_(girls_und_panzer)
+character_print
+chi-hatan_military_uniform
+colorado_(kantai_collection)
+cooler
+copyright
+covered_face
+crescent_rose
+cropped_vest
+cutting_board
+darjeeling_(girls_und_panzer)
+dark_areolae
+drugs
+dust_cloud
+duster
+ear_protection
+eldridge_(azur_lane)
+elise_(fire_emblem)
+erwin_(girls_und_panzer)
+evening_gown
+fat_folds
+fur-trimmed_hood
+gift_bag
+golden_snub-nosed_monkey_(kemono_friends)
+grey_bra
+hair_strand
+half-skirt
+hanasakigawa_school_uniform
+hand_on_own_leg
+heart_in_eye
+heshikiri_hasebe
+hikawa_sayo
+horn_bow
+humboldt_penguin_(kemono_friends)
+incoming_kiss
+interspecies
+ishtar_(fate)(all)
+k/da
(league_of_legends)
+kagami_mochi
+kamina_shades
+katagiri_sanae
+katarina_du_couteau
+katyusha_(girls_und_panzer)
+keizoku_school_uniform
+kizuna_akari
+kochou_shinobu
+kokkoro_(princess_connect!)
+kumada_masaru
+kyaru_(princess_connect)
+large_tail
+leather_boots
+light_nipples
+lysithea_von_ordelia
+magical_boy
+maruyama_aya
+mary_(pokemon)
+may_(guilty_gear)
+medical_eyepatch
+minase_akiko
+mod3_(girls_frontline)
+mummy
+nanachi_(made_in_abyss)
+national_shin_ooshima_school_uniform
+natsu_megumi
+nonna_(girls_und_panzer)
+opening_door
+orange_pekoe_(girls_und_panzer)
+oversized_shirt
+patterned_clothing
+pearl_bracelet
+pink_blouse
+pink_wings
+pointless_condom
+poke_ball_print
+print_bow
+raphtalia
+rating:explicit
+rating:questionable
+rating:safe
+rectangular_eyewear
+reines_el-melloi_archisorte
+reins
+reverse_upright_straddle
+ribbed_leotard
+ribbon-trimmed_dress
+rosehip_(girls_und_panzer)
+saitou_(pokemon)
+sangvis_ferri
+santa_bikini
+shiro_(dennou_shoujo_youtuber_shiro)
+sideless_outfit
+sidewalk
+single_strap
+skull_earrings
+snout
+sock_garters
+spaghetti
+st.louis(azur_lane)
+steering_wheel
+sunflower_hair_ornament
+tam_o'shanter
+tentacles_under_clothes
+thighhighs_over_pantyhose
+todoroki_shouto
+toe-point
+tools
+tsurumaki_kokoro
+tsushima
(kantai_collection)
+two-tone_ribbon
+u_u
+udagawa_tomoe
+uehara_himari
+uzuki_sayaka
+white_headband
+white_serafuku
+white_suit
+winged_footwear
+yae_sakura
+yoshida_yuuko_(machikado_mazoku)
+yuri_sakazaki
+yuudachi_(azur_lane)
+yuuri_(pokemon)
-anchovy
-assam
-carpaccio
-darjeeling
-katyusha
-nonna
-orange_pekoe
-pokemon_trainer
-rosehip
-winged_shoes

also it take longer to get the result. i got around 50-60 second per image for v1 and 95-100130 second per image for v3

my laptop spec
OS: Ubuntu 19.04 x86_64
Host: X201EP 1.0
Kernel: 5.0.0-38-generic
Uptime: 28 mins
Packages: 3792 (dpkg), 3 (snap)
Shell: zsh 5.5.1
Resolution: 1366x768
WM: i3
Theme: Ambiance [GTK2/3]
Icons: ubuntu-mono-light [GTK2/3]
CPU: Intel Celeron 847 (2) @ 1.100GHz
GPU: Intel 2nd Generation Core Processor Family
Memory: 1883MiB / 3825MiB

@KichangKim how are precision & recall for v3 in comparison to v1?

https://stats.stackexchange.com/questions/21551/how-to-compute-precision-recall-for-multiclass-multilabel-classification

from deepdanbooru.

rachmadaniHaryono avatar rachmadaniHaryono commented on June 10, 2024

actual v1 to v3 diff

tags_diff_src_v1_comp_v3.txt

v3 compatible v1 tags

tags_comp_v3.txt

changelog

  • girls und panzer character got series suffix
  • winged_shoes to winged_footwear
  • pokemon trainer kept as is because it is deleted and no replacement on v3 tags

@KichangKim

  1. is it better to skip width & tall image?

Unusual image ratio (too wide or tall) may be the problem
because all input images are resized and padded to 299x299 preserving its ratio.
So If the image is too long, as a result its actual information may be smaller.

based on danbooru wiki for long image:

An image that is either wide or tall:
that is, at least 1024px long on one side,
and whose long side is at least four times longer than its short side.

maybe that can be used as basic of long image specification

  1. is there exist parent tag, which rely only on children tag?

  2. maybe skip text related tag or not tag for information which is not contained on image?

it is mostly miss than hit especially unknown language

  • text
    • background_text [4]: Text is written on the background, in English, Japanese or other language.
    • chinese_text
    • english_text
    • engrish_text
    • french_text
    • german_text
    • korean_text
    • romaji_text
    • russian_text
    • simplified_chinese_text
    • text_focus [4]: Indicates that text is a major part of the image.
    • text_only_page [4]: Indicates that the image contains only text.
    • thai_text
    • wall_of_text [4]: Images with a large block of text included, usually in the background.
  • name
    • artist_name
    • brand_name_imitation
    • character_name
    • circle_name
    • company_name
    • copyright_name
    • group_name
    • song_name
  • username
    • deviantart_username
    • twitter_username
    • weibo_username
    • patreon_username
  • namesake [1]
    • namesake: Name shared by at least two persons.
    • object_namesake: A type of pun when a character is paired with any kind of object that makes a reference to their civilian or alter ego's name.
  • connection
    • company_connection
    • creator_connection
    • season_connection
    • seiyuu_connection
    • trait_connection: A crossover, cosplay or parody image clearly depicting two or more characters who share the same or closely similar personality traits, or have similar circumstances occur to them in their respective stories
  • parody [4]
    • parody: Parody implies a character is mimicking a scene, a dialogue, or another series, with the intention of being humorous
    • style_parody
    • title_parody
    • card_parody
    • fine_art_parody
  • single or unknown
    • artist_logo (not found)
    • artist_self-insert [2]: When the artist or creator puts themselves in their story, game, movie, Etc.
    • copyright (not found)
    • crossover [3]: A crossover is when two or more characters from unrelated copyrights are shown together in one scene.
    • multiple_crossover [3]: Crossovers within a franchise (for example all Final Fantasy villains together, or all Links meeting each other) don't qualify unless they also involve two or more other franchises.
    • fusion [4]: The merging of 2 or more characters into a single being
    • dated [4]: This tag should be used for when the date of creation is written somewhere on the image.
    • lyrics: Any post with song lyrics written in it.
    • number [4]: When a number is written somewhere in the image, consisting of any of the ten following digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9.
    • numbered [4]: When images that are part of a set are sequentially numbered (usually in date order) on the image.
    • page_number [4]: A page number typically used to navigate a book, comic, or magazine.
    • pokemon_number [4]: The number of a Pokémon species mentioned somewhere on the post.
    • out_of_character: Used when one or more characters are not acting like their usual selves.
    • product_placement: A situation wherein a well-known brand name or product is intentionally included in the image.
    • ranguage: What you get when a country tries to use any language that's not their native language, and makes mistakes in the process
    • sample: Promotional images for doujinshi, games or CG sets
    • timestamp [4]: Any picture that includes some set of numbers that represent time
    • web_address: A web address (e.g.: www.example.com or http://www.example.com/) is written somewhere.
    • borrowed_character: When an artist draws an original character which was originally created by a different artist
    • character_signature: For when an image appears to be signed by a character appearing in the image.

(possibly) valid:

  • name_tag: A name tag sewn a piece of clothing, such as on a gym uniform or school swimsuit.
  • color_connection: When characters are grouped together based upon their theme color, or one character is referencing or cosplaying as another who shares the same theme color.
  • calendar_(medium): Artwork from a calendar, frequently a scan.
  • rating: A classification that rates the suitability of content for any type of media
  • character_censor: When e.g. a character, a character's head or similar is used as a novelty censor.
  • character_print: Clothing or an item that has a specific character printed on it.
  • character_profile: Use this tag when an image contains information about a pictured character such as name, personality traits, likes and dislikes, etc
  • character_sheet: Multiple drawings of the same character in different poses (キャラクター設定 "character set" or キャラ表 "character table"), or the character and their accessories drawn separately (持ち物検査 "belongings inspection").

[1] namesake is only effective if character is known, which mean it have to include more character that currently exist on model

[2] artist is not included on the model, so no relation can be checked

[3] series is not included on the model, so it is not effective

[4] debatable as it may still be effective

[5] model can't recognize pokemon

from deepdanbooru.

DonaldTsang avatar DonaldTsang commented on June 10, 2024

is it better to skip width & tall image?

Seperate the image into smaller sub-images that have reasonable overlaps.
That way it can detect regions of the images without changing aspect ratios or downgrading resolutions.

is there exist parent tag, which rely only on children tag?

In that case a hierarchical tagging system is in order... but if it is not hierarchical and is instead a Directed Acyclic Graph (DAG) then a knowledge graph representation could be useful? I would like to find a solution that can do this well.

from deepdanbooru.

rachmadaniHaryono avatar rachmadaniHaryono commented on June 10, 2024

Seperate the image into smaller sub-images that have reasonable overlaps.
That way it can detect regions of the images without changing aspect ratios or downgrading resolutions.

i still can't imagine how to do that. if someone make implementation of it, please notify me.

parent-children tagging system

i just thought something about removing parent tag

it is possible that even if parent tag rely only on children tag(s), it have to be calculated because at least one of the children tag may have low image count and filtered


  • chinese_text
  • english_text
  • engrish_text
  • french_text
  • german_text
  • korean_text
  • romaji_text
  • russian_text
  • simplified_chinese_text
  • thai_text
  • ranguage?

maybe instead remove those tag, just merge it into single tag e.g. 'text'. this way model can recognize text but don't have to guess which language is it.

but i doubt this will work with name and username

another idea is just merge those tag groups (text, name, username) into single tag e.g. text


long image

i checked my image library and found that long image with full body is still recognizable even if it is downsized. but if model only trained with that tag, there is possibility that long image will bias to full body tag

e:

parent children tag

afaik there is no program yet to parse danbooru to get the data. i may (or may not) create simple script to do that

long image statistic

@KichangKim can you give statistic of long image on dataset, like actual width height and tag count?

from deepdanbooru.

libid0nes avatar libid0nes commented on June 10, 2024

i still can't imagine how to do that. if someone make implementation of it, please notify me.

Can't you ask the author? As far as I know, he implemented this feature on his website: http://kanotype.iptime.org:8003/deepdanbooru

from deepdanbooru.

KichangKim avatar KichangKim commented on June 10, 2024

@Libidine
Web demo implements evaluation-time cropping, but it is not part of deepdanbooru itself currently.

But you can easily implement yourself by using numpy's subarray. The main idea is that crop input image into multiple small regions and evaluate all. Then get max score of it. Some tags are affected by cropping (ex, number-related tags, lower/upper tags, frame related tags and so on) so you should ignore or control that.

Of course, it need more computation time depend on the number of subregions.

from deepdanbooru.

rachmadaniHaryono avatar rachmadaniHaryono commented on June 10, 2024

wait i thought @DonaldTsang propose new method instead the current one

from my understanding the image will be resized to proposed size e.g. 299x299 or 512x512 and the rest will be padded by with "edge" mode (copied from above response, still not quite understand the edge mode yet)

that is different from this part

That way it can detect regions of the images without changing aspect ratios or downgrading resolutions.

from deepdanbooru.

fengyueyeah avatar fengyueyeah commented on June 10, 2024

@rachmadaniHaryono
I think that It can't be compared correctly because its dataset is changed, but here is last training logs:
v1:

Epoch[29] Loss=1416.928884, P=0.773589, R=0.502963, F1=0.609590, Speed = 47.5 samples/s, 60.00 %, ETA = 2019-12-31 15:28:03
Epoch[29] Loss=1343.514524, P=0.779304, R=0.518631, F1=0.622791, Speed = 47.5 samples/s, 60.00 %, ETA = 2019-12-31 15:14:55
Epoch[29] Loss=1406.559717, P=0.777394, R=0.508826, F1=0.615071, Speed = 47.2 samples/s, 60.00 %, ETA = 2019-12-31 16:41:41

v3:

Epoch[30] Loss=540.683345, P=0.788256, R=0.545070, F1=0.644485, Speed = 22.9 samples/s, 61.25 %, ETA = 2020-02-25 03:23:44
Epoch[30] Loss=536.273903, P=0.782580, R=0.550326, F1=0.646218, Speed = 23.1 samples/s, 61.25 %, ETA = 2020-02-25 00:30:51
Epoch[30] Loss=563.256741, P=0.784784, R=0.536157, F1=0.637072, Speed = 23.0 samples/s, 61.25 %, ETA = 2020-02-25 01:56:43

P=precision, R=recall, F1=f1 score for training dataset. DeepDanbooru doesn't have validation set.

According to the log, v3 is much better than v1. What's the hyper-parameters setting for v3, such as learning rate(or scheduler) and batch size. I found the learning rate for v2 is 0.001 in the default project and not changed. By the way, what do you think v3 benifits most from, model arch, input size, data filter or hyper-params?

from deepdanbooru.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.