summerlvsong / aggregation-cross-entropy Goto Github PK

View Code? Open in Web Editor NEW

301.0 301.0 60.0 18.44 MB

Aggregation Cross-Entropy for Sequence Recognition. CVPR 2019.

Python 98.65% Shell 1.35%

aggregation-cross-entropy's People

Contributors

Stargazers

Watchers

Forkers

hciilab junyeoplee whang94 yuckfu lanfeng4659 vyraun wuxiaolianggit zengqi0730 mc261670164 ieee820 xiliu cqray1990 alwc billyzju jingwanli6666 meicsu199345 fireae kapitsa2811 shengzhang90 fendaq hell-to-heaven jeffrey98-ai rahzaazhar jadentan huanleo irene323 huanai tsui-xianjun jinglijj liuwenhaha xiaoyubing verazjy teresasun triplepool friendly-shaowei qjsqjs corleonechensiyu huismiling walter040422 dy1998 cjt222 xialuxi kiciro jeozhao dlml fengxingxiang kakafeicoffee demon19941113 donjet arufuss luffywong0123 lilu0726 harold-lkk ygexe liaoxy169 yusirhhh macmag33 buoyrina aniketgurav cheng1533

aggregation-cross-entropy's Issues

Why use ResLSTM for Offline Handwritten Chinese Text Recognition ?

Nice work!

I found this network architecture
(126,576)Input − 8C3 − MP2 − 32C3 − MP2 − 128C3 − MP2−5∗256C3−MP2−512C3−512C3−MP2−512C2− 3 ∗ 512ResLSTM − 7357F C − Output
in the paper for Handwritten Chinese Text Recognition.

Is it necessary for Chinese Text Recognition ?

关于论文公式（2）的理解问题

对general loss function为何可以通过第3节的公式（2）来估计？估计公式中的每一项概率不是远大于general loss function中的对应项吗？

can you share 288 cropped natural images for testing in CUTE80

Use ACE loss for English handwriting recognition，but not converge

I use the ACE loss function to do English handwriting recognition. When the model is trained, it does not converge, but with CTC, it gets a good convergence effect. How can this happen?

def vis(self,iteration) function, possible error

in https://github.com/summerlvsong/Aggregation-Cross-Entropy/blob/master/source/models/seq_module.py#L65, should it be pred_string_set = [pred_string[i:i+self.w] for i in xrange(0, len(pred_string), self.w)] instetad of self.w*2?

please verify.

Thanks

could you upload your trained model for scene text recognition?

how to get val label

result is n77*26 ,how to match word

KL Divergence

In the paper in https://arxiv.org/pdf/1904.08364.pdf sec 3.2 it is mentioned:
"We borrow the concept of cross-entropy from information theory, which is designed to measure the “distance” between two probability
distributions."

Wont' kl-divergence be a better way to measure the distance between both probability distributions ?

CUDA_VISIBLE_DEVICES=0 python -u main.py \ 2>&1 | tee $filename

-u代表什么，没有查到

Could you share your trained models?

Compared with CTC, the effect is very poor in Chinese long text

The result of training with fixed length (32 * 280) (the same number of characters (10)) is only good for short text.

HWDB results

Hi, have you experimented on HWDB 2.0-2.2, could you share your results for ACE? Thanks

Can you provide the data generator tool ?

Hi, thanks for the amazing works.

I was wondering that could you provide the tool that you guys have used to generate these toy dataset ?

Thanks you

It doesn't work on my data, why don't you provide a pretrained model?

I pretrained a model using ctcloss and it works well. Then I loaded the weights and continued to train with the aceloss. The losses seemed to be coming down, but the test results were terrible, almost all wrong.

Here is my implementation of ACELoss.

device = torch.device("cuda:" + cfg.TRAIN.GPU_ID if torch.cuda.is_available() else "cpu")
class ACELoss(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, input_, target, target_lens):
        w, bs, num_class = input_.size()
        aggragetions = torch.zeros(bs, cfg.ARCH.NUM_CLASS)
        for i in range(bs):
            idx = 0
            for j in range(target_lens[i]):
                aggragetions[i][target[idx]] += 1
                idx += 1
            aggragetions[i][0] = w - target_lens[i]
        target = aggragetions.to(device)

        input_ = input_ + 1e-10
        input_ = torch.sum(input_, 0)
        input_ = input_ / w
        target = target / w

        loss = (-torch.sum(torch.log(input_) * target)) / bs
        return loss

是否可以用在中文OCR上面呢

Would MSE loss work if i remove the softmax?

loss decline but accuracy near to 0

I train a model(CRNN) base on dataset synth90k, through the loss decline step by step, the accuracy is near to 0 all the time. What casue this problem?

Prediction script

@summerlvsong @whang94 Thank you for your hard work,
Can you post a script that uses the trained model to predict.

Do ACE work? same point about HCTR Result

3C-FCRN+B_SLD+SLD（residual LSTM proposed）get ICDAR CR 97.15 AR 96.50

But ACE just CR 96.70 AR 96.22

so do ACE work in HCTR or not residual LSTM?

Can’t reproduce your results in your cvpr paper

I reproduced crnn+ctc and test it on IIIT5K+SVT+IC03+IC13 test database, got WER 0.153, which is same as the reported results in paper.
I also reproduced crnn+ace loss, but only got WER 0.205 on the same test database, any advise?
My environment:
pytorch 1.2.0
batchsize 60
trained only on 8-million synthetic data released by Jaderberg
iterations 1000k
adadelta rho 0.9

I would like to ask you how to accurately predict the character order of a word.

I recreated your project and found that the input GT was converted into a word list, which had lost its order, and your prediction only provided the number of characters. Only through the two-dimensional matrix position of the network output can barely judge the order, I would like to ask you how to accurately predict the character order of a word.

code about HCTR will be publicly or not?

Table 3. Comparison with previous methods for HCTR. ACE (1D) 91.68 91.25 96.70 96.22
code about this experiment will be be publicly or not?

combined with CTC?

Very nice work,
can this method combine with CTC in 1D case to improve performace further? does it conflict with CTC when training?

Need Help! Loss nan

for this line: torch.log(input)

The 'input' is the softmax score (0-1).
If k-th class does not show in an input, the accumulative softmax score of all time steps for k-th class is very likely to be 0. Then this will result into torch.log(input) = nan.

How do you make sure that 'input' does not equal to 0 for 'torch.log(input)'