skeeto / utf-7 Goto Github PK

UTF-7 encoder and decoder in ANSI C

License: The Unlicense

Makefile 2.17% C 97.83%

utf-7's Issues

Can ctx.bits be != 0 when the UTF7_F_OPEN flag is set ?

Is it possible that the ctx->bits != 0, when the UTF7_F_OPEN flag is set in the function utf7_encode() ?:

/* Start encoding if not already */
if (!(ctx->flags & UTF7_F_OPEN)) 
{
  if (!ctx->len)
    return UTF7_FULL;
  ctx->flags &= ~UTF7_F_USED;
  ctx->flags |= UTF7_F_OPEN;        // is it possible here that ctx->bits != 0 ?
  *ctx->buf++ = 0x2b; /* '+' */
  ctx->len--;
}

Why isn't UTF7_F_USED cleared in utf7_close() ?

What is the utility of UTF7_F_USED remaining set after UTF7_F_OPEN has been cleared ?

utf7_close(UTF7CONTEXT *ctx, int next)
{
    if (ctx->flags & UTF7_F_OPEN) {
        /* An encoding is currently open */
        if (!ctx->len)
            return UTF7_FULL;

        /* Flush remaining bits */
        if (ctx->bits) {
            int a = (ctx->accum << (6 - ctx->bits)) & 0x3fUL;
            *ctx->buf++ = utf7_base64e(a);
            ctx->len--;
            ctx->bits = 0;
        }

        /* Close the encoding */
        if (next == 0x2d || utf7_base64d(next) != -1)  //if '-' or one of the Base64 chars
		{
            if (!ctx->len)
                return UTF7_FULL;
            *ctx->buf++ = 0x2d; /* '-' */
            ctx->len--;
        }
        ctx->flags &= ~UTF7_F_OPEN;  //Why isn't UTF7_F_USED cleared here, too ?
    }
    return UTF7_OK;

Problems with calling UTF7_FLUSH after utf7_encode() returns UTF7_FULL

While encoding the following 3 Unicode codepoints:
U+0000 // An indirectly encodeable codepoint
U+0000 // An indirectly encodeable codepoint
U+002A // A directly encodeable codepoint

...with the ctx.len == 7

The utf7_encode() writes the following string into the output buffer and returns UTF7_FULL because the buffer has ended (as expected):
+AAAAAA

In my particular application, a full buffer means that I must finalize encoding as soon as possible, so I consume the string "+AAAAAA" which is already in the output buffer and call:
ctx.buf = &newbuf[0];
ctx.len = 7;
utf7_encode(&ctx, UTF7_FLUSH);

Now the problem is that utf7_encode(&ctx, UTF7_FLUSH) returns UTF7_OK, but does not write anything into the output buffer and consequently the last Unicode codepoint (U+002A) is not encoded and lost.

Not only is one codepoint lost but when I later append a second valid UTF7 string (from another application) to the "+AAAAAA" output, then the concatenated string cannot bet decoded correctly when the appended string begins with one of the Base64 characters: [A-Z] or [a-z] or [0-9] or 0x2b or 0x2f.

The documentation says NOTHING about NOT calling utf7_encode(&ctx, UTF7_FLUSH) after a previous call returns UTF7_FULL.

How should I use this code, when I want to finalize encoding as soon as the buffer is full, in such manner that later I am able to append an ARBITRARY valid UTF7 string to the UTF7 output without problems?

skeeto / utf-7 Goto Github PK

utf-7's People

Contributors

Stargazers

Watchers

Forkers

utf-7's Issues

Can ctx.bits be != 0 when the UTF7_F_OPEN flag is set ?

Why isn't UTF7_F_USED cleared in utf7_close() ?

Problems with calling UTF7_FLUSH after utf7_encode() returns UTF7_FULL

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent