skeeto / utf-7 Goto Github PK
View Code? Open in Web Editor NEWUTF-7 encoder and decoder in ANSI C
License: The Unlicense
UTF-7 encoder and decoder in ANSI C
License: The Unlicense
Is it possible that the ctx->bits != 0, when the UTF7_F_OPEN flag is set in the function utf7_encode() ?:
/* Start encoding if not already */
if (!(ctx->flags & UTF7_F_OPEN))
{
if (!ctx->len)
return UTF7_FULL;
ctx->flags &= ~UTF7_F_USED;
ctx->flags |= UTF7_F_OPEN; // is it possible here that ctx->bits != 0 ?
*ctx->buf++ = 0x2b; /* '+' */
ctx->len--;
}
What is the utility of UTF7_F_USED remaining set after UTF7_F_OPEN has been cleared ?
utf7_close(UTF7CONTEXT *ctx, int next)
{
if (ctx->flags & UTF7_F_OPEN) {
/* An encoding is currently open */
if (!ctx->len)
return UTF7_FULL;
/* Flush remaining bits */
if (ctx->bits) {
int a = (ctx->accum << (6 - ctx->bits)) & 0x3fUL;
*ctx->buf++ = utf7_base64e(a);
ctx->len--;
ctx->bits = 0;
}
/* Close the encoding */
if (next == 0x2d || utf7_base64d(next) != -1) //if '-' or one of the Base64 chars
{
if (!ctx->len)
return UTF7_FULL;
*ctx->buf++ = 0x2d; /* '-' */
ctx->len--;
}
ctx->flags &= ~UTF7_F_OPEN; //Why isn't UTF7_F_USED cleared here, too ?
}
return UTF7_OK;
While encoding the following 3 Unicode codepoints:
U+0000 // An indirectly encodeable codepoint
U+0000 // An indirectly encodeable codepoint
U+002A // A directly encodeable codepoint
...with the ctx.len == 7
The utf7_encode() writes the following string into the output buffer and returns UTF7_FULL because the buffer has ended (as expected):
+AAAAAA
In my particular application, a full buffer means that I must finalize encoding as soon as possible, so I consume the string "+AAAAAA" which is already in the output buffer and call:
ctx.buf = &newbuf[0];
ctx.len = 7;
utf7_encode(&ctx, UTF7_FLUSH);
Now the problem is that utf7_encode(&ctx, UTF7_FLUSH) returns UTF7_OK, but does not write anything into the output buffer and consequently the last Unicode codepoint (U+002A) is not encoded and lost.
Not only is one codepoint lost but when I later append a second valid UTF7 string (from another application) to the "+AAAAAA" output, then the concatenated string cannot bet decoded correctly when the appended string begins with one of the Base64 characters: [A-Z] or [a-z] or [0-9] or 0x2b or 0x2f.
The documentation says NOTHING about NOT calling utf7_encode(&ctx, UTF7_FLUSH) after a previous call returns UTF7_FULL.
How should I use this code, when I want to finalize encoding as soon as the buffer is full, in such manner that later I am able to append an ARBITRARY valid UTF7 string to the UTF7 output without problems?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.