Comments (12)
also invalid utf8 encoded messages should be rejected (see src/xmpp_callbacks:36 and https://github.com/diml/zed/blob/master/src/zed_utf8.mli#L37 for a possible solution)
from jackline.
done in 5cf47e6 6aa60d8 and presence filtering in 63db57c
from jackline.
Seems quite messy, it seems to me that it should be the task of the underlying xml library to check for encoding validity of all the data you get so that once you have passed that boundary you are sure that all the data is trusted to be UTF-8 valid.
from jackline.
- yes, I'd love to find the time to use xmlm and your unicode libraries instead (in zed/lambda-term/XMPP)
- the problem arises now: we received an xml message from the server, which contains base64 encoded encrypted text --- thus we need after decryption (via Otr.Handshake.handle) to validate the resulting data (this is done in the first two commits mentioned above)
- in general, OTR data might include binary data, thus doing utf8 validation within OTR is not a viable solution
from jackline.
Ah ok, makes sense didn't think about the b64 bits.
from jackline.
and I'm actually not entirely sure whether 63db57c is needed or guaranteed...
from jackline.
Well if that concerns the output of the xml decoder you use; I'd keep it since it seems to me that this will input invalid UTF-8: a quick look seems to indicate that it doesn't check subsequent bytes for correct range; see e.g. the case for '\240'..'\247'
, which treats all 4 bytes encoded coded points and compare to table 3-7 in this document. This is what is used by Xmlm which should be, once I get the time, be rewritten on top of the similar code in Uutf. The latter decoder has a proof of correctness by exhaustiveness and computer heat (that's the reason why the test is commented) on a representation of the specification.
from jackline.
thx @dbuenzli for investigation... and yes, I intend to switch over to your xmlm (whenever I find some more spare time for this)
from jackline.
reopening: escape characters are not handled, and to remind myself that most of the xml from the server is used unchecked (everything apart from the parts a user can set - presence and messages)
from jackline.
so the decrypted messages are being checked here - the xml parser pushes 0..127 through
from jackline.
to wrap up from #42 - we should use 0xfffd (unicode replacement character) for control sequences instead of ? (to distinguish from real ? received over the wire). or maybe just error out (as done with invalid utf8 sequences). this replacement should be done at a single place and not spread over multiple places.
plan is to replace erm_xmpp at some point - and use decent xml parser and unicode libraries... I'm currently busy with other things, but have this on my plan (most likely before any public release)
from jackline.
I changed validate_utf8 to work on UChars instead, and made use of the 0xFFfd character for unknown characters. As mentioned in the pull request, I'd love to see this check for non-printable characters in general at some point -- perhaps after we change xml lib.
Pull request: #43
from jackline.
Related Issues (20)
- should /clear also erase input history? HOT 3
- Compiling fails after upgrade to tls 0.9.0 HOT 3
- Paste confirmation
- XEPs and RFCs support page with version
- XEP-0479: XMPP Compliance Suites 2023 HOT 1
- key repetition in macOS terminal HOT 7
- inconsistent names for contacts HOT 1
- no echo after quitting jackline in macOS HOT 6
- feature request: sort by recently used
- Re connect loop HOT 3
- Link to aspcud is 404 HOT 4
- Compilation fails on macOS HOT 2
- Build error using opam on FreeBSD HOT 2
- Buddylist does not display proper status of contacts when JID contains unicode HOT 1
- Build fails on MacOS Mojave HOT 2
- DNS resolver issues HOT 2
- .ocamlinit for development environment no longer works HOT 1
- IPv6 address support HOT 7
- Can't remove unauthorized roster entry of non-existing jid HOT 1
- allow to avoid presence notifications, but keep buddies in buddy list HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jackline.