Code Monkey home page Code Monkey logo

Comments (8)

redsk avatar redsk commented on August 18, 2024 1

I've given a more thorough explanation of the same issue here along with a PR. Maybe you'd like to take the same approach.

from reboot.

farmdawgnation avatar farmdawgnation commented on August 18, 2024 1

Yeah, that's good. I was just making a note here as I worked through figuring out how I want to integrate it. No action needed.

from reboot.

farmdawgnation avatar farmdawgnation commented on August 18, 2024 1

Well, if I understand correctly, the host API would work only for hosts and not for full URLs which of course would be rather inconvenient as the parsing would be needed to be done by the caller.

How inconvenient this is largely depends on the application.

I created this issue because I believe that other users of Dispatch might have the same problem and because I think this library should be able to handle URL normalization properly (not only emojis, also other characters, as illustrated here), which the current url API does not.

Yep, I hear you. I'm not closing the issue, just pointing out that this isn't going to be a quick, drop-in fix like I originally thought. This will take some time to get right.

from reboot.

redsk avatar redsk commented on August 18, 2024

In case you're wondering, java.net.IDN.toASCII(u) fails with

java.lang.IllegalArgumentException: java.text.ParseException: An unassigned code point was found in the input

as it only supports IDNA2003. For emoji IDNA2008 is needed and the icu4j library [1] supports it:

import com.ibm.icu.text.IDNA
val uts46 = IDNA.getUTS46Instance(IDNA.DEFAULT)

val u = "i❤.ws/"
val punycodedDomain = uts46.nameToASCII(u, new java.lang.StringBuilder(), new IDNA.Info()).toString
// punycodedDomain == "xn--i-7iq.ws/"

Apparently, nameToASCII works with domain names and paths but not with protocol, that has to be stripped off before the conversion.

[1] https://mvnrepository.com/artifact/com.ibm.icu/icu4j

from reboot.

farmdawgnation avatar farmdawgnation commented on August 18, 2024

icu4j is using the ICU license. This seems to be a tweaked version of the X11 License which is compatible with our own LGPL.

from reboot.

redsk avatar redsk commented on August 18, 2024

Yeah as you wrote, they use the ICU license which is deemed compatible with GPL. So it should be ok, no?

from reboot.

farmdawgnation avatar farmdawgnation commented on August 18, 2024

Ok, this is a lot more complex than I originally thought looking at the bug report. TL;DR there's no good way for us to support this type of encoding from the url helper because we lean heavily on the URL parsing provided by the JVM by default. That, in turn, is based on a regex that doesn't properly support this type of encoding.

The good news is that there's an existing API which will handle this correctly: host:

@ host("i❤.ws").url
res38: String = "http://xn--i-7iq.ws/"

This API works a bit differently because it doesn't accept a full URL, but that's the same thing that makes the IDN conversion work as expected. For example, to get https you would use:

@ host("i❤.ws").secure.url
res40: String = "https://xn--i-7iq.ws/"

In order to support this from the url API we'd need to re-implement breaking up URLs into their component parts for parsing. I'm willing to investigate that, but it's a much larger project because we've got to be sure we don't incidentally break something else.

Does this unblock your use case for the time being?

from reboot.

redsk avatar redsk commented on August 18, 2024

Well, if I understand correctly, the host API would work only for hosts and not for full URLs which of course would be rather inconvenient as the parsing would be needed to be done by the caller.

My use case is not blocked as I use a different library to detect URLs in strings and the library can do URL normalization, including using ICU (I did the PR), so I simply pass dispatch the normalized URL.

I created this issue because I believe that other users of Dispatch might have the same problem and because I think this library should be able to handle URL normalization properly (not only emojis, also other characters, as illustrated here), which the current url API does not.

from reboot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.