Code Monkey home page Code Monkey logo

Comments (8)

xenocrat avatar xenocrat commented on August 16, 2024

Hello there. The problem is this: because of the way tags work, it must be possible to represent a tag uniquely using a URL-safe combination of characters, meaning only ASCII. The code that handles this can substitute some non-ASCII characters for ASCII equivalents (for example, àáâ èéê and so on) but not all.

A tag like а б в г д е ё ж з и й к л м н о п р с т у ф х has all the non-substitutable characters removed and then there's nothing left, but a tag like а б в г д е ё ж з и й к л м н о п р с т у ф х1 will be converted to the URL-safe form -----------------------1.

The best way to solve this is to teach Chyrp Lite how to substitute Cyrillic characters for ASCII. The function that does the substitution is here and there is a comment at the bottom showing you how to generate the correct codes to add substitution keys for additional characters.

For example:

echo implode(",", unpack("C*", "ж"));

I'm happy to receive pull requests, or if you tell me what the substitutions should be, I will add them myself before the next release.

from chyrp-lite.

xenocrat avatar xenocrat commented on August 16, 2024

Would Volapuk encoding be effective here?

from chyrp-lite.

roywikan avatar roywikan commented on August 16, 2024

I use chyrp lite here :
http://baliho.com/tag/Такси-ВезЁт-объявляет-СРОЧНЫЙ/

from chyrp-lite.

xenocrat avatar xenocrat commented on August 16, 2024

@roywikan Did you modify the Tags module? The code here makes that URL impossible to create.

from chyrp-lite.

roywikan avatar roywikan commented on August 16, 2024

Yes, but i am not a good programmer, i just trying to make it works. After I modify that lines, I become worry to upgrade the chryrp lite script. Worry if my modifications will stop to works. Here it is the snippet:

private function mb_is_string_equal_ci($string1, $string2) {
   //hacked by roy
	if((!empty($string1)) && (!empty($string2)))  {
		if( (!(Normalizer::normalize($string1, Normalizer::FORM_KC))) &&  (!(Normalizer::normalize($string2, Normalizer::FORM_KC))) ){
			die("HALTED, reasons: 1.php_intl/php5-intl extension doesnt enabled for tags.php. Check phpinfo, 2. There are at least one tag pair that empty string e.g. \"\",\"\"");}
		else{
			$string1_normalized = Normalizer::normalize($string1, Normalizer::FORM_KC);
			$string2_normalized = Normalizer::normalize($string2, Normalizer::FORM_KC);
			
			return mb_strtolower($string1_normalized) === mb_strtolower($string2_normalized) || mb_strtoupper($string1_normalized) === mb_strtoupper($string2_normalized);
		}
	}else {
			return false;
	}
}



	private function mb_strcasecmp($str1, $str2, $encoding = "UTF-8") {
        $str1 = preg_replace("/[[:punct:]]+/", "", $str1);
        $str2 = preg_replace("/[[:punct:]]+/", "", $str2);

        if (!function_exists("mb_strtoupper"))
            return $this->mb_is_string_equal_ci(strtoupper($str1), strtoupper($str2));

        return $this->mb_is_string_equal_ci(mb_strtoupper($str1, $encoding), mb_strtoupper($str2, $encoding));
    }

feel free to incorporate this to chyrp lite script.

from chyrp-lite.

xenocrat avatar xenocrat commented on August 16, 2024

By modifying the module you take on the burden of maintaining your modifications, but there are no big changes planned for the Tags module so as long as you redo this whenever you upgrade Chyrp Lite, you will be fine.

The reason tag names are converted to ASCII is because there is no agreed behaviour for a URL that contains characters outside the ASCII range. Modern web browsers can cope with it, but if your links are used somewhere that doesn't accept or understand UTF-8 encoding, they might get messed up. It's unfortunate for people using languages other than English, but there's nothing I can do about it without risking incompatibilities.

I have had a few thoughts about how to better support tags that can't be converted to ASCII without having to risk using UTF-8 in URLs. I will investigate further.

from chyrp-lite.

Dextrolain avatar Dextrolain commented on August 16, 2024

@xenocrat thanks a lot, nice solution with MD5 hash. If you think that your first solution with substitutions could also be useful I can still tell you them, but I like the last fix.

from chyrp-lite.

xenocrat avatar xenocrat commented on August 16, 2024

I think this is the neater solution. Substituting à for a is intuitive, but in non-Latin languages it gets murky. The solution of using a hash means the problem is also solved for people using Cyrillic, Hanzi, Arabic etc. without extra work.

Thanks for your feedback and for the question! I wouldn't have considered this problem without it.

from chyrp-lite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.