Code Monkey home page Code Monkey logo

Comments (6)

Links2004 avatar Links2004 commented on June 19, 2024 1

the lib / broadcastTXT does not alter any Strings send.
as a result any invalide UTF-8 or control codes that you put in the function will be send as is.
in your case sanitizeing your input is the way to go.

yes String will be adding the needed null termination automatically at the end,
any other type that you can use with broadcastTXT will need a 0x00 at the end.

to simply remove any non Printable chars you can try

String sanitizeToUTF8(const String &input) {
  String sanitized = "";
  for (size_t i = 0; i < input.length(); i++) {
    char c = input.charAt(i);
    if (isPrintable(c)) { // remove any non Printable chars
      sanitized += c;
    }
  }
return sanitized;
}

for other filters take a look at https://www.arduino.cc/reference/en/ "Characters"
or
https://github.com/esp8266/Arduino/blob/f2da54d3a23bff4ca481d058b61c2b90a77ab3f1/cores/esp8266/WCharacter.h#L91-L93

from arduinowebsockets.

Links2004 avatar Links2004 commented on June 19, 2024

do you have UTF-8 in you string / char array that you send via broadcastTXT?
broadcastTXT does not alter you text / data by adding random strings.
are you sure you data is 0x00 terminated?

the length is detected via strlen if you use use broadcastTXT with out the length parameter.

*/
bool WebSocketsServerCore::broadcastTXT(uint8_t * payload, size_t length, bool headerToPayload) {
WSclient_t * client;
bool ret = true;
if(length == 0) {
length = strlen((const char *)payload);
}

my guess is that the 0x00 is missing at the end of the string and as a result gabage from the RAM until the next 0x00 is send too, but its hard to say without seeing the code.

from arduinowebsockets.

VaAndCob avatar VaAndCob commented on June 19, 2024

do you have UTF-8 in you string / char array that you send via broadcastTXT? broadcastTXT does not alter you text / data by adding random strings. are you sure you data is 0x00 terminated?

the length is detected via strlen if you use use broadcastTXT with out the length parameter.

https://github.com/Links2004/arduinoWebSockets/blob/751cf87b6cd684c9d339f0314a18b0ee866d449c/src/WebSocketsServer.cpp#L186-L192

my guess is that the 0x00 is missing at the end of the string and as a result gabage from the RAM until the next 0x00 is send too, but its hard to say without seeing the code.

here is the code:

String content ="";
  content.concat("\nSong  | ");
  content.concat(song_title);
  content.concat("\nArtist| ");
  content.concat(artist_name);
  webSocket.broadcastTXT(content);

Just simple like this. I notice there are terminate 0x00 characters at the end of variable song_title and/or artist_name
This trigger text frame decoding error so often. but when I add sanitize code by removing terminate 0x00, it's now error rarely occurs but still happens sometimes.

here is sanitize code to remove terminate 0x00

String sanitizeToUTF8(const String &input) {
  String sanitized = "";
  for (size_t i = 0; i < input.length(); i++) {
    //uint8_t c = input[i];
    char c = input.charAt(i);
    if (c != '\0') { //remove NULL
      sanitized += c;
    }
  }
  return sanitized;
}

song_title = sanitizeToUTF8(song_title);
artist_name = sanitizeToUTF8(artist_name=;

How can I send a string (around 360 bytes in length) and handle UTF-8 encoding?

I dig into Websockets.cpp the library handles UTF-8 encoding, text frame but not sure if I have to include the null byte 0x00 to the payload (a string being sent) manually.

as I read the document.

  1. "If you're working with Arduino's String class, it automatically manages null-terminated strings internally, so you don't need to manually add a null terminator. "

  2. " when sending a text message in the WebSocket protocol, you need to include the null byte (0x00) at the end of the payload to properly terminate the message. This is a crucial part of correctly framing WebSocket messages."

this makes me so confused. I'd try adding null as the last character before sending with WebSocket.broadcastTXT like this
content += ('\0');

Thank you

from arduinowebsockets.

VaAndCob avatar VaAndCob commented on June 19, 2024

it works very well, no more errors, but.... it also eliminates all Asian letters. because I wanna send Artist and Track TItle over WebSocket, but only English letters can be sent,
Could u please advise me? Thank you

from arduinowebsockets.

Links2004 avatar Links2004 commented on June 19, 2024

its not as easy since arduino is not really desigend for UTF-8.
do you control the client side too?
if yes send the data as base64 and deal with UTF-8 on the client side, the browser knows better what UTF-8 is ;)

bsong_title = base64.encode(song_title);

https://github.com/esp8266/Arduino/blob/master/cores/esp8266/base64.h

or switch to broadcastBIN which will send any data.

from arduinowebsockets.

VaAndCob avatar VaAndCob commented on June 19, 2024

Let me update:
now everything works like a charm, nice font for every language, correct indent ,

I have tried base64 data but have had no luck, the library has duplicated references. cannot compile. So I try another way by

I encode the string to UTF-8 with this function. the output is 3 HEX byte that represents 1 character. and broadcast as text.

String encodeToUTF8(const String &input) {
  String sanitized = "";
  for (size_t i = 0; i < input.length()-1; i++) {//remove suffix 00 by minus length with 1
    char c = input.charAt(i);
    if (isPrintable(c) && c != '\0') { // Remove NULL and printable characters
      sanitized += c;//skip ascii char
    } else {
      char hexString[3];   // 2 characters for the hex representation + 1 for the null terminator
      snprintf(hexString, sizeof(hexString), "%02X", (unsigned char)c); // Convert character to hex and store in hexString
      sanitized += hexString;
    }
  }
  return sanitized;
}

the client side I add UTF-8 decoding back to characters. and display. a bit slow but everything works well.

//Function decodeUTF-8 to character E0E881 -> "ก"
function decodeAndReplaceUTF8(inputString) {
  const regex = /([0-9A-Fa-f]{6})/g;
  const decodedString = inputString.replace(regex, (match, hexBytes) => {
    const byte1 = parseInt(hexBytes.substr(0, 2), 16);
    const byte2 = parseInt(hexBytes.substr(2, 2), 16);
    const byte3 = parseInt(hexBytes.substr(4, 2), 16);
    const utf8Character = String.fromCharCode(((byte1 & 0x0F) << 12) | ((byte2 & 0x3F) << 6) | (byte3 & 0x3F));
    return utf8Character;
  });
  return decodedString;

In conclusion, it's like all characters were sent as printable (readable) ASCII characters. so there is no trouble with decoding anymore.

Thank you for point me out the way.

from arduinowebsockets.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.