<a href="https://github.com/esp-rs/std-training/blob/f7ce2e7d7afe520c027d836e81e1e50bd

This conversion to UTF-8 string is not safe about std-training HOT 5 CLOSED

pghalliday commented on June 23, 2024

This conversion to UTF-8 string is not safe

from std-training.

Comments (5)

pghalliday commented on June 23, 2024

Ok, so I have been doing more research and have found the valid_up_to method of Utf8Error: https://doc.rust-lang.org/std/str/struct.Utf8Error.html#method.valid_up_to

I think this is the safest way to safely decode a UTF-8 response in chunks I'm likely to come up with at this time.

const BUFFER_SIZE: usize = 256;

fn read_response(mut response: Response<&mut EspHttpConnection>) -> Result<()> {
    // Fixed buffer to read into
    let mut buffer = [0_u8; BUFFER_SIZE];
    // Offset into the buffer to indicate that there are still
    // bytes at the beginning that have not been decoded yet
    let mut offset = 0;
    // Keep track of the total number of bytes read to print later
    let mut total = 0;
    loop {
        // read into the buffer starting at the offset to not overwrite
        // the incomplete UTF-8 sequence we put there earlier
        if let Ok(size) = response.read(&mut buffer[offset..]) {
            if size == 0 {
                // no more bytes to read from the response
                if offset > 0 {
                    bail!("Response ends with an invalid UTF-8 sequence with length: {}", offset)
                }
                break;
            }
            // Update the total number of bytes read
            total += size;
            // remember that we read into an offset and recalculate the
            // real length of the bytes to decode
            let size_plus_offset = size + offset;
            match str::from_utf8(&buffer[..size_plus_offset]) {
                Ok(text) => {
                    // buffer contains fully valid UTF-8 data,
                    // print it and reset the offset to 0
                    println!("{}", text);
                    offset = 0;
                },
                Err(error) => {
                    // buffer contains incomplete UTF-8 data, we will
                    // print the valid part, copy the invalid sequence to
                    // the beginning of the buffer and set an offset for the
                    // next read
                    let valid_up_to = error.valid_up_to();
                    println!("{}", str::from_utf8(&buffer[..valid_up_to])?);
                    buffer.copy_within(valid_up_to.., 0);
                    offset = size_plus_offset - valid_up_to;
                }
            }
        }
    }
    println!("Total: {} bytes", total);
    Ok(())
}

from std-training.

pghalliday commented on June 23, 2024

I just realised that I should also check the error_len from the error: https://doc.rust-lang.org/std/str/struct.Utf8Error.html#method.error_len

This would signify an invalid sequence that needs to be skipped as stated in the linked docs (I did not cover this case in the function above)

from std-training.

pghalliday commented on June 23, 2024

Ok this implementation deals with invalid UTF-8 sequences too, but I'm not sure it's helpful in the context of the training materials.

const BUFFER_SIZE: usize = 256;

struct ResponsePrinter {
    // Fixed buffer to read into
    buffer: [u8; BUFFER_SIZE],
    // Offset into the buffer to indicate that there are still
    // bytes at the beginning that have not been decoded yet
    offset: usize,
}

impl ResponsePrinter {
    fn new() -> ResponsePrinter {
        ResponsePrinter {
            buffer: [0_u8; BUFFER_SIZE],
            offset: 0,
        }
    }

    fn print(&mut self, mut response: Response<&mut EspHttpConnection>) -> Result<()> {
        // Keep track of the total number of bytes read to print later
        let mut total = 0;
        loop {
            // read into the buffer starting at the offset to not overwrite
            // the incomplete UTF-8 sequence we put there earlier
            if let Ok(size) = response.read(&mut self.buffer[self.offset..]) {
                if size == 0 {
                    // no more bytes to read from the response
                    if self.offset > 0 {
                        bail!("Response ends with an invalid UTF-8 sequence with length: {}", self.offset)
                    }
                    break;
                }
                // Update the total number of bytes read
                total += size;
                // recursive print to handle invalid UTF-8 sequences
                self.print_utf8(size)?;
            }
        }
        println!("Total: {} bytes", total);
        Ok(())
    }

    fn print_utf8(&mut self, size: usize) -> Result<()> {
        // remember that we read into an offset and recalculate the
        // real length of the bytes to decode
        let size_plus_offset = size + self.offset;
        match str::from_utf8(&self.buffer[..size_plus_offset]) {
            Ok(text) => {
                // buffer contains fully valid UTF-8 data,
                // print it and reset the offset to 0
                print!("{}", text);
                self.offset = 0;
            },
            Err(error) => {
                // A UTF-8 decode error was encountered, print
                // the valid part and figure out what to do with the rest
                let valid_up_to = error.valid_up_to();
                print!("{}", str::from_utf8(&self.buffer[..valid_up_to])?);
                if let Some(error_len) = error.error_len() {
                    // buffer contains invalid UTF-8 data, print a replacement
                    // character then copy the remainder (probably valid) to the
                    // beginning of the buffer, reset the offset and deal with
                    // the remainder in a recursive call to print_utf8
                    print!("{}", char::REPLACEMENT_CHARACTER);
                    let valid_after = valid_up_to + error_len;
                    self.buffer.copy_within(valid_after.., 0);
                    self.offset = 0;
                    return self.print_utf8(size_plus_offset - valid_after);
                } else {
                    // buffer contains incomplete UTF-8 data, copy the invalid
                    // sequence to the beginning of the buffer and set an offset
                    // for the next read
                    self.buffer.copy_within(valid_up_to.., 0);
                    self.offset = size_plus_offset - valid_up_to;
                }
            }
        }
        Ok(())
    }
}

from std-training.

SergioGasquez commented on June 23, 2024

Hi! Thanks for opening the issue and sharing your findings on the topic! Would you mind opening a PR with your solution? For the purpose of the training, I would keep it as simple as possible as the main point of the exercise is the HTTP request.

from std-training.

pghalliday commented on June 23, 2024

I'll cut it down to make it more palatable but at least safe for the happy path of valid utf-8 :)

from std-training.

This conversion to UTF-8 string is not safe about std-training HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent