Code Monkey home page Code Monkey logo

Comments (5)

pghalliday avatar pghalliday commented on June 23, 2024

Ok, so I have been doing more research and have found the valid_up_to method of Utf8Error: https://doc.rust-lang.org/std/str/struct.Utf8Error.html#method.valid_up_to

I think this is the safest way to safely decode a UTF-8 response in chunks I'm likely to come up with at this time.

const BUFFER_SIZE: usize = 256;

fn read_response(mut response: Response<&mut EspHttpConnection>) -> Result<()> {
    // Fixed buffer to read into
    let mut buffer = [0_u8; BUFFER_SIZE];
    // Offset into the buffer to indicate that there are still
    // bytes at the beginning that have not been decoded yet
    let mut offset = 0;
    // Keep track of the total number of bytes read to print later
    let mut total = 0;
    loop {
        // read into the buffer starting at the offset to not overwrite
        // the incomplete UTF-8 sequence we put there earlier
        if let Ok(size) = response.read(&mut buffer[offset..]) {
            if size == 0 {
                // no more bytes to read from the response
                if offset > 0 {
                    bail!("Response ends with an invalid UTF-8 sequence with length: {}", offset)
                }
                break;
            }
            // Update the total number of bytes read
            total += size;
            // remember that we read into an offset and recalculate the
            // real length of the bytes to decode
            let size_plus_offset = size + offset;
            match str::from_utf8(&buffer[..size_plus_offset]) {
                Ok(text) => {
                    // buffer contains fully valid UTF-8 data,
                    // print it and reset the offset to 0
                    println!("{}", text);
                    offset = 0;
                },
                Err(error) => {
                    // buffer contains incomplete UTF-8 data, we will
                    // print the valid part, copy the invalid sequence to
                    // the beginning of the buffer and set an offset for the
                    // next read
                    let valid_up_to = error.valid_up_to();
                    println!("{}", str::from_utf8(&buffer[..valid_up_to])?);
                    buffer.copy_within(valid_up_to.., 0);
                    offset = size_plus_offset - valid_up_to;
                }
            }
        }
    }
    println!("Total: {} bytes", total);
    Ok(())
}

from std-training.

pghalliday avatar pghalliday commented on June 23, 2024

I just realised that I should also check the error_len from the error: https://doc.rust-lang.org/std/str/struct.Utf8Error.html#method.error_len

This would signify an invalid sequence that needs to be skipped as stated in the linked docs (I did not cover this case in the function above)

from std-training.

pghalliday avatar pghalliday commented on June 23, 2024

Ok this implementation deals with invalid UTF-8 sequences too, but I'm not sure it's helpful in the context of the training materials.

const BUFFER_SIZE: usize = 256;

struct ResponsePrinter {
    // Fixed buffer to read into
    buffer: [u8; BUFFER_SIZE],
    // Offset into the buffer to indicate that there are still
    // bytes at the beginning that have not been decoded yet
    offset: usize,
}

impl ResponsePrinter {
    fn new() -> ResponsePrinter {
        ResponsePrinter {
            buffer: [0_u8; BUFFER_SIZE],
            offset: 0,
        }
    }

    fn print(&mut self, mut response: Response<&mut EspHttpConnection>) -> Result<()> {
        // Keep track of the total number of bytes read to print later
        let mut total = 0;
        loop {
            // read into the buffer starting at the offset to not overwrite
            // the incomplete UTF-8 sequence we put there earlier
            if let Ok(size) = response.read(&mut self.buffer[self.offset..]) {
                if size == 0 {
                    // no more bytes to read from the response
                    if self.offset > 0 {
                        bail!("Response ends with an invalid UTF-8 sequence with length: {}", self.offset)
                    }
                    break;
                }
                // Update the total number of bytes read
                total += size;
                // recursive print to handle invalid UTF-8 sequences
                self.print_utf8(size)?;
            }
        }
        println!("Total: {} bytes", total);
        Ok(())
    }

    fn print_utf8(&mut self, size: usize) -> Result<()> {
        // remember that we read into an offset and recalculate the
        // real length of the bytes to decode
        let size_plus_offset = size + self.offset;
        match str::from_utf8(&self.buffer[..size_plus_offset]) {
            Ok(text) => {
                // buffer contains fully valid UTF-8 data,
                // print it and reset the offset to 0
                print!("{}", text);
                self.offset = 0;
            },
            Err(error) => {
                // A UTF-8 decode error was encountered, print
                // the valid part and figure out what to do with the rest
                let valid_up_to = error.valid_up_to();
                print!("{}", str::from_utf8(&self.buffer[..valid_up_to])?);
                if let Some(error_len) = error.error_len() {
                    // buffer contains invalid UTF-8 data, print a replacement
                    // character then copy the remainder (probably valid) to the
                    // beginning of the buffer, reset the offset and deal with
                    // the remainder in a recursive call to print_utf8
                    print!("{}", char::REPLACEMENT_CHARACTER);
                    let valid_after = valid_up_to + error_len;
                    self.buffer.copy_within(valid_after.., 0);
                    self.offset = 0;
                    return self.print_utf8(size_plus_offset - valid_after);
                } else {
                    // buffer contains incomplete UTF-8 data, copy the invalid
                    // sequence to the beginning of the buffer and set an offset
                    // for the next read
                    self.buffer.copy_within(valid_up_to.., 0);
                    self.offset = size_plus_offset - valid_up_to;
                }
            }
        }
        Ok(())
    }
}

from std-training.

SergioGasquez avatar SergioGasquez commented on June 23, 2024

Hi! Thanks for opening the issue and sharing your findings on the topic! Would you mind opening a PR with your solution? For the purpose of the training, I would keep it as simple as possible as the main point of the exercise is the HTTP request.

from std-training.

pghalliday avatar pghalliday commented on June 23, 2024

I'll cut it down to make it more palatable but at least safe for the happy path of valid utf-8 :)

from std-training.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.