Comments (5)
Ok, so I have been doing more research and have found the valid_up_to
method of Utf8Error
: https://doc.rust-lang.org/std/str/struct.Utf8Error.html#method.valid_up_to
I think this is the safest way to safely decode a UTF-8 response in chunks I'm likely to come up with at this time.
const BUFFER_SIZE: usize = 256;
fn read_response(mut response: Response<&mut EspHttpConnection>) -> Result<()> {
// Fixed buffer to read into
let mut buffer = [0_u8; BUFFER_SIZE];
// Offset into the buffer to indicate that there are still
// bytes at the beginning that have not been decoded yet
let mut offset = 0;
// Keep track of the total number of bytes read to print later
let mut total = 0;
loop {
// read into the buffer starting at the offset to not overwrite
// the incomplete UTF-8 sequence we put there earlier
if let Ok(size) = response.read(&mut buffer[offset..]) {
if size == 0 {
// no more bytes to read from the response
if offset > 0 {
bail!("Response ends with an invalid UTF-8 sequence with length: {}", offset)
}
break;
}
// Update the total number of bytes read
total += size;
// remember that we read into an offset and recalculate the
// real length of the bytes to decode
let size_plus_offset = size + offset;
match str::from_utf8(&buffer[..size_plus_offset]) {
Ok(text) => {
// buffer contains fully valid UTF-8 data,
// print it and reset the offset to 0
println!("{}", text);
offset = 0;
},
Err(error) => {
// buffer contains incomplete UTF-8 data, we will
// print the valid part, copy the invalid sequence to
// the beginning of the buffer and set an offset for the
// next read
let valid_up_to = error.valid_up_to();
println!("{}", str::from_utf8(&buffer[..valid_up_to])?);
buffer.copy_within(valid_up_to.., 0);
offset = size_plus_offset - valid_up_to;
}
}
}
}
println!("Total: {} bytes", total);
Ok(())
}
from std-training.
I just realised that I should also check the error_len
from the error: https://doc.rust-lang.org/std/str/struct.Utf8Error.html#method.error_len
This would signify an invalid sequence that needs to be skipped as stated in the linked docs (I did not cover this case in the function above)
from std-training.
Ok this implementation deals with invalid UTF-8 sequences too, but I'm not sure it's helpful in the context of the training materials.
const BUFFER_SIZE: usize = 256;
struct ResponsePrinter {
// Fixed buffer to read into
buffer: [u8; BUFFER_SIZE],
// Offset into the buffer to indicate that there are still
// bytes at the beginning that have not been decoded yet
offset: usize,
}
impl ResponsePrinter {
fn new() -> ResponsePrinter {
ResponsePrinter {
buffer: [0_u8; BUFFER_SIZE],
offset: 0,
}
}
fn print(&mut self, mut response: Response<&mut EspHttpConnection>) -> Result<()> {
// Keep track of the total number of bytes read to print later
let mut total = 0;
loop {
// read into the buffer starting at the offset to not overwrite
// the incomplete UTF-8 sequence we put there earlier
if let Ok(size) = response.read(&mut self.buffer[self.offset..]) {
if size == 0 {
// no more bytes to read from the response
if self.offset > 0 {
bail!("Response ends with an invalid UTF-8 sequence with length: {}", self.offset)
}
break;
}
// Update the total number of bytes read
total += size;
// recursive print to handle invalid UTF-8 sequences
self.print_utf8(size)?;
}
}
println!("Total: {} bytes", total);
Ok(())
}
fn print_utf8(&mut self, size: usize) -> Result<()> {
// remember that we read into an offset and recalculate the
// real length of the bytes to decode
let size_plus_offset = size + self.offset;
match str::from_utf8(&self.buffer[..size_plus_offset]) {
Ok(text) => {
// buffer contains fully valid UTF-8 data,
// print it and reset the offset to 0
print!("{}", text);
self.offset = 0;
},
Err(error) => {
// A UTF-8 decode error was encountered, print
// the valid part and figure out what to do with the rest
let valid_up_to = error.valid_up_to();
print!("{}", str::from_utf8(&self.buffer[..valid_up_to])?);
if let Some(error_len) = error.error_len() {
// buffer contains invalid UTF-8 data, print a replacement
// character then copy the remainder (probably valid) to the
// beginning of the buffer, reset the offset and deal with
// the remainder in a recursive call to print_utf8
print!("{}", char::REPLACEMENT_CHARACTER);
let valid_after = valid_up_to + error_len;
self.buffer.copy_within(valid_after.., 0);
self.offset = 0;
return self.print_utf8(size_plus_offset - valid_after);
} else {
// buffer contains incomplete UTF-8 data, copy the invalid
// sequence to the beginning of the buffer and set an offset
// for the next read
self.buffer.copy_within(valid_up_to.., 0);
self.offset = size_plus_offset - valid_up_to;
}
}
}
Ok(())
}
}
from std-training.
Hi! Thanks for opening the issue and sharing your findings on the topic! Would you mind opening a PR with your solution? For the purpose of the training, I would keep it as simple as possible as the main point of the exercise is the HTTP request.
from std-training.
I'll cut it down to make it more palatable but at least safe for the happy path of valid utf-8 :)
from std-training.
Related Issues (20)
- Use mdBook anchors instead of duplicating code
- Common crates HOT 1
- Async material
- ESP32-C6 and ZigBee examples HOT 4
- The MQTT lesson is not for ESP32? HOT 3
- Running "hardware-check" reports "This script was called from a virtual environment, can not create a virtual environment again" error HOT 2
- Compilation error when building hardware-check example for ESP32 HOT 2
- Building `esp-idf-sys` failed because of uninitialized submodules in esp-idf HOT 3
- error: unrecognized command-line option '--target=riscv32imc_zicsr_zifencei-esp-espidf HOT 2
- How to modify the configuration of SDK,such as the port port port rate HOT 4
- esp32c2 uses 26M crystal oscillator, resulting in delay errors HOT 2
- How to save and pass 'PinDriver' objects HOT 3
- WIFI Example does not compile with the esp_idf_svc crate recommended in the `esp-rs` book HOT 6
- How create multiple asynchronous tasks,task have "loop{}" HOT 3
- 'None' cannot be used for the 'Option' type in 'SpiDeviceDriver::new_single()' HOT 1
- Button interrupt on edge fired repeatedly
- Guide on interoperability with C
- hardware-check: Segment 0 0x3c090020-0x3c09fff8 invalid: bad load address range HOT 5
- RGB LED is not flashing if esp-idf-svc is 0.48.* HOT 3
- Does esp-rs support speech recognition now, or can I use the existing esp-sr?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from std-training.