Comments (3)
I took a quick stab at improving it by replacing https://github.com/BurntSushi/bstr/blob/master/src/unicode/whitespace.rs#L7 with this:
pub fn whitespace_len_fwd(slice: &[u8]) -> usize {
let mut i = 0;
while i < slice.len() && slice[i].is_ascii_whitespace() {
i += 1;
}
if i == slice.len() || slice[i].is_ascii() {
i
} else {
WHITESPACE_ANCHORED_FWD.find_at(slice, i).unwrap_or(i)
}
}
Which helped the improve the benchmarks above by about 30%, although it hurt the existing benchmark by around 10%.
I couldn't quite avoid off by one errors in the _rev
version (and I'm not 100% certain I've avoided them in the _fwd
, tbh — there are probably bugs in the transition between ascii and unicode there). I'm not really sure this is an ideal approach anyway, so I figured I'd just report the issue rather than spend more time debugging it.
from bstr.
@thomcc Thanks for diving into this! I don't quite have the bandwidth to dive into this right now. I will at least do it whenever I get back around to releasing 1.0 in order to minimize context switching. If you do want to submit a PR with your current work, then I think that sounds good to me given the 10% loss but 30% gain. But more broadly, I certainly agree that optimizing for the ASCII case makes sense. I'd be happy to do that even if it makes the fully general Unicode case a good deal worse.
from bstr.
Just to re-iterate my previous comment: I'm generally okay with giving up a small loss for non-ASCII in favor of making the ASCII case much faster.
I'm not going to get to this for 1.0 though, which is okay, since this is an internal detail.
from bstr.
Related Issues (20)
- bstr-bench Cargo.lock out of sync
- Can the MSRV be lowered to 1.57 (or lower)? HOT 3
- Clarify None case in bstr::decode_utf8 HOT 1
- Feature request: `impl Deserialize for Box<BStr>`
- bstr 1.3.0 with `impl AsRef<BStr> for BStr` breaks some folks downstream HOT 1
- candidate versions found which didn't match: 0.2.17, 0.2.16, 0.2.15, ... HOT 5
- Possible panic safety issues in insert_str HOT 2
- Complementary ByteSlice functions addition - find_not_byte / rfind_not_byte HOT 1
- Use clippy in CI? HOT 2
- Intradoc links are broken when building with no default features HOT 3
- re-enable miri tests
- Accept array of str for split_str HOT 1
- remove `Borrow<BStr> for String` impls (and similar) in a semver compatible release HOT 9
- Add unescape_ascii fn HOT 4
- Display implementation doesn't respect Formatter options
- `bstr::Split` should implement clone. HOT 1
- Incorrect Output rfind() HOT 7
- Should the documentation be updated to take into_encoded_bytes and related functions into account?
- Grapheme segmentation is 1.2x-8x slower than `unicode-segmentation` in benchmarks HOT 3
- When stdin is a terminal, for_byte_record_with_terminator() does not exit immediately on Control-D HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bstr.