fefit / visdom Goto Github PK
View Code? Open in Web Editor NEWA library use jQuery like API for html parsing & node selecting & node mutation, suitable for web scraping and html confusion.
License: MIT License
A library use jQuery like API for html parsing & node selecting & node mutation, suitable for web scraping and html confusion.
License: MIT License
测试中发现,好像没有提供jquery中对某个元素拷贝的接口,目前的cloned
不是深拷贝,对返回的值进行操作依然会作用到之前的元素上
非常好Rust库。
在使用过程中,遇到一个问题:遍历所有节点时,希望能获取节点的名称,从源码看有tag_name(),但没看到可调用的接口。
let html = r#"
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>.eq</title>
</head>
<body>
<ul id="menu">
<li class="item-1">item-1</li>
<li class="item-2">item-2</li>
<li class="item-3">item-3</li>
</ul>
</body>
</html>
"#;
let root = Vis::load(html)?;
root.each(|_index, ele|{
println!("tag:{}", ele.tag_name()); // 如何获取tag_name
true
})
Hello, is there a way to navigate sideway with find
method?
use std::ops::Index;
use visdom::{Vis, html};
use visdom::types::{BoxDynError, Elements};
fn main() -> Result<(), BoxDynError> {
let html = r##"
<div class='main'>
<p>abc</p>
<p>def</p>
</div>
<div class='main'>
<p>ghi</p>
<p>jkl</p>
</div>
"##;
let doc = Vis::load(html)?;
let div = doc.find("div.main");
let div_length = div.length();
div.each(|index, ele| {
if index < div_length {
let p = ele.find("p");
return true;
} false
});
Ok(())
}
# error[E0599]: the method `find` exists for mutable reference `&mut Box<dyn visdom::mesdoc::interface::element::IElementTrait>`, but its trait bounds were not satisfied
for div.each
, ele.find("p")
is not working. Also, why does .each
need index
?
use visdom::{types::BoxDynError, Vis};
fn main()->Result<(),BoxDynError> {
const HTML:&str = r#"<div id="contain"><p class="item"></p></div>"#;
let mut v = Vis::load(HTML)?;
v.set_attr("class", Some("ppp"));
println!("{}",v.outer_html());
Ok(())
}
对比Jquery
let d = $(`<div id="contain"><p class="item"></p></div>`);
d.addClass("ppp");
console.log(d[0].outerHTML)
visdom 需要通过v.find("#contain").set_attr("class", Some("ppp"));
手动选择元素才能设置成功
在一个loop中,使用了let doc = Vis::load(&text),doc.find()之后,调用了doc.remove(),text.clear()
内存仍会以1~2M的速度持续增加
let html = r##"
<!doctype html>
<html>
<body>
<select>
<option value="1">1</option>
<option value="2" selected="true">2</option>
<option value="3">3</option>
</select>
</body>
</html>
"##;
let doc = Vis::load(html)?;
let select = doc.find("#select");
println!("select value is {:?}", select.text());
// select value is "\n 1\n \n 2\n \n 3\n \n "
如何获取select选中的值?当前代码希望获取为”2”
my cargo.toml
visdom = { version = "0.5.8", features = ["insertion", "full"] }
the function do not work
fn get_images(root: &mut Elements, no: &str) {
root.find("img").for_each(|idx, img| {
if let Some(src) = img.get_attribute("data-src") {
let src = src.to_string();
let re = Regex::new("wx_fmt=([^&]*)").unwrap();
if let Some(cap) = re.captures(&src) {
if let Some(fmt) = cap.get(1) {
let ext = fmt.as_str();
let fname = format!("{no}-{idx}.{ext}");
let new_img = format!("");
// let new_img = Vis::load(new_img).unwrap();
// println!("{}", new_img.outer_html());
// let it = new_img.get_ref().iter().next().unwrap();
// println!("{}", it.as_ref().outer_html());
img.set_html(&new_img);
// img.as_mut().replace_with(&it);
// img.replace_with(it);
println!("1. {}", img.outer_html());
println!("2. {}", img.html());
}
}
}
true
});
()
}
Is opposite of has()
possible already? I tried not()
, but that seems to check the root of element than the children.
Hi i wanted to know if this supports xpath selectors like this "//h1[contains(text(),'Search Results')]/following-sibling::div[1]/div"
?
我Google了好几天,看过html5ver/craper/lol html之类的。他们的用法非常的复杂。
原本C# 有自己的Document可以直接使用,到了Rust就找不到了,最后在简书发现了作者的项目。
真是非常棒的项目,希望能多点star,希望作者能一直维护下去(Rust好多项目已经没人维护了,悲
最后,附上诚挚的感谢!
Hello, I'd like to get all elements with html tags and attribute. The currently available methods are .html() and .outer_html(). However, these methods only target the first element. The .text() gets all the elements in plain text. Is there a way to achieve this purpose with html?
Hi there,
the APIs of htmlentities
and therefore of rphtml
changed. This leads to compile errors. From a fresh checkout:
error[E0432]: unresolved imports `rphtml::entity::encode_chars`, `rphtml::entity::EntitySet`
--> src/lib.rs:32:19
|
32 | entity::{encode, encode_chars, EncodeType, EntitySet},
| ^^^^^^^^^^^^ ^^^^^^^^^ no `EntitySet` in `entity`
| |
| no `encode_chars` in `entity`
|
help: a similar name exists in the module
|
32 | entity::{encode, encode_char, EncodeType, EntitySet},
| ~~~~~~~~~~~
help: a similar name exists in the module
|
32 | entity::{encode, encode_chars, EncodeType, Entity},
| ~~~~~~
error[E0308]: arguments to this function are incorrect
--> src/lib.rs:250:21
|
250 | let content = encode(content, EntitySet::Html, EncodeType::NamedOrDecimal);
| ^^^^^^ ------- -------------------------- expected `&CharacterSet`, found enum `EncodeType`
| |
| expected slice `[u8]`, found `str`
|
= note: expected reference `&[u8]`
found reference `&str`
note: function defined here
--> /Users/ckruse/.cargo/registry/src/github.com-1ecc6299db9ec823/htmlentity-1.3.1/src/entity.rs:1126:8
|
1126 | pub fn encode<'a>(
| ^^^^^^
error[E0308]: mismatched types
--> src/lib.rs:251:50
|
251 | let mut text_node = Node::create_text_node(content, None);
| ---------------------- ^^^^^^^ expected struct `Vec`, found struct `EncodedData`
| |
| arguments to this function are incorrect
|
= note: expected struct `Vec<char>`
found struct `EncodedData<'_>`
note: associated function defined here
--> /Users/ckruse/.cargo/registry/src/github.com-1ecc6299db9ec823/rphtml-0.5.9/src/parser.rs:511:9
|
511 | pub fn create_text_node(content: Vec<char>, code_at: Option<usize>) -> Self {
| ^^^^^^^^^^^^^^^^
Some errors have detailed explanations: E0308, E0432.
For more information about an error, try `rustc --explain E0308`.
error: could not compile `visdom` due to 3 previous errors
Can you please adopt the changes and release a new version?
Best regards,
CK
Hello, thanks for the library
I'm wondering how do I remove class="" attributes? I want to remove anything that class related, but I can't know their class name.
I've tried this but it doesn't work
html.remove_attr("style");
html.remove_attr("class");
```
# mesdoc = "0.1.11"
rphtml = "0.3.6"
mesdoc = { path = "../mesdoc" }
# rphtml = { path = "../rphtml"}
please fix mesdoc = { path = "../mesdoc" }
可以将Result<Elements, Box<dyn Error>>
改为Result<Elements, Box<dyn Error + Send>>
吗。以便与anyhow
兼容
错误如下:
error[E0432]: unresolved imports `htmlentity::entity::encode_chars`, `htmlentity::entity::EntitySet`
|
6 | use htmlentity::entity::{decode_chars_to, encode_chars, EncodeType, Entity, EntitySet};
| ^^^^^^^^^^^^ ^^^^^^^^^ no `EntitySet` in `entity`
| |
| no `encode_chars` in `entity`
|
help: a similar name exists in the module
|
6 | use htmlentity::entity::{decode_chars_to, encode_char, EncodeType, Entity, EntitySet};
| ~~~~~~~~~~~
help: a similar name exists in the module
|
6 | use htmlentity::entity::{decode_chars_to, encode_chars, EncodeType, Entity, Entity};
|
Hi! 请问我该如何删除一个符合要求的节点?以及如何增加节点?
let mut img_list = document.find("img[src]");
img_list.for_each(|_index, ele| {
// 如何删除符合要求的节点?
// How should I remove the element
});
let svg = Vis::load("<svg></svg>").unwrap();
// 如何把节点添加到document中?
// How should I append this svg to the document?
use visdom::Vis;
fn main() -> anyhow::Result<()> {
let html = r#"
<div>
<p>Visdom is awesome</p>
<p>Visdom 很好用</p>
</div>
"#;
let doc = Vis::load(html).unwrap();
let elements = doc.find("p:contains('Visdom')"); // work
println!("elements : {}", elements.length());
let elements = doc.find("p:contains('好用')"); // panic
println!("elements : {}", elements.length());
println!("done");
Ok(())
}
elements : 2
thread 'main' panicked at 'range end index 8 out of range for slice of length 5', library/core/src/slice/index.rs:73:5
stack backtrace:
0: rust_begin_unwind
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
2: core::slice::index::slice_end_index_len_fail_rt
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:73:5
3: core::ops::function::FnOnce::call_once
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ops/function.rs:248:5
4: core::intrinsics::const_eval_select
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/intrinsics.rs:2372:5
5: core::slice::index::slice_end_index_len_fail
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:67:9
6: <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::index
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:304:13
7: <core::ops::range::RangeTo<usize> as core::slice::index::SliceIndex<[T]>>::index
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:351:9
8: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:18:9
9: <visdom::mesdoc::selector::pattern::RegExp as visdom::mesdoc::selector::pattern::Pattern>::matched
at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/pattern.rs:357:17
10: visdom::mesdoc::selector::pattern::exec
at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/pattern.rs:445:26
11: visdom::mesdoc::selector::rule::Rule::exec_queues
at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/rule.rs:283:54
12: visdom::mesdoc::selector::rule::Rule::exec
at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/rule.rs:276:3
13: visdom::mesdoc::selector::Selector::from_str
at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/mod.rs:157:51
14: visdom::mesdoc::interface::elements::Elements::find
at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/interface/elements.rs:1098:11
15: repl::main
at ./src/bin/repl.rs:17:20
16: core::ops::function::FnOnce::call_once
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.