fb55 / domutils Goto Github PK
View Code? Open in Web Editor NEWUtilities for working with htmlparser2's DOM
Home Page: https://domutils.js.org
License: BSD 2-Clause "Simplified" License
Utilities for working with htmlparser2's DOM
Home Page: https://domutils.js.org
License: BSD 2-Clause "Simplified" License
export function replaceElement(elem: Node, replacement: Node): void {
const prev = (replacement.prev = elem.prev);
if (prev) {
prev.next = replacement;
}
const next = (replacement.next = elem.next);
if (next) {
next.prev = replacement;
}
const parent = (replacement.parent = elem.parent);
if (parent) {
const childs = parent.children;
childs[childs.lastIndexOf(elem)] = replacement;
}
}
forgetting to write elem.parent = null
it is very importfant!
if next step is appendChild's innner removeElement
export function removeElement(elem: Node): void {
if (elem.prev) elem.prev.next = elem.next;
if (elem.next) elem.next.prev = elem.prev;
if (elem.parent) {
const childs = elem.parent.children;
childs.splice(childs.lastIndexOf(elem), 1);
}
}
childs.lastIndexOf(elem) is -1, it will delete some element else
To see what happens to your code in Node.js 10, Greenkeeper has created a branch with the following changes:
.travis.yml
If you’re interested in upgrading this repo to Node.js 10, you can open a PR with these changes. Please note that this issue is just intended as a friendly reminder and the PR as a possible starting point for getting your code running on Node.js 10.
Greenkeeper has checked the engines
key in any package.json
file, the .nvmrc
file, and the .travis.yml
file, if present.
engines
was only updated if it defined a single version, not a range..nvmrc
was updated to Node.js 10.travis.yml
was only changed if there was a root-level node_js
that didn’t already include Node.js 10, such as node
or lts/*
. In this case, the new version was appended to the list. We didn’t touch job or matrix configurations because these tend to be quite specific and complex, and it’s difficult to infer what the intentions were.For many simpler .travis.yml
configurations, this PR should suffice as-is, but depending on what you’re doing it may require additional work or may not be applicable at all. We’re also aware that you may have good reasons to not update to Node.js 10, which is why this was sent as an issue and not a pull request. Feel free to delete it without comment, I’m a humble robot and won’t feel rejected 🤖
There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.
Your Greenkeeper Bot 🌴
On https://developer.apple.com/documentation/macos-release-notes you'll see a div with <div data-v-21735da6="" data-v-21125d33="" id="app" class="core-app">
If I do the following, I don't see attribs with class:
const macosReleaseNotes = await axios.get("https://developer.apple.com/documentation/macos-release-notes")
.then(response => {
if (response.status === 200) return parseDocument(response.data)
})
.catch(err => {
throw new Error(err)
})
const elementWithId = domutils.findAll(el => el.name === 'div', macosReleaseNotes.children, true)
console.log(elementWithId[1].attribs);
The response is { id: 'app' }
.
How do I find the class?
As a follow up to 9509808, including a tideleft id.
Details: https://help.github.com/en/articles/displaying-a-sponsor-button-in-your-repository.
Is there any API for removing/replacing an attribute? If not how would I do it with domutils?
First, let me apologize for not being confident in how to ask my question in a way that is helpful. I know open source maintainers work is difficult and under compensated. Thank you for all you do @fb55!
Simply put, I am seeing this error while working through instructions for running detox in a React Native project via Expo.
Unable to resolve "domhandler" from "node_modules/domutils/lib/index.js"
I've taken the following steps:
% cat node_modules/domhandler/package.json |grep main
"main": "lib/index.js",
% head node_modules/domhandler/lib/index.js
"use strict";
var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
if (k2 === undefined) k2 = k;
var desc = Object.getOwnPropertyDescriptor(m, k);
if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
desc = { enumerable: true, get: function() { return m[k]; } };
}
Object.defineProperty(o, k2, desc);
}) : (function(o, m, k, k2) {
if (k2 === undefined) k2 = k;
Past this, I'm not sure how to continue debugging. Any help you can offer would be very welcome. Thank you!
const dom: Domhandler.Document = htmlparser2.parseDocument(wxmlText, {
xmlMode: true,
});
In I use domUtils. GetElementsByTagName (' tagName, dom), I can pass in the dom, and get the expected element
In domUtils.getElementsById('eleId',dom), the ts type allows me to pass in the dom, but it doesn't get the correct result (null).
Found in the getElementsByTagName source code first made an 'isTag' judgment on the passed parameter (dom), which caused the dom to not enter the subsequent logic, because the dom tag is' root '.
This is obviously confusing, and hopefully ts will be more rigorous about the types of parameters passed in.
I now use domUtils.getElementsById('eleId',dom.children) as such.
This was kind of hard to reproduce but I'll try my best to explain. I have this HTML:
var html = '<table>\n<tr>\n<td>\nHi\n</td>\n</tr>\n</table>';
I want to parse it using htmlparser2, handle it with DomHandler, and do some manipulation with DomUtils. Here's the boilerplate stuff:
var htmlparser = require('htmlparser2');
var handler = new htmlparser.DomHandler(function (err, dom) {
if (err) {
throw err;
}
console.log(render(dom));
});
var parser = new htmlparser.Parser(handler);
parser.write(html);
parser.done();
render
takes a list of nodes and returns their string representation. It is called recursively when an element contains children (see bottom, under renderTag
):
function render(nodes) {
var html = '';
nodes.forEach(function (node) {
if (node.type == 'text') {
html += renderText(node);
return;
}
if (node.type == 'comment') {
html += renderComment(node);
return;
}
html += renderTag(node);
});
return html;
}
function renderText(node) {
return node.data;
}
function renderComment(node) {
return '<!--' + node.data + '-->';
}
function renderTag(node) {
var openTag = '<' + node.name + '>',
closeTag = '</' + node.name + '>';
if (!node.children.length) {
return openTag + closeTag;
}
return openTag + render(node.children) + closeTag;
}
So far, this works as expected. But a problem occurs when I add call DomUtils.removeElement
inside of render
:
function render(nodes) {
var html = '';
//debugging
console.log(nodes);
nodes = nodes.filter(function (node, index) {
if (node.type == 'text' && !node.data.trim()) {
htmlparser.DomUtils.removeElement(node);
return false
}
return true;
});
//debugging
console.log(nodes);
nodes.forEach(function (node) {
....
The first time render
is called, nodes
contains only one element: the table tag. The filtering has no effect, so the console logs the same data twice.
The second time render
is called, nodes
contains three elements:
[ { data: '\n',
type: 'text',
next:
{ type: 'tag',
name: 'tr',
attribs: {},
children: [Object],
next: [Object],
prev: [Circular],
parent: [Object] },
prev: null,
parent:
{ type: 'tag',
name: 'table',
attribs: {},
children: [Circular],
next: null,
prev: null,
parent: null } },
{ type: 'tag',
name: 'tr',
attribs: {},
children: [ [Object], [Object], [Object] ],
next:
{ data: '\n',
type: 'text',
next: null,
prev: [Circular],
parent: [Object] },
prev:
{ data: '\n',
type: 'text',
next: [Circular],
prev: null,
parent: [Object] },
parent:
{ type: 'tag',
name: 'table',
attribs: {},
children: [Circular],
next: null,
prev: null,
parent: null } },
{ data: '\n',
type: 'text',
next: null,
prev:
{ type: 'tag',
name: 'tr',
attribs: {},
children: [Object],
next: [Circular],
prev: [Object],
parent: [Object] },
parent:
{ type: 'tag',
name: 'table',
attribs: {},
children: [Circular],
next: null,
prev: null,
parent: null } } ]
As you can see, the first element is an "empty" string. I want to remove this node from the dom completely. But doing so also removes the other elements in the array, including the tr tag. The console logs an empty array after the filtering.
I don't know if this is a bug with the DomUtils library, or if this is some kind of limitation with JavaScript, or if I'm missing something. But it seems like I should be able to do this.
Hi,
I don't know if this is the right place to post this. But I am running into a strange issue all of a sudden when I try to start my react app that was working fine before and the only error I am getting it the following:
UnhandledPromiseRejectionWarning: TypeError: ext[key].bind is not a function
clientv1_1 | at /home/node/node_modules/renderkid/node_modules/domutils/index.js:12:28
clientv1_1 | at Array.forEach ()
clientv1_1 | at /home/node/node_modules/renderkid/node_modules/domutils/index.js:11:19
I tried googling this error but not able to find any solution. Trying my luck here to see if anyone might have an idea. Thanks in advance!
I appreciate how DomHandler
was abstracted from HtmlParser2
, but I think that the code in this library belongs in DomHandler
because the two are so tightly-coupled. Libraries that use DomHandler
will likely be interested in using DomUtils
, but synchronizing the two libraries can be a hassle. Also, major/minor releases in one library will necessitate similar release in the other, so the process of publishing is also non-optimal.
That said, I recognize that the current organization was created intentionally, so I expect there are good reasons for this that I'm missing.
It will be very helpful. Thanks :)
Seems like findAll
does the same thing as find
called with recurse=true
and limit=Infinity
, but it has a separate implementation. Both are deep-first. Am I missing something and are they supposed to do different things?
It would be great to have a README that enumerated the available APIs, had examples, and described the options.
Example:
var DomUtils = require('domutils')
var dom = {
type: "tag",
name: "div",
next: null,
prev: null,
parent: null,
attribs: {
'foo': 'bar with "quotes"'
}
}
console.log(DomUtils.getOuterHTML(dom))
Outputs: <div foo="bar with "quotes""></div>
Which is invalid HTML (and a code injection risk).
Here's an example that shows a more complete example of the problem when using in conjunction with htmlparser2:
var DomUtils = require('domutils')
var htmlparser2 = require('htmlparser2')
var handler = new htmlparser2.DomHandler(function(err, dom) {
console.log(DomUtils.getOuterHTML(dom))
})
var parser = new htmlparser2.Parser(handler)
parser.write(`
<!DOCTYPE html>
<title>Example</title>
<p style='font-family: "Times New Roman"'>Test</p>
`)
parser.done()
Which outputs:
<!DOCTYPE html>
<title>Example</title>
<p style="font-family: "Times New Roman"">Test</p>
According to the HTML5 specification – and the W3 validator – this is valid input code, but when processed by these two tools with their default options, malformed HTML is produced.
Workaround:
getOuterHTML
is to be implemented from cheeriojs/dom-serializer and allows options to be passed in. {decodeEntities: true}
will encode the quotes. This works for my case but it seems like if this is a necessary step to produce valid HTML it should be the default behaviour – or at the very least the default should be encoding what entities are necessary to avoid producing dangerous and malformed data-structures.
As a result after appending one child from other parent, another child is not appended properly.
Based on the function name, we should be able to append child to anything that can take children, right?
Due to the current typing
function appendChild(elem: Element, child: ChildNode): void
I get a type error with the following code
const doc = new Document([]);
const newHtmlElement = new Element('html', {}, undefined, ElementType.Tag);
appendChild(doc, newHtmlElement);
The code runs fine however typescript complains since Document
is not an Element
;
I think the typing is incorrect and should be
function appendChild(elem: NodeWithChildren, child: ChildNode): void
but correct me if I'm misunderstanding
Hey,
I have a problem on Digital Ocean App Platform:
15:17:25 at /workspace/node_modules/css-select/node_modules/domutils/index.js:12:28
It is while compiling javascript using gatsby.
Any idea as it seems to come from this package?
Tobias
HTMLParser2 returns a dom object. How can I use domUtils to traverse this object?
Thanks
Hi! Using domUtils.replaceElement(element, ???)
, how can I create a new element to replace with? I'm looking for something like domUtils.createElement('div', { id: 'some-id' })
or equivalent.
Thanks!
Take the following code:
var htmlparser = require('htmlparser2');
var handler = new htmlparser.DomHandler(function (err, dom) {
console.log(htmlparser.DomUtils.getOuterHTML(dom[0]));
});
var parser = new htmlparser.Parser(handler);
parser.done('<br />');
The console will output "
", which is interpreted by Chrome (and I presume other browsers) as 2 line-breaks. In other words, Chrome will put in 2
elements. You would think that Chrome wouldn't behave this way, but I guess it's technically doing the right thing. Anyway, DomUtils shouldn't be rendering a closing tag for self-closing elements.
Is it your intention to allow people to use this code too? if so, would you indicate so with an open source license (e.g. a BSD?) so that we can know that we are using your code with permission.
Thanks.
Gil
Branch | Build failing 🚨 |
---|---|
Dependency | jshint |
Current Version | 2.9.4 |
Type | devDependency |
This version is covered by your current version range and after updating it in your project the build failed.
As jshint is “only” a devDependency of this project it might not break production or downstream projects, but “only” your build or test tools – preventing new deploys or publishes.
I recommend you give this issue a high priority. I’m sure you can resolve this 💪
if
syntax (#3103) (8c6ac87)for-in/of
head LHS as asnmt target (da52ad9)The new version differs by 42 commits.
d3d84ae
v2.9.5
481cdca
Merge branch 'W083'
ad7df61
[[TEST]] Update tests to reflect new W083 language
5967e61
[[TEST]] Less ambiguous message for W083
cc215bd
[[CHORE]] Update Test262
e6c89f0
[[CHORE]] Fix bug in test script
5b957f6
Merge pull request #3126 from jugglinmike/for-lhs
da52ad9
[[FIX]] Parse for-in/of
head LHS as asnmt target
b075919
[[FEAT]] Add MediaRecorder to vars.js
24b8c97
Merge pull request #3066 from jugglinmike/asi-dowhile-es6-updated
29c359f
Merge pull request #3064 from jugglinmike/improve-yield-messages
c083866
[[FIX]] Avoid crash when peeking past end of prog
5f0f789
[[FIX]] Close synthetic scope for labeled blocks
70f9ca2
Merge remote-tracking branch 'jugglinmike/2358-improve-unused-desc'
bd36953
[[FIX]] Account for hoisting of importing bindings
There are 42 commits in total.
See the full diff
There is a collection of frequently asked questions and of course you may always ask my humans.
Your Greenkeeper Bot 🌴
Your license page states BSD 2 Clause, the license file is BSD 3 Clause.
It appears that this issue goes back to the first release.
When using domutils v3 in a React Native project, you'll get a warning when building:
warn Package domutils has been ignored because it contains invalid configuration. Reason: Package subpath './package.json' is not defined by "exports" in $PATH/node_modules/domutils/package.json
This is something that has been reported for multiple packages in React Native here:
react-native-community/cli#1168
Each individual package maintainer has the ability to work around the issue, however, by simply exporting package.json. I'll submit a PR to do this and leave it up to you @fb55 if you'd like to merge.
Some examples or usage info would be nice.
This would allows code inference for IDE's such as webstorm & vscode.
See: fb55/domhandler#132
I got multiple problems when using DomUtils with typescript
DomUtils.findAll(el => el.name === 'div' && el.parent?.name === 'body', dom.children')
TS2339: Property 'name' does not exist on type 'NodeWithChildren'.
DomUtils.getElements({name: 'div'}, dom.children).map(elment => element.children)
TS2339: Property 'children' does not exist on type 'Node'.
find(test: (elem: Node) => boolean, nodes: Node[], recurse: boolean, limit: number): Node[]
findAll(test: (elem: Element) => boolean, nodes: Node[]): Element[]
Why typeof test's params of find is Node but findAll is Element? and their return too?
There are cases like these but I can't remember now
How to be clear
Why getElements not return Element[] ?
When will method return Element, Node, NodeWithChildren, ... ?
Thanks
var dom = htmlparser.parseDOM('<div><span>1</span><span>2</span></div>');
htmlparser.DomUtils.compareDocumentPosition(dom[0].children[0], dom[0].children[0].children[0]);
This code causes an infinite loop. The debugger shows the execution looping here:
while (aParents[idx] === bParents[idx]) {
idx++;
}
With adjecent nodes, it's OK.
Environment: browserified & babelified ({ targets: '> 1%, not IE 11' }
) module, web worker, Chrome 78.
.
Hey,
I am trying to look at the source code for 1.5.1 which was used in htmlparser2 and cheerio 0.22.0 version. But i cannot find it under the releases. Can you please help me find it?
We are using cheerio
npm package version ^1.0.0-rc.12
in our application to parse xml content. cheerio
internally uses domutils
version ^3.0.1
.
When we tried to parse specific xml content using load()
function of cheerio
, load()
function then internally calls find()
of domutils
where we observed following exception
RangeError: Maximum call stack size exceeded
at Object.get [as isTag] (/home/xyz/cheerio-poc/node_modules/domhandler/lib/index.js:6:47)
at Object.get [as isTag] (/home/xyz/cheerio-poc/node_modules/domutils/lib/index.js:27:100)
at /home/xyz/cheerio-poc/node_modules/cheerio-select/lib/index.js:293:60
at find (/home/xyz/cheerio-poc/node_modules/domutils/lib/querying.js:37:13)
at find (/home/xyz/cheerio-poc/node_modules/domutils/lib/querying.js:43:28)
at find (/home/xyz/cheerio-poc/node_modules/domutils/lib/querying.js:43:28)
at find (/home/xyz/cheerio-poc/node_modules/domutils/lib/querying.js:43:28)
at find (/home/xyz/cheerio-poc/node_modules/domutils/lib/querying.js:43:28)
at find (/home/xyz/cheerio-poc/node_modules/domutils/lib/querying.js:43:28)
at find (/home/xyz/cheerio-poc/node_modules/domutils/lib/querying.js:43:28)
Attaching sample code for your reference.
cheerio-poc.zip
findAll
is recursive, and fails on big pages.
Is this package paid only? And from the documentation, it is not clear to me how to use it with htmlparser2 are there any examples that show how to use them together?
Hey @fb55
I noticed that one of the tests for prevElementSibling
is testing/using nextElementSibling
:
https://github.com/fb55/domutils/blob/master/src/traversal.spec.ts#L80-L85
It doesn't change much as the test will also pass but since I saw this while looking for a bug, I thought I would share this with you 😉
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.