WebReflection Ltd
webreflection / linkedom Goto Github PK
View Code? Open in Web Editor NEWA triple-linked lists based DOM implementation.
Home Page: https://webreflection.medium.com/linkedom-a-jsdom-alternative-53dd8f699311
License: ISC License
A triple-linked lists based DOM implementation.
Home Page: https://webreflection.medium.com/linkedom-a-jsdom-alternative-53dd8f699311
License: ISC License
Hey,
I was porting code like this https://github.com/google/eleventy-high-performance-blog/blob/main/_11ty/apply-csp.js#L54 and this https://github.com/google/eleventy-high-performance-blog/blob/main/_11ty/json-ld.js#L35 and both of those selectors fail. From my debugging it looks like those attributes (csp-hash
and type
) on script
aren't parsed (I logged the outerHTML of the nodes and didn't see the attributes).
Besides that things are working great and the performance is very much appreciated!
Cheers!
After replacing jsdom with linkedom, i ran into issues where documents being sent into linkedom was throwing errors because some html are blank strings. I had to wrap in try{} and return a default blank document.
Not sure why, but JSDOM handled it somehow. I dont know how, its too late, its gone from my system
This seems like an efficient way to represent the DOM. What is an example real life use case that you thinks represents the sweet spot usage for this lib?
Strange. I am currently trying to swap out the JSDOM api I am using for Linkedom. But i am always getting undefined for default view. I even tried running it from console:
parseHTML("<html><body>Hello</body></html>").defaultView
and its still undefined. I don't know if JSDOM is conflicting with it, because it's still being imported in another area of the system.
Is that possible? BTW - I am able to test is from the console because I require it from my Electron app and assign it to the window, so i get access:
In Electron:
var {parseHTML} = require('linkedom');
window.parseHTML=parseHTML;
Several advanced HTML use cases, such as attaching data-
tags for testing purposes can require empty attribute values to be present, currently empty attributes are entirely removed from the string that Linkedom outputs. Here is a test to illustrate the behaviour:
const {document} = parseHTML('<!DOCTYPE html><html id="html" class="live"><!--<hello>--><hello></html>');
document.documentElement.innerHTML = '<div data-amend="">Foo</div>'
// Assertion fails currently (desired behaviour)
assert(document.documentElement.toString(), '<div data-amend="">Foo</div>')
// Assertion passes currently
assert(document.documentElement.toString(), '<div>Foo</div>')
document.documentElement.innerHTML = '<div data-amend>Foo</div>'
// Assertion fails currently (desired behaviour)
assert(document.documentElement.toString(), '<div data-amend="">Foo</div>')
// Assertion passes currently
assert(document.documentElement.toString(), '<div>Foo</div>')
is this by design? it breaks javascript written to be run client side in some cases
For a large variety of use cases it is desirable to have HTML entities in Linkedom output. Currently outputting an element as a string decodes these entities, so for example an &
appears as an &
and an
appears as a
(space).
Here's a simple to test to demonstrate desired behaviour:
const {document} = parseHTML('<!DOCTYPE html><html id="html" class="live"><!--<hello>--><hello></html>');
document.documentElement.innerHTML = '<div>Foo Bar</div>'
// Assertion fails currently (desired behaviour)
assert(document.documentElement.toString(), '<div>Foo Bar</div>')
// Assertion passes currently
assert(document.documentElement.toString(), '<div>Foo Bar</div>')
Hello! 👋 When parsing a document, the doctype
is missing properties such as publicId
and systemId
mentioned here: https://developer.mozilla.org/en-US/docs/Web/API/DocumentType#properties
Running the code below using [email protected] produces the following results:
import { parseHTML } from "linkedom";
const { document } = parseHTML('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html></html>');
console.log(document.doctype.name); // => 'html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"'
console.log(document.doctype.publicId); // => undefined
console.log(document.doctype.systemId); // => undefined
However, opening this HTML file in the latest stable version of Firefox...
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html></html>
...and running the code below produces the following results:
console.log(document.doctype.name); // => 'html'
console.log(document.doctype.publicId); // => '-//W3C//DTD HTML 4.01//EN'
console.log(document.doctype.systemId); // => 'http://www.w3.org/TR/html4/strict.dtd'
It looks like DocumentType
only contains a name
property which is set to the entire doctype string:
https://github.com/WebReflection/linkedom/blob/main/esm/interface/document-type.js#L12
https://github.com/WebReflection/linkedom/blob/main/esm/shared/parse-json.js#L87
https://github.com/WebReflection/linkedom/blob/main/esm/interface/document.js#L144
Many thanks in advance!
Thanks for your extremely rapid attention and released fix for #65. Unfortunately the fix which has been issued has made the situation worse in some aspects and for the time being I have to return to release 0.8.0.
The following code which was functional under 0.8.0 now no longer returns any result for the query under 0.9.1
var linkedom = require("linkedom");
var text = "<!DOCTYPE html>\n<html lang=\"en\"><body><div class=\"test-component-include\"></div></body></html>"
var root = linkedom.parseHTML(text).document.cloneNode(true);
var queried = root.querySelectorAll(".test-component-include");
console.log("queried " + queried.length + " elements");
In addition part of the original problem reported under #65 is still present in that the classList member of the div node in the cloned markup is not populated. And className neither, which is probably what is responsible for the query failure.
See test case
const { parseHTML } = require("linkedom");
const { document } = parseHTML(`<script>"''"</script>`);
console.assert(
/"''"/.test(document.toString()),
`Quotes inside of script should not be encoded: ${document.toString()}`
);
I can see that there is a placeholder for this since the property is attached but it is always populated with a Set of 0 elements, regardless of whether there is a "class" attribute on the element or not.
Specified in the DOM API here: https://developer.mozilla.org/en-US/docs/Web/API/Element/classList
See attached image showing classList as Set(0) even though there is an class attribute holding value renderer-test-container
I never got this error with JSDOM, i feel it's important to file the bug since I am using Linkedom as a dropin. I went back today to pull in a site into the tool which then uses linkedom to manipulate. It turns out that this HTML previously worked with jsdom, but now breaks in Linkedom, again, no control over the crappy html.
ERROR:
Uncaught SyntaxError: Octal escape sequences are not allowed in template strings.
Clicking on the error took me to the place in the HTML src that i am feeding into Linkedom, and it shows this (chopped snippet for brevity):
footer:before,blockquote small:before{content:"\2014 \A0"}
Thats just a part of inline css in a <style> tag in HEAD
Any idea?
I have a medium sized app built in React and my test suite is painfully slow. I'm using jest with react testing library. Jest relies on jsdom by default.
I stumbled upon this library and I'd like to see how we can use it in our project.
Is it something possible? If so, how can we do a proof of concept?
Here is the jsdom environment file for jest
I got an error:
Error: :nth-last-child isn't part of CSS3
at Object.compilePseudoSelector (...\node_modules\css-select\lib\pseudo-selectors\index.js:29:15)
at Object.compileGeneralSelector (...\node_modules\css-select\lib\general.js:21:39)
...
with selector like this
document.querySelectorAll('.t01 > tr:nth-last-child(n+2)')
but isn't the nth-last-child
already supported by CSS3?
My environment:
When using LinkeDOM for testing, I stumbled across this limitation:
input.dispatchEvent(new window.InputEvent("change", { data: "filter" }));
This errors since InputEvent
is undefined.
Hi!, today I noticed the window.getComputedStyle() method is missing.
Do you have plans to add it? Thank you.
Hi @WebReflection ,
The how-to-contribute.md
suggested creating an issue for further documentation that I think might be useful.
I'm trying to get to grips with the project so I can maybe solve my own problems, let me tell you how I got started and maybe in the process of telling me what I should have done it'll become apparent what to add to the docs.
I want to look at the toString
'ing of Element
so I found a test file. I couldn't see any sections e.g. like in Jest (it("test thing thing")
), I assumed this assert
format is a stylistic thing but since I'm fixing one very defined problem for now I didn't want to spend time reading about it just yet. I just commented out the whole of test/html/element.js
and added in one test for my issue at the top whilst I tried some things out:
const {document} = parseHTML('<!DOCTYPE html><html id="html" class="live"><!--<hello>--><hello></html>');
document.documentElement.innerHTML = '<div>Foo Bar</div>'
assert(document.documentElement.toString(), '<div>Foo Bar</div>')
I usually work in IntelliJ, I tried to run the test from there but it can't make sense of the global.symbolFor stuff:
So I went to the CLI and luckily npm run test -- test/html/element.js
- a pretty standard syntax seemed to work. It would be great to have test commands documented.
I got a lot of output though, a lot of things that I probably don't need when I'm just iterating on one little test, a big table with these headers: File | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s
So it would be great to know how to not get all this stuff in the output whilst I'm focusing on a small task. Of course I'd come back and run a full test suite afterwards. I appreciate it's probably important a lot of the time, but it's quite daunting to a newbie.
My test output is present about halfway through the big chunk of output from this command, so I can work with it for now. It would also be helpful to know how you might add console.log
type statements whilst debugging, they don't seem to come out in the test output currently. I'd use the debugger, but IntelliJ has issues as above.
Example from the wild web:
http://dedimania.net/tmstats/?do=stat&Envir=TMU-Island&MapOrder=DATE-DESC&Show=MAPS
It has this sort of stuff: <td someattr="somevalue" /td>
No idea how standard compliant this is, but Chrome parses it without any problems.
Linkedom doesn't see that it's a "self closing" td
and puts subsequent tag into it.
Feel free to close, this looks real sketchy :)
const { parseHTML } = require("linkedom");
const test = parseHTML(`
<template>
"hello world"
</template>
`);
const template = test.document.querySelector("template");
console.log(test.document.toString()); // "<template></template>"
console.log(template.toString()); // "<template></template>"
the "hello world doesn't appear, only ""
quotes get changed to " for example, and this breaks some template languages
(text inside script and style tags don't have this problem thankfully, but text inside template tags do)
const { parseHTML } = require("linkedom");
const test = parseHTML(`
<div>
"hello world"
</div>
`);
const bla = test.document.querySelector("div");
console.log(bla.innerHTML); // "hello world"
Re-using the example from the readme, the following minimal example failed unexpectedly:
const {parseHTML} = require ('linkedom');
const {
document
} = parseHTML(`
<!doctype html>
<html lang="en">
<head>
<title>Hello SSR</title>
</head>
<body>
<form>
<input name="user">
<button>
Submit
</button>
</form>
</body>
</html>
`);
const form = document.querySelector('form');
const button = form.querySelector(':scope > button');
console.log(button); // returns null while DOM would return the button
I realize that this might be a bug in css-select
but I don't know it well enough to figure things out.
Test case
const { parseHTML } = require("linkedom");
const { document } = parseHTML(
`<script type="application/ld+json">{}</script>`
);
document.querySelector("script").textContent = `{"change": true}`;
console.assert(
document.toString() ==
`<!DOCTYPE html><html><script type="application/ld+json">{"change": true}</script></html>`,
`Should retain type: ${document.toString()}`
);
Thanks!
Thanks for working on and publishing this project. I’d like some clarification on this line of the project README, which states:
- be close to the current DOM standard, but not too close.
What are this project’s design goals with regard to DOM standards? If LinkeDOM is not aligned with WHATWG standards, where does it diverge? Is there a general rule of thumb or list of differences describing what’s in scope for the project?
In addition to checking node.nodeType, another way to check if a node is a Text node is:
const isTextNode = node instanceof Text;
But currently parseHTML() does not return the Text constructor, though it returns window, document, HTMLElement etc. This of course isn't specific to Text; for compatibility we might need all the other globals as well.
Would it be better to attach them to window, which is returned from parseHTML() anyway?
Hello )
Is it expected that window.addEventListener
is not available in node env?
Here is my snippet:
const html = `
<!DOCTYPE html>
<html>
<head></head>
<body></body>
</html>
`
const { parseHTML } = require('linkedom')
console.log('linkedom: ', parseHTML(html).window.addEventListener)
const { JSDOM } = require('jsdom')
const dom = new JSDOM(html)
console.log('jsdom: ', dom.window.addEventListener)
running this code with node
results into:
linkedom: undefined
jsdom: [Function: bound addEventListener]
What I am trying to do is to use linkedom
together with jest similarly to how it is used in jest-environment-jsdom
https://github.com/facebook/jest/blob/master/packages/jest-environment-jsdom/src/index.ts
Hi! I am using this library to run unit tests over a library I work on. Great work.
I realized the window.perfomance
object is missing. Do you have plans to add it?
If you do, with the right instructions I could volunteer to include it. I read the code and I believe it should live under interface
.
Best, Carlos
Repro
const { parseHTML } = require("linkedom");
const { document } = parseHTML(
`<picture><source></picture><picture><source></picture>`
);
const sources = [
...document.querySelector("picture").querySelectorAll("source"),
];
console.assert(sources.length == 1, `Should get the right src ${sources}`);
When I update the version from 0.6.0 to 0.8.0 I got this error
ERROR in ./node_modules/linkedom/cjs/interface/document.js
Module not found: Error: Can't resolve 'perf_hooks' in '/home/giles/Projects/FriendOfAnimal/Frontend/foa-vue/node_modules/linkedom/cjs/interface'
@ ./node_modules/linkedom/cjs/interface/document.js 2:22-43
@ ./node_modules/linkedom/cjs/xml/document.js
@ ./node_modules/linkedom/cjs/dom/parser.js
@ ./node_modules/babel-loader/lib!./node_modules/vue-loader/lib??vue-loader-options!./src/views/ViewPost.vue?vue&type=script&lang=js&
Hey,
It is pretty common (and this is how JSDOM behaves) for HTML serializers to allow special-casing the single quote '
to not be encoded when using double-quote "
as the attribute value delimiter.
While the current behavior works perfectly fine, it does make quote-heavy content like CSPs or SVG-data-URIs very busy to read.
Thanks!
First of all a very big thank you for a wonderful project! I thoroughly enjoy working with it.
I wonder if serialization of empty attributes should not depend on target (xml/html).
Consider console.log((new DOMParser).parseFromString(`<test hello=""></test>`,"text/xml").toString());
vs console.log((new DOMParser).parseFromString(`<test hello=""></test>`,"text/hmtl").toString());
in both cases the output is empty attribute: <test hello></test>
which is correct html but not correct xml. Maybe the toString() method of attr.js should be dependant on target html or xml?
For now, i monkey patch with toString("text/xml")
in the xml scenario.
Many thanks
Martin
Calling .innerHTML
on an element that includes escaped characters produces incorrect results.
E.g.:
console.log(parseHTML(`<body>
<div class="comment">
<pre><code>echo "<table class='charts-css'>"</code></pre>
</div>
</body>`).document.documentElement.innerHTML)
produces
<div class="comment">
<pre><code>echo "<table class='charts-css'>"</code></pre>
</div class="comment">
However, the correct output would leave the input unchanged:
<body><div class="comment">
<pre><code>echo "<table class='charts-css'>"</code></pre>
</div></body>
Removing unescape
inside the innerHTML
getter fixes this, but it's not clear if removing it would break other behaviors?
Hi, thanks for this library.
Is StyleSheet (HTMLStyleElement.sheet
property) available?
It always returns me undefined
and I can't find the implementation in the source code.
Tests do pass but unhandled rejections are there in console. You can see that here
It looks like I've been lazy enough to not put symbols in the file they likely belong the most: shared/symbols.js
It's easier to look at that file and check what Symbols are about, so ideally I should:
npm ERR! 404 Not Found - GET https://registry.npmjs.org/likedom - Not found
npm ERR! 404
npm ERR! 404 'likedom@latest' is not in the npm registry.
npm ERR! 404 You should bug the author to publish it (or use the name yourself!)
npm ERR! 404
This is not an issue for my usage as far as I know, so feel free to close the issue if this is not something useful to "fix".
I'm using v0.9.6 on macOS Big Sur v11.4 with Node.js v16.0.0
When I run this code:
const { parseHTML } = require('linkedom');
const { document } = parseHTML(`<!DOCTYPE html>
<html>
<body>
<p><img src="…" /></p>
</body>
</html>`);
console.log(document.toString());
I get this result:
<!DOCTYPE html>
<html>
<body>
<p><img src="…" /></p>
</body>
</html>
The <
and >
are preserved, but not the "
.
After replacing jsdom with linkedom, i continued regression testing all the html src that i was previously pulling into this tool. Linkedom i noticed is removing entire blocks of <style> tags, no errors though, its just gone from the DOM after resave it to disk an open.
It's unfortunate, i had to reinstall jsdom, and all the hundreds of HTML src files are being processed again fine. I still have the linkedom commented for a future try again:
/*
var {parseHTML} = require('linkedom');
window.JSDOM = function JSDOM(html) { return parseHTML(html);} //facade to work like jsdom
*/
const jsdom = require("jsdom");
const { JSDOM } = jsdom;
window.JSDOM = JSDOM;
The benefits, design, efficiency and thought that went into linkedom is amazing, but it's a bit off to be a drop-in for jsdom. When dealing with HTML src that we have no control over, jsdom does swallow the errors (like octal syntax errors etc) and never mutates the DOM automatically.
Like the disappearing STYLE tag, no idea why linkedom just removes it, causing the page to be completely unformatted.
Will def come back to try this api out a few months from now, i'll keep a watch.
Hey! I came across linkeddom today while googling for "faster jsdom" alternatives.
Specifically I'm looking to speed up our react application's Jest test suite, and wondering if you have any experience / tips doing that with linkedom?
The comment in your blog post "I am already replacing every repository of mine that uses, either for testing or as browser-less dependency DOM env, basicHTML with linkedom" makes it sound like you have, but just naively scanning the linkedom readme, and search for "jest" in this repo, so far I'm not seeing anything obvious. Or is this just really easy to do and I'm missing it?
Thanks!
Not an issue but a question. Is Linkedom faster at navigating the DOM compared to native use of querySelector/DOM.?
Hi,
Is it possible to add it, please? :P
Hello there—thanks for this library! I ran into some issues when attempting to use Preact with linkedom
, which the Preact team was able to track down to an unhandled edge case with CharacterData
.
According to this section of the DOM spec:
Each node inheriting from the CharacterData interface has an associated mutable string called data.
It looks like data
isn't mutable at the moment. I think all that needs to be added is a setter below CharacterData
's get data()
.
const { parseHTML } = require("linkedom");
const bar = parseHTML(`
<template>
<div></div>
</template>
<template>
</template>
`);
/node_modules/linkedom/cjs/shared/parse-from-string.js:60
node = node.appendChild(document.createElement(name));
^
TypeError: Cannot read property 'appendChild' of null
it looks like template elements have an innerHTML of an empty string
Hi there! First of all, I wanted to say thank you for building this lib -- looks super useful. I stumbled upon it while trying to replace jsdom
because of its 100 dependencies. Please feel free to close this issue if replacing jsdom
is not an actual goal of this lib.
I'm trying to combine linkedom
with @mozilla/readability
. Line 1251 of Readability.js in particular got me into an infinite loop. Rehashing the critical lines, the code looks like this:
while (someElement.childNodes.length) {
anotherElement.appendChild(someElement.childNodes[0])
}
At first, I was a little confused about this, how can length
change... but from the appendChild documentation, it seems like the DOM implementation updates the array of the previous parent's childNodes -- thus the length of the array with children gets reduced over time.
Maybe I should suggest a patch to the maintainers of @mozilla/readability
? In any case, I wanted to let you know of this potential incompatibility! Again, thank you for all the awesome.
Hi Andrea,
I'm trying to migrate images-responsiver from basicHTML to LinkeDOM, and it's really working well overall, but I'm having an issue with dataset
usage.
Here's a reduced test case to understand what I thought would work, but doesn't:
const { parseHTML } = require('linkedom');
const { document } = parseHTML(`<!DOCTYPE html>
<html>
<body>
<img src="test.png" data-responsiver="transform" />
</body>
</html>`);
document.querySelectorAll('img').forEach((image) => {
console.log('responsiver' in image.dataset);
console.dir(Object.keys(image.dataset));
console.log(image.dataset.responsiver);
delete image.dataset.responsiver;
console.log(image.dataset.responsiver);
console.dir(Object.keys(image.dataset));
});
Here's the result on my computer (macOS Big Sur v11.4 with Node.js v16.0.0):
false
[]
transform
null
[]
What I would expect:
true
['responsiver']
transform
undefined
[]
Did I forget to use something to better support dataset
, or is it a missing part in LinkeDOM?
Thanks a lot for your help.
First, thanks for your great work on this project.
While using Linkedom for testing a React app, I've come across this error: TypeError: Cannot read property 'userAgent' of undefined
And it comes from window.navigator
being undefined.
Here is how it's defined in JSDOM. For the scope of linkedom, and since it's a read only property, I think providing a default value to avoid crashes is enough for a fix.
Try it here https://runkit.com/haikyuu/60259abfa573fe0019c29a12
Hi 👋🏻 ,
Linkedom looks like a great project, i've got a project that spends tens of seconds in JSDOM and I'd love to try Linkedom as a replacement. I sense that's it not a drop in replacement, has anyone written a 'migration guide' - or is a conversion instead of a rewrite far more complex than I think it might be?
img.src
should yield img.getAttribute('src')
.
const { parseHTML } = require("linkedom");
const { document } = parseHTML(`<img src="test.gif">`);
const img = document.querySelector("img");
console.assert(
(document.toString() == img.src) == img.getAttribute("src"),
`Should have src ${img.src}`
);
Expose a top-level toJSON
method that accepts a linkedom
element instance and returns that element's jsdon
serialization.
When using linkedom
in a TypeScript environment, element instances (including the document
) don't seem expose a .toJSON()
method. An exploration of the code suggests that this might be because the built-in dom
typings from TypeScript are being used for these specced interfaces.
To serialize a linkedom
element to its jsdon
equivalent, the element needs to be cast to something like any
before calling .toJSON
. While this isn't a huge problem, I think there's an opportunity to expose a mirror to the existing parseJSON
top-level method that accepts a linkedom
element and simply calls that element's .toJSON()
method in a type-friendly way. This would allow us to continue pretending that these elements are the 'real deal'.
Minimal example:
parseHTML(`<body id="foo" class="bar"></body>`).document.toString())
produces:
<body class="bar" id="foo"></body>
Similarly, the .attributes
getter will return them in reversed order as well.
I've tried to fix this, however this causes a significant amount of tests to fail, so I'm not sure the extent to which this is behaviour is intended?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.