Code Monkey home page Code Monkey logo

nh3's Introduction

nh3

CI PyPI Documentation Status

Python bindings to the ammonia HTML sanitization library.

Installation

pip install nh3

Usage

See the documentation.

Performance

A quick benchmark showing that nh3 is about 20 times faster than the deprecated bleach package. Measured on a MacBook Air (M2, 2022).

Python 3.11.0 (main, Oct 25 2022, 16:25:24) [Clang 14.0.0 (clang-1400.0.29.102)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.9.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import requests

In [2]: import bleach

In [3]: import nh3

In [4]: html = requests.get("https://www.google.com").text

In [5]: %timeit bleach.clean(html)
2.85 ms ± 22.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit nh3.clean(html)
138 µs ± 860 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

License

This work is released under the MIT license. A copy of the license is provided in the LICENSE file.

nh3's People

Contributors

adamchainz avatar damianzaremba avatar dependabot[bot] avatar lepture avatar messense avatar monosans avatar seanbudd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nh3's Issues

RFE: is it possible to start making github releases?🤔

Is it possible next time on release new version make the github release to have entry on https://github.com/messense/nh3/releases? 🤔
On create github release entry is created email notification to those whom have set in your repo the web UI Watch->Releases.
gh release can contain additional comments (li changelog) or additional assets like release tar balls (by default it contains only assets from git tag) however all those part are not obligatory.
In simplest variant gh release can be empty because subiekt of the sent email contains git tag name.

I'm asking because my automation process uses those email notifications by trying to make preliminary automated upgrades of building packages, which allows saving some time on maintaining packaging procedures.
Probably other people may be interested to be instantly informed about release new version as well.

Description:
https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository
https://github.com/marketplace/actions/github-release
https://pgjones.dev/blog/trusted-plublishing-2023/
jbms/sphinx-immaterial#281 (comment)

Feature: allow generic attribute prefixes, e.g. data-*

In the current implementation (v0.2.9), there isn't a way to allow all data-* attributes (or other generic attributes prefixes)

In the underlying ammonia, the builder allows for generic_attribute_prefixes to be specified, and uses the data- prefix as an example in the docs:
https://docs.rs/ammonia/latest/ammonia/struct.Builder.html#method.generic_attribute_prefixes

I am currently using bleach with an implementation that allows all data-* attributes, and I would like to switch to this library.
Having this ability would allow me to make the switch easily.

Please consider adding this feature.

Feature goals compared to Bleach (/ full ammonia API)?

Discussions are not enabled so opening it here, sorry 'bout it.

With the recent deprecation of bleach (mostly on grounds of html5lib being unmaintained), unless someone has the time to e.g. rebuild the html5lib API on top of an existing html5 parser and the maintainer of bleach decides to use that, ammonia/nh3 seems well positioned as a migration target (there's already one package which has done that visible from the linked Bleach PR).

One issue there is that nh3 currently provides rather limited tuning knobs compared to Ammonia and Bleach (not sure how the two relate as I have not looked yet), but the readme doesn't really say what your eventual goals would be on that front as maintainer. If you do aim to favor such support & migration, maybe an issue or even project (kanban) about full Ammonia support and / or Bleach features parity (if not API compatibility) could be a consideration?

An other possible issue (though more internal) is for exposing customisations which allow arbitrary callables (attribute_filter seems to be the only one currently): nh3 currently releases the GIL during cleanup, which wouldn't allow calling Python functions, and thus exposing a generic attribute_filter, I don't know whether Ammonia has parallelism built-in or how much you care about parallel cleaning (though I figure having two paths and only keeping the GIL if callbacks were actually provided would always be an option if a somewhat more annoying one).

nh3 clean doesn't include html, head or body tags even when included in ALLOWED_TAGS

While using nh3 library, we came across a use case, where HTML content is expected for a field, but we need to remove the content that can cause XSS attack. Using nh3.clean() directly on the input text doesn't give the expected result and a lot of useful data is getting trimmed ultimately modifying the html template input.

import nh3
text = '''
<!DOCTYPE html>
<html>
<head>
  <title>HTML Tutorial</title>
</head>
<body>
  <h1>This is a heading</h1>
  <p>This is a paragraph.</p>
</body>
</html>
'''

nh3.ALLOWED_TAGS.add('title')
nh3.ALLOWED_TAGS.add('head')
nh3.ALLOWED_TAGS.add('html')
nh3.ALLOWED_TAGS.add('div')
nh3.ALLOWED_TAGS.add('body')

print(nh3.clean(text,tags=nh3.ALLOWED_TAGS,strip_comments=False))

Output: 
<title>HTML Tutorial</title>
 <h1>This is a heading</h1>
 <p>This is a paragraph.</p> 

We don't want to trim the html or head or body tags. Is there any limitation to nh3 library which does not allow these tags?

Allow frozenset in attributes parameter of clean function

Now it's dict[str, set[str]], and attempt to use frozenset will return

TypeError: argument 'attributes': 'frozenset' object cannot be converted to 'PySet'

but IMO using frozenset is a good practice, because if data is immutable good to use immutable type, for example allowed attributes can be defined in configuration and it's safer to use immutable type.

Aborted tests with Python 3.13.0b2

I tried to test nh3-0.2.17 with Python 3.13.0b2 and the test suit is aborted with following output:

python3.13 -m pytest -vv -ra -l -Wdefault -Werror::pytest.PytestUnhandledCoroutineWarning --color=yes -o console_output_style=count -o tmp_path_retention_count=0 -o tmp_path_retention_policy=failed -p no:cov -p no:flake8 -p no:flakes -p no
:pylint -p no:markdown -p no:sugar -p no:xvfb -p no:pytest-describe -p no:plus -p no:tavern -p no:salt-factories
================================================= test session starts =================================================
platform linux -- Python 3.13.0b2, pytest-8.2.2, pluggy-1.5.0 -- /var/tmp/portage/dev-python/nh3-0.2.17/work/nh3-0.2.17-python3_13/install/usr/bin/python3.13
cachedir: .pytest_cache
rootdir: /var/tmp/portage/dev-python/nh3-0.2.17/work/nh3-0.2.17
configfile: pyproject.toml
collecting ... collected 4 items

tests/test_nh3.py::test_clean PASSED                                                                             [1/4]
tests/test_nh3.py::test_clean_with_attribute_filter Fatal Python error: Aborted

Current thread 0x00007f7623806e80 (most recent call first):
  File "/var/tmp/portage/dev-python/nh3-0.2.17/work/nh3-0.2.17/tests/test_nh3.py", line 67 in test_clean_with_attribute_filter
  File "/usr/lib/python3.13/site-packages/_pytest/python.py", line 162 in pytest_pyfunc_call
  File "/usr/lib/python3.13/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/lib/python3.13/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/lib/python3.13/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/lib/python3.13/site-packages/_pytest/python.py", line 1632 in runtest
  File "/usr/lib/python3.13/site-packages/_pytest/runner.py", line 173 in pytest_runtest_call
  File "/usr/lib/python3.13/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/lib/python3.13/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/lib/python3.13/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/lib/python3.13/site-packages/_pytest/runner.py", line 241 in <lambda>
  File "/usr/lib/python3.13/site-packages/_pytest/runner.py", line 341 in from_call
  File "/usr/lib/python3.13/site-packages/_pytest/runner.py", line 240 in call_and_report
  File "/usr/lib/python3.13/site-packages/_pytest/runner.py", line 135 in runtestprotocol
  File "/usr/lib/python3.13/site-packages/_pytest/runner.py", line 116 in pytest_runtest_protocol
  File "/usr/lib/python3.13/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/lib/python3.13/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/lib/python3.13/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/lib/python3.13/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
  File "/usr/lib/python3.13/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/lib/python3.13/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/lib/python3.13/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/lib/python3.13/site-packages/_pytest/main.py", line 339 in _main
  File "/usr/lib/python3.13/site-packages/_pytest/main.py", line 285 in wrap_session
  File "/usr/lib/python3.13/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
  File "/usr/lib/python3.13/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/lib/python3.13/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/lib/python3.13/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/lib/python3.13/site-packages/_pytest/config/__init__.py", line 178 in main
  File "/usr/lib/python3.13/site-packages/_pytest/config/__init__.py", line 206 in console_main
  File "/usr/lib/python3.13/site-packages/pytest/__main__.py", line 7 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main
/var/tmp/portage/dev-python/nh3-0.2.17/temp/environment: line 2730:   291 Aborted                 "${@}"

Pylint false positive: no name in module

image
Is there a linter plugin to install, or is the module improperly configured for pylint? Functionally this code works perfectly, and intellisense can see that it's fine.

Memory leak

Hi,

I think I've found a memory leak.

This example reproduces it:

import requests
import nh3
html = requests.get("https://search.brave.com/").text

for _ in range(30_000):
    nh3.clean(html)

If you run that along any tool like htop you should see that the memory of the process grows continually and without any apparent bound.

I've tried to find the root cause. But I'm not really sure of my findings, and they seem pretty weird.

Bisecting nh3 with the above example gave me this:

# b5074b186b813313b258a7c97871bb2d9fc0eaa7 is the first bad commit
# commit b5074b186b813313b258a7c97871bb2d9fc0eaa7
# Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# Date:   Mon Apr 22 16:12:37 2024 +0800

#     Bump pyo3 from 0.21.1 to 0.21.2 (#43)
    
#     Bumps [pyo3](https://github.com/pyo3/pyo3) from 0.21.1 to 0.21.2.
#     - [Release notes](https://github.com/pyo3/pyo3/releases)
#     - [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
#     - [Commits](https://github.com/pyo3/pyo3/compare/v0.21.1...v0.21.2)
    
#     ---
#     updated-dependencies:
#     - dependency-name: pyo3
#       dependency-type: direct:production
#       update-type: version-update:semver-patch
#     ...
    
#     Signed-off-by: dependabot[bot] <[email protected]>
#     Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

#  Cargo.lock | 20 ++++++++++----------
#  Cargo.toml |  2 +-

Using memray also pointed to pyo3:

python3 -m memray run --native -f -o output2.bin nh.py
python3 -m memray flamegraph -f output2.bin

image

Thank you!

Failed building wheel on linux

I'm on linux x86_64, with python 3.11. pip install nh3 gives maturin failed error, see error log below. I tried with different nh3 versions (0.2.14-0.2.18) and different rust versions (1.77 and 1.79), but get the same maturin failed error. Not sure what's causing this, and any help is appreciated.

Error log Looking in links: /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo2023/x86-64-v3, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo2023/generic, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic Collecting nh3==0.2.15 Downloading nh3-0.2.15.tar.gz (14 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: nh3 Building wheel for nh3 (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for nh3 (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [177 lines of output]
Running maturin pep517 build-wheel -i concordia/bin/python3.11 --compatibility off
🔗 Found pyo3 bindings with abi3 support for Python ≥ 3.7
🐍 Not using a specific python interpreter
Compiling libc v0.2.150
Compiling proc-macro2 v1.0.70
Compiling unicode-ident v1.0.12
Compiling cfg-if v1.0.0
Compiling autocfg v1.1.0
Compiling target-lexicon v0.12.12
Compiling ppv-lite86 v0.2.17
Compiling siphasher v0.3.11
Compiling parking_lot_core v0.9.9
Compiling scopeguard v1.2.0
Compiling smallvec v1.11.2
Compiling once_cell v1.19.0
Compiling syn v1.0.109
Compiling new_debug_unreachable v1.0.4
Compiling serde v1.0.193
Compiling tinyvec_macros v0.1.1
Compiling mac v0.1.1
Compiling utf-8 v0.7.6
Compiling precomputed-hash v0.1.1
Compiling percent-encoding v2.3.1
Compiling log v0.4.20
Compiling heck v0.4.1
Compiling unicode-bidi v0.3.14
Compiling indoc v2.0.4
Compiling maplit v1.0.2
Compiling unindent v0.2.3
Compiling tinyvec v1.6.0
Compiling futf v0.1.5
Compiling form_urlencoded v1.2.1
Compiling phf_shared v0.10.0
Compiling tendril v0.4.3
Compiling lock_api v0.4.11
Compiling memoffset v0.9.0
Compiling phf v0.10.1
Compiling quote v1.0.33
Compiling pyo3-build-config v0.20.0
Compiling unicode-normalization v0.1.22
Compiling syn v2.0.39
Compiling getrandom v0.2.11
Compiling rand_core v0.6.4
Compiling parking_lot v0.12.1
Compiling rand_chacha v0.3.1
Compiling idna v0.5.0
Compiling rand v0.8.5
Compiling url v2.5.0
Compiling pyo3-ffi v0.20.0
Compiling pyo3 v0.20.0
Compiling phf_generator v0.10.0
Compiling string_cache_codegen v0.5.2
Compiling phf_codegen v0.10.0
Compiling markup5ever v0.11.0
Compiling string_cache v0.8.7
Compiling pyo3-macros-backend v0.20.0
Compiling html5ever v0.26.0
Compiling pyo3-macros v0.20.0
error[E0463]: can't find crate for pyo3_macros
--> /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/prelude.rs:20:9
|
20 | pub use pyo3_macros::{pyclass, pyfunction, pymethods, pymodule, FromPyObject};
| ^^^^^^^^^^^ can't find crate

  error[E0463]: can't find crate for `pyo3_macros`
     --> /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/lib.rs:439:9
      |
  439 | pub use pyo3_macros::{pyfunction, pymethods, pymodule, FromPyObject};
      |         ^^^^^^^^^^^ can't find crate
  
  error[E0463]: can't find crate for `pyo3_macros`
     --> /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/lib.rs:450:9
      |
  450 | pub use pyo3_macros::pyclass;
      |         ^^^^^^^^^^^ can't find crate
  
  error[E0463]: can't find crate for `indoc`
     --> /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/lib.rs:381:5
      |
  381 |     indoc,    // Re-exported for py_run
      |     ^^^^^ can't find crate
  
  error[E0432]: unresolved imports `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`, `crate::FromPyObject`
    --> /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/conversions/std/array.rs:4:10
     |
  4  |     ffi, FromPyObject, IntoPy, Py, PyAny, PyDowncastError, PyObject, PyResult, Python, ToPyObject,
     |          ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/conversions/std/ipaddr.rs:6:21
     |
  6  | use crate::{intern, FromPyObject, IntoPy, Py, PyAny, PyObject, PyResult, Python, ToPyObject};
     |                     ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/conversions/std/map.rs:7:5
     |
  7  |     FromPyObject, IntoPy, PyAny, PyErr, PyObject, Python, ToPyObject,
     |     ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/conversions/std/num.rs:4:22
     |
  4  |     exceptions, ffi, FromPyObject, IntoPy, PyAny, PyErr, PyObject, PyResult, Python, ToPyObject,
     |                      ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/conversions/std/osstr.rs:2:18
     |
  2  | use crate::{ffi, FromPyObject, IntoPy, PyAny, PyObject, PyResult, Python, ToPyObject};
     |                  ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/conversions/std/path.rs:2:10
     |
  2  |     ffi, FromPyObject, FromPyPointer, IntoPy, PyAny, PyObject, PyResult, Python, ToPyObject,
     |          ^^^^^^^^^^^^
     |
    ::: /home/x/cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/conversions/std/set.rs:6:46
     |
  6  |     types::set::new_from_iter, types::PySet, FromPyObject, IntoPy, PyAny, PyObject, PyResult,
     |                                              ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/conversions/std/slice.rs:3:29
     |
  3  | use crate::{types::PyBytes, FromPyObject, IntoPy, PyAny, PyObject, PyResult, Python, ToPyObject};
     |                             ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/conversions/std/string.rs:6:22
     |
  6  |     types::PyString, FromPyObject, IntoPy, Py, PyAny, PyObject, PyResult, Python, ToPyObject,
     |                      ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/impl_/extract_argument.rs:6:5
     |
  6  |     FromPyObject, PyAny, PyClass, PyErr, PyRef, PyRefMut, PyResult, Python,
     |     ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/impl_/frompyobject.rs:1:38
     |
  1  | use crate::{exceptions::PyTypeError, FromPyObject, PyAny, PyErr, PyResult, Python};
     |                                      ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/instance.rs:8:23
     |
  8  |     ffi, AsPyPointer, FromPyObject, IntoPy, PyAny, PyClass, PyClassInitializer, PyRef, PyRefMut,
     |                       ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/types/boolobject.rs:3:18
     |
  3  | use crate::{ffi, FromPyObject, IntoPy, PyAny, PyObject, PyResult, Python, ToPyObject};
     |                  ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/types/bytes.rs:1:18
     |
  1  | use crate::{ffi, FromPyObject, IntoPy, Py, PyAny, PyResult, Python, ToPyObject};
     |                  ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/types/floatob.rs:4:10
     |
  4  |     ffi, FromPyObject, IntoPy, PyAny, PyErr, PyNativeType, PyObject, PyResult, Python, ToPyObject,
     |          ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/types/sequence.rs:10:13
     |
  10 | use crate::{FromPyObject, PyTryFrom};
     |             ^^^^^^^^^^^^
     |
    ::: /home/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/types/tuple.rs:11:17
     |
  11 |     exceptions, FromPyObject, IntoPy, Py, PyAny, PyErr, PyObject, PyResult, Python, ToPyObject,
     |                 ^^^^^^^^^^^^
  
     Compiling ammonia v3.3.0
  Some errors have detailed explanations: E0432, E0463.
  For more information about an error, try `rustc --explain E0432`.
  error: could not compile `pyo3` (lib) due to 5 previous errors
  warning: build failed, waiting for other jobs to finish...
  💥 maturin failed
    Caused by: Failed to build a native library through cargo
    Caused by: Cargo build finished with "exit status: 101": `env -u CARGO PYO3_ENVIRONMENT_SIGNATURE="cpython-3.11-64bit" PYO3_PYTHON="concordia/bin/python3.11" PYTHON_SYS_EXECUTABLE="concordia/bin/python3.11" "cargo" "rustc" "--message-format" "json-render-diagnostics" "--manifest-path" "/tmp/pip-install-j86bsuzt/nh3_0a79c23311074ad0be8321120052238b/Cargo.toml" "--release" "--lib"`
  Error: command ['maturin', 'pep517', 'build-wheel', '-i', 'concordia/bin/python3.11', '--compatibility', 'off'] returned non-zero exit status 1
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for nh3
Failed to build nh3
ERROR: Could not build wheels for nh3, which is required to install pyproject.toml-based projects

Line endings \r\n converted to \n

Hi,

When I use nh3.clean(), line endings \r\n are converted to \n.
This behavior can cause issues for applications that rely on \r\n as their newline format, for example in a request body.

Example code

import nh3

input = "Line1\r\nLine2"
nh3.clean(input) # Outputs: 'Line1\nLine2'

Is this a desired behavior?

Thank you in advance!


Environment:

  • nh3 version: 0.2.18
  • Python version: 3.12.2
  • Operating System: Microsoft Windows [Version 10.0.19045.4651]

Alias for clean_text function: escape?

Sorry if I misunderstand something, but for me looks like clean_text function doesn't clean html or text, but do escape for html.
I understand that it's just mirroring API of ammonia, but perhaps good to have some, better named alias.

Would it be possible to disable adding of rel="noopener noreferrer"?

Adding rel="noopener noreferrer" to all <a> tags is not desired in our use case. Would it be possible to add a parameter to disable that?

If I read ammonia source correctly, it should be achievable by Builder.link_rel(None), so it just needs a way to expose this in the Python interface.

Access to the default URL schemes

It would be nice if the Python package could also expose ammonia’s default whitelisted URL schemes. If I understand the code correctly, this could easily be done by adding m.add("ALLOWED_URL_SCHEMES", a.clone_url_schemes())?; to the nh3 function.

clean_content_tags doesn't seem to work on tags other than <script> or <style>

Either I am misunderstanding what clean_content_tags does or it is not working correctly. I cannot get the clean_content_tags attribute to work on anything other than the two tags <script> and <style>. Using nh3 version 0.2.14, python 3.11.0.

import nh3

testItem = "<script>alert('hello')</script><p>hello</p>"
print(nh3.clean(html=testItem, clean_content_tags={'p'}))

I receive this error:

thread '<unnamed>' panicked at 'assertion failed: !self.tags.contains(tag_name)', /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ammonia-3.3.0/src/lib.rs:1792:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/Users/home/Desktop/import nh3.py", line 4, in <module>
    print(nh3.clean(html=item, tags=None, clean_content_tags={'p'}))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: assertion failed: !self.tags.contains(tag_name)

I have been able to reproduce this error with b, br, div, and img tags. I haven't tried any others. script and style tags work as expected.

PanicException if allowed attribute is missing

html = "<a href='http://www.google.com'>google.com</a>"
nh3.clean(html, tags={'a'}, attributes={'a': {'href', 'rel'}})
pyo3_runtime.PanicException: assertion failed: self.tag_attributes.get("a").and_then(|a| a.get("rel")).is_none()

Using default values in .clean() method produces unexpected output

Reading your docs, it seems like nh3.clean(html=somestring) should remove all HTML tags present, because technically tags defaults to None. Instead what it does is it removes <script> and <style> tags, but doesn't remove any others. Using nh3 version 0.2.14 and python 3.11.0.

import nh3

item = "<script>alert('hello')</script><style>hola</style><p>hello</p><b>hi</b>"
print("Output: ", nh3.clean(html=item))
#Output: <p>hello</p><b>hi</b>

I would guess if everything is left at the default, it should be removing all tags since you didn't specify any to keep. I also find it peculiar it automatically removes script and style tags when that's not described as a default behavior in the docs.

Edit to add that I just tested this, and it will remove gibberish tags. I don't understand why...

item = "<asdf>hi</ashgasf>"
print("Output: ", nh3.clean(html=item))
#Output: hi

Encoding issues with method clean()

Hello,

I'm working in project with iso-8859-15, when I pass string var to .clean() method some char are not correctly encoded.

Example str = "1 comprimé à faire dans la journée", output :

image

# -*- coding: iso-8859-15 -*-
import nh3

string = "1 comprimé à avaler par jour"

print(nh3.clean(string)) # 1 comprimé Ã&nbsp; avaler par jour

How can I bypass this, I want to clean my str but keep my char "à" ?

I'm looking for ideas, I have tried this below but make too many issues in my project : (here encoding='iso-8859-15')

Code_ufZ7zYHBHY

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.