jwkirchenbauer / lm-watermarking Goto Github PK

License: Apache License 2.0

Python 8.17% Jupyter Notebook 91.71% Shell 0.10% HTML 0.02%

lm-watermarking's Introduction

💧 A Watermark for Large Language Models 🔍

Demo | Paper

Official implementation of the watermarking and detection algorithms presented in the papers:

"A Watermark for Large language Models" by John Kirchenbauer*, Jonas Geiping*, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein

"On the Reliability of Watermarks for Large Language Models" by John Kirchenbauer*, Jonas Geiping*, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, Tom Goldstein

Updates:

(6/7/23) We're thrilled to announce the release of "On the Reliability of Watermarks for Large Language Models" Our new preprint documents a deep dive into the robustness properties of more advanced watermarks.
(6/9/23) Initial code release implementing the alternate watermark and detector variants in the new work. Files located in this subdirectory: watermark_reliability_release.
(9/23/23) Update to the docs with recommendations on parameter settings. Extended implementation (recommended) available in extended_watermark_processor.py.
(1/16/24) "On the Reliability of Watermarks for Large Language Models" has been accepted for publication and will be presented at ICLR 2024 in Vienna, Austria!

Implementation is based on the "logit processor" abstraction provided by the huggingface/transformers 🤗 library.

The WatermarkLogitsProcessor is designed to be readily compatible with any model that supports the generate API. Any model that can be constructed using the AutoModelForCausalLM or AutoModelForSeq2SeqLM factories should be compatible.

Repo contents

The core implementation is defined by the WatermarkBase, WatermarkLogitsProcessor, and WatermarkDetector classes in the files watermark_processor.py, for a minimal implementation and extended_watermark_processor.py for the more full featured implementation (recommended). The demo_watermark.py script implements a gradio demo interface as well as minimum working example in the main function using the minimal version.

Details about the parameters and the detection outputs are provided in the gradio app markdown blocks as well as the argparse definition.

The homoglyphs.py and normalizers.py modules implement algorithms used by the WatermarkDetector. homoglyphs.py (and its raw data in homoglyph_data) is an updated version of the homoglyph code from the deprecated package described here: https://github.com/life4/homoglyphs. The experiments directory contains pipeline code that we used to run the original experiments in the paper. However this is stale/deprecated in favor of the implementation in watermark_processor.py.

Demo Usage

As a quickstart, the app can be launched with default args (or deployed to a huggingface Space) using app.py which is just a thin wrapper around the demo script.

python app.py
gradio app.py # for hot reloading
# or
python demo_watermark.py --model_name_or_path facebook/opt-6.7b

How to Watermark - A short guide on watermark hyperparameters

What watermark hyperparameters are optimal for your task or for a comparison to new watermarks? We'll provide a brief overview about all important settings below, and best practices for future work. This guide represents our current understanding of optimal settings as of August 2023, and so is a bit more up to date than our ICML 2023 conference paper.

TL;DR: As a baseline generation setting, we suggest default values of gamma=0.25 and delta=2.0. Reduce delta if text quality is negatively impacted. For the context width, h, we recommend a moderate value, i.e. h=4, and as a default PRF we recommend selfhash, but can use minhash if you want. Reduce h if more robustness against edits is required. Note however that the choice of PRF only matters if h>1. The recommended PRF and context width can be easily selected by instantiating the watermark processor and detector with seeding_scheme="selfhash" (a shorthand for seeding_scheme="ff-anchored_minhash_prf-4-True-15485863", but do use a different base key if actually deploying). For detection, always run with --ignore--repeated-ngrams=True.

Logit bias delta: The magnitude of delta determines the strength of the watermark. A sufficiently large value of delta recovers a "hard" watermark that encodes 1 bit of information at every token, but this is not an advisable setting, as it strongly affects model quality. A moderate delta in the range of [0.5, 2.0] is appropriate for normal use cases, but the strength of delta is relative to the entropy of the output distribution. Models that are overconfident, such as instruction-tuned models, may benefit from choosing a larger delta value. With non-infinite delta values, the watermark strength is directly proportional to the (spike) entropy of the text and exp(delta) (see Theorem 4.2 in our paper).
Context width h: Context width is the length of the context which is taken into account when seeding the watermark at each location. The longer the context, the "more random" the red/green list partitions are, and the less detectable the watermark is. For private watermarks, this implies that the watermark is harder to discover via brute-force (with an exponential increase in hardness with increasing context width h). In the limit of a very long context width, we approach the "undetectable" setting of https://eprint.iacr.org/2023/763. However, the longer the context width, the less "nuclear" the watermark is, and robustness to paraphrasing and other attacks decreases. In the limit of h=0, the watermark is independent of local context and, as such, it is minimally random, but maximally robust against edits (see https://arxiv.org/abs/2306.17439).
Ignoring repeated ngrams: The watermark is only pseudo-random based on the local context. Whenever local context repeats, this constitutes a violation of the assumption that the PRNG numbers used to seed the green/red partition operation are drawn iid. (See Sec.4. in our paper for details). For this reason, p-values for text with repeated n-grams (n-gram here meaning context + chosen token) will be misleading. As such, detection should be run with --ignore-repeated-ngrams set to True. An additional, detailed analysis of this effect can be found in http://arxiv.org/abs/2308.00113.
Choice of pseudo-random-function (PRF): This choice is only relevant if context width h>1 and determines the robustness of the hash of the context against edits. In our experiments we find "min"-hash PRFs to be the most performant in striking a balance between maximizing robustness and minimizing impact on text quality. In comparison to a PRF that depends on the entire context, this PRF only depends on a single, randomly chosen token from the context.
Self-Hashing: It is possible to extend the context width of the watermark onto the current token. This effectively extends the context width "for-free" by one. The only downside is that this approach requires hashing all possible next tokens, and applying the logit bias only to tokens where their inclusion in the context would produce a hash that includes this token on the green list. This is slow in the way we implement it, because we use cuda's pseudorandom number generator and a simple inner-loop implementation, but in principle has a negligible cost, compared to generating new tokens if engineered for deployment. A generalized algorithm for self-hashing can be found as Alg.1 in http://arxiv.org/abs/2306.04634.
Gamma: Gamma denotes the fraction of the vocabulary that will be in each green list. We find gamma=0.25 to be slightly more optimal empirically, but this is a minor effect and reasonable values of gamma between 0.25 and 0.75 will lead to reasonable watermark. A intuitive argument can be made for why this makes it easier to achieve a fraction of green tokens sufficiently higher than gamma to reject the null hypothesis, when you choose a lower gamma value.
Base Key: Our watermark is salted with a small base key of 15485863 (the millionth prime). If you deploy this watermark, we do not advise re-using this key.

How to use the watermark in your own code.

Our implementation can be added into any huggingface generation pipeline as an additional LogitProcessor, only the classes WatermarkLogitsProcessor and WatermarkDetector from the extended_watermark_processor.py file are required.

Example snippet to generate watermarked text:

from extended_watermark_processor import WatermarkLogitsProcessor

watermark_processor = WatermarkLogitsProcessor(vocab=list(tokenizer.get_vocab().values()),
                                               gamma=0.25,
                                               delta=2.0,
                                               seeding_scheme="selfhash") #equivalent to `ff-anchored_minhash_prf-4-True-15485863`
# Note:
# You can turn off self-hashing by setting the seeding scheme to `minhash`.

tokenized_input = tokenizer(input_text, return_tensors='pt').to(model.device)
# note that if the model is on cuda, then the input is on cuda
# and thus the watermarking rng is cuda-based.
# This is a different generator than the cpu-based rng in pytorch!

output_tokens = model.generate(**tokenized_input,
                               logits_processor=LogitsProcessorList([watermark_processor]))

# if decoder only model, then we need to isolate the
# newly generated tokens as only those are watermarked, the input/prompt is not
output_tokens = output_tokens[:,tokenized_input["input_ids"].shape[-1]:]

output_text = tokenizer.batch_decode(output_tokens, skip_special_tokens=True)[0]

Example snippet to detect watermarked text:

from extended_watermark_processor import WatermarkDetector

watermark_detector = WatermarkDetector(vocab=list(tokenizer.get_vocab().values()),
                                        gamma=0.25, # should match original setting
                                        seeding_scheme="selfhash", # should match original setting
                                        device=model.device, # must match the original rng device type
                                        tokenizer=tokenizer,
                                        z_threshold=4.0,
                                        normalizers=[],
                                        ignore_repeated_ngrams=True)

score_dict = watermark_detector.detect(output_text) # or any other text of interest to analyze

To recover the main settings of the experiments in the original work (for historical reasons), use the seeding scheme simple_1 and set ignore_repeated_ngrams=False at detection time.

Contributing

Suggestions and PR's welcome 🙂

lm-watermarking's People

Contributors

Stargazers

Watchers

Forkers

echallenge sathvik2612 arav-agarwal2 codeaudit blesst88 dbhurley jnhdny vinusankars tianhelu pingyihu flodorner tonyluozn tarsur909 eltociear chenchenygu peachyland kaanefekeles vidina-solutions kemolo zetangforward bigdatasciencegroup wsgan001 jonasgeiping colorcatliu statsgary jianghaopg11 lbyiuou0329 zheng-ye hellotohere monnieka jackpurcell asifshaafi d-diaa suemarsr hk-kiran bright-crest harshavardhank lawrencelcty stype0912 kartik14 dvapnek peterouzh pzd-china hjenryin cechae devillmy palak-b skpig aditya-gam mahrukhs pubgplayer-06 nivanov8 raylin281 yeahjack huang-kaibo slpanir yancy456 chechelnitskd pasottimatteo98 suehyunpark testcoder-ui mdegriffa grid7

lm-watermarking's Issues

/experiments problems

while running the run_watermarking.py under /experiments, some problems occur:

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I wonder if you can fix it.

Low GPU Utilization and Reduced Performance with Selfhash Watermarking on Vicuna-7b-v1.5-16k

Hello,

I've noticed that the GPU utilization is around 25% when using the selfhash watermarking feature with vicuna-7b-v1.5-16k. This seems to result in a performance about 5 times slower compared to lefthash under identical context width settings. I am currently operating on a single H100 GPU. Could you provide any suggestions or insights on how to address this performance issue?

Thank you!

about the model

It's hard to reach the baseline presenting in the paper.
And opt-1.3b (default setting) isn't good enough —— the original output without the watermark is not good already(with many repetition sentences).
Maybe authors can give us the experiment settings? thx

Demo gives runtime error

Demo gives:

Runtime error
Scheduling failure: not enough hardware capacity

Mismatch in vocab_size and scores Shape in WatermarkLogitsProcessor

While debugging app.py (which uses the model 'facebook/opt-125m'), I encountered an issue in watermark_processor.py. In the call method of the WatermarkLogitsProcessor class, the shape of the received scores parameter is torch.Size([1, 50272]). However, self.vocab_size is 50265. This discrepancy does not result in any errors, but it does affect the proportion of the G/R list splitting, and ideally, these values should be consistent.

[feature-question] Is this possible to add onto 3rd party responses like Open AI?

Is there a typical method of implementing this onto the outputs from a 3rd party like Open AI after the text has been generated? Is it even possible / practical?

Confused about a place in the paper

In section 6. Experiment, there is the following paragraph: "Watermark Strength vs Number of Tokens. Theory predicts that the type I and type II error rates of the watermark should decay to zero as the sequence length T increases." I am confused about how to theoretically prove that as T increases, both type I and type II errors approach 0. I am eagerly anticipating response.

runtime error in huggingface demo，Could you fix it? Thanks.

Is SelfHash agnostic?

In your readme, you mention that the only downside to the SelfHash scheme is that "this approach requires hashing all possible next tokens". However, I believe that the SelfHash scheme additionally requires any detector to have logit-level model access and thus does not fulfill the agnostic requirement by (Kuditipudi et al., 2023). Am I missing something?

Not a common Model Type in transformers.

I need to add a watermark to the large language model chatglm-6b from Tsinghua University. Unfortunately, the configuration file for chatglm-6b is in the ChatGLMConfig format, which is not a common model type in transformers. Therefore, I encounter an error: ValueError: Unrecognized configuration class <class 'transformers_modules.chatglm-6b.configuration_chatglm.ChatGLMConfig'> for this kind of AutoModel: AutoModelForCausalLM. Can you provide some suggestions to help resolve this issue?

plz solve the demo runtime error asap

memory issue

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 23.69 GiB total capacity; 22.80 GiB already allocated; 188.94 MiB free; 22.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

hope you can propose a method that allows it to run on a single card (RTX 3090) with 24GB VRAM, or provide the minimum required configuration for running it.

4ca6de4334150f3d28ce24147794cf3668431297

unable to run the demo

Hello!
I wanted to run the demo on Hugging face but I got this error: Runtime error.

Also, when I tried running "python demo_watermark.py --model_name_or_path facebook/opt-6.7b", it is getting stuck after the "Generating with...".

Watermarking a Prompt

Is it possible to just waterwark the input?
Say I want to see if I can watermark a prompt such as :
"You are a expert ML system engineer. How do I integrate X with Y"
to which I will get a similar prompt back, just watermarked?

thanks

Figure 4 does not correspond to its caption

The caption writes '(c) The impact of the green list size parameter $\gamma$ on the z-score'. However, the figure looks like $\delta$.

watermark for OPENAI API calls?

dear Author,
If I want to add watermark when I directly call GTP API calls(as shown as below) in product development, is the approach demostrated in paper will also works for API calls? Any suggestion for how I can achieve this?

`POST https://{endpoint}/openai/deployments/{deployment-id}/completions?api-version=2024-06-01

{
"prompt": [
"tell me a joke about mango"
],
"max_tokens": 32,
"temperature": 1.0,
"n": 1
}`

Inquiry

Hi, does the proposed watermark works also with other models? how about llama2?

Dependency versions

The demo presentation is not user-friendly for personal notebooks. It is desired to provide configuration instructions. Additionally, explicit documentation of environment dependencies, including versions, is requested.

Error when input text is longer than model's max_sequence

Hi John.
Thank you for the great work!

I noticed a bug in the legacy code in the ./experiments/ directory due to using the model_max_seq_len=None argument when tokenizing the input text. An error occurs on the C4 dataset for the sample index=74 as the tokenizer defaults to no truncation. (EDIT: using the example hyperparameter configuration in ./experiments/README.md)

This can be easily fixed by commenting out the above argument and uncommenting the model_max_seq_len=model.config.max_position_embeddings.

I just wanted to check if using the max_position_embeddings to truncate the input text is okay? Wondering why this was commented out 🤔.

Thanks.

Configurations for replicating the results in the paper

Hi!

I am trying to replicate some of the results in the paper to analyze the outputs, but I am getting different oracle PPL.
Do you have plans to release the configuration files(e.g. gen_table_meta.json, shell scripts) for the results in the paper?

For instance, I am trying to replicate one of the results in Figure 2 right (greedy) when using \sigma=2, \gamma=0.5, but I get PPL around 2.0, which is higher than expected.
This is my configuration. Would you help me out?

python run_watermarking.py \
    --model_name facebook/opt-1.3b \
    --dataset_name c4 \
    --dataset_config_name realnewslike \
    --max_new_tokens 200 \
    --min_prompt_tokens 50 \
    --limit_indices 500 \
    --input_truncation_strategy completion_length \
    --input_filtering_strategy prompt_and_completion_length \
    --output_filtering_strategy no_filter \
    --dynamic_seed markov_1 \
    --bl_proportion 0.5 \
    --bl_logit_bias 2.0 \
    --bl_type soft \
    --store_spike_ents True \
    --num_beams 1 \
    --use_sampling False \
    --sampling_temp 0.7 \
    --oracle_model_name facebook/opt-2.7b \
    --run_name example_run \
    --output_dir ./all_runs/${OUTPUT_DIR} \
    --load_prev_generations F --no_wandb T --all_gas_no_eos T  \
    --no_repeat_ngram_size 8 # tried 0 as well

I also uncommented this part following "In the runs using greedy and beam search decoding [...] we truncate all sequences to T = 200." in Section 6.

    output_length_filtered_dataset = generations_dataset.filter(output_check,
                                                                batched=False,
                                                                with_indices=False)

Thanks.

relatively high perplexity on baseline complete text & other datasets

I am currently using the official code for "On the Reliability of Watermarks for Large Language
Models", testing on C4 dataset. I found that the baseline completion PPL measured using facebook/opt-6.7b oracle model is >9 and I think this is relatively high. However when I use llama models to generate watermark text and non-watermark text, the perplexity seems to be only around 2 or 3(under default parameters). I am not sure whether I made some mistake to get such a large PPL on human written text baseline. I am wondering if you could help to try to measure this.

Also, could you provide the hyperparamaters used for generation and evaluation on LFQA dataset and Wikitext dataset? Thanks a lot!

Question about loading the dataset

Hello,

I have been trying to load the dataset using Huggingface's datasets library, but have run into an issue with a "FileNotFoundError" message like this
`Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/aiohttp/connector.py", line 1154, in _create_direct_connection
hosts = await asyncio.shield(host_resolved)
File "/opt/conda/lib/python3.7/site-packages/aiohttp/connector.py", line 880, in _resolve_host
addrs = await self._resolver.resolve(host, port, family=self._family)
File "/opt/conda/lib/python3.7/site-packages/aiohttp/resolver.py", line 38, in resolve
flags=socket.AI_ADDRCONFIG,
File "/opt/conda/lib/python3.7/asyncio/base_events.py", line 792, in getaddrinfo
None, getaddr_func, host, port, family, type, proto, flags)
File "/opt/conda/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/conda/lib/python3.7/socket.py", line 752, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/fsspec/implementations/http.py", line 396, in _info
**kwargs,
File "/opt/conda/lib/python3.7/site-packages/fsspec/implementations/http.py", line 768, in _file_info
r = await session.get(url, allow_redirects=ar, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/aiohttp/client.py", line 536, in _request
req, traces=traces, timeout=real_timeout
File "/opt/conda/lib/python3.7/site-packages/aiohttp/connector.py", line 542, in connect
proto = await self._create_connection(req, traces, timeout)
File "/opt/conda/lib/python3.7/site-packages/aiohttp/connector.py", line 907, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File "/opt/conda/lib/python3.7/site-packages/aiohttp/connector.py", line 1166, in _create_direct_connection
raise ClientConnectorError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host huggingface.co:443 ssl:default [Temporary failure in name resolution]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/watermarking/lm-watermarking/pilot.py", line 12, in
next(ds_iterator)
File "/opt/conda/lib/python3.7/site-packages/datasets/iterable_dataset.py", line 937, in iter
for key, example in ex_iterable:
File "/opt/conda/lib/python3.7/site-packages/datasets/iterable_dataset.py", line 113, in iter
yield from self.generate_examples_fn(**self.kwargs)
File "/root/.cache/huggingface/modules/datasets_modules/datasets/c4/df532b158939272d032cc63ef19cd5b83e9b4d00c922b833e4cb18b2e9869b01/c4.py", line 88, in _generate_examples
with gzip.open(open(filepath, "rb"), "rt", encoding="utf-8") as f:
File "/opt/conda/lib/python3.7/site-packages/datasets/streaming.py", line 70, in wrapper
return function(*args, use_auth_token=use_auth_token, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/datasets/download/streaming_download_manager.py", line 495, in xopen
file_obj = fsspec.open(file, mode=mode, *args, **kwargs).open()
File "/opt/conda/lib/python3.7/site-packages/fsspec/core.py", line 141, in open
out = self.enter()
File "/opt/conda/lib/python3.7/site-packages/fsspec/core.py", line 104, in enter
f = self.fs.open(self.path, mode=mode)
File "/opt/conda/lib/python3.7/site-packages/fsspec/spec.py", line 1043, in open
**kwargs,
File "/opt/conda/lib/python3.7/site-packages/fsspec/implementations/http.py", line 340, in _open
size = size or self.info(path, **kwargs)["size"]
File "/opt/conda/lib/python3.7/site-packages/fsspec/asyn.py", line 86, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/fsspec/asyn.py", line 66, in sync
raise return_result
File "/opt/conda/lib/python3.7/site-packages/fsspec/asyn.py", line 26, in _runner
result[0] = await coro
File "/opt/conda/lib/python3.7/site-packages/fsspec/implementations/http.py", line 404, in _info
raise FileNotFoundError(url) from exc
FileNotFoundError: https://huggingface.co/datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/realnewslike/c4-train.00000-of-00512.json.gz`.
I have tried to troubleshoot the issue by upgrading to the latest version of the datasets library, but the problem persists. However, I found that I can download the dataset directly from the link provided without any issues.

I was wondering if you have encountered this issue before and if you might have any suggestions on how to resolve it.

Thank you for your time and help!

Attack experiment

Hi,

Your work is very impressive and I am extremely interested in it. I notice that you mentioned "Attack experiment code" in "Pending Items". I would like to know if you are already working on the attack and where I can follow up on your progress.

Thanks.

problems with running code in ./experients

after running ./experients/run_watermarking.py 2 times with parameters:
(1) --limit_indices 30 bl_proportion 0.5 --bl_logit_bias 2.0 --oracle_model_name facebook/opt-1.3b(other parameters using examples shown in readme.md)
(2) --limit_indices 30 bl_proportion 0.5 --bl_logit_bias 2.0 --oracle_model_name facebook/opt-1.3b(other parameters using examples shown in readme.md)
then running ./experients/process_rows.py (having modified the relevant paths)
bugs occur:(like this)
baseline input replacement bug occurred!
no_bl_output input replacement bug occurred!
w_bl_output input replacement bug occurred!

hoping you can solve this

Reliability Experiments

Hello,

I am trying to repeat the experiments in the paper and trying to using the figure notes .. but I am not sure what data to extract and how for each note .. I am focusing now in the robustness also it seems the code is not generating or adding any tags to wandb
I really appreciate you guidance here

Regards

How to calculate AUC?

Hi,

When detecting text, the score dictionary is returned. The fields include num_tokens_scored, num_green_tokens, green_fraction, z_score, p_value, z_score_at_T, prediction, and confidence. When calculating AUC,
metrics.roc_curve(all_pred, all_true, pos_label=1) 。

When the z_score of a text is greater than a threshold (for example, 4), we predict True. But all_pred is the probability of True. I don’t know how all_pred is understood and calculated in this paper.

Maybe this question is relatively simple, but I really don’t understand it. I look forward to your reply.

Thanks

Package Installation

Hi, I am trying to run the code run_watermark.py in the experiments folder. However, the modules associated with the library, watermark, does not import and it does not seem to exist. What can I do about this? Thanks!

inquiry about z-score

Hi,
just inquiry why you selected the z-threshold to be 4?

Regards

Problom about the watermark generation and detection in vLLM

When I use the vllm to accelerate the inference of my LLM (chatGLM3-6b), There is a problem that the input tokens in vLLM are not passed to the logits_processor (only ouput tokens will), which seems to align with the logic of watermark detection as well, because during detection, we can only see the output content. However, In the example code of extended_watermark_processor.py, the input tokens are used as random seeds to generate the red and green lists at the beginning. My current solution is not to modify the output tokens if output length is less than context_width. I wonder if this approach has a significant impact on the results theoretically.

In the experiment, it is true that some watermarks can be detected without vLLM architecture, but cannot be detected after using vLLM. However, I am not sure whether it is caused by the processing logic I mentioned above.

HF demo link in README.md leads to a runtime error page

The hugging face demo link in your README.md: https://huggingface.co/spaces/tomg-group-umd/lm-watermarking does not lead to a proper demo page:

Error 404

Demo Error

Hi,
I wanted to run the demo on hugging face but I received this error

Runtime error
Scheduling failure: not enough hardware capacity

Date for code release

I do not want to be a bother - but as this paper has achieved some fame from ComputerPhile and whatnot, I personally am interested in seeing the code. As yesterday was the planned day of release, I wanted to just get a new date to check if this releases.

About Experiments

Hello there,

I just wanted to start by saying that I really appreciate your work! Your contributions are very valuable, and I've found your methods for detecting generated content from large models to be particularly insightful.

I was wondering if I could clarify something with you. Is it correct for me to assume that the code in the root directory is the polished version of the code in the experiments directory (e.g. experiments/watermark.py -> watermark_processor.py)? Additionally, if I wanted to replicate the main experiments mentioned in the paper, could you provide some guidance on the process?

Thank you so much for your time, and I look forward to hearing from you soon.

About the attacking code

Could you provide some Code implementing the T5 span attack mentioned in your paper?
(for beginners)

Curious about your GPU specification

Hello, thanks for your great work!
While reading the paper, I became curious about your experimental setting.
I wanted to know the GPU type that you used for the experiment, but couldn't find it in the paper...
May I ask you the type of GPU that you used, and how long did it take for text completion generation (500 samples?) in your environment?
Thanks!!

jwkirchenbauer / lm-watermarking Goto Github PK

lm-watermarking's Introduction

💧 A Watermark for Large Language Models 🔍

Demo | Paper

Updates:

Repo contents

Demo Usage

How to Watermark - A short guide on watermark hyperparameters

How to use the watermark in your own code.

Contributing

lm-watermarking's People

Contributors

Stargazers

Watchers

Forkers

lm-watermarking's Issues

Recommend Projects

Recommend Topics

Recommend Org