xhan77 / ssd-lm Goto Github PK
View Code? Open in Web Editor NEWSemi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
I think it is an interesting work and would like to train the SSD-LM. The implementation looks elegant. When will the training code be uploaded?
The pretrained diffusion LM outperforms GPT2, it means a lot for diffusion model research in natural language processing. I am interested in using the pretrained diffusion LM on downstream tasks, I may ask if there is any plan to release the model? like add the model to Huggingface ?
Hi, how to configure the argument args.remove_noise_mode, just confused, please enumerate some examples, thanks.
Hi
Dear authors,
In line 230 of ssd_model_decode_fileio.py, it seems that you are calculating loss on the predicted W_0 and drift W_0 accordingly, instead of W_t as in equation (22) in the paper. May I ask why do you make such choice?
Thanks.
Hi, Thanks for sharing your implementation. It is really helpful and easy to follow.
Except for the prompt-based generation, the controlled text generation is also conducted in your experiment part. I think it is called classifier guidence in the diffusion models. But I did not found the code to run the controlled text generation.
Do you have any plan to open-source this?
the dimension length is the size of vocabulary, I doubt that with such long dimension length, if the training data or training time is enough to fully tune the model parameters ?
Hi! Thanks for the paper! A few questions:
hello,
I am trying to download the openwebtext dataset from huggingface, but I keep getting the following error:
Downloading data: 100%|________________________________________________________________________________________________________________| 12.9G/12.9G [25:43<00:00, 8.35MB/s]
/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/download/download_manager.py:527: FutureWarning: 'num_proc' was deprecated in version 2.6.2 and will be removed in 3.0.0. Pass `DownloadConfig(num_proc=<num_proc>)` to the initializer instead.
warnings.warn(
Extracting data files: 100%|________________________________________________________________________________________________________| 20610/20610 [9:43:42<00:00, 1.70s/it]
Traceback (most recent call last):
File "ssd_process_data.py", line 485, in <module>
main()
File "ssd_process_data.py", line 369, in main
raw_datasets["train"] = load_dataset(
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/load.py", line 1782, in load_dataset
builder_instance.download_and_prepare(
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/builder.py", line 872, in download_and_prepare
self._download_and_prepare(
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/builder.py", line 1649, in _download_and_prepare
super()._download_and_prepare(
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/builder.py", line 985, in _download_and_prepare
verify_splits(self.info.splits, split_dict)
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/utils/info_utils.py", line 100, in verify_splits
raise NonMatchingSplitsSizesError(str(bad_splits))
datasets.utils.info_utils.NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='train', num_bytes=39769494896, num_examples=8013769, shard_lengths=None, dataset_name=None), 'recorded': SplitInfo(name='train', num_bytes=39769065791, num_examples=8013740, shard_lengths=[101000, 100000, 101000, 101000, 102000, 102000, 101000, 102000, 101000, 101000, 101000, 101000, 101000, 102000, 101000, 101000, 101000, 101000, 102000, 102000, 100000, 101000, 100000, 101000, 102000, 101000, 102000, 101000, 102000, 102000, 102000, 101000, 101000, 101000, 101000, 102000, 101000, 102000, 101000, 101000, 100000, 101000, 101000, 101000, 101000, 101000, 101000, 101000, 101000, 101000, 101000, 100000, 101000, 102000, 101000, 101000, 101000, 101000, 101000, 102000, 102000, 101000, 102000, 101000, 102000, 102000, 101000, 101000, 102000, 102000, 102000, 101000, 102000, 102000, 102000, 101000, 101000, 102000, 101000, 13740], dataset_name='openwebtext')}]
I have tried forcing the redownloading of the dataset by passing the download_mode="force_redownload" parameter, but it yield the same error.
I have also tried passing the ignore_verifications=True
parameter, but this in turn yielded the following error:
raw_datasets["train"] = load_dataset(
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/site-packages/datasets/load.py", line 1754, in load_dataset
verification_mode = VerificationMode(
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/enum.py", line 339, in __call__
return cls.__new__(cls, value)
File "/home/nlp/sloboda1/anaconda3/envs/ssdlm/lib/python3.8/enum.py", line 663, in __new__
raise ve_exc
ValueError: 'none' is not a valid VerificationMode
Has anyone encountered such a problem, or knows what I can do?
Hi
thanks for sharing the codes and for the great work!
for reproducing GPT-3 results with script loop_baseline_gpt2.sh
, It needs the output files ctx25_trunc150_depth1_ctrlr0.0_step1000_topp0.0_sad_gen.jsonl
. Could you also update the files needed to run this script?
It helps a lot to check our outputs.
thanks
Would it be possible to share the generated text used to compute metrics for SSD-LM and baselines?
I am interested in doing some analysis of the outputs with respect to measures beyond those used in the paper and hope to avoid rerunning the full generation.
Thank you (and thank you for the generally well-documented code and interesting paper!).
Hi
thanks for the codes and great work
The ssd-lm provided in huggingface is comparable to gpt2-medium, could a smaller version of ssd-lm comparable to gpt2 be provided?
Hi Xiaochuan,
Thanks for your wonderful work. It is really eye-opening to come up with the diffusion process on the logits space instead of the token embeddings!
I have a few questions regarding the paper and the code. Could you kindly respond to them?
Thank you so much!
Hi,
Thank you for the great work and the well-documented code! I have a general question regarding SSD decoding algorithm. In the paper you mentioned that "The DDPM decoding is designed for diffusion in a continuous space and failed to generate sensible outputs in our preliminary experiments based on simplexes.", with details in the appendix. What would be an explanation for a worse sampling performance with continuous DDPM decoding on simplexes, and what was the intuition behind designing the modified sampling procedure? For example, why does using a noise z instead of using the deterministic z help?
Thank you in advance for your help!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.