Comments (3)
tried another one:
dataset = load_dataset('poloclub/diffusiondb', '2m_text_only')
also error:
ValueError Traceback (most recent call last)
File ~/sandbox/sd-prompt-analysis/venv/lib/python3.8/site-packages/datasets/builder.py:1588, in GeneratorBasedBuilder._prepare_split_single(self, gen_kwargs, fpath, file_format, max_shard_size, split_info, check_duplicate_keys, job_id)
1587 example = self.info.features.encode_example(record) if self.info.features is not None else record
-> 1588 writer.write(example, key)
1589 num_examples_progress_update += 1
File ~/sandbox/sd-prompt-analysis/venv/lib/python3.8/site-packages/datasets/arrow_writer.py:488, in ArrowWriter.write(self, example, key, writer_batch_size)
486 self.hkey_record = []
--> 488 self.write_examples_on_file()
File ~/sandbox/sd-prompt-analysis/venv/lib/python3.8/site-packages/datasets/arrow_writer.py:446, in ArrowWriter.write_examples_on_file(self)
442 batch_examples[col] = [
443 row[0][col].to_pylist()[0] if isinstance(row[0][col], (pa.Array, pa.ChunkedArray)) else row[0][col]
444 for row in self.current_examples
445 ]
--> 446 self.write_batch(batch_examples=batch_examples)
447 self.current_examples = []
File ~/sandbox/sd-prompt-analysis/venv/lib/python3.8/site-packages/datasets/arrow_writer.py:551, in ArrowWriter.write_batch(self, batch_examples, writer_batch_size)
550 typed_sequence = OptimizedTypedSequence(col_values, type=col_type, try_type=col_try_type, col=col)
--> 551 arrays.append(pa.array(typed_sequence))
552 inferred_features[col] = typed_sequence.get_inferred_type()
File ~/sandbox/sd-prompt-analysis/venv/lib/python3.8/site-packages/pyarrow/array.pxi:236, in pyarrow.lib.array()
...
1605 e = e.__context__
-> 1606 raise DatasetGenerationError("An error occurred while generating the dataset") from e
1608 yield job_id, True, (total_num_examples, total_num_bytes, writer._features, num_shards, shard_lengths)
DatasetGenerationError: An error occurred while generating the dataset
}
from diffusiondb.
Hi @harrywang, sorry for the late reply. Do you have Pillow
installed in your environment? I tried to load the dataset in a new Conda environment and got the following error message. I can successfully load the dataset once I run pip install Pillow
.
Downloading and preparing dataset diffusiondb/large_first_1k to /Users/jaywang/.cache/huggingface/datasets/poloclub___diffusiondb/large_first_1k/0.9.1/547894e3a57aa647ead68c9faf148324098f47f2bc1ab6705d670721de9d89d1...
Downloading data: 100%|███████████████████████████████████████████████████████████████| 512M/512M [05:08<00:00, 1.66MB/s]
Generating train split: 0 examples [00:01, ? examples/s]Traceback (most recent call last):
File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/builder.py", line 1587, in _prepare_split_single
example = self.info.features.encode_example(record) if self.info.features is not None else record
File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/features/features.py", line 1800, in encode_example
return encode_nested_example(self, example)
File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/features/features.py", line 1202, in encode_nested_example
{
File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/features/features.py", line 1203, in <dictcomp>
k: encode_nested_example(sub_schema, sub_obj, level=level + 1)
File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/features/features.py", line 1257, in encode_nested_example
return schema.encode_example(obj) if obj is not None else None
File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/features/image.py", line 84, in encode_example
raise ImportError("To support encoding images, please install 'Pillow'.")
ImportError: To support encoding images, please install 'Pillow'.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/load.py", line 1757, in load_dataset
builder_instance.download_and_prepare(
File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/builder.py", line 860, in download_and_prepare
self._download_and_prepare(
File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/builder.py", line 1611, in _download_and_prepare
super()._download_and_prepare(
File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/builder.py", line 953, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/builder.py", line 1449, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/builder.py", line 1606, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
from diffusiondb.
I will close the issue for now. Feel free to re-open it if pip install Pillow
doesn't solve your problem. 😺
from diffusiondb.
Related Issues (11)
- OSError: could not create decoder object HOT 3
- DatasetGenerationError: An error occurred while generating the dataset | ValueError: NaTType does not support utcoffset HOT 4
- How to use `dataset.load_dataset` with metadata if I downloaded my data using method 2? HOT 1
- 1.5M DiffusionDB Aesthetic and Artifact Ratings HOT 1
- "download -z" unzips all the images to the same directory
- Bug in download.py HOT 1
- How can I use this Dataset to train a LLM on Huggingface? HOT 1
- Bug in download.py HOT 5
- Unable to download using HuggingFace `datasets` library HOT 3
- 24gb GPU enough / supported?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from diffusiondb.