Code Monkey home page Code Monkey logo

Comments (3)

harrywang avatar harrywang commented on May 16, 2024

tried another one:

dataset = load_dataset('poloclub/diffusiondb', '2m_text_only')

also error:

ValueError                                Traceback (most recent call last)
File ~/sandbox/sd-prompt-analysis/venv/lib/python3.8/site-packages/datasets/builder.py:1588, in GeneratorBasedBuilder._prepare_split_single(self, gen_kwargs, fpath, file_format, max_shard_size, split_info, check_duplicate_keys, job_id)
   1587 example = self.info.features.encode_example(record) if self.info.features is not None else record
-> 1588 writer.write(example, key)
   1589 num_examples_progress_update += 1

File ~/sandbox/sd-prompt-analysis/venv/lib/python3.8/site-packages/datasets/arrow_writer.py:488, in ArrowWriter.write(self, example, key, writer_batch_size)
    486     self.hkey_record = []
--> 488 self.write_examples_on_file()

File ~/sandbox/sd-prompt-analysis/venv/lib/python3.8/site-packages/datasets/arrow_writer.py:446, in ArrowWriter.write_examples_on_file(self)
    442         batch_examples[col] = [
    443             row[0][col].to_pylist()[0] if isinstance(row[0][col], (pa.Array, pa.ChunkedArray)) else row[0][col]
    444             for row in self.current_examples
    445         ]
--> 446 self.write_batch(batch_examples=batch_examples)
    447 self.current_examples = []

File ~/sandbox/sd-prompt-analysis/venv/lib/python3.8/site-packages/datasets/arrow_writer.py:551, in ArrowWriter.write_batch(self, batch_examples, writer_batch_size)
    550 typed_sequence = OptimizedTypedSequence(col_values, type=col_type, try_type=col_try_type, col=col)
--> 551 arrays.append(pa.array(typed_sequence))
    552 inferred_features[col] = typed_sequence.get_inferred_type()

File ~/sandbox/sd-prompt-analysis/venv/lib/python3.8/site-packages/pyarrow/array.pxi:236, in pyarrow.lib.array()
...
   1605         e = e.__context__
-> 1606     raise DatasetGenerationError("An error occurred while generating the dataset") from e
   1608 yield job_id, True, (total_num_examples, total_num_bytes, writer._features, num_shards, shard_lengths)

DatasetGenerationError: An error occurred while generating the dataset
}

from diffusiondb.

xiaohk avatar xiaohk commented on May 16, 2024

Hi @harrywang, sorry for the late reply. Do you have Pillow installed in your environment? I tried to load the dataset in a new Conda environment and got the following error message. I can successfully load the dataset once I run pip install Pillow.

Downloading and preparing dataset diffusiondb/large_first_1k to /Users/jaywang/.cache/huggingface/datasets/poloclub___diffusiondb/large_first_1k/0.9.1/547894e3a57aa647ead68c9faf148324098f47f2bc1ab6705d670721de9d89d1...
Downloading data: 100%|███████████████████████████████████████████████████████████████| 512M/512M [05:08<00:00, 1.66MB/s]
Generating train split: 0 examples [00:01, ? examples/s]Traceback (most recent call last):
  File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/builder.py", line 1587, in _prepare_split_single
    example = self.info.features.encode_example(record) if self.info.features is not None else record
  File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/features/features.py", line 1800, in encode_example
    return encode_nested_example(self, example)
  File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/features/features.py", line 1202, in encode_nested_example
    {
  File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/features/features.py", line 1203, in <dictcomp>
    k: encode_nested_example(sub_schema, sub_obj, level=level + 1)
  File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/features/features.py", line 1257, in encode_nested_example
    return schema.encode_example(obj) if obj is not None else None
  File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/features/image.py", line 84, in encode_example
    raise ImportError("To support encoding images, please install 'Pillow'.")
ImportError: To support encoding images, please install 'Pillow'.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/load.py", line 1757, in load_dataset
    builder_instance.download_and_prepare(
  File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/builder.py", line 860, in download_and_prepare
    self._download_and_prepare(
  File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/builder.py", line 1611, in _download_and_prepare
    super()._download_and_prepare(
  File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/builder.py", line 953, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/builder.py", line 1449, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
  File "/Users/jaywang/miniconda3/envs/temp-nlp/lib/python3.9/site-packages/datasets/builder.py", line 1606, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

from diffusiondb.

xiaohk avatar xiaohk commented on May 16, 2024

I will close the issue for now. Feel free to re-open it if pip install Pillow doesn't solve your problem. 😺

from diffusiondb.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.