weixi-feng / layoutgpt Goto Github PK

View Code? Open in Web Editor NEW

277.0 13.0 20.0 91.87 MB

Official repo for LayoutGPT

License: MIT License

Python 93.30% C 0.40% C++ 1.05% Cuda 5.23% Dockerfile 0.02%

layoutgpt's People

Contributors

Stargazers

Watchers

layoutgpt's Issues

Why distinguish between spatial and quantitative data sets?

Can anybody tell me why distinguish between spatial and counting data sets?

Are the CSS structures an input along with the Condition and the image? How are the CSS structures for an examplr retrieved?

I have a couple of questions regarding the usage and retrieval of CSS structures in conjunction with conditions and images. I would appreciate some clarification on the following points:

Are CSS structures considered as an input alongside conditions and images? If so, how are they incorporated into the overall system?
Can you please provide guidance on how to retrieve CSS structures for a specific example or scenario? What are the recommended methods or resources for accessing and utilizing CSS structures effectively?

[email protected]
Saleh Ahmad Github

No such file or directory: 'gligen_checkpoints/checkpoint_generation_text.pth'

2D Image Layout Generation

The generated layout will be saved to ./llm_output/counting by default. To generate images based on the layouts, run

`(layoutgpt) ubuntu@host13:~/nj/text-3d/LayoutGPT/gligen$ python gligen_layout_counting.py --file ../llm_output/counting/gpt4.counting. k-similar.k_8.px_64.json --batch_size 5

Images will be saved at ./generation_samples/counting/gpt4.counting.k-similar.k_8.px_64
Traceback (most recent call last):
File "gligen_layout_counting.py", line 427, in
_main(args)
File "gligen_layout_counting.py", line 410, in _main
run(meta_list, args, starting_noise)
File "/home/ubuntu/anaconda3/envs/layoutgpt/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "gligen_layout_counting.py", line 282, in run
model, autoencoder, text_encoder, diffusion, config = load_ckpt(meta["ckpt"])
File "gligen_layout_counting.py", line 74, in load_ckpt
saved_ckpt = torch.load(ckpt_path)
File "/home/ubuntu/anaconda3/envs/layoutgpt/lib/python3.8/site-packages/torch/serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/ubuntu/anaconda3/envs/layoutgpt/lib/python3.8/site-packages/torch/serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/ubuntu/anaconda3/envs/layoutgpt/lib/python3.8/site-packages/torch/serialization.py", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'gligen_checkpoints/checkpoint_generation_text.pth'
`

No such file or directory: 'gligen_checkpoints/checkpoint_generation_text.pth'

For GLIGEN, download the Box+Text checkpoint and put it under gligen/gligen_checkpoints, so I can download diffusion_pytorch_model.bin under that website.
But I cann't find which step do this job(downloading checkpoint_generation_text.pth), can you help me?

Question about prepare data

Hi，Thank you for your amazing job. I would like to try this layoutGPT for 3D indoor scene generation. So, do I need to do these steps below?

Download and unzip 3D-FRONT.zip and 3D-FUTURE-model.zip
Preprocess the scenes to generate ground truth views
Looking forward to your response, TKS!

GLIGEN checkpoint not match

Hi,
I'm running into this issue in the layout to image step:

Traceback (most recent call last):File "/mnt/LayoutGPT/gligen/gligen layout counting.py", line 427, in main(args)
File "/mnt/LayoutGPT/gligen/gligen layout counting.py", line 410, in _mainrun(meta list, args, starting noise)File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/torch/utils/ contextlib.py", line 115, in decorate contextreturn func(*args，**kwargs)File "/mnt/LayoutGpT/gligen/gligen layout counting.py", line 282, in run
model, autoencoder, text encoder, diffusion, config = load ckpt(metar"ckpt"])
File "/mnt/LayoutGPT/gligen/gligen layout counting.py", line 85, in load ckpttext encoder.load state dict( saved ckptr"text encoder"File "/root/miniconda3/envs/myconda/lib/python3.9/site-packapes/torch/mn/modules/module,py", line 2041, in load state dictraise RuntimeError('Error(s) in loading state dict for :nlt'.format(RuntimeError: Error(s) in loading state dict for FrozenCLIPEmbedder:
Unexpected kev(s) in state dict:
"transformer.text model.embeddings .position ids"

I'm sure I used the right GLIGEN checkpoint as instructed, but not able to figure out why there were unexpected keys.

How to visualize the 3D indoor scenes?

Hello, I would like to express my appreciation for your admirable work. I must say that the visualizations in the paper are quite impressive. I am curious about how you created the 3D indoor scenes. Would it be possible for you to share the codes or scripts that you used? I would be grateful. Thank you.

BTW, may I ask about the timeline for open-sourcing?

About token's length

When I run the scriptrun_layoutgpt_3d.py, the following error occurs:
This model's maximum context length is 4097 tokens. However, your messages resulted in 4109 tokens. Please reduce the length of the messages.Input too long. Will shrink the prompting examples

Regarding the baseline approach based on LayoutTransformer [ICCV2021]

Hi Feng, Thank you for releasing the code of your great work!

As for one of the baseline approached mentioned in the paper, i.e., the modified LayoutTransformer, are you planning on also releasing your re-implementation? It would be of great help since the original LayoutTransformer does not receive textual description as input.

Best Regard

Question about your supporting set

I cannot find any information about your supporting set, which is NSR-1K/counting/counting.train.json.

The only thing I know is it is extracted from the MSCOCO-2017 dataset.

Could you give me more specific information about this json file? (i.e. how you choose some images)

IndexError: list index out of range

Hi! This is a great repo!

I have finished the data preparation steps and generated the scene by run_layoutgpt_3d.py.
I'm trying to visualize the generated 3D layout by rendering the image scenes:

basedir="/mnt/data/agic/data"

visualization_output_dir="./vis"
output_directory=$basedir"/output_pickle/threed_future_model_bedroom.pkl"
python render_from_file.py ../config/bedrooms_eval_config.yaml $visualization_output_dir $output_directory ../demo/floor_plan_texture_images \
    ../../llm_output/3D/gpt3.5.bedroom.k-similar.k_8.px_regular.json \
    --up_vector 0,1,0 --camera_position 2,2,2 --split test_regular --export_scene

I met the following error:

Traceback (most recent call last):
  File "render_from_files.py", line 383, in <module>
    main(sys.argv[1:])
  File "render_from_files.py", line 318, in main
    renderables, trimesh_meshes = get_textured_objects(
  File "/mnt/data/agic/LayoutGPT/ATISS/scene_synthesis/utils.py", line 31, in get_textured_objects
    raw_mesh = TexturedMesh.from_file(furniture.raw_model_path)
  File "/home/chenzheng/anaconda3/envs/atiss/lib/python3.8/site-packages/simple_3dviz/renderables/textured_mesh.py", line 300, in from_file
    mtl = read_material_file(mesh.material_file)
  File "/home/chenzheng/anaconda3/envs/atiss/lib/python3.8/site-packages/simple_3dviz/io/__init__.py", line 27, in read_material_file
    return {
  File "/home/chenzheng/anaconda3/envs/atiss/lib/python3.8/site-packages/simple_3dviz/io/material.py", line 25, in __init__
    self.read(filename)
  File "/home/chenzheng/anaconda3/envs/atiss/lib/python3.8/site-packages/simple_3dviz/io/material.py", line 113, in read
    self._Ns = float([
IndexError: list index out of range

I do not understand why this error occurred and don't know how to resolve it. Could you help me?

I found that simple_3dviz/io/material.py tried to read data from
3D-FUTURE-model/c2fecd9b-c61e-423a-a48d-08c63931cd1f/model.mtl
And simple_3dviz assumed that model.mtl contains the specular exponent, namely a line starting with "Ns". But in fact, I do not find that. How should I resolve this gap?

The content of model.mtl:

newmtl solid_001_wire

d 1
Tr 0
Tf 1.000000 1.000000 1.000000
illum 2
Ka 0.000000 0.000000 0.000000
Kd 0.000000 0.000000 0.000000
Ks 0.313725 0.313725 0.313725
Ke 0.000000 0.000000 0.000000

map_Ka ./texture.png
map_Kd ./texture.png


newmtl solid_002_wire

d 1
Tr 0
Tf 1.000000 1.000000 1.000000
illum 2
Ka 0.196078 0.196078 0.196078
Kd 0.196078 0.196078 0.196078
Ks 0.705882 0.705882 0.705882
Ke 0.000000 0.000000 0.000000

map_Ka ./texture.png
map_Kd ./texture.png


newmtl solid_003_wire

d 1
Tr 0
Tf 1.000000 1.000000 1.000000
illum 2
Ka 0.500000 0.500000 0.500000
Kd 0.500000 0.500000 0.500000
Ks 0.117647 0.117647 0.117647
Ke 0.000000 0.000000 0.000000

map_Ka ./texture.png
map_Kd ./texture.png


newmtl solid_004_wire

d 1
Tr 0
Tf 1.000000 1.000000 1.000000
illum 2
Ka 0.500000 0.500000 0.500000
Kd 0.500000 0.500000 0.500000
Ks 0.098039 0.098039 0.098039
Ke 0.000000 0.000000 0.000000

map_Ka ./texture.png
map_Kd ./texture.png


newmtl solid_005_wire

d 1
Tr 0
Tf 1.000000 1.000000 1.000000
illum 2
Ka 0.500000 0.500000 0.500000
Kd 0.500000 0.500000 0.500000
Ks 0.137255 0.137255 0.137255
Ke 0.000000 0.000000 0.000000

map_Ka ./texture.png
map_Kd ./texture.png

Can't load tokenizer for 'gpt2

Traceback (most recent call last):
File "/data/users/yyy/LayoutGPT/run_layoutgpt_2d.py", line 20, in
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
File "/home/yyy/anaconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1825, in from_pretrained
init_kwargs = json.load(tokenizer_config_handle)
OSError: Can't load tokenizer for 'gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gpt2' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer.

how to fix it?

Unable to download backbone

I downloaded the backbone through the link and cannot proceed with the download due to permission reasons
The error display is as follows：
wget https://penzhanwu2bbs.blob.core.windows.net/data/GLIPv1_Open/models/swin_large_patch4_window12_384_22k.pth -O swin_large_patch4_window12_384_22k.pth
wget : PublicAccessNotPermittedPublic access is not permitted on this storage account. RequestId:54b491d5-a01e-0070-1f81-cb068d000000 Time:2023-08-10T11:57:32.7421018Z

Low Numerical Reasoning Accuracy

Hi,
After running evaluation scripts, I got similar accuracy to those in the paper except for numerical reasoning accuracy. I got only around 0.24 accuracy after running the glip evaluation.
I haven't checked the evaluation code's details, but may I have a hint about what might be the reason causing this issue?

[DOUBT] Issue in Bounding Boxes of NSR-1k dataset

Hi, I was visualising the bboxes of the NSR-1K dataset, the boxes seems incorrect compared to the original-coco boxes.

counting = read_json("LayoutGPT/dataset/NSR-1K/counting/counting.train.json")
caps = read_json('COCO/annotations/captions_train2017.json')
image_info = caps['images']

def visualize_data(idx):
    sample = counting[idx]
    img_id = sample['image_id']
    for k in image_info:
        if k['id'] == img_id:
            img_name = k['file_name']
            H = k['height']
            W = k['width']
    print(img_id)
    img = Image.open(f'COCO/train2017/{img_name}').convert('RGB')
    draw = ImageDraw.Draw(img)
    for lst in sample['object_list']:
        text = lst[0]
        x,y,w,h = lst[1]
        x = (x*W)
        y = (y*H)
        w = (w*W)
        h = (h*H)
        print(x,y,w,h)
        draw.rectangle([(x,y), (x+w, y+h)], outline=(255,0,0))
        print(text)
        draw.text((x,y-10), text, (0,0,0))
    plt.imshow(img)
    plt.show()

The boxes for image_id = 45247 in counting.train.json files in [x,y,w,h] format (assumed) are:
[0.75 ,93.37, 273.27, 183.09] and [280.38, 112.05, 202.96, 147.68]
whereas in the original annotations are:
[0.75, 56.71, 273.27, 274.91] and [280.38, 84.75, 202.96, 221.75]

The x-axes matches completely but the y-axes is wrong in almost all the images. Here's an example
I might be doing this wrong, any help is appreciated!!

Red boxes are taken from NSR-1K dataset, Green are from MSCOCO GT

Cannot reproduce the result (low accuracy for counting evaluation)

Hi, thanks for the great work. However, I have problems when reproducing the counting accuracy result.

I used the provided layout file "gpt4.counting.k-similar.k_8.px_64.json" under "llm_output/counting/" to generate images.
Using the provided yml file, I have torch vision conflict problem when running evaluation code. So I used docker environment provided by GLIP and got the code to run successfully.
However, for counting accuracy, the evaluation result is only 23%. I checked the output of the images by GLIP, the detecting result seems unsatisfying. However, the layout accuracy evaluation performs well, so I assumed that's not GLIP's problem.

Any hint for what might have caused the problem is helpful!

missing dataset_stats.txt file in preprocessed dataset

Hi,

I am trying to run LayoutGPT on indoor scene generation and used the preprocessing dataset that you provided. Unfortunately, it seems that it is missing a file "dataset_stats.txt" which is called here. Can you please provide the missing files for the preprocessed dataset?

Thanks!

Why do we need to run the preprocess_data from ATISS?

Hi, there. Thank you for your wonderful work~ I am wondering why we need to process the 3D layout dataset. Since your method is training-free, we only need to sample the furniture from the 3D-FUTURE dataset and place them using the CSS code generated by your method. 😄

I am asking this for ATISS's buggy code of data processing.

weixi-feng / layoutgpt Goto Github PK

layoutgpt's People

Contributors

Stargazers

Watchers

Forkers

layoutgpt's Issues

Recommend Projects

Recommend Topics

Recommend Org