Code Monkey home page Code Monkey logo

wonderjourney's Introduction

WonderJourney: Going from Anywhere to Everywhere

a arXiv twitter

alice.mp4
real_campus_long.mp4

Getting Started

Installation

For the installation to be done correctly, please proceed only with CUDA-compatible GPU available. It requires 24GB GPU memory to run.

Clone the repo and create the environment:

git clone https://github.com/KovenYu/WonderJourney.git
cd WonderJourney
mamba create --name wonderjourney python=3.10
mamba activate wonderjourney

We are using Pytorch3D to perform rendering. Run the following commands to install it or follow their installation guide (it may take some time).

mamba install pytorch=1.13.0 torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
mamba install -c fvcore -c iopath -c conda-forge fvcore iopath
mamba install -c bottler nvidiacub
mamba install pytorch3d -c pytorch3d

Install the rest of the requirements:

pip install -r requirements.txt

Load English language model for spacy:

python -m spacy download en_core_web_sm

Export your OpenAI api_key (since we use GPT-4 to generate scene descriptions):

export OPENAI_API_KEY='your_api_key_here'

Download Midas DPT model and put it to the root directory.

wget https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_beit_large_512.pt

Run examples

  • Example config file

    To run an example, first you need to write a config. An example config ./config/village.yaml is shown below:

    runs_dir: output/56_village
    
    example_name: village
    
    seed: -1
    frames: 10
    save_fps: 10
    
    finetune_decoder_gen: True
    finetune_decoder_interp: False  # Turn on this for higher-quality rendered video
    finetune_depth_model: True
    
    num_scenes: 4
    num_keyframes: 2
    use_gpt: True
    kf2_upsample_coef: 4
    skip_interp: False
    skip_gen: False
    enable_regenerate: True
    
    debug: True
    inpainting_resolution_gen: 512
    
    rotation_range: 0.45
    rotation_path: [0, 0, 0, 1, 1, 0, 0, 0]
    camera_speed_multiplier_rotation: 0.2

    The total frames of the generated example is num_scenes $\times$ num_keyframes. You can manually adjust rotation_path in the config file to control the rotation state of the camera in each frame. A value of $0$ indicates moving straight, $1$ signifies a right turn, and $-1$ indicates a left turn.

  • Run

    python run.py --example_config config/village.yaml

    You will see results in output/56_village/{time-string}_merged.

How to add more examples?

We highly encourage you to add new images and try new stuff! You would need to do the image-caption pairing separately (e.g., using DALL-E to generate image and GPT4V to generate description).

  • Add a new image in ./examples/images/.

  • Add content of this new image in ./examples/examples.yaml.

    Here is an example:

    - name: new_example
      image_filepath: examples/images/new_example.png
      style_prompt: DSLR 35mm landscape
      content_prompt: scene name, object 1, object 2, object 3
      negative_prompt: ''
      background: ''
    • content_prompt: "scene name", "object 1", "object 2", "object 3"

    • negative_prompt and background are optional

    For controlled journey, you need to add control_text. Examples are as follow:

    - name: poem_jiangxue
      image_filepath: examples/images/60_poem_jiangxue.png
      style_prompt: black and white color ink painting
      content_prompt: Expansive mountainous landscape, old man in traditional attire, calm river, mountains
      negative_prompt: ""
      background: ""
      control_text: ["千山鸟飞绝", "万径人踪灭", "孤舟蓑笠翁", "独钓寒江雪"]
      
    - name: poem_snowy_evening
      image_filepath: examples/images/72_poem_snowy_evening.png
      style_prompt: Monet painting
      content_prompt: Stopping by woods on a snowy evening, woods, snow, village
      negative_prompt: ""
      background: ""
      control_text: ["Snowy Woods and Farmhouse: A secluded farmhouse, a frozen lake, a dense thicket, a quiet meadow, a chilly wind, a pale twilight, a covered bridge, a rustic fence, a snow-laden tree, and a frosty ground", "The Traveler's Horse: A restless horse, a jingling harness, a snowy mane, a curious gaze, a sturdy hoof, a foggy breath, a leather saddle, a woolen blanket, a frost-covered tail, and a patient stance", "Snowfall in the Woods: A gentle snowflake, a whispering wind, a soft flurry, a white blanket, a twinkling icicle, a bare branch, a hushed forest, a crystalline droplet, a serene atmosphere, and a quiet night", "Deep, Dark Woods in the Evening: A mysterious grove, a shadowy tree, a darkened sky, a hidden trail, a silent owl, a moonlit glade, a dense underbrush, a quiet clearing, a looming branch, and an eerie stillness"]
  • Write a config config/new_example.yaml like ./config/village.yaml for the new example

  • Run

    python run.py --example_config config/new_example.yaml

Citation

@article{yu2023wonderjourney,
  title={WonderJourney: Going from Anywhere to Everywhere},
  author={Yu, Hong-Xing and Duan, Haoyi and Hur, Junhwa and Sargent, Kyle and Rubinstein, Michael and Freeman, William T and Cole, Forrester and Sun, Deqing and Snavely, Noah and Wu, Jiajun and Herrmann, Charles},
  journal={arXiv preprint arXiv:2312.03884},
  year={2023}
}

Acknowledgement

We appreciate the authors of SceneScape, MiDaS, SAM, Stable Diffusion, and OneFormer to share their code.

wonderjourney's People

Contributors

kovenyu avatar

Stargazers

LeonGu avatar Armando Teles Fortes avatar Ziyuan Wang avatar  avatar Youngjun Choi avatar Gao, Ruiyuan avatar  avatar  avatar  avatar ZHANG XU avatar  avatar  avatar Cundian Yang avatar Shahid Bilal avatar Weihan Luo avatar xcThu avatar SereinH avatar Dane avatar anahuauda avatar Lihan Jiang avatar Changjiang Cai avatar Alakia avatar Gleb Sterkin avatar taylorZhang avatar inFinith avatar  avatar Soumava Paul avatar Haoyu Wang avatar  avatar Jiacheng Chen avatar Ruihan Lu avatar John D. Pope avatar  avatar Yexiong Lin avatar  avatar small ant avatar Minjun Kang avatar  avatar  avatar Li Wenxuan avatar  avatar Yuhong Zhang avatar Xinyu Liu avatar Kellyxiaowei avatar Garimella Hari Pawan Kishore avatar zaozao avatar  avatar Yuseung (Phillip) Lee avatar  avatar Chenyang LEI avatar quanshr avatar  avatar Ziyun (Claude) Wang avatar  avatar Yuan Shen avatar Brian Pugh avatar Hyeontae Son avatar  avatar ewrfcas avatar wyb avatar Jingbo  avatar Junhyeong Cho avatar  avatar Xiao Zeqi avatar Runyi Li (Lake) avatar Bo Pan avatar Tianyi Yan avatar Hu Zhu avatar Lorenzo Stacchio avatar Yuan-Man avatar  avatar  avatar Euphoria avatar PeliHa avatar Mitchell Mosure avatar Xiyi Chen avatar Freda Peng avatar Divano avatar Junyi Zhang avatar  avatar  avatar linzhiqiu avatar Jinfa Huang avatar Yue Cao avatar Vilson Vieira avatar wuyujack (Mingfu Liang) avatar Fengyuan Shi avatar GeekSloth avatar Robert Luo avatar Jiageng Mao avatar  avatar Matthew Biederman avatar Yinghao Xu avatar Noah Snavely avatar DoubleXING avatar Quankai Gao avatar AAA avatar 愿逍遥自在 avatar Katan! avatar TheMattaBase avatar

Watchers

Hiroka Koizumi avatar cheng zhang avatar Mario Garcia avatar  avatar Brian Shannon avatar E.C avatar fingerx avatar 刘国友 avatar zpin avatar  avatar Jiahui Du avatar L.JIE avatar Igor avatar ke1ne avatar AnyISalIn avatar Jimmy Burns (pluckCode) avatar Zhouxia Wang avatar MagicSource avatar Xander Steenbrugge avatar Savvy Raghuvanshi avatar  avatar  avatar  avatar Keep Growing And Moving Forward avatar Xiao Pan  avatar Ernestina avatar Wes Robbins avatar ZincCat avatar Enzo Morales avatar Liu Gongye avatar  avatar Paweł Klimkowski avatar  avatar  avatar Matt Shaffer avatar taoranyi avatar  avatar huhupy avatar  avatar Yingjie Cai avatar  avatar  avatar  avatar Wuyang LI avatar Francesco Fugazzi avatar Marc Palmhøj avatar  avatar  avatar

wonderjourney's Issues

Running in windows11 !!!

My computer (nvida 4090 cuda11.6,cudnn8.9.7.29)

Some missing dependencies :

conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia
pip install filelock
pip install spacy
pip install segment_anything
pip install openai==0.28.1
pip install ipdb
pip install pillow==9.5.0
pip install timm==0.6.7
pip install git+https://github.com/facebookresearch/segment-anything.git

modify run.py line:78
seed = np.random.randint(0, high=2 ** 32 - 2, dtype=np.int64)
modify util/utils.py line:248
with open(yaml_path, 'r',encoding="UTF-8") as file:
modify run.py line:69
with open(Path(model.run_dir) / "config.yaml", "w",encoding="UTF-8") as f:
modify run.py line:199
with open(save_root / 'regenerate_info.json', 'w',encoding='UTF-8') as json_file:
modify models/models.py line:575
self.run_dir = run_dir_root / f"Interp-{dt_string}_{inpainting_prompt.replace(' ', '_').replace(':', '_')[:40]}"
modify models/models.py line:418
self.run_dir = run_dir_root / f"Gen-{dt_string}_{inpainting_prompt.replace(' ', '_').replace(':','_')[:40]}"

Run Error in the Docker

Thanks for the awesome project!
When I build a Dockerfile and run in an EC2 instance, I get the following error.
I don't know what the solution is and would like your advice.

### Dockerfile ###
FROM nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04

WORKDIR /app
COPY . /app

RUN apt-get update && apt-get install -y \
    wget \
    git \
    libgl1-mesa-glx \
    libglib2.0-0 \
    python3.10 \
    python3.10-venv \
    python3.10-dev \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

RUN pip install --upgrade pip

COPY requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
RUN pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2

RUN pip install fvcore iopath
RUN pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu117_pyt201/download.html

RUN pip install git+https://github.com/facebookresearch/segment-anything.git
RUN pip install openai==0.28.1
RUN pip install ipdb
RUN pip uninstall -y timm
RUN pip install timm==0.6.7

RUN pip install spacy
RUN python3 -m spacy download en_core_web_sm

RUN mkdir -p /viscam/projects/wonderland/segment-anything && \
    wget -L -O /viscam/projects/wonderland/segment-anything/sam_vit_h_4b8939.pth https://huggingface.co/spaces/abhishek/StableSAM/blob/main/sam_vit_h_4b8939.pth

RUN wget https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_beit_large_512.pt -P /app

ENV OPENAI_API_KEY='MY_OPENAI_API_KEY'
$ python3 run.py --example_config config/village.yaml
running with seed: 3382461321.
preprocessor_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████| 6.82k/6.82k [00:00<00:00, 53.3MB/s]
/opt/venv/lib/python3.10/site-packages/transformers/utils/deprecation.py:165: UserWarning: The following named arguments are not valid for OneFormerImageProcessor.__init__ and were ignored: '_max_size'
  return func(*args, **kwargs)
coco_panoptic.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 5.85k/5.85k [00:00<00:00, 52.4MB/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 806/806 [00:00<00:00, 7.72MB/s]
vocab.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 2.48MB/s]
merges.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 1.87MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 472/472 [00:00<00:00, 5.20MB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 82.6k/82.6k [00:00<00:00, 242MB/s]
pytorch_model.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 879M/879M [01:31<00:00, 9.65MB/s]
/opt/venv/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
invalid load key, '-'.
> /opt/venv/lib/python3.10/site-packages/torch/serialization.py(1033)_legacy_load()
   1032 
-> 1033     magic_number = pickle_module.load(f, **pickle_load_args)
   1034     if magic_number != MAGIC_NUMBER:

ipdb>

Code

Hi,
Incredible project! When will the code be released?

provide some package version

timm==0.6.7; otherwise will raise <Unexpected key(s) in state_dict: "pretrained.model.blocks.0.attn.relative_position_index",...>
Pillow==9.5.0; otherwise will raise <AttributeError: module 'PIL.Image' has no attribute 'ANTIALIAS'>

where to find the results of point cloud?

Thanks for sharing your work, I followed the instructions to run the demo, and I can see images and the final saved video in the results, but I couldn't find any files related to the 3D output point cloud in the results folder. Could you please advise where I can find them? Looking forward to your reply.
截屏2024-05-06 22 11 23

Missing segmentation model in installation guide?

Dear author,

I tried to follow closely your installation instruction, then run
python run.py --example_config config/village.yaml
but I got the following error:

File "/ssd/eric/code/WonderJourney/util/segment_utils.py", line 3, in <module>
    from segment_anything import sam_model_registry, SamAutomaticMaskGenerator
ModuleNotFoundError: No module named 'segment_anything'

Then I found out there is no SAM installation anywhere in installation instructions, can you give some hint, thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.