Code Monkey home page Code Monkey logo

motion-x's People

Contributors

ailingzengzzz avatar jyuhao88 avatar linghaochan avatar linjing7 avatar shunlinlu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

motion-x's Issues

Use slerp Interpolation for Alignment in face_motion_augmentation

In the face_motion_augmentation code, for samples where the number of frames in face_motion is less than the number of frames in motion, linear interpolation is currently used for alignment. face_motion uses axisangle representation (so3). It is suggested to use slerp interpolation. The code could be modified as follows:

from pytorch3d.transforms import so3_exp_map, so3_log_map

def slerp(axisangle_left, axisangle_right, t):
    """Spherical linear interpolation."""
    # https://en.wikipedia.org/wiki/Slerp
    # t: (time - timeleft / (timeright - timeleft)) (0, 1)
    assert (
        axisangle_left.shape == axisangle_right.shape
    ), "axisangle_left and axisangle_right must have the same shape"
    assert (
        axisangle_left.shape[-1] == 3
    ), "axisangle_left and axisangle_right must be axis-angle representations"
    assert (
        t.shape[:-1] == axisangle_left.shape[:-1]
    ), "t must have the same shape as axisangle_left and axisangle_right"

    main_shape = axisangle_left.shape[:-1]
    axisangle_left = axisangle_left.reshape(-1, 3)
    axisangle_right = axisangle_right.reshape(-1, 3)
    t = t.reshape(-1, 1)
    delta_rotation = so3_exp_map(
        so3_log_map(so3_exp_map(-axisangle_left) @ so3_exp_map(axisangle_right)) * t
    )

    return so3_log_map(so3_exp_map(axisangle_left) @ delta_rotation).reshape(*main_shape, 3)


def slerp_interpolate(motion, new_len):
    motion_len, n_joints, axisangle_dims = motion.shape

    new_t = torch.linspace(0, 1, new_len)
    timeline_idx = new_t * (motion_len - 1)
    timeline_idx_left = torch.floor(timeline_idx).long()
    timeline_idx_right = torch.clamp(timeline_idx_left + 1, max=motion_len - 1)

    motion_left = torch.gather(
        motion, 0, timeline_idx_left[:, None, None].expand(-1, n_joints, axisangle_dims)
    )
    motion_right = torch.gather(
        motion,
        0,
        timeline_idx_right[:, None, None].expand(-1, n_joints, axisangle_dims),
    )
    delta_t = timeline_idx - timeline_idx_left.float()

    new_motion = slerp(
        motion_left,
        motion_right,
        delta_t[:, None, None].expand(-1, n_joints, -1),
    )
    return new_motion

if motion_length != face_motion_length:
    face_motion = torch.from_numpy(face_motion)
    n_frames, n_dims = face_motion.shape
    n_joints = n_dims // 3
    face_motion = face_motion.reshape(n_frames, n_joints, 3)
    face_motion = slerp_interpolate(face_motion, motion_length)
    face_motion = face_motion.reshape(motion_length, -1).numpy()
else:
    (
        motion[:, 66 + 90 : 66 + 93],
        motion[:, 159 : 159 + 50],
        motion[:, 209 : 209 + 100],
    ) = (face_motion[:, :3], face_motion[:, 3 : 3 + 50], face_motion[:, 53:153])

Mismatch of motion and text for humaml dataset

Hi Author,
After running humanml.py, the motion data .npy has no corresponding txt file. In total, there are 26292 mismatched files. And Do you have statistics for how many files for each subset? Thanks

Feature Extractors for Evaluation

Hello, great work! And, thank you for sharing your code.

I'd like to train several motion generation models on Motion-X and evaluate them. Your paper says

We pretrain a motion feature extractor and a text feature extractor for the new motion presentation with contrastive loss to map the text and motion into feature space and then evaluate the distance between the text-motion pairs.

I'd like to use your pretrained feature extractors (motion and text) and evaluation code. Do you have a plan to make them publicly available?

grab.py

I used your code to process the GRAB dataset and got results like this, which appeared to float, and I thought something was wrong. Then modified pose_trans in grab.py to change the coordinate transformation from (x,y,z) to (x,z,-y). After that it came out looking right. I would like to confirm this thing.
Also thank you very much for posting the SMPL-X visualization sequences for all the datasets, in there the GRAB dataset also has the problem of not being grounded, is it due to not converting pose_trans. Can this script be released?

AIST motions error

Hi! When I try to visualize AIST motions they appear to have some problem. In my custom visualizer the other motion dataset appear like this: (In this example from GRAB)
grab

But the AIST motions appear like this:
aist

Can you please recheck the aist motions?

Visualization as the paper fig show.

Hi, thanks for the great work. I have a more basic question, and I would greatly appreciate it if you could answer it.
I just want to know how to render the results as the paper fig (e.g. Fig 6(b)) shows. is there any rendering script that you could provide?Or this is rendered in software like Blender? Hope to get your help, thanks!

About the dataset

Thank you very much for your work, but I have some questions. How do I download IDEA400 dataset? Does IDEA400 dataset have corresponding pictures or videos data? And when will the videos data of the Motion-X be released?

Has the data been released?

Great job! This dataset is like the ImageNet for the 3D motion! It's definitely going to significantly boost performance on various motion-related tasks!

I noticed that you plan to release Motion-X by Sept. 2023. As of now, I can not find somewhere to access the data. Could you please hint me whether the dataset is released or not?

Why are face_expr params empty in face_motion_data?

Maybe I'm missing something here? But I wrote a script to output the face_expr params for all day in face_motion_data/GRAB, face_motion_data/EgoBody, face_motion_data/humanml and all of them are size [x, 0] arrays meaning no data.

dir = 'MotionDiffuse/face_motion_data/smplx_322/GRAB'
for subdir, dirs, files in os.walk(dir):
    for file in files:
        if file.endswith('.npy'):
            motion = np.load(os.path.join(subdir, file))
            motion_data = torch.tensor(motion).float()
            motion_params = get_params(motion_data)
            print(motion_params['face_expr'].shape)

outputs:

(573, 0)
(250, 0)
(507, 0)
(216, 0)
(240, 0)
....
....

And more explicitly if I run:

dir = 'MotionDiffuse/face_motion_data/smplx_322/GRAB'
for subdir, dirs, files in os.walk(dir):
    for file in files:
        if file.endswith('.npy'):
            motion = np.load(os.path.join(subdir, file))
            motion_data = torch.tensor(motion).float()
            motion_params = get_params(motion_data)
            if motion_params['face_expr'].shape[1] != 0:
                print(os.path.join(subdir, file))

it outputs nothing i.e. shows all tensors are empty.

And for a sanity test you can see all the shapes for one example:

motion = np.load('MotionDiffuse/face_motion_data/smplx_322/EgoBody/recording_20210907_S02_S01_01/body_idx_1/003.npy')
motion = torch.tensor(motion).float()
motion_params = {
            'root_orient': motion[:, :3],  # controls the global root orientation
            'pose_body': motion[:, 3:3+63],  # controls the body
            'pose_hand': motion[:, 66:66+90],  # controls the finger articulation
            'pose_jaw': motion[:, 66+90:66+93],  # controls the yaw pose
            'face_expr': motion[:, 159:159+50],  # controls the face expression
            'face_shape': motion[:, 209:209+100],  # controls the face shape
            'trans': motion[:, 309:309+3],  # controls the global body position
            'betas': motion[:, 312:],  # controls the body shape. Body shape is static
        }
for key in motion_params.keys():
    print(key, motion_params[key].shape)
root_orient torch.Size([124, 3])
pose_body torch.Size([124, 63])
pose_hand torch.Size([124, 87])
pose_jaw torch.Size([124, 0])
face_expr torch.Size([124, 0])
face_shape torch.Size([124, 0])
trans torch.Size([124, 0])
betas torch.Size([124, 0])

Which shows actually everything is empty except for root_orient, body, and hand joints.

Am I missing something?

What would be orientation of the data and FPS

Awesome work! I wanted to know what is the orientation of all the data? For example Z-up Y forward (AMASS) or Y-up Z-forward (AIST++). Also are all the motions sampled at the same FPS?

Any tips for generating sequence motion labels from videos?

"Meanwhile, we input the videos into Video-LLaMA [62] and filter the human action descriptions as supplemental texts."

Hi authors, I want to annotate sequence motion labels in my own video dataset. I tried Video-LLaMA but the quality is bad, here is an example. Are these results similar to yours? Any tips for improving the quality of labels? And how to automatically filter the human action descriptions ?

Screenshot_2023-12-22_13-09-42
Screenshot_2023-12-22_13-29-40

BMLhandball

Thank for Huge Projects
I have one question
BMLhandball doesn't have SMPL-X G...
only have SMPL+H G

What is face shape?

Hi, just want to know what is the meaning of face shape (100-dim) in the data preprocessing scripts? Thanks.

Error in t2m raw offsets?

Hi! I have done something similar to you for my motion processing. I have also used 000021 as the example motion similar to humanml3d. The raw offset is the general direction of the joint from the parent. However, if you visualize the motion (as shown below) you can see that the relative offset of the first finger joint is at -1 y from the wrist similar to lines 94 - 99 for t2m_raw_body_offsets. While finger joints 2 and 3 are on the x-axis. However you have the raw offsets of all finger joints on the x-axis, denoting the fingers are pointing in the x-axis for 000021. Can you recheck again?

example_hml_render

This is what I have:

p - pinky, r - ring, m - middle, i - index, t - thumb
left hand
[0, -1, 0], # lp1
[-1, 0, 0], # lp1
[-1, 0, 0], # lp1
[0, -1, 0], # lr1
[-1, 0, 0], # lr1
[-1, 0, 0], # lr1
[0, -1, 0], # lm1
[-1, 0, 0], # lm1
[-1, 0, 0], # lm1
[0, -1, 0], # li1
[-1, 0, 0], # li1
[-1, 0, 0], # li1
[0, -1, 0], # lt1
[0, -1, 0], # lt1
[0, -1, 0], # lt1
right hand
[0, -1, 0], # rp1
[1, 0, 0], # rp1
[1, 0, 0], # rp1
[0, -1, 0], # rr1
[1, 0, 0], # rr1
[1, 0, 0], # rr1
[0, -1, 0], # rm1
[1, 0, 0], # rm1
[1, 0, 0], # rm1
[0, -1, 0], # ri1
[1, 0, 0], # ri1
[1, 0, 0], # ri1
[0, -1, 0], # rt1
[0, -1, 0], # rt1
[0, -1, 0], # rt1

plot_3d_global.py无法运行

请问plot_3d_global.py的final data是通过什么步骤产生的?
另外这个example对象是什么,在哪里定义的?
谢谢。
Traceback (most recent call last): File "/app/Motion-X-main/Motion-X-main/tomato_represenation/plot_3d_global.py", line 362, in <module> joints = np.load(example) NameError: name 'example' is not defined

AttributeError: 'map' object has no attribute 'reshape'

When I run raw_pose_processing.py, it displays an error.
AttributeError: 'map' object has no attribute 'reshape'
It seems that only in Python 2, the map object has the reshape attribute.
The error occurs at line 337 in the smplx2joints.py file.The error code is as follows.
vertices = output.vertices.reshape(batch_size, num_frames, 10475, 3)
I want to know the environment configuration required for the program in the 'tomato_representation' folder.
Please let me know.

about the text descriptions (text_v1.1) augmented by Vicuna 1.5

Thank you very much for uploading the text descriptions augmented by Vicuna 1.5. However, it seems that these descriptions are not significantly different from the previous ones, and the modification date of the files is also October 24, 2023. I wonder if you might have uploaded the wrong files?

About tomato representation

Hi, thanks for your great job!
I tried to run the code in the tomato representation, but it seems there are other tasks to be done before that?And I wonder that what environment setting is needed to obtain tomato representation and use body-only motion?

Framerate different in SMPL+H G and SMPL-X G

I find the framerates are different among the SMPL+H G and SMPL-X G in AMASS data. For example, the 009655 in HumanML3D, whose raw name is CMU/62/62_11 has 60 framerate in SMPL+H G and 120 framerate in SMPL-X G, while they have same total 3703 frames.

Different framerate will affect the text-motion alignment because the caption is get from the respective start frame to endding frame. How do you think about this issue?

And another question about your code following:

if 'humanact12' not in source_path:
if 'Eyes_Japan_Dataset' in source_path:
pose = pose[int(3*ex_fps):]
if 'MPI_HDM05' in source_path:
pose = pose[int(3*ex_fps):]
if 'TotalCapture' in source_path:
pose = pose[int(1*ex_fps):]
if 'MPI_Limits' in source_path:
pose = pose[int(1*ex_fps):]
if 'Transitions_mocap' in source_path:
pose = pose[int(0.5*ex_fps):]
pose = pose[int(start_frame*1.5):int(end_frame*1.5)]

The [int(3*ex_fps):] slicing index for different dataset is your experience value or any official recommendation? and the same question for pose = pose[int(start_frame*1.5):int(end_frame*1.5)]

Looking for your reply. Thanks very much!

A Question on transfer_to_body_only_humanml.py

Thank you for your priceless work. It really helps a lot.
It seems that a motion-X data sequence is represented as a (nframes,322) vector.But in transfer_to_body_only_humanml.py it says
data_263 = np.concatenate((data[:, :4+(body_joints - 1)*3], data[:, 4+(joints - 1)*3:4+(joints - 1)*3+(body_joints - 1)*6], data[:, 4 + (joints - 1)*9: 4 + (joints - 1) *9 + body_joints *3], data[:, -4:]), axis=1)
with the variable joints=52, and the index 4+(joints-1)*9 is obviously bigger than 322,which causes an AssertionError in
assert data_263.shape[1] == 263
I failed to figure out this problem and is now dying for your assistance.OTZ
btw,express my sincere gratitude again.

Question about 15.6M frame-level whole-body pose description

Hi authors, I am very amazed by your work.
I notice you use face recognition, posescript and handscript to generate 15.6M frame-level descriptions and have some questions about it.

  1. How to generate frame-level description for a RGB video with only part of body, for example, (a)only the upper-body can be seen, (b)only see the face and shoulder can be seen, (c) part of the body is self-occluded bn or is occluded by loose cloth or some obstacles.
  2. How to generate frame-level description for a RGB video with multiple persons? Or just delete all videos with multiple persons?
  3. Are these descriptions used in one of your experiments to validate it is right, or for new application? The Tab.4(text-driven motion generation) seens not support frame-level description since they always require at least 24 frames as input. Do you aggregate all frames-level into one video-level description (If yes, how to aggregate it)? Is the Tab.6 the only experiment related to frame-level description(Besides, how do you compute the FID in Tab.6)?

The whole-body pose description

Thanks for sharing your excellent work. I would like to follow your works, but the sequence-level semantic labels is unsatisfactory in some subsets of Motion-X. I'm wondering if you could release whole-body pose description.

Question about the data pipeline loss (Eq. 2).

Hi, thanks for the great work! I’m a bit confused about the data pipeline, e.g. in Eq. (2), based on my understanding, the initial human parameter is predicted directly by OSX. Afterward, when the model is updated, it predicts new human body parameters, I want to know if the parameter loss is calculated between the initial parameters and each round of updated parameters. If so, how many epochs of training were conducted in the data fitting pipeline, and how many epochs will you update the OSX model parameters? Hope to get your help, thanks!

Magic folder issues

In mocap-dataset-process/README.md

3. Perform face motion augmentation
In this step, we will perform face motion augmentation to replace the face motion, since these Mocap data does not provide facial expression. Notably, we keep the original jaw pose of GRAB dataset.

Move the processed motion data to ../datasets/motion_data/smplx_322
mv EgoBody_motion ../datasets/motion_data/smplx_322/EgoBody
mv humanml ../datasets/motion_data/smplx_322/humanml
mv GRAB_motion ../datasets/motion_data/smplx_322/GRAB

mv humanml makes me confused.
What is this humanml folder and how does that comes out?

I checked back and forth in the README and other files, didn't see how to prepare that.

And then mv GRAB_txt ../datasets/texts/semantic_labels/GRAB, I suppose GRAB_txt is the same thing with GRAB_text(specified in grab.py) right?

And what's more interesting is the python aist.py in Process EgoBody Dataset process. Though you've fixed that mistake recently, but it makes me wonder is this README a true valid guide to run the final script?

Question regarding AMASS dataset

image
Per the instructions, we need to download the SMPL-X G version of each sub-data, but on the amass website, we don't have SMPL-X G version of BMLhandball. Do we just skip that one, or is there any other downloading sources?

Duplicated GRAB datasets in mocap-data-processing

Hi Author,
Thanks for your excellent work.
I found there are a duplicated data in mocap-data.
GRAB datasets exists in the amass data. And also, in the mocap-data-processing needs to download GRAB alone.
Are these two GRAB datasets same? Thanks

about release time

Thank you for your excellent work! I have filled out the form, but I haven't received any feedback email yet. May I ask when can the download link be released approximately?

Dataset required to train the model

image Are the datasets in the image all the ones required to train the model? Also, do we need an environment setting before we train the model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.