Code Monkey home page Code Monkey logo

Comments (16)

kylevedder avatar kylevedder commented on July 18, 2024 2

ZeroFlow XL is the ZeroFlow pipeline with two changes:

  1. We use twice as much data, pulling unlabeled point clouds from the Argoverse 2 LiDAR dataset (I updated GETTING_STARTED.md with details)
  2. We use an enlarged student model; we quadruple the pseudoimage area (512x512 to 1024x1024 by making the pillars twice as small in each dimension), we double the size of the point embedding vector (thereby making each layer of the UNet twice as fat), and we add another layer to the UNet. To contextualize the size of the model change, the normal student model weights are 79MB, the XL model weights are 1.3GB. I've pushed the new UNet backbone to the repo.

To be clear, like with ZeroFlow, ZeroFlow XL is using zero human labels. Our results are simply because we used more unlabeled data and added more parameters! We are able to beat the teacher model's performance and achieve state-of-the-art on the AV2 test set because our model has seen enough diverse data and is expressive enough to learn to distinguish noise from signal in the teacher labels.

On Sunday when we got this result, I tweeted this cool updated graph showing we are doing better than our scaling laws predicted on the normal student model:

F2VMqQXXEAAmSyq

To further drive home this point, here are the raw results from our submissions to the AV2 Scene Flow test split:

If you look at the linked results, our XL model outperforms the teacher across all three categories of the Threeway EPE, but makes particularly large gains in the static foreground category. This means our model has learned to recognize (a lack of motion) better than NSFP is able to represent because it's seen enough data to know that, in expectation, static objects should have zero flow, even if there's a bit of noise in the teacher labels. This is while, in expectation, also extracting what correct movement vectors look like for moving objects.

I also have more good news! I'm an idiot and forgot to run the XL model with our Speed Scaling feature enabled (Equation 5 from the paper), and so I stopped training this model after only 5 epochs (this is akin to seeing ~10 epochs worth of frames). This means that the XL model is undertrained, and it's missing a feature that provides free Foreground Dynamic EPE improvements (which substantially improves Threeway EPE). We are training a new XL model with these features enabled, and for more epochs, so we should hopefully get even better performance from our new student model.

from zeroflow.

kylevedder avatar kylevedder commented on July 18, 2024 1

The student is the FastFlow3D model, which uses a PointPillars feature encoder that turns everything into a 2D birds eye view pseudoimage.

The voxelization function that I am using is provided by MMCV, which is more general and can be used for 3D voxelization (e.g. SECOND / VoxelNet). I set the minimum and maximum point height in the config as you referenced, and the point clouds should be chopped accordingly, so everything should be in a single very tall voxel (a point pillar, hence the name PointPillars) to form the pseudoimage. I added the referenced assert to validate this assumption when doing the voxilization for FastFlow3D; if something had a Z index other than zero, it means that the assumption is being violated and that assert should trigger.

from zeroflow.

kylevedder avatar kylevedder commented on July 18, 2024 1

Download the pre-trained model weights and run the evaluation on those to ensure that everything else is setup correctly.

If you're able to reproduce those test numbers, then there's something going on with the training run that we can dig into further.

from zeroflow.

Kin-Zhang avatar Kin-Zhang commented on July 18, 2024 1

thank you so much for sharing these! looking forward to your updates.

from zeroflow.

kylevedder avatar kylevedder commented on July 18, 2024 1

That's correct, I used NSFP to pseudolabel the Argoverse 2 LiDAR dataset data subset. We have a large SLURM cluster with a bunch of old 2080tis so the pseudolabeling only took a few days because I could parallelize across them all.

I used the data_prep_scripts/split_nsfp_jobs_sbatch.py to setup and launch these jobs for both the Sensor and LiDAR subset pseudolabeling.

from zeroflow.

kylevedder avatar kylevedder commented on July 18, 2024 1

As we discuss in the paper, our reported Threeway EPE for ZeroFlow is an average of three runs.

These weights are the ones highlighted in the weight repo README:

https://github.com/kylevedder/zeroflow_weights/tree/master/argo/nsfp_distilatation_speed_scaled_updated

https://github.com/kylevedder/zeroflow_weights/tree/master/argo/nsfp_distilatation_speed_scaled_updated_run2

https://github.com/kylevedder/zeroflow_weights/tree/master/argo/nsfp_distilatation_speed_scaled_updated_run3

NSFP doesn't have trained weights, it's a test time optimization method.

We have not uploaded the ZeroFlow XL weights, they are too large (1.2GB) and require me to setup git LFS.

from zeroflow.

yanconglin avatar yanconglin commented on July 18, 2024 1

hi, kylevedder,

Could you please let me know the number of samples in the processed Argo/Waymo dataset, train/val/test splits respectively? It seems there are several versions of waymo scene flow datasets, such as PCAccumulation, ECCV2022. The strategies to calculate gt glows are similar. But I wonder if there is a difference in scale. I can not find any info in the paper and sup.

It seems ego motion compensation is used when creating the scene flow dataset as mentioned in the sup. Could you please share the results WITHOUT ego-motion compensation, if any. So far my results show NSFP performs worse on dynamic objects when using ego-motion on waymo. Not sure to what extent this impacts the distillation. Any insights from your side? Thank you!

from zeroflow.

kylevedder avatar kylevedder commented on July 18, 2024 1

Dataset Details

For Argoverse2, I read the dataset straight from disk as downloaded, sans the minor folder rearrangement I discuss in the GETTING_STARTED.md.

For Waymo Open, the exact dataset version and labels are detailed in my GETTING_STARTED.md -- we use 1.4.2 and use the standard flow labels provided on the Waymo Open website. We preprocess the data from the annoying .proto format into easy to read .pkl files, along the way removing the ground plane from the point clouds. For details, please read the preprocessing scripts discussed in the Getting Started, they are pretty easy to read and frankly I do not remember all the nuances of my preprocessing.

ZeroFlow without ego motion compensation

We do not have any results for ZeroFlow / FastFlow3D without ego motion compensation. In principle we can train our feedforward model without ego compensation, but it's reasonable to assume decent quality ego compensation is available at test time on modern service robot / autonomous vehicle stacks. Chodosh et al 2023 makes a fairly compelling case that ego compensation in general is broadly useful, so we decided to use it.

NSFP without ego motion compensation

I don't directly have head to head NSFP results with and without ego compensation. In my early work using NSFP on Argoverse2 I saw Threeway EPE was better with compensation (which makes sense, it's an easier problem), and we ran with that on Waymo.

How much worse is NSFP on the dynamic bin? Do you have more details on what kinds of dynamic objects it's performing worse on / when? Are you doing ground removal? (this is basically mandatory to get NSFP to work, otherwise it fits a bunch of zero vectors to the lidar on the ground)

I also found that NSFP performance is very dependent upon dataloading details -- ZeroFlow's implementation integrates the author implementation of NSFP, but we use our own data loaders. The NSFP authors actually reached out to me to discuss dataloader details because our NSFP implementation (which is listed as the Baseline NSFP implementation on the Argoverse2 Scene Flow leaderboard) actually significantly outperformed their own implementation. Their entry is NP (NSFP), my NSFP implementation is the Host_67820_Team NSFP entry (the challenge organizers asked me to send them results for a strong baseline).

from zeroflow.

Kin-Zhang avatar Kin-Zhang commented on July 18, 2024 1

I also found that NSFP performance is very dependent upon dataloading details -- ZeroFlow's implementation integrates the author implementation of NSFP, but we use our own data loaders.

I also found when I tried to reproduce the FastFlow3D result which is Zeroflow teacher network. However, since I used the official dataloader inside av2-api, and there are some score lost. I will try to find out in the following days.

Thanks to @kylevedder , he mentioned one of reasons in the email (in case someone after me has same problem, I attached his words here):

If your numbers are significantly worse, then something is wrong. If that's the case, my first guess is that you trained on a single GPU using the given config, which is setup to train on 4x GPUs simultaneously and thus has the per GPU batch size set to 16 instead of 64.

Thanks again to Kyle!

from zeroflow.

Kin-Zhang avatar Kin-Zhang commented on July 18, 2024

I see, saw your config to set z voxel size to the range, so for the config, it will be 0 [len=1] but why for that? 2D grid is better?

from zeroflow.

Kin-Zhang avatar Kin-Zhang commented on July 18, 2024

Thanks for your reply. 🥰

from zeroflow.

Kin-Zhang avatar Kin-Zhang commented on July 18, 2024

Thanks for your help! Appreciate.

By the way, I saw there is new eval on zeroflow in leaderboard

  1. what's mean on XL?
  2. is it still zeroflow pipeline [but it's kind of weird since it is over the teacher NSFP?] or fastflow3d with gt flow to supervise for the result in the leaderboard?

from zeroflow.

Kin-Zhang avatar Kin-Zhang commented on July 18, 2024

one more question 😊, XL dataset means also needs zeroflow paper pipeline which is NSFP to produce pseudo label first on the LiDAR dataset? or a new pipeline to do that?

from zeroflow.

Kin-Zhang avatar Kin-Zhang commented on July 18, 2024

@kylevedder sorry to bother you again, but could you specify which three models here to get these three results? or maybe one Base Zeroflow results?

Base ZeroFlow results; Threeway EPE of 0.0814
NSFP results; Threeway EPE of 0.0684
ZeroFlow XL results; Threeway EPE of 0.0578

from zeroflow.

kylevedder avatar kylevedder commented on July 18, 2024

from zeroflow.

Kin-Zhang avatar Kin-Zhang commented on July 18, 2024

I also found that NSFP performance is very dependent upon dataloading details -- ZeroFlow's implementation integrates the author implementation of NSFP, but we use our own data loaders.

I also found when I tried to reproduce the FastFlow3D result which is Zeroflow teacher network. However, since I used the official dataloader inside av2-api, and there are some score lost. I will try to find out in the following days.

from zeroflow.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.