CVPR2022 Papers (Papers/Codes/Demos)

OW-DETR: Open-world Detection Transformer(开放世界检测transformer)
paper | code

Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation(通过弹性响应蒸馏克服增量目标检测中的灾难性遗忘)
paper | code

AdaMixer: A Fast-Converging Query-Based Object Detector(一种快速收敛的基于查询的对象检测器)(Oral)
paper | code

Multi-Granularity Alignment Domain Adaptation for Object Detection(用于目标检测的多粒度对齐域自适应)
paper | code

Interactron: Embodied Adaptive Object Detection(体现自适应目标检测)
paper | code

Label, Verify, Correct: A Simple Few Shot Object Detection Method(标签、验证、正确：一种简单的小样本物体检测方法)
paper

Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection(用于增量少样本目标检测的超网络框架)
paper

QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection(用于加速高分辨率小目标检测的级联稀疏查询)
paper | code

End-to-End Human-Gaze-Target Detection with Transformers(使用 Transformer 进行端到端的人眼目标检测)
paper

Progressive End-to-End Object Detection in Crowded Scenes(拥挤场景中的渐进式端到端对象检测)
paper | code

Real-time Object Detection for Streaming Perception(用于流感知的实时对象检测)
paper | code

Oriented RepPoints for Aerial Object Detection(面向空中目标检测的 RepPoints)(小目标检测)
paper | code

Confidence Propagation Cluster: Unleash Full Potential of Object Detectors(信心传播集群：释放物体检测器的全部潜力)
paper

Semantic-aligned Fusion Transformer for One-shot Object Detection(用于一次性目标检测的语义对齐融合转换器)
paper

A Dual Weighting Label Assignment Scheme for Object Detection(一种用于目标检测的双重加权标签分配方案)
paper | code

MUM : Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection(混合图像块和 UnMix 特征块用于半监督目标检测)
paper | code

SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection(域自适应对象检测的语义完全图匹配)
paper | code

Accelerating DETR Convergence via Semantic-Aligned Matching(通过语义对齐匹配加速 DETR 收敛)
paper | code

Focal and Global Knowledge Distillation for Detectors(探测器的焦点和全局知识蒸馏)
keywords: Object Detection, Knowledge Distillation
paper | code

Unknown-Aware Object Detection: Learning What You Don't Know from Videos in the Wild(未知感知对象检测：从野外视频中学习你不知道的东西)
paper | code

Localization Distillation for Dense Object Detection(密集对象检测的定位蒸馏)
keywords: Bounding Box Regression, Localization Quality Estimation, Knowledge Distillation
paper | code

视频目标检测(Video Object Detection)

Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering(通过联合表示学习和在线聚类进行无监督活动分割)
paper

3D目标检测(3D object detection)

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection(用于多模态 3D 对象检测的对比增强transformer)
paper

Forecasting from LiDAR via Future Object Detection(通过未来目标检测从 LiDAR 进行预测)
paper | code

Point2Seq: Detecting 3D Objects as Sequences(将 3D 对象检测为序列)
paper | code

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection(用于单目 3D 对象检测的深度感知transformer)
paper | code

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers(用于 3D 对象检测的稳健 LiDAR-Camera Fusion 与 Transformer)
paper | code

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds(学习用于 3D LiDAR 点云的高效基于点的检测器)
paper | code

Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion(迈向具有深度完成的高质量 3D 检测)
paper

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer(使用深度感知 Transformer 的单目 3D 对象检测)
paper | code

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds(从点云进行 3D 对象检测的 Set-to-Set 方法)
paper | code

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
paper | code

MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection(单目 3D 目标检测的联合语义和几何成本量)
paper | code

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection(用于多模态 3D 目标检测的激光雷达相机深度融合)
paper | code

Point Density-Aware Voxels for LiDAR 3D Object Detection(用于 LiDAR 3D 对象检测的点密度感知体素)
paper | code

Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement(带有形状引导标签增强的弱监督 3D 对象检测)
paper | code

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes(在 3D 场景中实现稳健的定向边界框检测)
paper | code

A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation(在全景分割的指导下，用于基于 LiDAR 的 3D 对象检测的多功能多视图框架)
keywords: 3D Object Detection with Point-based Methods, 3D Object Detection with Grid-based Methods, Cluster-free 3D Panoptic Segmentation, CenterPoint 3D Object Detection
paper

Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving(自动驾驶中用于单目 3D 目标检测的伪立体)
keywords: Autonomous Driving, Monocular 3D Object Detection
paper | code

人物交互检测(HOI Detection)

What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions(看什么和在哪里看：语义和空间精炼transformer，用于检测人与物体的交互)(Oral)
paper

MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection(用于端到端人-物交互检测的多尺度 Transformer)
paper

Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer(使用新型一元对变换器的人与物体交互的两阶段检测)
paper

伪装目标检测(Camouflaged Object Detection)

Implicit Motion Handling for Video Camouflaged Object Detection(视频伪装对象检测的隐式运动处理)
paper

Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection(放大和缩小：用于伪装目标检测的混合尺度三元组网络)
paper | code

显著性目标检测(Saliency Object Detection)

Bi-directional Object-context Prioritization Learning for Saliency Ranking(显着性排名的双向对象上下文优先级学习)
paper | code

Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection()
paper

关键点检测(Keypoint Detection)

UKPGAN: A General Self-Supervised Keypoint Detector(一个通用的自监督关键点检测器)
paper | code

车道线检测(Lane Detection)

CLRNet: Cross Layer Refinement Network for Lane Detection(用于车道检测的跨层细化网络)
paper

Rethinking Efficient Lane Detection via Curve Modeling(通过曲线建模重新思考高效车道检测)
keywords: Segmentation-based Lane Detection, Point Detection-based Lane Detection, Curve-based Lane Detection, autonomous driving
paper | code

边缘检测(Edge Detection)

EDTER: Edge Detection with Transformer(使用transformer的边缘检测)
paper | code

消失点检测(Vanishing Point Detection)

Deep vanishing point detection: Geometric priors make dataset variations vanish(深度消失点检测**：几何先验使数据集变化消失)**
paper | code

异常检测(Anomaly Detection)

Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection(捕捉灰天鹅和黑天鹅：开放集监督异常检测)
paper | code

UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection(监督开放集视频异常检测的新基准)
paper | code

ViM: Out-Of-Distribution with Virtual-logit Matching(具有虚拟 logit 匹配的分布外)(OOD检测)
paper | code

Generative Cooperative Learning for Unsupervised Video Anomaly Detection(用于无监督视频异常检测的生成式协作学习)
paper

Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection(用于异常检测的自监督预测卷积注意力块)(论文暂未上传)
paper | code

分割(Segmentation)

图像分割(Image Segmentation)

Learning Part Segmentation through Unsupervised Domain Adaptation from Synthetic Vehicles(通过合成车辆的无监督域适应学习零件分割)(Oral)
paper

Progressive Minimal Path Method with Embedded CNN(具有嵌入式 CNN 的渐进最小路径方法)
paper

Revisiting Near/Remote Sensing with Geospatial Attention(用地理空间注意力重新审视近/遥感)
paper

Learning What Not to Segment: A New Perspective on Few-Shot Segmentation(学习不分割的内容：关于小样本分割的新视角)
paper | code

CRIS: CLIP-Driven Referring Image Segmentation(CLIP 驱动的参考图像分割)
paper

Hyperbolic Image Segmentation(双曲线图像分割)
paper

全景分割(Panoptic Segmentation)

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers(使用 Transformers 深入研究全景分割)
paper | code

Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation(弯曲现实：适应全景语义分割的失真感知Transformer)
keywords: Semanticand panoramic segmentation, Unsupervised domain adaptation, Transformer
paper | code

语义分割(Semantic Segmentation)

Semantic-Aware Domain Generalized Segmentation(语义感知领域广义分割)(Oral)
paper | code

FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation(学习雾景分割的雾不变特征)(Oral)
paper

WildNet: Learning Domain Generalized Semantic Segmentation from the Wild(从野外学习领域广义语义分割)
paper | code

Rethinking Semantic Segmentation: A Prototype View(重新思考语义分割：原型视图)(Oral)
paper | code

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation(改进域自适应语义分割的网络架构和训练策略)
paper | code

Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation(朝向更少的注释：通过区域不纯度和预测不确定性进行域自适应语义分割的主动学习)
paper | code

Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation(半监督语义分割的扰动和严格均值教师)
paper

Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation(用于域自适应语义分割的类平衡像素级自标记)
paper | code

Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation(弱监督语义分割的区域语义对比和聚合)
paper | code

Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation(走向稀疏注释的语义分割)
paper | code

Scribble-Supervised LiDAR Semantic Segmentation
paper | code

ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation(多目标域自适应语义分割的直接适应策略)
paper

Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast(通过像素到原型对比的弱监督语义分割)
paper

Representation Compensation Networks for Continual Semantic Segmentation(连续语义分割的表示补偿网络)
paper | code

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels(使用不可靠伪标签的半监督语义分割)
paper | code

Weakly Supervised Semantic Segmentation using Out-of-Distribution Data(使用分布外数据的弱监督语义分割)
paper | code

Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation(弱监督语义分割的自监督图像特定原型探索)
paper | code

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation(用于弱监督语义分割的多类token Transformer)
paper | code

Cross Language Image Matching for Weakly Supervised Semantic Segmentation(用于弱监督语义分割的跨语言图像匹配)
paper

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers(从注意力中学习亲和力：使用 Transformers 的端到端弱监督语义分割)
paper | code

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation(让自我训练更好地用于半监督语义分割)
keywords: Semi-supervised learning, Semantic segmentation, Uncertainty estimation
paper | code

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation(弱监督语义分割的类重新激活图)
paper | code

实例分割(Instance Segmentation)

Sparse Object-level Supervision for Instance Segmentation with Pixel Embeddings(具有像素嵌入的实例分割的稀疏对象级监督)
paper | code

Relieving Long-tailed Instance Segmentation via Pairwise Class Balance(通过 Pairwise Class Balance 减轻长尾实例分割)
paper | code

Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement(超越语义到实例分割：通过语义知识转移和自我完善的弱监督实例分割)
paper | code

Noisy Boundaries: Lemon or Lemonade for Semi-supervised Instance Segmentation?(嘈杂的边界：半监督实例分割的柠檬还是柠檬水？)
paper

SharpContour: A Contour-based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation(一种用于高效准确实例分割的基于轮廓的边界细化方法)
paper

Sparse Instance Activation for Real-Time Instance Segmentation(实时实例分割的稀疏实例激活)
paper | code

Mask Transfiner for High-Quality Instance Segmentation(用于高质量实例分割的 Mask Transfiner)
paper | code

ContrastMask: Contrastive Learning to Segment Every Thing(对比学习分割每件事)
paper

Discovering Objects that Can Move(发现可以移动的物体)
paper | code

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation(一种基于端到端轮廓的高质量高速实例分割方法)
paper | code

Efficient Video Instance Segmentation via Tracklet Query and Proposal(通过 Tracklet Query 和 Proposal 进行高效的视频实例分割)
paper

SoftGroup for 3D Instance Segmentation on Point Clouds(用于点云上的 3D 实例分割)
keywords: 3D Vision, Point Clouds, Instance Segmentation
paper | code

视频目标分割(Video Object Segmentation)

Language as Queries for Referring Video Object Segmentation(语言作为引用视频对象分割的查询)
paper | code

密集预测(Dense Prediction)

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting(具有上下文感知提示的语言引导密集预测)
paper | code

视频处理(Video Processing)

Bringing Old Films Back to Life(让老电影焕然一新)
paper | code

Time Lens++: Event-based Frame Interpolation with Parametric Non-linear Flow and Multi-scale Fusion(具有参数非线性流和多尺度融合的基于事件的帧插值)
paper

Long-term Video Frame Interpolation via Feature Propagation(通过特征传播的长期视频帧插值)
paper

Unifying Motion Deblurring and Frame Interpolation with Events(将运动去模糊和帧插值与事件统一起来)
paper

Neural Compression-Based Feature Learning for Video Restoration(用于视频复原的基于神经压缩的特征学习)
paper

视频编辑(Video Editing)

M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers(M3L：通过多模式多级transformer进行基于语言的视频编辑)
paper

视频生成/视频合成(Video Generation/Video Synthesis)

Depth-Aware Generative Adversarial Network for Talking Head Video Generation(用于说话头视频生成的深度感知生成对抗网络)
paper | code

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning(告诉我什么并告诉我如何：通过多模式调节进行视频合成)
paper | code

视频超分(Video Super-Resolution)

Reference-based Video Super-Resolution Using Multi-Camera Video Triplets(使用多摄像机视频三元组的基于参考的视频超分辨率)
paper | code

估计(Estimation)

光流/运动估计(Optical Flow/Motion Estimation)

Global Matching with Overlapping Attention for Optical Flow Estimation(具有重叠注意力的全局匹配光流估计)
paper | code

CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation(用于联合光流和场景流估计的双向相机-LiDAR 融合)
paper

深度估计(Depth Estimation)

Degradation-agnostic Correspondence from Resolution-asymmetric Stereo(来自分辨率非对称立体声的与退化无关的对应)
paper

P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior(具有分段平面先验的单目深度估计)
paper | code

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry(通过融合单视图深度概率与多视图几何进行多视图深度估计)(Oral)
paper | code

Learning Structured Gaussians to Approximate Deep Ensembles(学习结构化高斯函数以逼近深度集成)
paper

LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network(具有几何感知变压器网络的室内全景房间布局估计)(布局估计)
paper | code

Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation(基于自适应相关的级联循环网络的实用立体匹配)
paper

Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light(结合双目立体和单目结构光的深度估计)
paper | code

RGB-Depth Fusion GAN for Indoor Depth Completion(用于室内深度完成的 RGB 深度融合 GAN)
paper

Revisiting Domain Generalized Stereo Matching Networks from a Feature Consistency Perspective(从特征一致性的角度重新审视域广义立体匹配网络)
paper

Deep Depth from Focus with Differential Focus Volume(具有不同焦点体积的焦点深度)
paper

ChiTransformer:Towards Reliable Stereo from Cues(从线索走向可靠的立体声)
paper

Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation and Focal Loss(重新思考多视图立体的深度估计：统一表示和焦点损失)
paper | code

ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks(立体匹配网络中自动避免捷径和域泛化的信息论方法)
keywords: Learning-based Stereo Matching Networks, Single Domain Generalization, Shortcut Learning
paper

Attention Concatenation Volume for Accurate and Efficient Stereo Matching(用于精确和高效立体匹配的注意力连接体积)
keywords: Stereo Matching, cost volume construction, cost aggregation
paper | code

Occlusion-Aware Cost Constructor for Light Field Depth Estimation(光场深度估计的遮挡感知成本构造函数)
paper | [code](https://github.com/YingqianWang/OACC- Net)

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation(用于单目深度估计的神经窗口全连接 CRF)
keywords: Neural CRFs for Monocular Depth
paper

OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion(通过几何感知融合进行 360 度单目深度估计)
keywords: monocular depth estimation(单目深度估计),transformer
paper

人体解析/人体姿态估计(Human Parsing/Human Pose Estimation)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision(自我监督下共同进化的 3D 人体姿势估计、模仿和幻觉)
paper | code

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes(从野外拥挤的场景中学习估计稳健的 3D 人体网格)
paper | code

Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization(用于单目绝对 3D 定位的基于射线的 3D 人体姿态估计)
paper | code

Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video(捕捉运动中的人类：来自单目视频的时间注意 3D 人体姿势和形状估计)
paper

Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors(来自稀疏惯性传感器的物理感知实时人体运动跟踪)
paper

Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation(用于多人 3D 姿势估计的分布感知单阶段模型)
paper

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation(用于 3D 人体姿势估计的多假设transformer)
paper | code

CDGNet: Class Distribution Guided Network for Human Parsing(用于人类解析的类分布引导网络)
paper

Forecasting Characteristic 3D Poses of Human Actions(预测人类行为的特征 3D 姿势)
paper

Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation(学习用于多人姿势估计的局部-全局上下文适应)
keywords: Top-Down Pose Estimation(从上至下姿态估计), Limb-based Grouping, Direct Regression
paper

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video(用于视频中 3D 人体姿势估计的 Seq2seq 混合时空编码器)
paper

手势估计(Gesture Estimation)

ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis(通过在线探索和合成提升关节式 3D 手对象姿势估计)
paper | code

图像处理(Image Processing)

超分辨率(Super Resolution)

High-Resolution Image Harmonization via Collaborative Dual Transformations(通过协作双变换实现高分辨率图像协调)
paper | code

Deep Constrained Least Squares for Blind Image Super-Resolution(用于盲图像超分辨率的深度约束最小二乘)
paper

Local Texture Estimator for Implicit Representation Function(隐式表示函数的局部纹理估计器)
paper

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution(一种用于空间变形鲁棒场景文本图像超分辨率的文本注意网络)
paper | code

Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution(一种真实图像超分辨率的局部判别学习方法)
paper | code

Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel(对噪声和核进行精细退化建模的盲图像超分辨率)
paper | code

Reflash Dropout in Image Super-Resolution(图像超分辨率中的闪退dropout)
paper

Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence(迈向双向任意图像缩放：联合优化和循环幂等)
paper

HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening(用于全色锐化的纹理和光谱特征融合Transformer)
paper | code

HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging(光谱压缩成像的高分辨率双域学习)
keywords: HSI Reconstruction, Self-Attention Mechanism, Image Frequency Spectrum Analysis
paper

图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)

HyperInverter: Improving StyleGAN Inversion via Hypernetwork(通过超网络改进 StyleGAN 反转)
paper

Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation(用于高效 3DCG 背景创建的多样化合理 360 度图像外绘)
paper

Exploring and Evaluating Image Restoration Potential in Dynamic Scenes(探索和评估动态场景中的图像复原潜力)
paper

Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction(通过随机收缩加速逆问题的条件扩散模型)
paper

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction(用于高效高光谱图像重建的掩模引导光谱变换器)
paper | code

Restormer: Efficient Transformer for High-Resolution Image Restoration(用于高分辨率图像复原的高效transformer)
paper | code

Event-based Video Reconstruction via Potential-assisted Spiking Neural Network(通过电位辅助尖峰神经网络进行基于事件的视频重建)
paper

图像去噪/去模糊/去雨去雾(Image Denoising)

CVF-SID: Cyclic multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise from Image(通过从图像中分离噪声的自监督图像去噪的循环多变量函数)
paper | code

Unpaired Deep Image Deraining Using Dual Contrastive Learning(使用双重对比学习的非配对深度图像去雨)
paper | code

AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network(通过非对称 PD 和盲点网络对真实世界图像进行自监督去噪)
paper | code

IDR: Self-Supervised Image Denoising via Iterative Data Refinement(通过迭代数据细化的自监督图像去噪)
paper | code

Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots(具有可见盲点的自监督图像去噪)
paper | code

E-CIR: Event-Enhanced Continuous Intensity Recovery(事件增强的连续强度恢复)
keywords: Event-Enhanced Deblurring, Video Representation
paper | code

图像编辑/图像修复(Image Edit/Inpainting)

HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing(用于真实图像编辑的超网络 StyleGAN 反演)
paper

High-Fidelity GAN Inversion for Image Attribute Editing(用于图像属性编辑的高保真 GAN 反演)
paper | code

Style Transformer for Image Inversion and Editing(用于图像反转和编辑的样式transformer)
paper | code

MISF: Multi-level Interactive Siamese Filtering for High-Fidelity Image Inpainting(用于高保真图像修复的多级交互式 Siamese 过滤)
paper | code

HairCLIP: Design Your Hair by Text and Reference Image(通过文本和参考图像设计你的头发)
keywords: Language-Image Pre-Training (CLIP), Generative Adversarial Networks
paper

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding(增量transformer结构增强图像修复与掩蔽位置编码)
keywords: Image Inpainting, Transformer, Image Generation
paper | code

图像翻译(Image Translation)

Marginal Contrastive Correspondence for Guided Image Generation(引导图像生成的边际对比对应)(Oral)
paper

Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation(未配对图像到图像翻译的最大空间扰动一致性)
paper | code

Globetrotter: Connecting Languages by Connecting Images(通过连接图像连接语言)
paper

QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation(图像翻译中对比学习的查询选择注意)
paper | code

FlexIT: Towards Flexible Semantic Image Translation(迈向灵活的语义图像翻译)
paper

Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks(探索图像到图像翻译任务中对比学习的补丁语义关系)
keywords: image translation, knowledge transfer,Contrastive learning
paper

风格迁移(Style Transfer)

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer(基于示例的高分辨率肖像风格转移)
paper | code

Industrial Style Transfer with Large-scale Geometric Warping and Content Preservation(具有大规模几何变形和内容保留的工业风格迁移)
paper | code

Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization(任意风格迁移和域泛化的精确特征分布匹配)
paper | code

Style-ERD: Responsive and Coherent Online Motion Style Transfer(响应式和连贯的在线运动风格迁移)
paper

CLIPstyler: Image Style Transfer with a Single Text Condition(具有单一文本条件的图像风格转移)
keywords: Style Transfer, Text-guided synthesis, Language-Image Pre-Training (CLIP)
paper

人脸(Face)

ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations(具有隐式神经表示的非线性 3D 可变形人脸模型)
paper

Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?(跨模态感知者：可以从声音中收集面部几何形状吗？)
paper

Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data(利用 3D 合成数据去除人像眼镜和阴影)
paper | code

HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network(分层解析胶囊网络的无监督人脸部分发现)
paper

FaceFormer: Speech-Driven 3D Facial Animation with Transformers(FaceFormer：带有transformer的语音驱动的 3D 面部动画)
paper | code

Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning(用于鲁棒人脸对齐和地标固有关系学习的稀疏局部补丁transformer)
paper | code

人脸识别/检测(Facial Recognition/Detection)

DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover's Distance Improves Out-Of-Distribution Face Identification(使用 Patch-wise Earth Mover 的距离重新排序改进了分布外人脸识别)
paper | code

Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin(具有自适应置信度的半监督深度面部表情识别)
paper | code

Privacy-preserving Online AutoML for Domain-Specific Face Detection(用于特定领域人脸检测的隐私保护在线 AutoML)
paper

An Efficient Training Approach for Very Large Scale Face Recognition(一种有效的超大规模人脸识别训练方法)
paper | code

人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing(基于 Transformer 的双空间 GAN 用于高度可控的面部编辑)
paper | code

FENeRF: Face Editing in Neural Radiance Fields(神经辐射场中的人脸编辑)
paper

GCFSR: a Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors(一种没有面部和 GAN 先验的生成可控人脸超分辨率方法)
paper

Sparse to Dense Dynamic 3D Facial Expression Generation(稀疏到密集的动态 3D 面部表情生成)
keywords: Facial expression generation, 4D face generation, 3D face modeling
paper

人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection(对抗样本的自监督学习：迈向 Deepfake 检测的良好泛化)
paper | code

Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing(通过 Shuffled Style Assembly 进行域泛化以进行人脸反欺骗)
paper | code

Voice-Face Homogeneity Tells Deepfake
paper | code

Protecting Celebrities from DeepFake with Identity Consistency Transformer(使用身份一致性转换器保护名人免受 DeepFake 的影响)
paper | code

目标跟踪(Object Tracking)

Unsupervised Learning of Accurate Siamese Tracking(准确连体跟踪的无监督学习)
paper | code

Global Tracking Transformers
paper | code

Transforming Model Prediction for Tracking(转换模型预测以进行跟踪)
paper | code

MixFormer: End-to-End Tracking with Iterative Mixed Attention(具有迭代混合注意力的端到端跟踪)
paper | code

Unsupervised Domain Adaptation for Nighttime Aerial Tracking(夜间空中跟踪的无监督域自适应)
paper | code

Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects(迭代对应几何：融合区域和深度以实现无纹理对象的高效 3D 跟踪)
paper | [code](https://github.com/DLR- RM/3DObjectTracking)

TCTrack: Temporal Contexts for Aerial Tracking(空中跟踪的时间上下文)
paper | code

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds(超越 3D 连体跟踪：点云中 3D 单对象跟踪的以运动为中心的范式)
keywords: Single Object Tracking, 3D Multi-object Tracking / Detection, Spatial-temporal Learning on Point Clouds
paper

Correlation-Aware Deep Tracking(相关感知深度跟踪)
paper

图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)

Correlation Verification for Image Retrieval(图像检索的相关性验证)(Oral)
paper | code

It's About Time: Analog Clock Reading in the Wild(时间到了：野外模拟时钟读数)
paper

Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval(无忧素描：基于素描的抗噪图像检索)
paper | code

Partially Does It: Towards Scene-Level FG-SBIR with Partial Input(走向带有部分输入的场景级 FG-SBIR)
paper

Sketch3T: Test-Time Training for Zero-Shot SBIR(零样本 SBIR 的测试时间训练)
paper

Bridging Video-text Retrieval with Multiple Choice Questions(桥接视频文本检索与多项选择题)
paper | code

BEVT: BERT Pretraining of Video Transformers(视频Transformer的 BERT 预训练)
keywords: Video understanding, Vision transformers, Self-supervised representation learning, BERT pretraining
paper | code

行为识别/动作识别/检测/分割/定位(Action/Activity Recognition)

Revisiting Skeleton-based Action Recognition(重新审视基于骨架的动作识别)(Oral)
paper | code

UnweaveNet: Unweaving Activity Stories(解开活动故事)
paper | [code](https://github.com/willprice/activity- stories)

Dual-AI: Dual-path Action Interaction Learning for Group Activity Recognition(用于群体动作识别的双路径动作交互学习)(Oral)
paper

Detector-Free Weakly Supervised Group Activity Recognition(无检测器弱监督群体动作识别)
paper

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection(用于动作检测的多尺度时间 ConvTransformer)
paper | code

Unsupervised Pre-training for Temporal Action Localization Tasks(时间动作定位任务的无监督预训练)
paper | code

Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos(多视图教学视频中的弱监督在线动作分割)
paper

How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs(你怎么做呢？使用伪副词进行细粒度的动作理解)
paper

E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition(用于以自我为中心的动作识别的运动增强事件流)
paper

Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos(寻找变化：从未修剪的网络视频中学习对象状态和状态修改操作)
paper | code

DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition(鲁棒动作识别的 Transformer 方法中的定向注意)
paper

Self-supervised Video Transformer(自监督视频transformer)
paper | code

Spatio-temporal Relation Modeling for Few-shot Action Recognition(小样本动作识别的时空关系建模)
paper | code

RCL: Recurrent Continuous Localization for Temporal Action Detection(用于时间动作检测的循环连续定位)
paper

OpenTAL: Towards Open Set Temporal Action Localization(走向开放集时间动作定位)
paper | code

End-to-End Semi-Supervised Learning for Video Action Detection(视频动作检测的端到端半监督学习)
paper

Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos(模态特定注释视频上多模态动作识别的可学习不相关模态丢失)
paper

Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation(通过代表性片段知识传播的弱监督时间动作定位)
paper | code

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars(通过咨询示例进行有效且高效的在线动作检测)
keywords: Online action detection(在线动作检测)
paper

行人重识别/检测(Re-Identification/Detection)

Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification(用于孤立摄像机监督行人重识别的摄像机条件稳定特征生成)
paper | [code](https://github.com/ftd- Wuchao/CCSFG)

Large-Scale Pre-training for Person Re-identification with Noisy Labels(带有噪声标签的人员重新识别的大规模预训练)
paper | code

Part-based Pseudo Label Refinement for Unsupervised Person Re-identification(用于无监督人员重新识别的基于部分的伪标签细化)
paper | code

Cascade Transformers for End-to-End Person Search(用于端到端人员搜索的级联transformer)
paper | code

图像/视频字幕(Image/Video Caption)

Quantifying Societal Bias Amplification in Image Captioning(量化图像字幕中的社会偏见放大)
paper

NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge(从外部知识中检索词汇的新颖对象字幕)
paper

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning(用于视频字幕的具有稀疏注意力的端到端transformer)
paper | code

Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources(通过在线资源对上下文外图像进行开放域、基于内容、多模式的事实检查)
paper | code

Hierarchical Modular Network for Video Captioning(用于视频字幕的分层模块化网络)
paper | code

X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning(使用 Transformer 进行 3D 密集字幕的跨模式知识迁移)
paper

医学影像(Medical Imaging)

Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis(用于 3D 医学图像分析的 Swin Transformers 的自监督预训练)
paper | code

Incremental Cross-view Mutual Distillation for Self-supervised Medical CT Synthesis(用于自监督医学 CT 合成的增量交叉视图相互蒸馏)
paper

DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification(用于组织病理学全幻灯片图像分类的双层特征蒸馏多实例学习)
paper | code

ACPL: Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification(半监督医学图像分类的反课程伪标签)
paper

Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces from 3D MRI Scans with Geometric Deep Neural Networks(使用几何深度神经网络从 3D MRI 扫描中快速显式重建皮质表面)
paper | code

Generalizable Cross-modality Medical Image Segmentation via Style Augmentation and Dual Normalization(通过风格增强和双重归一化的可泛化跨模态医学图像分割)
paper | code

Adaptive Early-Learning Correction for Segmentation from Noisy Annotations(从噪声标签中分割的自适应早期学习校正)
keywords: medical-imaging segmentation, Noisy Annotations
paper | code

Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations(时间上下文很重要：使用疾病进展表示增强单图像预测)
keywords: Self-supervised Transformer, Temporal modeling of disease progression
paper

文本检测/识别/理解(Text Detection/Recognition/Understanding)

Text Spotting Transformers(文本识别transformer)
paper | [code](https://github.com/mlpc- ucsd/TESTR)

Syntax-Aware Network for Handwritten Mathematical Expression Recognition(用于手写数学表达式识别的语法感知网络)
paper

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition(通过文本检测和文本识别之间更好的协同作用进行场景文本定位)
paper | code

Fourier Document Restoration for Robust Document Dewarping and Recognition(用于鲁棒文档去扭曲和识别的傅里叶文档恢复)
paper | code

XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding(迈向布局感知多模式网络，以实现视觉丰富的文档理解)
paper

遥感图像(Remote Sensing Image)

Exploiting Temporal Relations on Radar Perception for Autonomous Driving(利用自动驾驶雷达感知的时间关系)
paper

GAN/生成式/对抗式(GAN/Generative/Adversarial)

GAN-Supervised Dense Visual Alignment(GAN监督的密集视觉对齐)(Oral)
paper | code

Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond(迈向强大的雨水清除对抗对抗性攻击：综合基准分析及其他)
paper | code

Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training(了解 Frank-Wolfe 对抗训练并提高效率)
paper | code

Feature Statistics Mixing Regularization for Generative Adversarial Networks(生成对抗网络的特征统计混合正则化)
paper | code

Subspace Adversarial Training(子空间对抗训练)
paper | code

DTA: Physical Camouflage Attacks using Differentiable Transformation Network(使用可微变换网络的物理伪装攻击)
paper | code

Improving the Transferability of Targeted Adversarial Examples through Object-Based Diverse Input(通过基于对象的多样化输入提高目标对抗样本的可迁移性)
paper | code

Towards Practical Certifiable Patch Defense with Vision Transformer(使用 Vision Transformer 实现实用的可认证补丁防御)
paper

Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment(基于松弛空间结构对齐的小样本生成模型自适应)
paper

Enhancing Adversarial Training with Second-Order Statistics of Weights(使用权重的二阶统计加强对抗训练)
paper | code

Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack(通过自适应自动攻击对对抗鲁棒性的实际评估)
paper

Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity(对语义相似性的频率驱动的不可察觉的对抗性攻击)
paper

Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon(阴影可能很危险：自然现象的隐秘而有效的物理世界对抗性攻击)
paper

Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-robust Makeup Transfer(保护面部隐私：通过风格稳健的化妆转移生成对抗性身份面具)
paper

Adversarial Texture for Fooling Person Detectors in the Physical World(物理世界中愚弄人探测器的对抗性纹理)
paper

Label-Only Model Inversion Attacks via Boundary Repulsion(通过边界排斥的仅标签模型反转攻击)
paper

图像生成/图像合成(Image Generation/Image Synthesis)

Exemplar-bsaed Pattern Synthesis with Implicit Periodic Field Network(具有隐式周期场网络的示例模式合成)
paper

Styleformer: Transformer based Generative Adversarial Networks with Style Vector(具有样式向量的基于 Transformer 的生成对抗网络)
paper | code

Modulated Contrast for Versatile Image Synthesis(用于多功能图像合成的调制对比度)
paper | code

Attribute Group Editing for Reliable Few-shot Image Generation(属性组编辑用于可靠的小样本图像生成)
paper | code

Text to Image Generation with Semantic-Spatial Aware GAN(使用语义空间感知 GAN 生成文本到图像)
paper | code

Playable Environments: Video Manipulation in Space and Time(可播放环境：空间和时间的视频操作)
paper | code

FLAG: Flow-based 3D Avatar Generation from Sparse Observations(从稀疏观察中生成基于流的 3D 头像)
paper

Dynamic Dual-Output Diffusion Models(动态双输出扩散模型)
paper

Exploring Dual-task Correlation for Pose Guided Person Image Generation(探索姿势引导人物图像生成的双任务相关性)
paper | code

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces(基于小批量特征交换的三维形状变化自动编码器潜在解纠缠)
paper | code

Interactive Image Synthesis with Panoptic Layout Generation(具有全景布局生成的交互式图像合成)
paper

Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values(极性采样：通过奇异值对预训练生成网络的质量和多样性控制)
paper

Autoregressive Image Generation using Residual Quantization(使用残差量化的自回归图像生成)
paper | code

三维视觉(3D Vision)

Fast Point Transformer
paper

Towards Implicit Text-Guided 3D Shape Generation(迈向隐式文本引导的 3D 形状生成)
paper | code

The Neurally-Guided Shape Parser: Grammar-based Labeling of 3D Shape Regions with Approximate Inference(神经引导的形状解析器：具有近似推理的 3D 形状区域的基于语法的标记)
paper | code

Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings(在 3D 网格中嵌入消息并从 2D 渲染中提取它们)
paper

X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning(使用 Transformer 进行 3D 密集字幕的跨模式知识迁移)
paper

点云(Point Cloud)

REGTR: End-to-end Point Cloud Correspondences with Transformers(与 Transformer 的端到端点云匹配)
paper | code

Stratified Transformer for 3D Point Cloud Segmentation(用于 3D 点云分割的分层transformer)
paper | code

AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception(利用点云的径向对称性进行方位归一化 3D 感知)
paper | code

WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation(为对抗性 3D 点云生成扭曲多个均匀先验)
paper | code

IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment(通过深度嵌入对齐的动态 3D 点云插值)
paper | code

No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces(没有痛苦，收获很大：通过拟合特征级时空表面，用静态模型对动态点云序列进行分类)
paper | code

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation(通用 3D 零件分割的中间监督搜索)
paper

Geometric Transformer for Fast and Robust Point Cloud Registration(用于快速和稳健点云配准的几何transformer)
paper | code

Contrastive Boundary Learning for Point Cloud Segmentation(点云分割的对比边界学习)
paper | code

Shape-invariant 3D Adversarial Point Clouds(形状不变的 3D 对抗点云)
paper | code

ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation(通过对抗旋转提高点云分类器的旋转鲁棒性)
paper

Lepard: Learning partial point cloud matching in rigid and deformable scenes(Lepard：在刚性和可变形场景中学习部分点云匹配)
paper | code

A Unified Query-based Paradigm for Point Cloud Understanding(一种基于统一查询的点云理解范式)
paper

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding(用于 3D 点云理解的自监督跨模态对比学习)
keywords: Self-Supervised Learning, Contrastive Learning, 3D Point Cloud, Representation Learning, Cross-Modal Learning
paper | code

三维重建(3D Reconstruction)

I M Avatar: Implicit Morphable Head Avatars from Videos(视频中的隐式可变形头部头像)(Oral)
paper

BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion(使用双层神经体积融合的密集 3D 重建)
paper

SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video(从单目视频自我重建你的数字化身)(Oral)
paper | code

LISA: Learning Implicit Shape and Appearance of Hands(学习手的隐式形状和外观)
paper

BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information(通过利用品种信息学习从图像中回归 3D 狗形状)
paper | code

Uncertainty-Aware Deep Multi-View Photometric Stereo(不确定性感知深度多视图光度立体)
paper

Neural Reflectance for Shape Recovery with Shadow Handling(使用阴影处理进行形状恢复的神经反射)
paper | code

PLAD: Learning to Infer Shape Programs with Pseudo-Labels and Approximate Distributions(学习用伪标签和近似分布推断形状程序)
paper | code

ϕ-SfT: Shape-from-Template with a Physics-Based Deformation Model(具有基于物理的变形模型的模板形状)
paper | code

Input-level Inductive Biases for 3D Reconstruction(用于 3D 重建的输入级归纳偏差)
paper

AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation(用于 3D 完成、重建和生成的形状先验)
paper

Interacting Attention Graph for Single Image Two-Hand Reconstruction(单幅图像双手重建的交互注意力图)
paper | code

OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction(实时动态 3D 重建的遮挡感知运动估计)
paper

Neural RGB-D Surface Reconstruction(神经 RGB-D 表面重建)
paper

Neural Face Identification in a 2D Wireframe Projection of a Manifold Object(流形对象的二维线框投影中的神经人脸识别)
paper | [code](https://manycore- research.github.io/faceformer)

Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers(使用伤口分割和重建生成 3D 生物可打印贴片以治疗糖尿病足溃疡)
keywords: semantic segmentation, 3D reconstruction, 3D bio-printers
paper

H4D: Human 4D Modeling by Learning Neural Compositional Representation(通过学习神经组合表示进行人体 4D 建模)
keywords: 4D Representation(4D 表征),Human Body Estimation(人体姿态估计),Fine-grained Human Reconstruction(细粒度人体重建)
paper

场景重建/视图合成/新视角合成(Novel View Synthesis)

RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo(学习基于光线的 1D 隐式场以实现准确的多视图立体)
paper

Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis(用于可控 3D 人体合成的表面对齐神经辐射场)
paper

IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric Images(通过优化来自光度图像的神经 SDF 和材料进行反向渲染)
paper

MonoScene: Monocular 3D Semantic Scene Completion(单目 3D 语义场景完成)
paper | code

Stereo Magnification with Multi-Layer Images(具有多层图像的立体放大)
paper | code

Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations(通过集合潜在场景表示的无几何新颖视图合成)
paper

Neural Rays for Occlusion-aware Image-based Rendering(用于遮挡感知的基于图像的渲染的神经射线)
paper | code

Deblur-NeRF: Neural Radiance Fields from Blurry Images(来自模糊图像的神经辐射场)
paper | code

NPBG++: Accelerating Neural Point-Based Graphics(加速基于神经点的图形)
paper

PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo(从多视图立体重建 3D 平面)
paper

NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction(用于大规模场景重建的融合辐射场)
paper

GeoNeRF: Generalizing NeRF with Geometry Priors(用几何先验概括 NeRF)
paper | code

StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions(室内 3D 场景重建的风格转换)
paper | code

Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image(向外看：从单个图像合成一致的长期 3D 场景视频)
paper | code

Point-NeRF: Point-based Neural Radiance Fields(基于点的神经辐射场)
paper | code

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields(文本和图像驱动的神经辐射场操作)
keywords: NeRF, Image Generation and Manipulation, Language-Image Pre-Training (CLIP)
paper | code

Point-NeRF: Point-based Neural Radiance Fields(基于点的神经辐射场)
paper | code

模型压缩(Model Compression)

知识蒸馏(Knowledge Distillation)

Decoupled Knowledge Distillation(解耦知识蒸馏)
paper | code

Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation(小波知识蒸馏：迈向高效的图像到图像转换)
paper

Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability(知识蒸馏作为高效的预训练：更快的收敛、更高的数据效率和更好的可迁移性)
paper | code

Focal and Global Knowledge Distillation for Detectors(探测器的焦点和全局知识蒸馏)
keywords: Object Detection, Knowledge Distillation
paper | code

剪枝(Pruning)

CHEX: CHannel EXploration for CNN Model Compression(CNN模型压缩的通道探索)
paper | code

Interspace Pruning: Using Adaptive Filter Representations to Improve Training of Sparse CNNs(空间剪枝：使用自适应滤波器表示来改进稀疏 CNN 的训练)
paper

量化(Quantization)

It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher(一切尽在老师身上：零样本量化更贴近老师)(Oral)
paper

Implicit Feature Decoupling with Depthwise Quantization(使用深度量化的隐式特征解耦)
paper

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization(学习具有类内异质性的合成图像以进行零样本网络量化)
paper | code

神经网络结构设计(Neural Network Structure Design)

DyRep: Bootstrapping Training with Dynamic Re-parameterization(使用动态重新参数化的引导训练)
paper | code

BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning(学习探索样本关系以进行鲁棒表征学习)
keywords: sample relationship, data scarcity learning, Contrastive Self-Supervised Learning, long-tailed recognition, zero-shot learning, domain generalization, self-supervised learning
paper | code

CNN

TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing(用于布局感知视觉处理的高效翻译变体卷积)(动态卷积)
paper | code

On the Integration of Self-Attention and Convolution(自注意力和卷积的整合)
paper

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs(将内核扩展到 31x31：重新审视 CNN 中的大型内核设计)
paper | code

DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos(视频中稀疏帧差异的端到端 CNN 推断)
keywords: sparse convolutional neural network, video inference accelerating
paper

A ConvNet for the 2020s
paper | code

Transformer

Patch Slimming for Efficient Vision Transformers(高效视觉transformer的补丁瘦身)
paper

CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance(具有几何制导的基于码本的稀疏体素transformer)
paper

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens(通过操作信使token交换本地空间信息)
paper | code

BoxeR: Box-Attention for 2D and 3D Transformers(用于 2D 和 3D tranformer的 Box-Attention)
paper | code

Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training(引导 ViT：从预训练中解放视觉transformer)
paper | code

Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning
paper | code

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition(在视觉transformer中为视觉识别指定协同上下文)
paper | code

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts(深入研究分布变化下的视觉Transformer的泛化)
keywords: out-of-distribution (OOD) generalization, Vision Transformers
paper | code

Mobile-Former: Bridging MobileNet and Transformer(连接 MobileNet 和 Transformer)
keywords: Light-weight convolutional neural networks(轻量卷积神经网络),Combination of CNN and ViT
paper

图神经网络(GNN)

Improving Subgraph Recognition with Variational Graph Information Bottleneck(利用变分图信息瓶颈改进子图识别)
paper | code

AEGNN: Asynchronous Event-based Graph Neural Networks(基于异步事件的图神经网络)
paper

神经网络架构搜索(NAS)

Demystifying the Neural Tangent Kernel from a Practical Perspective: Can it be trusted for Neural Architecture Search without training?(从实用的角度揭开神经切线内核的神秘面纱：无需训练就可以信任神经架构搜索吗？)
paper | code

Training-free Transformer Architecture Search(免训练transformer架构搜索)
paper

Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning(MAML 的全局收敛和受理论启发的神经架构搜索以进行 Few-Shot 学习)
paper | code

β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search(可微架构搜索的 Beta-Decay 正则化)
paper

MLP

MAXIM: Multi-Axis MLP for Image Processing(用于图像处理的多轴 MLP)(Oral)
paper | code

Brain-inspired Multilayer Perceptron with Spiking Neurons(具有尖峰神经元的类脑多层感知器)
paper | code

Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information(利用地理和时间信息进行细粒度图像分类的动态 MLP)
paper | code

Revisiting the Transferability of Supervised Pretraining: an MLP Perspective(重新审视监督预训练的可迁移性：MLP 视角)
paper

An Image Patch is a Wave: Quantum Inspired Vision MLP(图像补丁是波浪：量子启发的视觉 MLP)
paper | code

数据处理(Data Processing)

Generating High Fidelity Data from Low-density Regions using Diffusion Models(使用扩散模型从低密度区域生成高保真数据)
paper

Dataset Distillation by Matching Training Trajectories(通过匹配训练轨迹进行蒸馏)(数据集蒸馏)
paper | code

数据增广(Data Augmentation)

EnvEdit: Environment Editing for Vision-and-Language Navigation(视觉语言导航的环境编辑)
paper | code

TeachAugment: Data Augmentation Optimization Using Teacher Knowledge(使用教师知识进行数据增强优化)
paper | code

3D Common Corruptions and Data Augmentation(3D 常见损坏和数据增强)(Oral)
keywords: Data Augmentation, Image restoration, Photorealistic image synthesis
paper

归一化/正则化(Batch Normalization)

Delving into the Estimation Shift of Batch Normalization in a Network(深入研究网络中批量标准化的估计偏移)
paper | code

图像聚类(Image Clustering)

RAMA: A Rapid Multicut Algorithm on GPU(GPU 上的快速多切算法)
paper | code

图像压缩(Image Compression)

Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression(用于高效神经图像压缩的统一多元高斯混合)
paper | code

ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding(具有不均匀分组的空间通道上下文自适应编码的高效学习图像压缩)
paper

The Devil Is in the Details: Window-based Attention for Image Compression(细节中的魔鬼：图像压缩的基于窗口的注意力)
paper | code

Neural Data-Dependent Transform for Learned Image Compression(用于学习图像压缩的神经数据相关变换)
paper | code

模型训练/泛化(Model Training/Generalization)

Parameter-free Online Test-time Adaptation(无参数在线测试时间自适应)(Oral)
paper | code

SNUG: Self-Supervised Neural Dynamic Garments(自我监督的神经动态服装)(Oral)
paper

Automated Progressive Learning for Efficient Training of Vision Transformers(用于高效训练视觉transformer的自动渐进式学习)
paper | code

GradViT: Gradient Inversion of Vision Transformers(视觉transformer的梯度反转)
paper

Recall@k Surrogate Loss with Large Batches and Similarity Mixup(大批量和相似性混合的 Recall@k 代理损失)
paper

Out-of-distribution Generalization with Causal Invariant Transformations(具有因果不变变换的分布外泛化)
paper

Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective(神经网络可以两次学习相同的模型吗？从决策边界的角度研究可重复性和双重下降)
paper | code

Towards Efficient and Scalable Sharpness-Aware Minimization(迈向高效和可扩展的锐度感知最小化)
keywords: Sharp Local Minima, Large-Batch Training
paper

CAFE: Learning to Condense Dataset by Aligning Features(通过对齐特征学习压缩数据集)
keywords: dataset condensation, coreset selection, generative models
paper | code

The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration(魔鬼在边缘：用于网络校准的基于边缘的标签平滑)
paper | code

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising(通过引入查询去噪加速 DETR 训练)
keywords: Detection Transformer
paper | code

噪声标签(Noisy Label)

UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning(通过统一选择和对比学习来对抗标签噪声)
paper | code

Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels(带有噪声标签的学习中噪声检测的可扩展惩罚回归)
paper | code

Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels(Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels)
paper | code

长尾分布(Long-Tailed Distribution)

Targeted Supervised Contrastive Learning for Long-Tailed Recognition(用于长尾识别的有针对性的监督对比学习)
keywords: Long-Tailed Recognition(长尾识别), Contrastive Learning(对比学习)
paper

图像特征提取与匹配(Image feature extraction and matching)

Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences(弱监督语义对应的概率扭曲一致性)
paper | code

视觉表征学习(Visual Representation Learning)

Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization(通过节点到邻域互信息最大化的图中节点表示学习)
paper | code

SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization(通过相似性感知归一化探索场景文本的自监督表示学习)
paper

Exploring Set Similarity for Dense Self-supervised Representation Learning(探索密集自监督表示学习的集合相似性)
paper

Motion-aware Contrastive Video Representation Learning via Foreground-background Merging(通过前景-背景合并的运动感知对比视频表示学习)
paper | code

多模态学习(Multi-Modal Learning)

MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound(通过视觉、语言和声音的神经脚本知识)
paper

视听学习(Audio-visual Learning)

Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language(具有跨模态注意力和语言的视听广义零样本学习)
paper | code

Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes(自监督预测学习：视觉场景中声源定位的无负法方法)(视觉定位)
paper | code

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation(用于协同语音手势生成的学习分层跨模式关联)
paper

UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection(用于联合视频时刻检索和高光检测的统一多模态transformer)
paper | code

视觉-语言（Vision-language）

DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation(用于鲁棒图像处理的文本引导扩散模型)
paper | code

StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis(走向合成和高保真文本到图像的合成)
paper

LiT: Zero-Shot Transfer with Locked-image text Tuning(带锁定图像文本调整的零样本迁移)
paper

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks(视觉和语言任务的参数高效迁移学习)
paper | code

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model(预测、预防和评估：由预训练的视觉语言模型支持的解耦的文本驱动图像处理)
paper | code

LAFITE: Towards Language-Free Training for Text-to-Image Generation(面向文本到图像生成的无语言培训)
paper | code

An Empirical Study of Training End-to-End Vision-and-Language Transformers(培训端到端视觉和语言transformer的实证研究)
paper | code

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding(为视觉基础生成伪语言查询)
paper | code

Conditional Prompt Learning for Vision-Language Models(视觉语言模型的条件提示学习)
paper | code

NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks(视觉和视觉语言任务中的自然语言解释模型)
paper | code

**L-Verse: Bidirectional Generation Between Image and Text(图像和文本之间的双向生成) **(Oral Presentation)****
paper

HairCLIP: Design Your Hair by Text and Reference Image(通过文本和参考图像设计你的头发)
keywords: Language-Image Pre-Training (CLIP), Generative Adversarial Networks
paper

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields(文本和图像驱动的神经辐射场操作)
keywords: NeRF, Image Generation and Manipulation, Language-Image Pre-Training (CLIP)
paper | code

Vision-Language Pre-Training with Triple Contrastive Learning(三重对比学习的视觉语言预训练)
keywords: Vision-language representation learning, Contrastive Learning
paper | code

视觉预测(Vision-based Prediction)

Multi-Person Extreme Motion Prediction(多人极限运动预测)
paper

Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos(以自我为中心的视频的联合手部运动和交互热点预测)
paper

Vehicle trajectory prediction works, but not everywhere(车辆轨迹预测有效，但并非无处不在)
paper | code

Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion(基于运动不确定性扩散的随机轨迹预测)
paper | code

Non-Probability Sampling Network for Stochastic Human Trajectory Prediction(用于随机人体轨迹预测的非概率采样网络)
paper | code

Remember Intentions: Retrospective-Memory-based Trajectory Prediction(记住意图：基于回顾性记忆的轨迹预测)
paper | code

GaTector: A Unified Framework for Gaze Object Prediction(凝视对象预测的统一框架)
paper

On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles(自动驾驶汽车轨迹预测的对抗鲁棒性)
paper | code

Adaptive Trajectory Prediction via Transferable GNN(基于可迁移 GNN 的自适应轨迹预测)
paper

Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective(迈向稳健和自适应运动预测：因果表示视角)
paper | code

How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting(多少个观察就足够了？轨迹预测的知识蒸馏)
keywords: Knowledge Distillation, trajectory forecasting
paper

Motron: Multimodal Probabilistic Human Motion Forecasting(多模式概率人体运动预测)
paper

数据集(Dataset)

Pyramid Grafting Network for One-Stage High Resolution Saliency Detection(用于单阶段高分辨率显着性检测的金字塔嫁接网络)
paper

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting(使用 Transformer 编码多尺度时间相关性以进行重复动作计数)(Oral)
paper | code

Multi-Person Extreme Motion Prediction(多人极限运动预测)(人体交互数据集)
paper

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer(用于 Sim2Real 传输的多感官对象数据集)
paper

Rethinking Visual Geo-localization for Large-Scale Applications(重新思考大规模应用程序的视觉地理定位)
paper

Deep Image-based Illumination Harmonization(基于深度图像的照明协调)
paper

OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction(理解手物交互的大规模知识库)
paper

Instance-wise Occlusion and Depth Orders in Natural Scenes(自然场景中的实例遮挡和深度顺序)
paper | code

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities(用于理解程序活动的大规模多视图视频数据集)
paper

Rope3D: TheRoadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task(用于自动驾驶和单目 3D 目标检测任务的路边感知数据集)
paper

DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation(用于语义变化分割的每日多光谱卫星数据集)
paper

Egocentric Prediction of Action Target in 3D(以自我为中心的 3D 行动目标预测)(机器人)
paper

M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining(电子商务多模态预训练的自协调对比学习)(多模态预训练数据集)
paper

FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos(用于视频中面部表情识别的大规模多场景数据集)
paper

Ego4D: Around the World in 3,000 Hours of Egocentric Video(3000 小时以自我为中心的视频环游世界)
paper

GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive Recognition of Cereal Grains(用于细粒度和域自适应识别谷物的大规模数据集)
paper

Kubric: A scalable dataset generator(Kubric：可扩展的数据集生成器)
paper | code

A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection(用于分段级视频复制检测的大规模综合数据集和复制重叠感知评估协议)
paper

主动学习(Active Learning)

Active Learning by Feature Mixing(通过特征混合进行主动学习)
paper | code

小样本学习/零样本学习(Few-shot Learning/Zero-shot Learning)

Integrative Few-Shot Learning for Classification and Segmentation(用于分类和分割的集成小样本学习)
paper

Ranking Distance Calibration for Cross-Domain Few-Shot Learning(跨域小样本学习的排名距离校准)
paper

Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification(小样本分类的相互集中学习)
paper

MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning(用于零样本学习的相互语义蒸馏网络)
keywords: Zero-Shot Learning, Knowledge Distillation
paper | code

持续学习(Continual Learning/Life-long Learning)

GCR: Gradient Coreset Based Replay Buffer Selection For Continual Learning(用于持续学习的基于梯度核心集的重放缓冲区选择)
paper

Probing Representation Forgetting in Supervised and Unsupervised Continual Learning(探索有监督和无监督持续学习中的表征遗忘)
paper

Meta-attention for ViT-backed Continual Learning(ViT 支持的持续学习的元注意力)
paper | code

Learning to Prompt for Continual Learning(学习提示持续学习)
paper | code

On Generalizing Beyond Domains in Cross-Domain Continual Learning(关于跨域持续学习中的域外泛化)
paper

场景图(Scene Graph)

Continuous Scene Representations for Embodied AI(具身 AI 的连续场景表示)
paper | code

场景图生成(Scene Graph Generation)

Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation(用于无偏场景图生成的堆叠混合注意力和组协作学习)
paper | code

Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs(将视频场景图重新格式化为时间二分图)
keywords: Video Scene Graph Generation, Transformer, Video Grounding
paper | code

视觉定位/位姿估计(Visual Localization/Pose Estimation)

ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework(一种计算效率高且具有对称性的 6D 姿势回归框架)
paper | code

Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions(重新审视 3D 对象姿态估计的模板：对新对象的泛化和对遮挡的鲁棒性)
paper | code

OSOP: A Multi-Stage One Shot Object Pose Estimation Framework(多阶段 One Shot 对象姿态估计框架)
paper

Putting People in their Place: Monocular Regression of 3D People in Depth(3D 人物深度的单目回归)
paper | code

FS6D: Few-Shot 6D Pose Estimation of Novel Objects(新物体的小样本 6D 姿态估计)
paper

Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation(用于 6D 姿势估计的无投影分解的统一 CNN 框架)
paper

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation(用于单目物体姿态估计的广义端到端概率透视-n-点)
paper

RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization(具有鲁棒对应场估计和姿态优化的递归 6-DoF 对象姿态细化)
paper | code

DiffPoseNet: Direct Differentiable Camera Pose Estimation(直接可微分相机位姿估计)
paper

ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation(用于 6DoF 对象姿态估计的粗到细表面编码)
paper

Object Localization under Single Coarse Point Supervision(单粗点监督下的目标定位)
paper | code

CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data(多模式合成数据辅助的可扩展空中定位)
paper | code

GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting(通过几何引导的逐点投票进行类别级对象位姿估计)
paper | code

CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild(CPPF：在野外实现稳健的类别级 9D 位姿估计)
paper | code

OVE6D: Object Viewpoint Encoding for Depth-based 6D Object Pose Estimation(用于基于深度的 6D 对象位姿估计的对象视点编码)
paper | code

Spatial Commonsense Graph for Object Localisation in Partial Scenes(局部场景中对象定位的空间常识图)
paper | code

视觉推理/视觉问答(Visual Reasoning/VQA)

SimVQA: Exploring Simulated Environments for Visual Question Answering(探索视觉问答的模拟环境)
paper

Learning to Answer Questions in Dynamic Audio-Visual Scenarios(学习在动态视听场景中回答问题)(视听学习)
paper | code

Visual Abductive Reasoning(视觉溯因推理)
paper | code

MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering(基于知识的视觉问答的多模态知识提取与积累)
paper | code

REX: Reasoning-aware and Grounded Explanation(推理意识和扎根的解释)
paper | code

图像分类(Image Classification)

CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification(共同适应判别特征以改进小样本分类)
paper

GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for Multi-category Attributes Prediction(用于多类别属性预测的基于全局、局部和内在的密集嵌入网络)
keywords: multi-label classification
paper | code

迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

Transferability Estimation using Bhattacharyya Class Separability(使用 Bhattacharyya 类可分离性的可迁移性估计)
paper

The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization(通过归一化进行动态无监督域自适应)
paper | code

Continual Test-Time Domain Adaptation(持续测试时域适应)
paper | code

Compound Domain Generalization via Meta-Knowledge Encoding(基于元知识编码的复合域泛化)
paper

Learning Affordance Grounding from Exocentric Images(从离中心图像中学习可供性基础)
paper | code

Category Contrast for Unsupervised Domain Adaptation in Visual Tasks(视觉任务中无监督域适应的类别对比)
paper

Learning Distinctive Margin toward Active Domain Adaptation(向主动领域适应学习独特的边际)
paper | code

How Well Do Sparse Imagenet Models Transfer?(稀疏 Imagenet 模型的迁移效果如何？)
paper

A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation(用于手语翻译的简单多模态迁移学习基线)
paper

Weakly Supervised Object Localization as Domain Adaption(作为域适应的弱监督对象定位)
keywords: Weakly Supervised Object Localization(WSOL), Multi-instance learning based WSOL, Separated-structure based WSOL, Domain Adaption
paper | code

度量学习(Metric Learning)

Hyperbolic Vision Transformers: Combining Improvements in Metric Learning(双曲线视觉transformer：结合度量学习的改进)
paper | code

Non-isotropy Regularization for Proxy-based Deep Metric Learning(基于代理的深度度量学习的非各向同性正则化)
paper | code

Integrating Language Guidance into Vision-based Deep Metric Learning(将语言指导集成到基于视觉的深度度量学习中)
paper | code

Enhancing Adversarial Robustness for Deep Metric Learning(增强深度度量学习的对抗鲁棒性)
keywords: Adversarial Attack, Adversarial Defense, Deep Metric Learning
paper

对比学习(Contrastive Learning)

Versatile Multi-Modal Pre-Training for Human-Centric Perception(用于以人为中心的感知的多功能多模态预训练)
paper | code

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation(用于弱监督对象定位和语义分割的类不可知激活图的对比学习)
paper | [code](https://github.com/CVI- SZU/CCAM)

Rethinking Minimal Sufficient Representation in Contrastive Learning(重新思考对比学习中的最小充分表示)(Oral)
paper | code

Selective-Supervised Contrastive Learning with Noisy Labels(带有噪声标签的选择性监督对比学习)
paper | code

HCSC: Hierarchical Contrastive Selective Coding(分层对比选择性编码)
keywords: Self-supervised Representation Learning, Deep Clustering, Contrastive Learning
paper | code

Crafting Better Contrastive Views for Siamese Representation Learning(为连体表示学习制作更好的对比视图)
paper | code

增量学习(Incremental Learning)

Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning(类增量学习的初始阶段去相关方法)
paper | code

Forward Compatible Few-Shot Class-Incremental Learning(前后兼容的小样本类增量学习)
paper | code

Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning(非示例类增量学习的自我维持表示扩展)
paper

强化学习(Reinforcement Learning)

Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory(具有编排记忆的演员评论家 GPT 的 3D 舞蹈生成)
paper | code

元学习(Meta Learning)

A Structured Dictionary Perspective on Implicit Neural Representations(隐式神经表示的结构化字典视角)
paper | code

Multidimensional Belief Quantification for Label-Efficient Meta-Learning(标签高效元学习的多维信念量化)
paper

What Matters For Meta-Learning Vision Regression Tasks?(元学习视觉回归任务的重要性是什么？)
paper

机器人(Robotic)

Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation(通过离散化实现视觉机器人操作的高效学习)
paper | code

IFOR: Iterative Flow Minimization for Robotic Object Rearrangement(IFOR：机器人对象重排的迭代流最小化)
paper

半监督学习/弱监督学习/无监督学习/自监督学习(Self-supervised Learning/Semi-supervised Learning)

When Does Contrastive Visual Representation Learning Work?(对比视觉表征学习何时起作用)
paper

Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy(利用局部和全局表征：一种新的自我监督学习策略)
paper

Decoupling Makes Weakly Supervised Local Feature Better(解耦使弱监督的局部特征更好)
paper | code

SimMatch: Semi-supervised Learning with Similarity Matching(具有相似性匹配的半监督学习)
paper | code

Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements(一个完全无监督的框架，用于从噪声和部分测量中学习图像)
paper | code

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training(自监督视觉预训练的统一框架)
paper

Class-Aware Contrastive Semi-Supervised Learning(类感知对比半监督学习)
keywords: Semi-Supervised Learning, Self-Supervised Learning, Real-World Unlabeled Data Learning
paper

A study on the distribution of social biases in self-supervised learning visual models(自监督学习视觉模型中social biases分布的研究)
paper

神经网络可解释性(Neural Network Interpretability)

Do Explanations Explain? Model Knows Best(解释解释吗？模型最清楚)
paper

Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks(神经网络中可解释的部分-整体层次结构和概念语义关系)
paper

图像计数(Image Counting)

DR.VIC: Decomposition and Reasoning for Video Individual Counting(视频个体计数的分解与推理)
paper | code

Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting(表示、比较和学习：用于类不可知计数的相似性感知框架)
paper | code

Boosting Crowd Counting via Multifaceted Attention(通过多方面注意提高人群计数)
paper | code

联邦学习(Federated Learning)

FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning(用于异构联邦学习的基于相关性的主动客户端选择策略)
paper

FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling and Correction(通过局部漂移解耦和校正与非 IID 数据进行联邦学习)
paper | code

Federated Class-Incremental Learning(联邦类增量学习)
paper | code

Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning(通过非 IID 联邦学习的无数据知识蒸馏微调全局模型)
paper

Differentially Private Federated Learning with Local Regularization and Sparsification(局部正则化和稀疏化的差分私有联邦学习)
paper

其他

What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions(看什么和在哪里看：语义和空间精炼transformer，用于检测人与物体的交互)(Oral)
paper | code

Marginal Contrastive Correspondence for Guided Image Generation(引导图像生成的边际对比对应)(Oral)
paper

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting(使用 Transformer 编码多尺度时间相关性以进行重复动作计数)(Oral)
paper | code

Learning Part Segmentation through Unsupervised Domain Adaptation from Synthetic Vehicles(通过合成车辆的无监督域适应学习局部分割)(Oral)
paper

Semantic-Aware Domain Generalized Segmentation(语义感知领域广义分割)(Oral)
paper | code

Revisiting Skeleton-based Action Recognition(重新审视基于骨架的动作识别)(Oral)
paper | code

MAXIM: Multi-Axis MLP for Image Processing(用于图像处理的多轴 MLP)(Oral)
paper | code

Rethinking Minimal Sufficient Representation in Contrastive Learning(重新思考对比学习中的最小充分表示)(Oral)
paper | code

I M Avatar: Implicit Morphable Head Avatars from Videos(视频中的隐式可变形头部头像)(Oral)
paper

Parameter-free Online Test-time Adaptation(无参数在线测试时间自适应)(Oral)
paper | code

Correlation Verification for Image Retrieval(图像检索的相关性验证)(Oral)
paper | code

Rethinking Semantic Segmentation: A Prototype View(重新思考语义分割：原型视图)(Oral)
paper | code

SNUG: Self-Supervised Neural Dynamic Garments(自我监督的神经动态服装)(Oral)
paper

SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video(从单目视频自我重建你的数字化身)(Oral)
paper | code

Dual-AI: Dual-path Action Interaction Learning for Group Activity Recognition(用于群体动作识别的双路径动作交互学习)(Oral)
paper

3D Common Corruptions and Data Augmentation(3D 常见损坏和数据增强)(Oral)
paper

GAN-Supervised Dense Visual Alignment(GAN监督的密集视觉对齐)(Oral)
paper | code

It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher(一切尽在老师身上：零样本量化更贴近老师)(Oral)
paper

AdaMixer: A Fast-Converging Query-Based Object Detector(一种快速收敛的基于查询的对象检测器)(Oral)
paper | code

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry(通过融合单视图深度概率与多视图几何进行多视图深度估计)(Oral)
paper | code

**L-Verse: Bidirectional Generation Between Image and Text(图像和文本之间的双向生成) **(视觉语言表征学习)****
paper

Backbone

MPViT : Multi-Path Vision Transformer for Dense Prediction
paper | code

MetaFormer is Actually What You Need for Vision
paper | code

Shunted Self-Attention via Multi-Scale Token Aggregation
paper | code

Learned Queries for Efficient Local Attention
paper | code

RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
paper | code

CLIP

PointCLIP: Point Cloud Understanding by CLIP
paper | code

Blended Diffusion for Text-driven Editing of Natural Images
paper | code

GAN

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
paper

Unsupervised Image-to-Image Translation with Generative Prior
paper | code

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
paper | code

GNN

OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks
paper | code

MLP

RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
paper | code

NAS

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
paper | code

NeRF

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
paper

NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images
paper

Urban Radiance Fields
paper

Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation
paper | code

HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
paper

长尾分布(Long-Tail)

Retrieval Augmented Classification for Long-Tail Visual Recognition
paper | code

Visual Transformer

Backbone

应用(Application)

Embracing Single Stride 3D Object Detector with Sparse Transformer
paper | code

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
paper | code

GroupViT: Semantic Segmentation Emerges from Text Supervision
paper

Splicing ViT Features for Semantic Appearance Transfer
paper | code

Omni-DETR: Omni-Supervised Object Detection with Transformers
paper | code

Collaborative Transformers for Grounded Situation Recognition
paper | code

NFormer: Robust Person Re-identification with Neighbor Transformer
paper | code

Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation
paper | code

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
paper | code

A New Dataset and Transformer for Stereoscopic Video Super-Resolution
paper | code

Safe Self-Refinement for Transformer-based Domain Adaptation
paper | code

Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
paper | code

自监督学习(Self-supervised Learning)

DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
paper | code

数据增强(Data Augmentation)

AlignMix: Improving representation by interpolating aligned features
paper | code

目标检测(Object Detection)

Omni-DETR: Omni-Supervised Object Detection with Transformers
paper | code

半监督目标检测

Dense Learning based Semi-Supervised Object Detection
paper | code

目标跟踪(Visual Tracking)

多模态目标跟踪

Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
paper

多目标跟踪(Multi-Object Tracking)

Learning of Global Objective for Network Flow in Multi-Object Tracking
paper | code

语义分割(Semantic Segmentation)

Novel Class Discovery in Semantic Segmentation
paper | code

无监督语义分割

GroupViT: Semantic Segmentation Emerges from Text Supervision
paper

实例分割(Instance Segmentation)

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
paper | code

自监督实例分割

FreeSOLO: Learning to Segment Objects without Annotations
paper | code

视频实例分割

Temporally Efficient Vision Transformer for Video Instance Segmentation
paper | code

小样本分割(Few-Shot Segmentation)

Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation
paper | code

图像抠图(Image Matting)

Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation
paper | code

视频理解(Video Understanding)

FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
paper | code

图像编辑(Image Editing)

Blended Diffusion for Text-driven Editing of Natural Images
paper | code

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
paper

Low-level Vision

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
paper | code

超分辨率(Super-Resolution)

图像超分辨率(Image Super-Resolution)

Learning the Degradation Distribution for Blind Image Super-Resolution
paper | code

视频超分辨率(Video Super-Resolution)

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
paper | code

Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling
paper | code

A New Dataset and Transformer for Stereoscopic Video Super-Resolution
paper | code

3D点云(3D Point Cloud)

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
paper | code

PointCLIP: Point Cloud Understanding by CLIP
paper | code

3D目标检测(3D Object Detection)

Embracing Single Stride 3D Object Detector with Sparse Transformer
paper | code

HyperDet3D: Learning a Scene-conditioned 3D Object Detector
paper | code

OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data
paper | code

DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
paper | code

3D目标跟踪(3D Object Tracking)

PTTR: Relational 3D Point Cloud Object Tracking with Transformer
paper | code

3D重建(3D Reconstruction)

BANMo: Building Animatable 3D Neural Models from Many Casual Videos
paper | code

行人重识别(Person Re-identification)

NFormer: Robust Person Re-identification with Neighbor Transformer
paper | code

深度估计(Depth Estimation)

单目深度估计

Toward Practical Self-Supervised Monocular Indoor Depth Estimation
paper | code

Multi-Frame Self-Supervised Depth with Transformers
paper | code

特征匹配(Feature Matching)

ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching
paper | code

车道线检测(Lane Detection)

A Keypoint-based Global Association Network for Lane Detection
paper | code

光流估计(Optical Flow Estimation)

Imposing Consistency for Optical Flow Estimation
paper | code

Deep Equilibrium Optical Flow Estimation
paper | code

人脸识别(Face Recognition)

AdaFace: Quality Adaptive Margin for Face Recognition
paper | code

人群计数(Crowd Counting)

Leveraging Self-Supervision for Cross-Domain Crowd Counting
paper | code

医学图像(Medical Image)

BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
paper | code

DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
paper | code

视频生成(Video Generation)

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
paper | code

场景图生成(Scene Graph Generation)

SGTR: End-to-end Scene Graph Generation with Transformer
paper | code

参考视频目标分割(Referring Video Object Segmentation)

ReSTR: Convolution-free Referring Image Segmentation Using Transformers
paper | code

步态识别(Gait Recognition)

Gait Recognition in the Wild with Dense 3D Representations and A Benchmark
paper | code

对抗样本(Adversarial Examples)

LAS-AT: Adversarial Training with Learnable Attack Strategy
paper | code

图像拼接(Image Stitching)

Deep Rectangling for Image Stitching: A Learning Baseline
paper | code

Grounded Situation Recognition

Collaborative Transformers for Grounded Situation Recognition
paper | code

Zero-shot Learning

Unseen Classes at a Later Time? No Problem
paper | code

DeepFakes

Detecting Deepfakes with Self-Blended Images
paper | code

数据集(Datasets)

Toward Practical Self-Supervised Monocular Indoor Depth Estimation
paper | code

Deep Rectangling for Image Stitching: A Learning Baseline
paper | code

Shape from Polarization for Complex Scenes in the Wild
paper | code

Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
paper

FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
paper | code

Aesthetic Text Logo Synthesis via Content-aware Layout Inferring
paper | code

DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
paper | code

A New Dataset and Transformer for Stereoscopic Video Super-Resolution
paper | code

新任务(New Task)

Splicing ViT Features for Semantic Appearance Transfer
paper | code

其他(Others)

Balanced MSE for Imbalanced Visual Regression
paper | code

Shape from Polarization for Complex Scenes in the Wild
paper | code

LASER: LAtent SpacE Rendering for 2D Visual Localization
paper | code

Single-Photon Structured Light
paper | code

3DeformRS: Certifying Spatial Deformations on Point Clouds
paper | code

Aesthetic Text Logo Synthesis via Content-aware Layout Inferring
paper | code

Robust and Accurate Superquadric Recovery: a Probabilistic Approach
paper | code

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
paper | code

DeepDPM: Deep Clustering With an Unknown Number of Clusters
paper | code

ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
paper | code

Proto2Proto: Can you recognize the car, the way I do?
paper | code

gbstack / cvpr-2022-papers Goto Github PK

cvpr-2022-papers's Introduction

CVPR2022 Papers (Papers/Codes/Demos)

分类目录：

检测

2D目标检测(2D Object Detection)

视频目标检测(Video Object Detection)

3D目标检测(3D object detection)

人物交互检测(HOI Detection)

伪装目标检测(Camouflaged Object Detection)

显著性目标检测(Saliency Object Detection)

关键点检测(Keypoint Detection)

车道线检测(Lane Detection)

边缘检测(Edge Detection)

消失点检测(Vanishing Point Detection)

异常检测(Anomaly Detection)

分割(Segmentation)

图像分割(Image Segmentation)

全景分割(Panoptic Segmentation)

语义分割(Semantic Segmentation)

实例分割(Instance Segmentation)

视频目标分割(Video Object Segmentation)

密集预测(Dense Prediction)

视频处理(Video Processing)

视频处理(Video Processing)

视频编辑(Video Editing)

视频生成/视频合成(Video Generation/Video Synthesis)

视频超分(Video Super-Resolution)

估计(Estimation)

光流/运动估计(Optical Flow/Motion Estimation)

深度估计(Depth Estimation)

人体解析/人体姿态估计(Human Parsing/Human Pose Estimation)

手势估计(Gesture Estimation)

图像处理(Image Processing)

超分辨率(Super Resolution)

图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)

图像去噪/去模糊/去雨去雾(Image Denoising)

图像编辑/图像修复(Image Edit/Inpainting)

图像翻译(Image Translation)

风格迁移(Style Transfer)

人脸(Face)

人脸(Face)

人脸识别/检测(Facial Recognition/Detection)

人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

目标跟踪(Object Tracking)

目标跟踪(Object Tracking)

图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)

图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)

行为识别/动作识别/检测/分割/定位(Action/Activity Recognition)

行人重识别/检测(Re-Identification/Detection)

图像/视频字幕(Image/Video Caption)

医学影像(Medical Imaging)

医学影像(Medical Imaging)

文本检测/识别/理解(Text Detection/Recognition/Understanding)

文本检测/识别/理解(Text Detection/Recognition/Understanding)

遥感图像(Remote Sensing Image)

遥感图像(Remote Sensing Image)

GAN/生成式/对抗式(GAN/Generative/Adversarial)

GAN/生成式/对抗式(GAN/Generative/Adversarial)

图像生成/图像合成(Image Generation/Image Synthesis)

图像生成/图像合成(Image Generation/Image Synthesis)

三维视觉(3D Vision)

三维视觉(3D Vision)

点云(Point Cloud)

三维重建(3D Reconstruction)

场景重建/视图合成/新视角合成(Novel View Synthesis)

模型压缩(Model Compression)

知识蒸馏(Knowledge Distillation)

剪枝(Pruning)

量化(Quantization)

神经网络结构设计(Neural Network Structure Design)

神经网络结构设计(Neural Network Structure Design)

CNN

Transformer

图神经网络(GNN)

神经网络架构搜索(NAS)

MLP

数据处理(Data Processing)

数据处理(Data Processing)