Code Monkey home page Code Monkey logo

cv-arxiv-daily's Introduction

Contributors Forks Stargazers Issues

Updated on 2024.08.22

Usage instructions: here

Table of Contents
  1. Semantic Segmentation
  2. Instance Segmentation
  3. Panoptic Segmentation
  4. Object Detection
  5. Keypoint Detection
  6. Open-Vocabulary
  7. Image Captioning

Semantic Segmentation

Publish Date Title Authors PDF Code
2024-08-20 NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency Valentinos Pariza et.al. 2408.11054 null
2024-08-20 CO2Wounds-V2: Extended Chronic Wounds Dataset From Leprosy Patients Karen Sanchez et.al. 2408.10827 null
2024-08-20 Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended? Chen Liang et.al. 2408.10627 null
2024-08-20 Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation Jiawei Han et.al. 2408.10537 link
2024-08-19 Imbalance-Aware Culvert-Sewer Defect Segmentation Using an Enhanced Feature Pyramid Network Rasha Alshawi et.al. 2408.10181 null
2024-08-19 Dynamic Label Injection for Imbalanced Industrial Defect Segmentation Emanuele Caruso et.al. 2408.10031 link
2024-08-19 Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis Kira Maag et.al. 2408.10021 null
2024-08-19 Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving Jun Yan et.al. 2408.09839 link
2024-08-18 OVOSE: Open-Vocabulary Semantic Segmentation in Event-Based Cameras Muhammad Rameez Ur Rahman et.al. 2408.09424 link
2024-08-18 Elite360M: Efficient 360 Multi-task Learning via Bi-projection Fusion and Cross-task Collaboration Hao Ai et.al. 2408.09336 null
2024-08-17 Cross-Species Data Integration for Enhanced Layer Segmentation in Kidney Pathology Junchao Zhu et.al. 2408.09278 link
2024-08-17 GoodSAM++: Bridging Domain and Capacity Gaps via Segment Anything Model for Panoramic Semantic Segmentation Weiming Zhang et.al. 2408.09115 null
2024-08-17 Depth-guided Texture Diffusion for Image Semantic Segmentation Wei Sun et.al. 2408.09097 null
2024-08-15 5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks Dongshuo Yin et.al. 2408.08345 link
2024-08-14 MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis Nimeesha Chan et.al. 2408.07773 link
2024-08-15 MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation Beoungwoo Kang et.al. 2408.07576 link
2024-08-15 MagicFace: Training-free Universal-Style Human Image Customized Synthesis Yibin Wang et.al. 2408.07433 null
2024-08-14 Segment Using Just One Example Pratik Vora et.al. 2408.07393 null
2024-08-14 Ensemble architecture in polyp segmentation Hao-Yun Hsu et.al. 2408.07262 link
2024-08-14 Leveraging Perceptual Scores for Dataset Pruning in Computer Vision Tasks Raghavendra Singh et.al. 2408.07243 null
2024-08-14 Enhancing Autonomous Vehicle Perception in Adverse Weather through Image Augmentation during Semantic Segmentation Training Ethan Kou et.al. 2408.07239 null
2024-08-13 ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation Jingyun Wang et.al. 2408.06747 link
2024-08-10 Dilated Convolution with Learnable Spacings Ismail Khalfaoui-Hassani et.al. 2408.06383 null
2024-08-12 Correlation Weighted Prototype-based Self-Supervised One-Shot Segmentation of Medical Images Siladittya Manna et.al. 2408.06235 null
2024-08-12 A-BDD: Leveraging Data Augmentations for Safe Autonomous Driving in Adverse Weather and Lighting Felix Assion et.al. 2408.06071 null
2024-08-12 Enhancing 3D Transformer Segmentation Model for Medical Image with Token-level Representation Learning Xinrong Hu et.al. 2408.05889 link
2024-08-11 Seg-CycleGAN : SAR-to-optical image translation guided by a downstream task Hannuo Zhang et.al. 2408.05777 null
2024-08-11 MacFormer: Semantic Segmentation with Fine Object Boundaries Guoan Xu et.al. 2408.05699 null
2024-08-10 Multimodal generative semantic communication based on latent diffusion model Weiqi Fu et.al. 2408.05455 null
2024-08-09 In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation Dahyun Kang et.al. 2408.04961 link
2024-08-09 ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation Mengcheng Lan et.al. 2408.04883 link
2024-08-09 Extracting Signal Electron Trajectories in the COMET Phase-I Cylindrical Drift Chamber Using Deep Learning Fumihiro Kaneko et.al. 2408.04795 null
2024-08-08 SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation Jieming Yu et.al. 2408.04593 null
2024-08-08 SegXAL: Explainable Active Learning for Semantic Segmentation in Driving Scene Scenarios Sriram Mandalika et.al. 2408.04482 null
2024-08-08 What could go wrong? Discovering and describing failure modes in computer vision Gabriela Csurka et.al. 2408.04471 null
2024-08-07 CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications Tianfang Zhang et.al. 2408.03703 link
2024-08-07 SAM2-PATH: A better segment anything model for semantic segmentation in digital pathology Mingya Zhang et.al. 2408.03651 link
2024-08-06 Post-Mortem Human Iris Segmentation Analysis with Deep Learning Afzal Hossain et.al. 2408.03448 null
2024-08-06 Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression Jonas Schmitt et.al. 2408.03046 link
2024-08-05 Perception Matters: Enhancing Embodied AI with Uncertainty-Aware Semantic Segmentation Sai Prasanna et.al. 2408.02297 null
2024-08-05 Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs Jeongkee Lim et.al. 2408.02261 null
2024-08-05 Curriculum learning based pre-training using Multi-Modal Contrastive Masked Autoencoders Muhammad Abdullah Jamal et.al. 2408.02245 null
2024-08-04 Pixel-Level Domain Adaptation: A New Perspective for Enhancing Weakly Supervised Semantic Segmentation Ye Du et.al. 2408.02039 null
2024-08-03 Bayesian Active Learning for Semantic Segmentation Sima Didari et.al. 2408.01694 null
2024-08-03 A Comparative Analysis of CNN-based Deep Learning Models for Landslide Detection Omkar Oak et.al. 2408.01692 null
2024-08-03 Leveraging GNSS and Onboard Visual Data from Consumer Vehicles for Robust Road Network Estimation Balázs Opra et.al. 2408.01640 null
2024-08-02 Multi-Unit Floor Plan Recognition and Reconstruction Using Improved Semantic Segmentation of Raster-Wise Floor Plans Lukas Kratochvila et.al. 2408.01526 null
2024-08-02 Balanced Residual Distillation Learning for 3D Point Cloud Class-Incremental Semantic Segmentation Yuanzhi Su et.al. 2408.01356 null
2024-08-02 StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation Bingyu Li et.al. 2408.01343 null
2024-08-02 Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach Yabin Zhu et.al. 2408.00969 link
2024-08-01 Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation Siyu Jiao et.al. 2408.00744 link
2024-08-01 Synthetic dual image generation for reduction of labeling efforts in semantic segmentation of micrographs with a customized metric function Matias Oscar Volman Stern et.al. 2408.00707 null
2024-08-01 AMAES: Augmented Masked Autoencoder Pretraining on Public Brain MRI Data for 3D-Native Segmentation Asbjørn Munk et.al. 2408.00640 null
2024-08-01 SegStitch: Multidimensional Transformer for Robust and Efficient Medical Imaging Segmentation Shengbo Tan et.al. 2408.00496 link
2024-07-31 Open-Vocabulary Audio-Visual Semantic Segmentation Ruohao Guo et.al. 2407.21721 null
2024-07-31 MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment Anurag Das et.al. 2407.21654 null
2024-07-31 Small Object Few-shot Segmentation for Vision-based Industrial Inspection Zilong Zhang et.al. 2407.21351 link
2024-07-31 On-the-fly Point Feature Representation for Point Clouds Analysis Jiangyi Wang et.al. 2407.21335 null
2024-07-31 Fine-grained Metrics for Point Cloud Semantic Segmentation Zhuheng Lu et.al. 2407.21289 null
2024-07-30 PLANesT-3D: A new annotated dataset for segmentation of 3D plant point clouds Kerem Mertoğlu et.al. 2407.21150 null
2024-07-30 Learning Ordinality in Semantic Segmentation Rafael Cristino et.al. 2407.20959 null
2024-07-29 Improving 2D Feature Representations by 3D-Aware Fine-Tuning Yuanwen Yue et.al. 2407.20229 null
2024-07-29 Background Semantics Matter: Cross-Task Feature Exchange Network for Clustered Infrared Small Target Detection With Sky-Annotated Dataset Yimian Dai et.al. 2407.20078 link
2024-07-29 Language-driven Grasp Detection with Mask-guided Attention Tuan Van Vo et.al. 2407.19877 null
2024-07-29 Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets Muhammad Abdullah Jamal et.al. 2407.19714 null
2024-07-29 ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement Ezequiel Perez-Zarate et.al. 2407.19708 link
2024-07-28 ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding Zhen Chen et.al. 2407.19435 link
2024-07-27 Ensembling convolutional neural networks for human skin segmentation Patryk Kuban et.al. 2407.19310 null
2024-07-27 Sewer Image Super-Resolution with Depth Priors and Its Lightweight Network Gang Pan et.al. 2407.19271 null
2024-07-26 Sparse Refinement for Efficient High-Resolution Semantic Segmentation Zhijian Liu et.al. 2407.19014 null
2024-07-29 Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation Jingjun Yi et.al. 2407.18568 null
2024-07-25 Taxonomy-Aware Continual Semantic Segmentation in Hyperbolic Spaces for Open-World Perception Julia Hindel et.al. 2407.18145 null
2024-07-25 TiCoSS: Tightening the Coupling between Semantic Segmentation and Stereo Matching within A Joint Learning Framework Guanfeng Tang et.al. 2407.18038 null
2024-07-25 Segmentation-guided MRI reconstruction for meaningfully diverse reconstructions Jan Nikolas Morshuis et.al. 2407.18026 link
2024-07-24 Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation Hyunwoo Yu et.al. 2407.17261 link
2024-07-24 Trans2Unet: Neural fusion for Nuclei Semantic Segmentation Dinh-Phu Tran et.al. 2407.17181 null
2024-07-24 PiPa++: Towards Unification of Domain Adaptive Semantic Segmentation via Self-supervised Learning Mu Chen et.al. 2407.17101 null
2024-07-25 Enhancing Environmental Monitoring through Multispectral Imaging: The WasteMS Dataset for Semantic Segmentation of Lakeside Waste Qinfeng Zhu et.al. 2407.17028 link
2024-07-24 Progressive Query Refinement Framework for Bird's-Eye-View Semantic Segmentation from Surrounding Images Dooseop Choi et.al. 2407.17003 link
2024-07-23 Deformable Convolution Based Road Scene Semantic Segmentation of Fisheye Images in Autonomous Driving Anam Manzoor et.al. 2407.16647 null
2024-07-23 Deep Bayesian segmentation for colon polyps: Well-calibrated predictions in medical imaging Daniela L. Ramos et.al. 2407.16608 null
2024-07-23 Augmented Efficiency: Reducing Memory Footprint and Accelerating Inference for 3D Semantic Segmentation through Hybrid Vision Aditya Krishnan et.al. 2407.16102 null
2024-07-22 MILAN: Milli-Annotations for Lidar Semantic Segmentation Nermin Samet et.al. 2407.15797 null
2024-07-22 Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond Silvio Galesso et.al. 2407.15739 link
2024-07-22 MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics Alexander Melekhin et.al. 2407.15663 link
2024-07-22 Learning at a Glance: Towards Interpretable Data-limited Continual Semantic Segmentation via Semantic-Invariance Modelling Bo Yuan et.al. 2407.15429 link
2024-07-22 Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data Junha Song et.al. 2407.15383 null
2024-07-21 Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation Xiaoyang Wu et.al. 2407.15282 null
2024-07-20 Downstream-Pretext Domain Knowledge Traceback for Active Learning Beichen Zhang et.al. 2407.14720 null
2024-07-19 Panoptic Segmentation of Mammograms with Text-To-Image Diffusion Model Kun Zhao et.al. 2407.14326 null
2024-07-19 Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation Zhengyuan Xie et.al. 2407.14142 link
2024-07-19 GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation Florian Chabot et.al. 2407.14108 null
2024-07-18 Many Perception Tasks are Highly Redundant Functions of their Input Data Rahul Ramesh et.al. 2407.13841 null
2024-07-18 GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model Abdelrahman Shaker et.al. 2407.13772 link
2024-07-18 SegPoint: Segment Any Point Cloud via Large Language Model Shuting He et.al. 2407.13761 null
2024-07-18 MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis Ziming Zhong et.al. 2407.13675 link
2024-07-18 Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models Xiaoyu Zhu et.al. 2407.13642 null
2024-07-18 FADE: A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures Hao Lu et.al. 2407.13500 link
2024-07-18 FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions Sohyun Lee et.al. 2407.13437 null
2024-07-18 Lightweight Uncertainty Quantification with Simplex Semantic Segmentation for Terrain Traversability Judith Dijk et.al. 2407.13392 null
2024-07-18 Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation Chang Liu et.al. 2407.13363 link
2024-07-18 Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation Shoumeng Qiu et.al. 2407.13254 link
2024-07-18 OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation Jian Sun et.al. 2407.13137 null
2024-07-17 Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation Prantik Howlader et.al. 2407.12630 link
2024-07-17 Instance-wise Uncertainty for Class Imbalance in Semantic Segmentation Luís Almeida et.al. 2407.12609 null
2024-07-18 Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks Antoni Kowalczuk et.al. 2407.12588 link
2024-07-17 Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation Ruijie Xu et.al. 2407.12489 link
2024-07-17 Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation Hyun Seok Seong et.al. 2407.12463 link
2024-07-17 ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference Mengcheng Lan et.al. 2407.12442 null
2024-07-17 Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model Tao Wang et.al. 2407.12319 null
2024-07-16 FoodMem: Near Real-time and Precise Food Video Segmentation Ahmad AlMughrabi et.al. 2407.12121 null
2024-07-16 Mitigating Background Shift in Class-Incremental Semantic Segmentation Gilhan Park et.al. 2407.11859 link
2024-07-16 Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation Juncheng Ma et.al. 2407.11820 null
2024-07-16 XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach Truong Thanh Hung Nguyen et.al. 2407.11771 null
2024-07-16 OAM-TCD: A globally diverse dataset of high-resolution tree cover maps Josh Veitch-Michaelis et.al. 2407.11743 link
2024-07-16 SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds Yanbo Wang et.al. 2407.11569 link
2024-07-16 Leveraging Segment Anything Model in Identifying Buildings within Refugee Camps (SAM4Refugee) from Satellite Imagery for Humanitarian Operations Yunya Gao et.al. 2407.11381 link
2024-07-16 Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities Xu Zheng et.al. 2407.11351 null
2024-07-16 Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation Xu Zheng et.al. 2407.11344 null
2024-07-16 TCFormer: Visual Recognition via Token Clustering Transformer Wang Zeng et.al. 2407.11321 link
2024-07-15 Distributed Semantic Segmentation with Efficient Joint Source and Task Decoding Danish Nazir et.al. 2407.11224 null
2024-07-15 No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations Walter Simoncini et.al. 2407.10964 link
2024-07-15 APC: Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation Wangyu Wu et.al. 2407.10649 null
2024-07-15 Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs Rong Ma et.al. 2407.10534 null
2024-07-14 Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data Tuo Feng et.al. 2407.10200 link
2024-07-14 RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation Li Li et.al. 2407.10159 link
2024-07-14 HSFusion: A high-level vision task-driven infrared and visible image fusion network via semantic and geometric domain transformation Chengjie Jiang et.al. 2407.10047 null
2024-07-13 Background Adaptation with Residual Modeling for Exemplar-Free Class-Incremental Semantic Segmentation Anqi Zhang et.al. 2407.09838 null
2024-07-13 Enhancing Semantic Segmentation with Adaptive Focal Loss: A Novel Approach Md Rakibul Islam et.al. 2407.09828 null
2024-07-13 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance Xiaoxu Xu et.al. 2407.09826 link
2024-07-13 TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation Xiaopei Wu et.al. 2407.09751 null
2024-07-12 FANet: Feature Amplification Network for Semantic Segmentation in Cluttered Background Muhammad Ali et.al. 2407.09379 link
2024-07-12 Salt & Pepper Heatmaps: Diffusion-informed Landmark Detection Strategy Julian Wyatt et.al. 2407.09192 null
2024-07-12 Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off Levente Halmosi et.al. 2407.09150 link
2024-07-12 Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation Wei Cong et.al. 2407.09047 null
2024-07-12 Textual Query-Driven Mask Transformer for Domain Generalized Segmentation Byeonghyun Pak et.al. 2407.09033 null
2024-07-12 Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation Zihao Li et.al. 2407.08994 null
2024-07-11 Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation Tong Shao et.al. 2407.08268 link
2024-07-11 Enrich the content of the image Using Context-Aware Copy Paste Qiushi Guo et.al. 2407.08151 null
2024-07-10 MambaVision: A Hybrid Mamba-Transformer Vision Backbone Ali Hatamizadeh et.al. 2407.08083 link
2024-07-10 Satellite Image Time Series Semantic Change Detection: Novel Architecture and Analysis of Domain Shift Elliot Vincent et.al. 2407.07616 link
2024-07-10 H-FCBFormer Hierarchical Fully Convolutional Branch Transformer for Occlusal Contact Segmentation with Articulating Paper Ryan Banks et.al. 2407.07604 link
2024-07-11 Trainable Highly-expressive Activation Functions Irit Chelly et.al. 2407.07564 link
2024-07-10 Deformable-Heatmap-Segmentation for Automobile Visual Perception Hongyu Jin et.al. 2407.07493 null
2024-07-10 Exploring the Untouched Sweeps for Conflict-Aware 3D Segmentation Pretraining Tianfang Sun et.al. 2407.07465 null
2024-07-11 HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation Guoan Xu et.al. 2407.07441 null
2024-07-09 ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation Yuyuan Liu et.al. 2407.07171 link
2024-07-08 Training-free CryoET Tomogram Segmentation Yizhou Zhao et.al. 2407.06833 link
2024-07-09 CycleSAM: One-Shot Surgical Scene Segmentation using Cycle-Consistent Feature Matching to Prompt SAM Aditya Murali et.al. 2407.06795 null
2024-07-09 LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration Jiayi Liu et.al. 2407.06512 link
2024-07-08 Leveraging image captions for selective whole slide image annotation Jingna Qiu et.al. 2407.06363 link
2024-07-08 Object-Oriented Material Classification and 3D Clustering for Improved Semantic Perception and Mapping in Mobile Robots Siva Krishna Ravipati et.al. 2407.06077 link
2024-07-08 Test-time adaptation for geospatial point cloud semantic segmentation with distinct domain shifts Puzuo Wang et.al. 2407.06043 null
2024-07-08 RHRSegNet: Relighting High-Resolution Night-Time Semantic Segmentation Sarah Elmahdy et.al. 2407.06016 link
2024-07-07 Semantic Segmentation for Real-World and Synthetic Vehicle's Forward-Facing Camera Images Tuan T. Nguyen et.al. 2407.05452 null
2024-07-07 Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness Idris Hamoud et.al. 2407.05448 null
2024-07-06 A Study of Test-time Contrastive Concepts for Open-world, Open-vocabulary Semantic Segmentation Monika Wysoczańska et.al. 2407.05061 null
2024-07-06 BlessemFlood21: Advancing Flood Analysis with a High-Resolution Georeferenced Dataset for Humanitarian Aid Support Vladyslav Polushko et.al. 2407.05007 null
2024-07-05 Explainable Metric Learning for Deflating Data Bias Emma Andrews et.al. 2407.04866 null
2024-07-05 LMSeg: A deep graph message-passing network for efficient and accurate semantic segmentation of large-scale 3D landscape meshes Zexian Huang et.al. 2407.04326 null
2024-07-04 Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier Prantik Howlader et.al. 2407.04036 link
2024-07-04 Relative Difficulty Distillation for Semantic Segmentation Dong Liang et.al. 2407.03719 link
2024-07-04 POSTURE: Pose Guided Unsupervised Domain Adaptation for Human Body Part Segmentation Arindam Dutta et.al. 2407.03549 null
2024-07-03 A Unified Framework for 3D Scene Understanding Wei Xu et.al. 2407.03263 null
2024-07-03 ISWSST: Index-space-wave State Superposition Transformers for Multispectral Remotely Sensed Imagery Semantic Segmentation Chang Li et.al. 2407.03033 null
2024-07-03 ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation Yipin Guo et.al. 2407.02881 null
2024-07-03 Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation Tao Chen et.al. 2407.02768 link
2024-07-02 Open Panoramic Segmentation Junwei Zheng et.al. 2407.02685 link
2024-07-08 Holistically-Nested Structure-Aware Graph Neural Network for Road Extraction Tinghuai Wang et.al. 2407.02639 null
2024-07-02 Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather Junsung Park et.al. 2407.02286 link
2024-07-02 MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders Baijiong Lin et.al. 2407.02228 link
2024-07-02 Occlusion-Aware Seamless Segmentation Yihong Cao et.al. 2407.02182 link
2024-07-02 VRBiom: A New Periocular Dataset for Biometric Applications of HMD Ketan Kotwal et.al. 2407.02150 null
2024-07-02 Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts Pasquale De Marinis et.al. 2407.02075 link
2024-07-02 Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning Chengchao Shen et.al. 2407.02014 link
2024-07-01 Label-free Neural Semantic Image Synthesis Jiayi Wang et.al. 2407.01790 null
2024-07-01 PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction Xuan Yu et.al. 2407.01349 null
2024-07-01 CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes Danial Qashqai et.al. 2407.01328 link
2024-06-29 SolarSAM: Building-scale Photovoltaic Potential Assessment Based on Segment Anything Model (SAM) and Remote Sensing for Emerging City Guohao Wang et.al. 2407.00296 link
2024-07-01 Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding Yifan Tang et.al. 2406.19791 null
2024-06-28 Precision matters: Precision-aware ensemble for weakly supervised semantic segmentation Junsung Park et.al. 2406.19638 link
2024-06-28 PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation Deyi Ji et.al. 2406.19632 null
2024-06-27 Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model Haobo Yuan et.al. 2406.19369 link
2024-06-27 ProtoGMM: Multi-prototype Gaussian-Mixture-based Domain Adaptation Model for Semantic Segmentation Nazanin Moradinasab et.al. 2406.19225 null
2024-06-30 Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO Fuseini Mumuni et.al. 2406.19057 null
2024-06-27 Divide, Ensemble and Conquer: The Last Mile on Unsupervised Domain Adaptation for On-Board Semantic Segmentation Tao Lian et.al. 2406.18809 null
2024-06-26 CAS: Confidence Assessments of classification algorithms for Semantic segmentation of EO data Nikolaos Dionelis et.al. 2406.18279 null
2024-06-26 The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval Meinardus Boris et.al. 2406.18113 link
2024-06-26 Few-Shot Medical Image Segmentation with High-Fidelity Prototypes Song Tang et.al. 2406.18074 link
2024-06-25 Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation Xuming Zhang et.al. 2406.17679 null
2024-06-25 DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation Ahmad Mohammadshirazi et.al. 2406.17591 link
2024-06-25 Principal Component Clustering for Semantic Segmentation in Synthetic Data Generation Felix Stillger et.al. 2406.17541 null
2024-06-25 Investigating Self-Supervised Methods for Label-Efficient Learning Srinivasa Rao Nandam et.al. 2406.17460 null
2024-06-25 Pseudo Labelling for Enhanced Masked Autoencoders Srinivasa Rao Nandam et.al. 2406.17450 null
2024-06-25 Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model Zhuoyuan Li et.al. 2406.17442 null
2024-06-25 Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes Qi Ma et.al. 2406.17438 link
2024-06-24 Instance Consistency Regularization for Semi-Supervised 3D Instance Segmentation Yizheng Wu et.al. 2406.16776 link
2024-06-24 μ-Net: A Deep Learning-Based Architecture for μ-CT Segmentation Pierangela Bruno et.al. 2406.16724 null
2024-06-24 GATSBI: An Online GTSP-Based Algorithm for Targeted Surface Bridge Inspection and Defect Detection Harnaik Dhami et.al. 2406.16625 link
2024-06-24 LOGCAN++: Local-global class-aware network for semantic segmentation of remote sensing images Xiaowen Ma et.al. 2406.16502 link
2024-06-24 Cascade Reward Sampling for Efficient Decoding-Time Alignment Bolian Li et.al. 2406.16306 link
2024-06-24 SegNet4D: Effective and Efficient 4D LiDAR Semantic Segmentation in Autonomous Driving Environments Neng Wang et.al. 2406.16279 link
2024-06-23 UDHF2-Net: An Uncertainty-diffusion-model-based High-Frequency TransFormer Network for High-accuracy Interpretation of Remotely Sensed Imagery Pengfei Zhang et.al. 2406.16129 null
2024-06-22 Fine-grained Background Representation for Weakly Supervised Semantic Segmentation Xu Yin et.al. 2406.15755 link
2024-06-20 Evaluation of Deep Learning Semantic Segmentation for Land Cover Mapping on Multispectral, Hyperspectral and High Spatial Aerial Imagery Ilham Adi Panuntun et.al. 2406.14220 null
2024-06-20 Trusting Semantic Segmentation Networks Samik Some et.al. 2406.14201 null
2024-06-20 EvSegSNN: Neuromorphic Semantic Segmentation for Event Data Dalia Hareb et.al. 2406.14178 null
2024-06-20 Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images Qinfeng Zhu et.al. 2406.14086 link
2024-06-19 Search-based DNN Testing and Retraining with GAN-enhanced Simulations Mohammed Oualid Attaoui et.al. 2406.13359 null
2024-06-19 Deep Learning-Based 3D Instance and Semantic Segmentation: A Review Siddiqui Muhammad Yasir et.al. 2406.13308 null
2024-06-18 Reparameterizable Dual-Resolution Network for Real-time Semantic Segmentation Guoyu Yang et.al. 2406.12496 link
2024-06-18 Agriculture-Vision Challenge 2024 -- The Runner-Up Solution for Agricultural Pattern Recognition via Class Balancing and Model Ensemble Wang Liu et.al. 2406.12271 null
2024-06-17 OoDIS: Anomaly Instance Segmentation Benchmark Alexey Nekrasov et.al. 2406.11835 link
2024-06-17 Multimodal Learning To Improve Segmentation With Intraoperative CBCT & Preoperative CT Maximilian E. Tschuchnig et.al. 2406.11650 null
2024-06-17 SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation Zhenchao Lin et.al. 2406.11441 link
2024-06-17 Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding Yunsong Wang et.al. 2406.11283 null
2024-06-17 Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation Bingfeng Zhang et.al. 2406.11189 link
2024-06-16 $α$ -SSC: Uncertainty-Aware Camera-based 3D Semantic Scene Completion Sanbao Su et.al. 2406.11021 null
2024-06-16 PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery Libo Wang et.al. 2406.10828 link
2024-06-15 GenMM: Geometrically and Temporally Consistent Multimodal Data Generation for Video and LiDAR Bharat Singh et.al. 2406.10722 null
2024-06-15 A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection Chenyao Zhou et.al. 2406.10678 link
2024-06-14 ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers Narges Norouzi et.al. 2406.09936 link
2024-06-14 Label-Efficient Semantic Segmentation of LiDAR Point Clouds in Adverse Weather Conditions Aldi Piroli et.al. 2406.09906 null
2024-06-14 Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation Brunó B. Englert et.al. 2406.09896 link
2024-06-14 Open-Vocabulary Semantic Segmentation with Image Embedding Balancing Xiangheng Shan et.al. 2406.09829 link
2024-06-13 Instance-level quantitative saliency in multiple sclerosis lesion segmentation Federico Spagnolo et.al. 2406.09335 link
2024-06-13 APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation Weizhao He et.al. 2406.08372 null
2024-06-12 Dataset Enhancement with Instance-Level Augmentations Orest Kupyn et.al. 2406.08249 link
2024-06-16 A $^{2}$ -MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder Lixian Zhang et.al. 2406.08079 null
2024-06-12 OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding Yinan Deng et.al. 2406.08009 link
2024-06-12 SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation Chanda Grover Kamra et.al. 2406.07986 link
2024-06-12 Small Scale Data-Free Knowledge Distillation He Liu et.al. 2406.07876 link
2024-06-11 Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph Sergey Linok et.al. 2406.07113 null
2024-06-11 PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving Yining Shi et.al. 2406.07037 null
2024-06-12 LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection Jiahua Xu et.al. 2406.07023 null
2024-06-10 Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation Dong Zhao et.al. 2406.06813 link
2024-06-09 Transforming Heart Chamber Imaging: Self-Supervised Learning for Whole Heart Reconstruction and Segmentation Abdul Qayyum et.al. 2406.06643 null
2024-06-10 Merlin: A Vision Language Foundation Model for 3D Computed Tomography Louis Blankemeier et.al. 2406.06512 null
2024-06-10 UMAD: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving Daniel Bogdoll et.al. 2406.06370 null
2024-06-09 Scaling Graph Convolutions for Mobile Vision William Avery et.al. 2406.05850 link
2024-06-09 Solution for CVPR 2024 UG2+ Challenge Track on All Weather Semantic Segmentation Jun Yu et.al. 2406.05837 null
2024-06-09 Convolution and Attention-Free Mamba-based Cardiac Image Segmentation Abbas Khan et.al. 2406.05786 null
2024-06-09 Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language Mark Hamilton et.al. 2406.05629 link
2024-06-08 A Two-Stage Adverse Weather Semantic Segmentation Method for WeatherProof Challenge CVPR 2024 Workshop UG2+ Jianzhao Wang et.al. 2406.05513 null
2024-06-08 Layered Image Vectorization via Semantic Simplification Zhenyu Wang et.al. 2406.05404 null
2024-06-08 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation Qingfeng Liu et.al. 2406.05352 null
2024-06-07 USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation Xiaoqi Wang et.al. 2406.05271 null
2024-06-07 Semantic Segmentation on VSPW Dataset through Masked Video Consistency Chen Liang et.al. 2406.04979 null
2024-06-07 Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment Venkanna Babu Guthula et.al. 2406.04949 null
2024-06-06 Characterizing segregation in blast rock piles a deep-learning approach leveraging aerial image analysis Chengeng Liu et.al. 2406.04149 null
2024-06-06 Frequency-based Matcher for Long-tailed Semantic Segmentation Shan Li et.al. 2406.03917 link
2024-06-07 Enhanced Semantic Segmentation Pipeline for WeatherProof Dataset Challenge Nan Zhang et.al. 2406.03799 link
2024-06-06 DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation Zilu Guo et.al. 2406.03702 link
2024-06-05 Comparative Benchmarking of Failure Detection Methods in Medical Image Segmentation: Unveiling the Role of Confidence Aggregation Maximilian Zenk et.al. 2406.03323 null
2024-06-05 Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy Yunho Kim et.al. 2406.02989 null
2024-06-04 W-RIZZ: A Weakly-Supervised Framework for Relative Traversability Estimation in Mobile Robotics Andre Schreiber et.al. 2406.02822 link
2024-06-04 Window to Wall Ratio Detection using SegFormer Zoe De Simone et.al. 2406.02706 link
2024-06-04 Detecting Endangered Marine Species in Autonomous Underwater Vehicle Imagery Using Point Annotations and Few-Shot Learning Heather Doig et.al. 2406.01932 null
2024-06-03 EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding Thanh-Dat Truong et.al. 2406.01429 null
2024-06-03 TE-NeXt: A LiDAR-Based 3D Sparse Convolutional Network for Traversability Estimation Antonio Santo et.al. 2406.01395 link
2024-06-03 ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds Ka Lung Cheung et.al. 2406.01337 link
2024-06-03 LSKSANet: A Novel Architecture for Remote Sensing Image Semantic Segmentation Leveraging Large Selective Kernel and Sparse Attention Mechanism Miao Fu et.al. 2406.01228 null
2024-06-04 GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer Ding Jia et.al. 2406.01210 link
2024-06-03 S-CycleGAN: Semantic Segmentation Enhanced CT-Ultrasound Image-to-Image Translation for Robotic Ultrasonography Yuhan Song et.al. 2406.01191 link
2024-06-02 Diffusion Features to Bridge Domain Gap for Semantic Segmentation Yuxiang Ji et.al. 2406.00777 null
2024-06-02 Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation Yunheng Li et.al. 2406.00670 link
2024-06-02 Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024 Biao Wu et.al. 2406.00587 null
2024-05-31 Uncertainty Quantification for Bird's Eye View Semantic Segmentation: Methods and Benchmarks Linlin Yu et.al. 2405.20986 null
2024-05-31 Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation Wooseok Shin et.al. 2405.20610 link
2024-05-30 P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation Qi Zhang et.al. 2405.20443 link
2024-05-30 SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow Chaoyang Wang et.al. 2405.20282 link
2024-05-30 MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion Angel Villar-Corrales et.al. 2405.19921 link
2024-05-30 Open-Set Domain Adaptation for Semantic Segmentation Seun-An Choe et.al. 2405.19899 link
2024-05-30 DenseSeg: Joint Learning for Semantic Segmentation and Landmark Detection Using Dense Image-to-Shape Representation Ron Keuth et.al. 2405.19746 link
2024-05-30 Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes Yong-Qiang Mao et.al. 2405.19735 null
2024-05-30 CRIS: Collaborative Refinement Integrated with Segmentation for Polyp Segmentation Ankush Gajanan Arudkar et.al. 2405.19672 null
2024-05-29 Organizing Background to Explore Latent Classes for Incremental Few-shot Semantic Segmentation Lianlei Shan et.al. 2405.19568 null
2024-05-29 Enabling Visual Recognition at Radio Frequency Haowen Lai et.al. 2405.19516 null
2024-05-29 Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models Tianrun Chen et.al. 2405.19326 null
2024-05-29 A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation Niclas Vödisch et.al. 2405.19035 link
2024-05-29 Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation Zelin Peng et.al. 2405.18840 null
2024-05-28 Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic Segmentation JuneHyoung Kwon et.al. 2405.18148 null
2024-05-28 Edge-guided and Class-balanced Active Learning for Semantic Segmentation of Aerial Images Lianlei Shan et.al. 2405.18078 null
2024-05-28 RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields Mihnea-Bogdan Jurca et.al. 2405.18033 null
2024-05-28 DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture Shentong Mo et.al. 2405.17995 link
2024-05-28 The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention Xingyu Ding et.al. 2405.17776 null
2024-05-27 Evaluation of Multi-task Uncertainties in Joint Semantic Segmentation and Monocular Depth Estimation Steven Landgraf et.al. 2405.17097 null
2024-05-27 DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking Hongtao Wang et.al. 2405.16980 null
2024-05-27 Collective Perception Datasets for Autonomous Driving: A Comprehensive Review Sven Teufel et.al. 2405.16973 null
2024-05-27 Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models Qian Wang et.al. 2405.16947 link
2024-05-27 A re-calibration method for object detection with multi-modal alignment bias in autonomous driving Zhihang Song et.al. 2405.16848 null
2024-05-25 BOLD: Boolean Logic Deep Learning Van Minh Nguyen et.al. 2405.16339 null
2024-05-25 Improving 3D Occupancy Prediction through Class-balancing Loss and Multi-scale Representation Huizhou Chen et.al. 2405.16099 null
2024-05-25 Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality Hakim Ikebayashi et.al. 2405.16008 null
2024-05-24 Visualize and Paint GAN Activations Rudolf Herdt et.al. 2405.15636 null
2024-05-24 Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets Hoàng-Ân Lê et.al. 2405.15394 link
2024-05-24 U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation Bingyu Li et.al. 2405.15365 link
2024-05-24 Cross-Domain Few-Shot Semantic Segmentation via Doubly Matching Transformation Jiayi Chen et.al. 2405.15265 link
2024-05-23 Mamba-R: Vision Mamba ALSO Needs Registers Feng Wang et.al. 2405.14858 null
2024-05-23 Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation Daniel Kienzle et.al. 2405.14467 link
2024-05-23 MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models Jiuming Liu et.al. 2405.14338 null
2024-05-23 Tuning-free Universally-Supervised Semantic Segmentation Xiaobo Yang et.al. 2405.14294 null
2024-05-23 SCMix: Stochastic Compound Mixing for Open Compound Domain Adaptation in Semantic Segmentation Kai Yao et.al. 2405.14278 null
2024-05-23 Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations Mohammed Baharoon et.al. 2405.14239 link
2024-05-24 Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained Form Classification Taylor Archibald et.al. 2405.14162 null
2024-05-23 Skip-SCAR: A Modular Approach to ObjectGoal Navigation with Sparsity and Adaptive Skips Yaotian Liu et.al. 2405.14154 null
2024-05-22 TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System Diogo Lavado et.al. 2405.13989 null
2024-05-22 Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer Qihang Fan et.al. 2405.13337 link
2024-05-21 Transparency Distortion Robustness for SOTA Image Segmentation Tasks Volker Knauthe et.al. 2405.12864 null
2024-05-20 A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation Sushmita Sarker et.al. 2405.11903 null
2024-05-20 Salience-guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments Jooyong Park et.al. 2405.11855 null
2024-05-20 Universal Organizer of SAM for Unsupervised Semantic Segmentation Tingting Li et.al. 2405.11742 link
2024-05-19 Interpreting a Semantic Segmentation Model for Coastline Detection Conor O'Sullivan et.al. 2405.11500 null
2024-05-17 CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation Mushui Liu et.al. 2405.10530 link
2024-05-16 Towards Task-Compatible Compressible Representations Anderson de Andrade et.al. 2405.10244 link
2024-05-16 A Preprocessing and Postprocessing Voxel-based Method for LiDAR Semantic Segmentation Improvement in Long Distance Andrea Matteazzi et.al. 2405.10046 null
2024-05-16 Towards Realistic Incremental Scenario in Class Incremental Semantic Segmentation Jihwan Kwak et.al. 2405.09858 null
2024-05-22 Synth-to-Real Unsupervised Domain Adaptation for Instance Segmentation Yachan Guo et.al. 2405.09682 null
2024-05-14 CLIP with Quality Captions: A Strong Pretraining for Vision Tasks Pavan Kumar Anasosalu Vasu et.al. 2405.08911 null
2024-05-14 Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study Qinfeng Zhu et.al. 2405.08493 null
2024-05-14 TEDNet: Twin Encoder Decoder Neural Network for 2D Camera and LiDAR Road Detection Martín Bayón-Gutiérrez et.al. 2405.08429 link
2024-05-13 IMAFD: An Interpretable Multi-stage Approach to Flood Detection from time series Multispectral Data Ziyang Zhang et.al. 2405.07916 null
2024-05-12 Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception Haoming Chen et.al. 2405.07201 link
2024-05-10 GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs Mustafa Munir et.al. 2405.06849 link
2024-05-10 Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach Elham Ravanbakhsh et.al. 2405.06586 null
2024-05-10 Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation Xiaowen Ma et.al. 2405.06525 link
2024-05-10 Multi-Target Unsupervised Domain Adaptation for Semantic Segmentation without External Data Yonghao Xu et.al. 2405.06502 link
2024-05-10 Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data Rongyu Zhang et.al. 2405.06413 null
2024-05-10 Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation Zhenliang Ni et.al. 2405.06228 link
2024-05-10 Zero-shot Degree of Ill-posedness Estimation for Active Small Object Change Detection Koji Takeda et.al. 2405.06185 null
2024-05-10 Prior-guided Diffusion Model for Cell Segmentation in Quantitative Phase Imaging Zhuchen Shao et.al. 2405.06175 null
2024-05-09 Mask-TS Net: Mask Temperature Scaling Uncertainty Calibration for Polyp Segmentation Yudian Zhang et.al. 2405.05830 null
2024-05-08 OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies Lingdong Kong et.al. 2405.05259 link
2024-05-08 Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving Lingdong Kong et.al. 2405.05258 link
2024-05-08 Weakly-supervised Semantic Segmentation via Dual-stream Contrastive Learning of Cross-image Contextual Information Qi Lai et.al. 2405.04913 null
2024-05-08 DeepDamageNet: A two-step deep-learning model for multi-disaster building damage segmentation and classification using satellite imagery Irene Alisjahbana et.al. 2405.04800 null
2024-05-07 FRACTAL: An Ultra-Large-Scale Aerial Lidar Dataset for 3D Semantic Segmentation of Diverse Landscapes Charles Gaydon et.al. 2405.04634 link
2024-05-07 A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields Raiyan Rahman et.al. 2405.04305 null
2024-05-07 ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation Zhibo Zhang et.al. 2405.04121 null
2024-05-06 PTQ4SAM: Post-Training Quantization for Segment Anything Chengtao Lv et.al. 2405.03144 link
2024-05-04 MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning Vishal Nedungadi et.al. 2405.02771 link
2024-05-04 Few-Shot Fruit Segmentation via Transfer Learning Jordan A. James et.al. 2405.02556 link
2024-05-03 DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model Peijin Jia et.al. 2405.02008 null
2024-05-02 Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey Guoping Xu et.al. 2405.01725 link
2024-05-02 Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and Beyond: A Survey Rokas Gipiškis et.al. 2405.01636 null
2024-05-02 CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation Chenying Liu et.al. 2405.01217 null
2024-05-02 Uncertainty-aware self-training with expectation maximization basis transformation Zijia Wang et.al. 2405.01175 null
2024-05-01 Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis Huy H. Nguyen et.al. 2405.00355 null
2024-04-30 Masked Multi-Query Slot Attention for Unsupervised Object Discovery Rishav Pramanik et.al. 2404.19654 link
2024-04-30 DELINE8K: A Synthetic Data Pipeline for the Semantic Segmentation of Historical Documents Taylor Archibald et.al. 2404.19259 null
2024-04-29 Swin2-MoSE: A New Single Image Super-Resolution Model for Remote Sensing Leonardo Rossi et.al. 2404.18924 link
2024-04-29 IPixMatch: Boost Semi-supervised Semantic Segmentation with Inter-Pixel Relation Kebin Wu et.al. 2404.18891 null
2024-04-29 Towards Long-term Robotics in the Wild Stephen Hausler et.al. 2404.18477 null
2024-04-27 Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments Benoît Gérin et.al. 2404.17930 link
2024-04-27 GLIMS: Attention-Guided Lightweight Multi-Scale Hybrid Network for Volumetric Semantic Segmentation Ziya Ata Yazıcı et.al. 2404.17854 link
2024-04-27 CLFT: Camera-LiDAR Fusion Transformer for Semantic Segmentation in Autonomous Driving Junyi Gu et.al. 2404.17793 link
2024-04-26 Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment Kazi Shahriar Sanjid et.al. 2404.17235 null
2024-04-25 Calculation of Femur Caput Collum Diaphyseal angle for X-Rays images using Semantic Segmentation Deepak Bhatia et.al. 2404.17083 null
2024-04-25 Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals Oliver Hahn et.al. 2404.16818 link
2024-04-26 Multi-Scale Representations by Varying Window Attention for Semantic Segmentation Haotian Yan et.al. 2404.16573 link
2024-04-25 360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes Xu Zheng et.al. 2404.16501 null
2024-04-25 Semantic Segmentation Refiner for Ultrasound Applications with Zero-Shot Foundation Models Hedda Cohen Indelman et.al. 2404.16325 null
2024-04-25 Style Adaptation for Domain-adaptive Semantic Segmentation Ting Li et.al. 2404.16301 null
2024-04-29 A Multi-objective Optimization Benchmark Test Suite for Real-time Semantic Segmentation Yifan Zhao et.al. 2404.16266 link
2024-04-24 3D Freehand Ultrasound using Visual Inertial and Deep Inertial Odometry for Measuring Patellar Tracking Russell Buchanan et.al. 2404.15847 null
2024-04-24 Vision Transformer-based Adversarial Domain Adaptation Yahan Li et.al. 2404.15817 link
2024-04-22 OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks Sophia Sirko-Galouchenko et.al. 2404.14027 link
2024-04-21 Semantic-Rearrangement-Based Multi-Level Alignment for Domain Generalized Segmentation Guanlong Jiao et.al. 2404.13701 null
2024-04-21 PV-S3: Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images Abhishek Jha et.al. 2404.13693 null
2024-04-21 A Complete System for Automated 3D Semantic-Geometric Mapping of Corrosion in Industrial Environments Rui Pimentel de Figueiredo et.al. 2404.13691 null
2024-04-21 LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing Tong Wang et.al. 2404.13659 null
2024-04-21 Towards Unified Representation of Multi-Modal Pre-training for 3D Understanding via Differentiable Rendering Ben Fei et.al. 2404.13619 null
2024-04-20 AMMUNet: Multi-Scale Attention Map Merging for Remote Sensing Image Segmentation Yang Yang et.al. 2404.13408 link
2024-04-19 BACS: Background Aware Continual Semantic Segmentation Mostafa ElAraby et.al. 2404.13148 link
2024-04-19 Foundation Model assisted Weakly Supervised LiDAR Semantic Segmentation Yilong Chen et.al. 2404.12861 null
2024-04-19 COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images Dmytro Shvetsov et.al. 2404.12832 link
2024-04-19 A Point-Based Approach to Efficient LiDAR Multi-Task Perception Christopher Lang et.al. 2404.12798 null
2024-04-19 Generalized Few-Shot Meets Remote Sensing: Discovering Novel Classes in Land Cover Mapping via Hybrid Semantic Segmentation Framework Zhuohong Li et.al. 2404.12721 link
2024-04-19 Improving Prediction Accuracy of Semantic Segmentation Methods Using Convolutional Autoencoder Based Pre-processing Layers Hisashi Shimodaira et.al. 2404.12718 null
2024-04-19 Show and Grasp: Few-shot Semantic Segmentation for Robot Grasping through Zero-shot Foundation Models Leonardo Barcellona et.al. 2404.12717 null
2024-04-18 A Perspective on Deep Vision Performance with Standard Image and Video Codecs Christoph Reich et.al. 2404.12330 null
2024-04-18 Deep Gaussian mixture model for unsupervised image segmentation Matthias Schwab et.al. 2404.12252 link
2024-04-18 Observation, Analysis, and Solution: Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training Jin Gao et.al. 2404.12210 link
2024-04-18 How to Benchmark Vision Foundation Models for Semantic Segmentation? Tommie Kerssies et.al. 2404.12172 link
2024-04-19 Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation Chongjie Si et.al. 2404.11981 null
2024-04-18 Group-On: Boosting One-Shot Segmentation with Supportive Query Hanjing Zhou et.al. 2404.11871 null
2024-04-17 Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach Mir Rayat Imtiaz Hossain et.al. 2404.11732 null
2024-04-17 A Semantic Segmentation-guided Approach for Ground-to-Aerial Image Matching Francesco Pro et.al. 2404.11302 link
2024-04-17 Learning from Unlabelled Data with Transformers: Domain Adaptation for Semantic Segmentation of High Resolution Aerial Images Nikolaos Dionelis et.al. 2404.11299 link
2024-04-16 A Concise Tiling Strategy for Preserving Spatial Context in Earth Observation Imagery Ellianna Abrahams et.al. 2404.10927 link
2024-04-16 Vocabulary-free Image Classification and Semantic Segmentation Alessandro Conti et.al. 2404.10864 link
2024-04-16 Gasformer: A Transformer-based Architecture for Segmenting Methane Emissions from Livestock in Optical Gas Imaging Toqi Tahamid Sarker et.al. 2404.10841 link
2024-04-16 Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark Jiangning Zhang et.al. 2404.10760 link
2024-04-16 ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation Iaroslav Melekhov et.al. 2404.10699 link
2024-04-16 Contextrast: Contextual Contrastive Learning for Semantic Segmentation Changki Sung et.al. 2404.10633 null
2024-04-16 Label merge-and-split: A graph-colouring approach for memory-efficient brain parcellation Aaron Kujawa et.al. 2404.10572 null
2024-04-16 LAECIPS: Large Vision Model Assisted Adaptive Edge-Cloud Collaboration for IoT-based Perception System Shijing Hu et.al. 2404.10498 null
2024-04-16 Adversarial Identity Injection for Semantic Face Image Synthesis Giuseppe Tarollo et.al. 2404.10408 null
2024-04-16 Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation Jiapeng Su et.al. 2404.10322 link
2024-04-16 Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing Domain Steve Andreas Immanuel et.al. 2404.10307 link
2024-04-15 Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL Fangwei Zhong et.al. 2404.09857 null
2024-04-15 In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation Han Xue et.al. 2404.09633 null
2024-04-15 The revenge of BiSeNet: Efficient Multi-Task Image Segmentation Gabriele Rosi et.al. 2404.09570 null
2024-04-16 Human-in-the-Loop Segmentation of Multi-species Coral Imagery Scarlett Raine et.al. 2404.09406 link
2024-04-14 Bridging Data Islands: Geographic Heterogeneity-Aware Federated Learning for Collaborative Remote Sensing Semantic Segmentation Jieyi Tan et.al. 2404.09292 null
2024-04-12 Analyzing Decades-Long Environmental Changes in Namibia Using Archival Aerial Photography and Deep Learning Girmaw Abebe Tadesse et.al. 2404.08544 null
2024-04-12 LaSagnA: Language-based Segmentation Assistant for Complex Queries Cong Wei et.al. 2404.08506 link
2024-04-12 Tackling Ambiguity from Perspective of Uncertainty Inference and Affinity Diversification for Weakly Supervised Semantic Segmentation Zhiwei Yang et.al. 2404.08195 link
2024-04-12 Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation Sina Hajimiri et.al. 2404.08181 link
2024-04-10 AI-Guided Feature Segmentation Techniques to Model Features from Single Crystal Diamond Growth Rohan Reddy Mekala et.al. 2404.08017 null
2024-04-11 Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification Ricardo Pereira et.al. 2404.07739 null
2024-04-11 OpenTrench3D: A Photogrammetric 3D Point Cloud Dataset for Semantic Segmentation of Underground Utilities Lasse H. Hansen et.al. 2404.07711 link
2024-04-11 Implicit and Explicit Language Guidance for Diffusion-based Visual Perception Hefeng Wang et.al. 2404.07600 null
2024-04-11 Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling Sourajit Saha et.al. 2404.07410 null
2024-04-10 AI-Guided Defect Detection Techniques to Model Single Crystal Diamond Growth Rohan Reddy Mekala et.al. 2404.07306 null
2024-04-10 RESSCAL3D: Resolution Scalable 3D Semantic Segmentation of Point Clouds Remco Royen et.al. 2404.06863 null
2024-04-10 O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation Muer Tie et.al. 2404.06836 null
2024-04-10 Convolution-based Probability Gradient Loss for Semantic Segmentation Guohang Shan et.al. 2404.06704 link
2024-04-09 Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation Luca Barsellotti et.al. 2404.06542 null
2024-04-09 QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding Yash Mehan et.al. 2404.06442 null
2024-04-09 DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning Senthil Yogamani et.al. 2404.06352 null
2024-04-09 Hierarchical Insights: Exploiting Structural Similarities for Reliable 3D Semantic Segmentation Mariella Dreissig et.al. 2404.06124 null
2024-04-09 Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation Zong-Wei Hong et.al. 2404.06029 null
2024-04-08 Evaluating the Efficacy of Cut-and-Paste Data Augmentation in Semantic Segmentation for Satellite Imagery Ionut M. Motoi et.al. 2404.05693 null
2024-04-08 AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation Jiannan Ge et.al. 2404.05667 null
2024-04-08 Impact of LiDAR visualisations on semantic segmentation of archaeological objects Raveerat Jaturapitpornchai et.al. 2404.05512 null
2024-04-08 Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance Dazhong Shen et.al. 2404.05384 link
2024-04-08 GPS-free Autonomous Navigation in Cluttered Tree Rows with Deep Semantic Segmentation Alessandro Navone et.al. 2404.05338 null
2024-04-08 Human Detection from 4D Radar Data in Low-Visibility Field Conditions Mikael Skog et.al. 2404.05307 null
2024-04-08 iVPT: Improving Task-relevant Information Sharing in Visual Prompt Tuning by Cross-layer Dynamic Connection Nan Zhou et.al. 2404.05207 null
2024-04-08 UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather Haimei Zhao et.al. 2404.05145 null
2024-04-07 D2SL: Decouple Defogging and Semantic Learning for Foggy Domain-Adaptive Segmentation Xuan Sun et.al. 2404.04807 null
2024-04-06 HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene Ziang Guo et.al. 2404.04653 link
2024-04-05 Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation Zifu Wan et.al. 2404.04256 link
2024-04-05 Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation Ji-Jia Wu et.al. 2404.04231 link
2024-04-05 MarsSeg: Mars Surface Semantic Segmentation with Multi-level Extractor and Connector Junbo Li et.al. 2404.04155 null
2024-04-04 Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation Elham Amin Mansour et.al. 2404.03799 null
2024-04-04 Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball Simon Weber et.al. 2404.03778 link
2024-04-04 Background Noise Reduction of Attention Map for Weakly Supervised Semantic Segmentation Izumi Fujimori et.al. 2404.03394 null
2024-04-03 GPU-Accelerated RSF Level Set Evolution for Large-Scale Microvascular Segmentation Meher Niger et.al. 2404.02813 null
2024-04-03 RS-Mamba for Large Remote Sensing Image Dense Prediction Sijie Zhao et.al. 2404.02668 link
2024-04-03 A Satellite Band Selection Framework for Amazon Forest Deforestation Detection Task Eduardo Neto et.al. 2404.02659 null
2024-04-03 SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation Junyan Ye et.al. 2404.02638 link
2024-04-03 Active learning for efficient annotation in precision agriculture: a use-case on crop-weed semantic segmentation Bart M. van Marrewijk et.al. 2404.02580 null
2024-04-03 HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras Zhongyu Xia et.al. 2404.02517 link
2024-04-03 Optimizing traffic signs and lights visibility for the teleoperation of autonomous vehicles through ROI compression I. Dror et.al. 2404.02481 null
2024-04-03 RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation Xianping Ma et.al. 2404.02457 link
2024-04-02 Constrained Robotic Navigation on Preferred Terrains Using LLMs and Speech Instruction: Exploiting the Power of Adverbs Faraz Lotfi et.al. 2404.02294 null
2024-04-02 Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation Hui Xiao et.al. 2404.02065 null
2024-04-02 Synthetic Data for Robust Stroke Segmentation Liam Chalcroft et.al. 2404.01946 link
2024-04-02 Improving Bird's Eye View Semantic Segmentation by Task Decomposition Tianhao Zhao et.al. 2404.01925 null
2024-04-02 Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model Qinfeng Zhu et.al. 2404.01705 link
2024-04-02 Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss Jaeha Kim et.al. 2404.01692 link
2024-04-01 PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation Jinfeng Xu et.al. 2404.00979 link
2024-04-01 GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields Yunsong Wang et.al. 2404.00931 link
2024-04-02 Rethinking Saliency-Guided Weakly-Supervised Semantic Segmentation Beomyoung Kim et.al. 2404.00918 link
2024-03-31 Training-Free Semantic Segmentation via LLM-Supervision Wenfang Sun et.al. 2404.00701 null
2024-03-31 LAESI: Leaf Area Estimation with Synthetic Imagery Jacek Kałużny et.al. 2404.00593 null
2024-03-29 Modeling Weather Uncertainty for Multi-weather Co-Presence Estimation Qi Bi et.al. 2403.20092 null
2024-03-29 MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection Ali Behrouz et.al. 2403.19888 null
2024-03-28 Segmentation Re-thinking Uncertainty Estimation Metrics for Semantic Segmentation Qitian Ma et.al. 2403.19826 null
2024-03-28 ENet-21: An Optimized light CNN Structure for Lane Detection Seyed Rasoul Hosseini et.al. 2403.19782 null
2024-03-29 Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers Pingcheng Dong et.al. 2403.19591 link
2024-03-28 DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs Donghyun Kim et.al. 2403.19588 link
2024-03-28 Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided Matting Weihao Jiang et.al. 2403.19213 null
2024-03-27 Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D Mukund Varma T et.al. 2403.18922 null
2024-03-27 I2CKD : Intra- and Inter-Class Knowledge Distillation for Semantic Segmentation Ayoub Karine et.al. 2403.18490 null
2024-03-28 ViTAR: Vision Transformer with Any Resolution Qihang Fan et.al. 2403.18361 null
2024-03-27 Generating Diverse Agricultural Data for Vision-Based Farming Applications Mikolaj Cieslak et.al. 2403.18351 null
2024-03-27 Road Obstacle Detection based on Unknown Objectness Scores Chihiro Noguchi et.al. 2403.18207 null
2024-03-26 The Need for Speed: Pruning Transformers with One Recipe Samir Khaki et.al. 2403.17921 link
2024-03-26 Compressed Multi-task embeddings for Data-Efficient Downstream training and inference in Earth Observation Carlos Gomes et.al. 2403.17886 link
2024-03-26 PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition Chenhongyi Yang et.al. 2403.17695 link
2024-03-26 Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion Kazi Shahriar Sanjid et.al. 2403.17432 null
2024-03-25 Optimizing LiDAR Placements for Robust Driving Perception in Adverse Conditions Ye Li et.al. 2403.17009 link
2024-03-25 DreamLIP: Language-Image Pre-training with Long Captions Kecheng Zheng et.al. 2403.17007 link
2024-03-25 TwinLiteNetPlus: A Stronger Model for Real-time Drivable Area and Lane Segmentation Quang-Huy Che et.al. 2403.16958 null
2024-03-25 HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation Linglin Jing et.al. 2403.16788 null
2024-03-25 SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation Aysim Toker et.al. 2403.16605 null
2024-03-25 Self-Supervised Learning for Medical Image Data with Anatomy-Oriented Imaging Planes Tianwei Zhang et.al. 2403.16499 null
2024-03-25 GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation Weiming Zhang et.al. 2403.16370 null
2024-03-24 Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System Jing Li et.al. 2403.16227 null
2024-03-24 Segment Anything Model for Road Network Graph Extraction Congrui Hetang et.al. 2403.16051 link
2024-03-24 SM2C: Boost the Semi-supervised Segmentation for Medical Image by using Meta Pseudo Labels and Mixed Images Yifei Wang et.al. 2403.16009 null
2024-03-22 Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting Jun Guo et.al. 2403.15624 null
2024-03-22 A2DMN: Anatomy-Aware Dilated Multiscale Network for Breast Ultrasound Semantic Segmentation Kyle Lucke et.al. 2403.15560 null
2024-03-22 InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Yi Wang et.al. 2403.15377 link
2024-03-22 Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations Pranav Kulkarni et.al. 2403.15218 link
2024-03-22 Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion Sofia Casarin et.al. 2403.15194 null
2024-03-22 Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive Segmentation Wenlve Zhou et.al. 2403.14995 link
2024-03-21 WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather Blake Gella et.al. 2403.14874 null
2024-03-21 Learning to Project for Cross-Task Knowledge Distillation Dylan Auty et.al. 2403.14494 null
2024-03-21 OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation Bohao Peng et.al. 2403.14418 link
2024-03-21 Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models Pablo Marcos-Manchón et.al. 2403.14291 link
2024-03-21 OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation Kwanyoung Kim et.al. 2403.14183 link
2024-03-21 Evidential Semantic Mapping in Off-road Environments with Uncertainty-aware Bayesian Kernel Inference Junyoung Kim et.al. 2403.14138 null
2024-03-21 Soft Masked Transformer for Point Cloud Processing with Skip Attention-Based Upsampling Yong He et.al. 2403.14124 null
2024-03-21 Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots Connor Lee et.al. 2403.14056 null
2024-03-20 When Cars meet Drones: Hyperbolic Federated Learning for Source-Free Domain Adaptation in Adverse Weather Giulia Rizzoli et.al. 2403.13762 null
2024-03-20 Next day fire prediction via semantic segmentation Konstantinos Alexis et.al. 2403.13545 null
2024-03-20 MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining Di Wang et.al. 2403.13430 link
2024-03-20 AMCO: Adaptive Multimodal Coupling of Vision and Proprioception for Quadruped Robot Navigation in Outdoor Environments Mohamed Elnoor et.al. 2403.13235 null
2024-03-20 Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation Linshan Wu et.al. 2403.13225 link
2024-03-19 Reflectivity Is All You Need!: Advancing LiDAR Semantic Segmentation Kasi Viswanath et.al. 2403.13188 link
2024-03-19 As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks? Anjun Hu et.al. 2403.12693 null
2024-03-19 PCT: Perspective Cue Training Framework for Multi-Camera BEV Segmentation Haruya Ishikawa et.al. 2403.12530 null
2024-03-19 Semantics, Distortion, and Style Matter: Towards Source-free UDA for Panoramic Segmentation Xu Zheng et.al. 2403.12505 null
2024-03-18 Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation Wangbo Zhao et.al. 2403.11808 link
2024-03-18 LSKNet: A Foundation Lightweight Backbone for Remote Sensing Yuxuan Li et.al. 2403.11735 link
2024-03-18 TTT-KD: Test-Time Training for 3D Semantic Segmentation through Knowledge Distillation from Foundation Models Lisa Weijler et.al. 2403.11691 null
2024-03-18 OurDB: Ouroboric Domain Bridging for Multi-Target Domain Adaptive Semantic Segmentation Seungbeom Woo et.al. 2403.11582 null
2024-03-18 MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception Thien-Minh Nguyen et.al. 2403.11496 null
2024-03-18 Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting Mingkui Tan et.al. 2403.11491 null
2024-03-17 TAG: Guidance-free Open-Vocabulary Semantic Segmentation Yasufumi Kawano et.al. 2403.11197 link
2024-03-17 MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation Yasufumi Kawano et.al. 2403.11194 link
2024-03-17 DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation Yuanchen Wu et.al. 2403.11184 link
2024-03-17 LERENet: Eliminating Intra-class Differences for Metal Surface Defect Few-shot Semantic Segmentation Hanze Ding et.al. 2403.11122 null
2024-03-17 Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution Jialu Sui et.al. 2403.11078 link
2024-03-17 Intelligent Railroad Grade Crossing: Leveraging Semantic Segmentation and Object Detection for Enhanced Safety Al Amin et.al. 2403.11060 null
2024-03-15 FeatUp: A Model-Agnostic Framework for Features at Any Resolution Stephanie Fu et.al. 2403.10516 link
2024-03-15 Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search Hongyuan Yu et.al. 2403.10413 link
2024-03-15 Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning Meixuan Li et.al. 2403.10252 null
2024-03-15 Exploring Optical Flow Inclusion into nnU-Net Framework for Surgical Instrument Segmentation Marcos Fernández-Rodríguez et.al. 2403.10216 null
2024-03-15 TransLandSeg: A Transfer Learning Approach for Landslide Semantic Segmentation Based on Vision Foundation Model Changhong Hou et.al. 2403.10127 null
2024-03-15 Visual Foundation Models Boost Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation Jingyi Xu et.al. 2403.10001 link
2024-03-14 WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity Qiyuan Wang et.al. 2403.09551 null
2024-03-14 Annotation Free Semantic Segmentation with Vision Foundation Models Soroush Seifi et.al. 2403.09307 null
2024-03-14 When Semantic Segmentation Meets Frequency Aliasing Linwei Chen et.al. 2403.09065 link
2024-03-13 CART: Caltech Aerial RGB-Thermal Dataset in the Wild Connor Lee et.al. 2403.08997 link
2024-03-13 SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net Helin Cao et.al. 2403.08885 null
2024-03-13 Segmentation of Knee Bones for Osteoarthritis Assessment: A Comparative Analysis of Supervised, Few-Shot, and Zero-Shot Learning Approaches Yun Xin Teoh et.al. 2403.08761 null
2024-03-13 Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution Samuel Sze et.al. 2403.08748 null
2024-03-13 Semantic Segmentation of Solar Radio Spikes at Low Frequencies Pearse C. Murphy et.al. 2403.08546 null
2024-03-13 Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation Zicheng Zhang et.al. 2403.08426 null
2024-03-13 LIX: Implicitly Infusing Spatial Geometric Prior Knowledge into Visual Semantic Segmentation for Autonomous Driving Sicen Guo et.al. 2403.08215 null
2024-03-13 Multiscale Low-Frequency Memory Network for Improved Feature Extraction in Convolutional Neural Networks Fuzhi Wu et.al. 2403.08157 link
2024-03-12 Mitigating the Impact of Attribute Editing on Face Recognition Sudipta Banerjee et.al. 2403.08092 null
2024-03-12 Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation Feilong Tang et.al. 2403.07630 link
2024-03-12 PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution Honghao Chen et.al. 2403.07589 null
2024-03-12 Open-World Semantic Segmentation Including Class Similarity Matteo Sodano et.al. 2403.07532 link
2024-03-11 Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation Theodore Barfoot et.al. 2403.06759 link
2024-03-11 Forest Inspection Dataset for Aerial Semantic Segmentation and Depth Estimation Bianca-Cerasela-Zelia Blaga et.al. 2403.06621 link
2024-03-11 OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation Baran Ozaydin et.al. 2403.06546 null
2024-03-11 3D Semantic Segmentation-Driven Representations for 3D Object Detection Hayeon O et.al. 2403.06501 link
2024-03-11 Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy Jiuming Liu et.al. 2403.06467 link
2024-03-11 Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation Xiaoyang Wang et.al. 2403.06462 null
2024-03-11 Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation Peng Zhang et.al. 2403.06401 null
2024-03-10 Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning Woo-Jin Ahn et.al. 2403.06122 link
2024-03-09 Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation Hairong Shi et.al. 2403.05912 link
2024-03-08 Attention-guided Feature Distillation for Semantic Segmentation Amir M. Mansourian et.al. 2403.05451 link
2024-03-08 Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation Yu Han et.al. 2403.05388 null
2024-03-08 Frequency-Adaptive Dilated Convolution for Semantic Segmentation Linwei Chen et.al. 2403.05369 link
2024-03-08 Embedded Deployment of Semantic Segmentation in Medicine through Low-Resolution Inputs Erik Ostrowski et.al. 2403.05340 null
2024-03-08 LVIC: Multi-modality segmentation by Lifting Visual Info as Cue Zichao Dong et.al. 2403.05159 null
2024-03-06 ECAP: Extensive Cut-and-Paste Augmentation for Unsupervised Domain Adaptive Semantic Segmentation Erik Brorsson et.al. 2403.03854 link
2024-03-06 Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision Yajie Liu et.al. 2403.03707 null
2024-03-06 Causal Prototype-inspired Contrast Adaptation for Unsupervised Domain Adaptive Semantic Segmentation of High-resolution Remote Sensing Imagery Jingru Zhu et.al. 2403.03704 null
2024-03-06 GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding Zi-Ting Chou et.al. 2403.03608 null
2024-03-06 Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator Wonhyeok Choi et.al. 2403.03468 null
2024-03-05 Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection Mohamed Afifi et.al. 2403.03111 null
2024-03-05 ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous Driving Han Lu et.al. 2403.02877 null
2024-03-05 DDF: A Novel Dual-Domain Image Fusion Strategy for Remote Sensing Image Semantic Segmentation with Unsupervised Domain Adaptation Lingyan Ran et.al. 2403.02784 null
2024-03-08 Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels Zhuohong Li et.al. 2403.02746 link
2024-03-05 FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View Jiawei Hou et.al. 2403.02710 null
2024-03-05 Deep Common Feature Mining for Efficient Video Semantic Segmentation Yaoyan Zheng et.al. 2403.02689 null
2024-03-04 Self-Supervised Facial Representation Learning with Facial Region Awareness Zheng Gao et.al. 2403.02138 null
2024-03-04 Semi-Supervised Semantic Segmentation Based on Pseudo-Labels: A Survey Lingyan Ran et.al. 2403.01909 null
2024-03-04 Map-aided annotation for pole base detection Benjamin Missaoui et.al. 2403.01868 null
2024-03-04 AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation Haonan Wang et.al. 2403.01818 link
2024-03-02 Benchmarking Segmentation Models with Mask-Preserved Attribute Editing Zijin Yin et.al. 2403.01231 link
2024-03-02 Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation Lian Xu et.al. 2403.01156 null
2024-03-01 Rethinking Few-shot 3D Point Cloud Semantic Segmentation Zhaochong An et.al. 2403.00592 link
2024-03-01 Small, Versatile and Mighty: A Range-View Perception Framework Qiang Meng et.al. 2403.00325 null
2024-03-01 YOLO-MED : Multi-Task Interaction Network for Biomedical Images Suizhi Huang et.al. 2403.00245 null
2024-02-29 FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything Safouane El Ghazouali et.al. 2403.00175 link
2024-02-29 RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation Jie Zhang et.al. 2402.19004 null
2024-02-28 Spatial Coherence Loss for Salient and Camouflaged Object Detection and Beyond Ziyun Yang et.al. 2402.18698 null
2024-02-29 Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation Zhiwei Yang et.al. 2402.18467 link
2024-02-29 A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation Francesco Barbato et.al. 2402.18402 link
2024-02-28 Enhancing Roadway Safety: LiDAR-based Tree Clearance Analysis Miriam Louise Carnot et.al. 2402.18309 null
2024-02-28 Self-Supervised Learning in Electron Microscopy: Towards a Foundation Model for Advanced Image Analysis Bashir Kazimi et.al. 2402.18286 null
2024-02-28 PRCL: Probabilistic Representation Contrastive Learning for Semi-Supervised Semantic Segmentation Haoyu Xie et.al. 2402.18117 null
2024-02-28 Spannotation: Enhancing Semantic Segmentation for Autonomous Navigation with Efficient Image Annotation Samuel O. Folorunsho et.al. 2402.18084 link
2024-02-27 Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation Xinyu Yang et.al. 2402.17891 link
2024-02-27 Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data David S. W. Williams et.al. 2402.17653 null
2024-02-27 Masked Gamma-SSL: Learning Uncertainty Estimation via Masked Image Modeling David S. W. Williams et.al. 2402.17622 null
2024-02-27 A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images David Torpey et.al. 2402.17611 null
2024-02-27 Scribble Hides Class: Promoting Scribble-Based Weakly-Supervised Semantic Segmentation with Its Class Label Xinliang Zhang et.al. 2402.17555 link
2024-02-26 ConSept: Continual Semantic Segmentation via Adapter-based Vision Transformer Bowen Dong et.al. 2402.16674 null
2024-02-26 UN-SAM: Universal Prompt-Free Segmentation for Generalized Nuclei Images Zhen Chen et.al. 2402.16663 link
2024-02-26 Placing Objects in Context via Inpainting for Out-of-distribution Segmentation Pau de Jorge et.al. 2402.16392 link
2024-02-26 BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning of SAM Li Zhang et.al. 2402.16338 link
2024-02-23 Modified CycleGAN for the synthesization of samples for wheat head segmentation Jaden Myers et.al. 2402.15135 null
2024-02-22 Semantic Image Synthesis with Unconditional Generator Jungwoo Chae et.al. 2402.14395 null
2024-02-22 Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation Mingxuan Yan et.al. 2402.14326 null
2024-02-21 Tumor segmentation on whole slide images: training or prompting? Huaqian Wu et.al. 2402.13932 null
2024-02-26 BenchCloudVision: A Benchmark Analysis of Deep Learning Approaches for Cloud Detection and Segmentation in Remote Sensing Imagery Loddo Fabio et.al. 2402.13918 link
2024-02-21 Zero-BEV: Zero-shot Projection of Any First-Person Modality to BEV Maps Gianluca Monaci et.al. 2402.13848 null
2024-02-21 Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation Jialei Chen et.al. 2402.13697 null
2024-02-20 Cross-Domain Transfer Learning with CoRTe: Consistent and Reliable Transfer from Black-Box to Lightweight Segmentation Model Claudia Cuttano et.al. 2402.13122 null
2024-02-19 LangXAI: Integrating Large Vision Models for Generating Textual Explanations to Enhance Explainability in Visual Perception Tasks Truong Thanh Hung Nguyen et.al. 2402.12525 link
2024-02-19 Towards Explainable LiDAR Point Cloud Semantic Segmentation via Gradient Based Target Localization Abhishek Kuriyal et.al. 2402.12098 link
2024-02-19 ISCUTE: Instance Segmentation of Cables Using Text Embedding Shir Kozlovsky et.al. 2402.11996 null
2024-02-18 Key Patch Proposer: Key Patches Contain Rich Information Jing Xu et.al. 2402.11458 link
2024-02-17 ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing Zhenghang Yuan et.al. 2402.11325 link
2024-02-17 A Decoding Scheme with Successive Aggregation of Multi-Level Features for Light-Weight Semantic Segmentation Jiwon Yoo et.al. 2402.11201 null
2024-02-16 HistoSegCap: Capsules for Weakly-Supervised Semantic Segmentation of Histological Tissue Type in Whole Slide Images Mobina Mansoori et.al. 2402.10851 null
2024-02-16 Selective Prediction for Semantic Segmentation using Post-Hoc Confidence Estimation and Its Performance under Distribution Shift Bruno Laboissiere Camargos Borges et.al. 2402.10665 null
2024-02-16 Efficient Multi-task Uncertainties for Joint Semantic Segmentation and Monocular Depth Estimation Steven Landgraf et.al. 2402.10580 null
2024-02-15 Is Continual Learning Ready for Real-world Challenges? Theodora Kontogianni et.al. 2402.10130 null
2024-02-15 Robust semi-automatic vessel tracing in the human retinal image by an instance segmentation neural network Siyi Chen et.al. 2402.10055 null
2024-02-15 MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding Hai-Tao Yu et.al. 2402.10002 link
2024-02-14 Automated Plaque Detection and Agatston Score Estimation on Non-Contrast CT Scans: A Multicenter Study Andrew M. Nguyen et.al. 2402.09569 null
2024-02-14 Reducing Texture Bias of Deep Neural Networks via Edge Enhancing Diffusion Edgar Heinert et.al. 2402.09530 null
2024-02-13 Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing Alaa Anani et.al. 2402.08400 link
2024-02-13 Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss Kei Iino et.al. 2402.08267 null
2024-02-12 Semantic segmentation for recognition of epileptiform patterns recorded via Microelectrode Arrays in vitro Gabriel Galeote-Checa et.al. 2402.08099 null
2024-02-11 Data Quality Aware Approaches for Addressing Model Drift of Semantic Segmentation Models Samiha Mirza et.al. 2402.07258 null
2024-02-09 More than the Sum of Its Parts: Ensembling Backbone Networks for Few-Shot Segmentation Nico Catalano et.al. 2402.06581 null
2024-02-09 Hybridnet for depth estimation and semantic segmentation Dalila Sánchez-Escobedo et.al. 2402.06539 null
2024-02-09 Classifying point clouds at the facade-level using geometric features and deep learning networks Yue Tan et.al. 2402.06506 link
2024-02-09 ControlUDA: Controllable Diffusion-assisted Unsupervised Domain Adaptation for Cross-Weather Semantic Segmentation Fengyi Shen et.al. 2402.06446 null
2024-02-08 Early Fusion of Features for Semantic Segmentation Anupam Gupta et.al. 2402.06091 null
2024-02-08 Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery Mengya Xu et.al. 2402.05860 link
2024-02-08 On the Effect of Image Resolution on Semantic Segmentation Ritambhara Singh et.al. 2402.05398 null
2024-02-07 Multi-Scale Semantic Segmentation with Modified MBConv Blocks Xi Chen et.al. 2402.04618 null
2024-02-06 Energy-based Domain-Adaptive Segmentation with Depth Guidance Jinjing Zhu et.al. 2402.03795 null
2024-02-05 SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM Mingrui Li et.al. 2402.03246 null
2024-02-05 RRWNet: Recursive Refinement Network for Effective Retinal Artery/Vein Segmentation and Classification José Morano et.al. 2402.03166 link
2024-02-05 Unsupervised semantic segmentation of high-resolution UAV imagery for road scene parsing Zihan Ma et.al. 2402.02985 link
2024-02-04 M $^3$ Face: A Unified Multi-Modal Multilingual Framework for Human Face Generation and Editing Mohammadreza Mofayezi et.al. 2402.02369 null
2024-02-04 Exploring Intrinsic Properties of Medical Images for Self-Supervised Binary Semantic Segmentation Pranav Singh et.al. 2402.02367 null
2024-02-04 Region-Based Representations Revisited Michal Shlapentokh-Rothman et.al. 2402.02352 link
2024-02-03 Multi-Level Feature Aggregation and Recursive Alignment Network for Real-Time Semantic Segmentation Yanhua Zhang et.al. 2402.02286 link
2024-02-03 Revisiting Generative Adversarial Networks for Binary Semantic Segmentation on Imbalanced Datasets Lei Xu et.al. 2402.02245 link
2024-02-03 Evaluating the Robustness of Off-Road Autonomous Driving Segmentation against Adversarial Attacks: A Dataset-Centric analysis Pankaj Deoli et.al. 2402.02154 link
2024-02-03 Decomposition-based and Interference Perception for Infrared and Visible Image Fusion in Complex Scenes Xilai Li et.al. 2402.02096 null
2024-02-02 Convolution kernel adaptation to calibrated fisheye Bruno Berenguel-Baeta et.al. 2402.01456 link
2024-02-02 Delving into Decision-based Black-box Attacks on Semantic Segmentation Zhaoyu Chen et.al. 2402.01220 null
2024-02-02 Scale Equalization for Multi-Level Feature Fusion Bum Jun Kim et.al. 2402.01149 null
2024-02-01 We're Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline Simar Kareer et.al. 2402.00868 link
2024-02-01 Automatic Segmentation of the Spinal Cord Nerve Rootlets Jan Valosek et.al. 2402.00724 link
2024-02-01 A Framework for Building Point Cloud Cleaning, Plane Detection and Semantic Segmentation Ilyass Abouelaziz et.al. 2402.00692 null
2024-01-31 Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model Zihan Zhong et.al. 2401.17868 link
2024-01-31 Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation Rozhan Ahmadi et.al. 2401.17828 link
2024-02-01 Tiered approach for rapid damage characterisation of infrastructure enabled by remote sensing and deep learning technologies Nadiia Kopiika et.al. 2401.17759 null
2024-01-31 Towards Image Semantics and Syntax Sequence Learning Chun Tao et.al. 2401.17515 null
2024-01-30 Evaluation of Out-of-Distribution Detection Performance on Autonomous Driving Datasets Jens Henriksson et.al. 2401.17013 null
2024-01-30 CAFCT: Contextual and Attentional Feature Fusions of Convolutional Neural Networks and Transformer for Liver Tumor Segmentation Ming Kang et.al. 2401.16886 null
2024-01-29 Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors Shiyin Dong et.al. 2401.16459 null
2024-01-28 SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks Serdar Erisen et.al. 2401.15741 link
2024-01-28 UP-CrackNet: Unsupervised Pixel-Wise Road Crack Detection via Adversarial Image Restoration Nachuan Ma et.al. 2401.15647 null
2024-01-27 Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes Diandian Guo et.al. 2401.15261 link
2024-01-26 Biological Valuation Map of Flanders: A Sentinel-2 Imagery Analysis Mingshi Li et.al. 2401.15223 null
2024-01-26 Kitchen Food Waste Image Segmentation and Classification for Compost Nutrients Estimation Raiyan Rahman et.al. 2401.15175 null
2024-01-26 SSR: SAM is a Strong Regularizer for domain adaptive semantic segmentation Yanqi Ge et.al. 2401.14686 null
2024-01-25 CloudTracks: A Dataset for Localizing Ship Tracks in Satellite Images of Clouds Muhammad Ahmed Chaudhry et.al. 2401.14486 null
2024-01-25 Unlocking Past Information: Temporal Embeddings in Cooperative Bird's Eye View Prediction Dominik Rößle et.al. 2401.14325 null
2024-01-24 Segment Any Cell: A SAM-based Auto-prompting Fine-tuning Framework for Nuclei Segmentation Saiyang Na et.al. 2401.13220 null
2024-01-24 Boundary and Relation Distillation for Semantic Segmentation Dong Zhang et.al. 2401.13174 null
2024-01-23 DatUS^2: Data-driven Unsupervised Semantic Segmentation with Pre-trained Self-supervised Vision Transformer Sonal Kumar et.al. 2401.12820 link
2024-01-23 Self-Supervised Vision Transformers Are Efficient Segmentation Learners for Imperfect Labels Seungho Lee et.al. 2401.12535 null
2024-01-23 Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration Yifan Zhang et.al. 2401.12452 null
2024-01-22 Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge Yao Lu et.al. 2401.12350 null
2024-01-22 Exploring Simple Open-Vocabulary Semantic Segmentation Zihang Lai et.al. 2401.12217 link
2024-01-22 Out-of-Distribution Detection & Applications With Ablated Learned Temperature Energy Will LeVine et.al. 2401.12129 link
2024-01-22 HomeRobot Open Vocabulary Mobile Manipulation Challenge 2023 Participant Report (Team KuzHum) Volodymyr Kuzma et.al. 2401.12048 null
2024-01-22 SemPLeS: Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation Ci-Siang Lin et.al. 2401.11791 null
2024-01-22 EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models Koichi Namekata et.al. 2401.11739 null
2024-01-22 MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation Shenwang Jiang et.al. 2401.11738 null

(back to top)

Instance Segmentation

Publish Date Title Authors PDF Code
2024-08-20 Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant Guofeng Mei et.al. 2408.10652 null
2024-08-21 LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS Xinyu Liu et.al. 2408.10469 null
2024-08-19 Leveraging Superfluous Information in Contrastive Representation Learning Xuechu Yu et.al. 2408.10292 null
2024-08-19 3D-Aware Instance Segmentation and Tracking in Egocentric Videos Yash Bhalgat et.al. 2408.09860 null
2024-08-18 VrdONE: One-stage Video Visual Relation Detection Xinjie Jiang et.al. 2408.09408 link
2024-08-17 GoodSAM++: Bridging Domain and Capacity Gaps via Segment Anything Model for Panoramic Semantic Segmentation Weiming Zhang et.al. 2408.09115 null
2024-08-16 Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation Tri Ton et.al. 2408.08591 null
2024-08-16 Tuning a SAM-Based Model with Multi-Cognitive Visual Adapter to Remote Sensing Instance Segmentation Linghao Zheng et.al. 2408.08576 null
2024-08-16 Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs Jinming Liu et.al. 2408.08575 null
2024-08-15 5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks Dongshuo Yin et.al. 2408.08345 link
2024-08-09 Assessment of Cell Nuclei AI Foundation Models in Kidney Pathology Junlin Guo et.al. 2408.06381 link
2024-08-13 Performance Evaluation of YOLOv8 Model Configurations, for Instance Segmentation of Strawberry Fruit Development Stages in an Open Field Environment Abdul-Razak Alhassan Gamani et.al. 2408.05661 null
2024-08-08 Embodied Uncertainty-Aware Object Segmentation Xiaolin Fang et.al. 2408.04760 null
2024-08-08 Robust Approximate Characterization of Single-Cell Heterogeneity in Microbial Growth Richard D. Paul et.al. 2408.04501 link
2024-08-07 CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications Tianfang Zhang et.al. 2408.03703 link
2024-08-07 SAM2-PATH: A better segment anything model for semantic segmentation in digital pathology Mingya Zhang et.al. 2408.03651 link
2024-08-06 Evaluation of Segment Anything Model 2: The Role of SAM2 in the Underwater Environment Shijie Lian et.al. 2408.02924 link
2024-08-09 NuLite -- Lightweight and Fast Model for Nuclei Instance Segmentation and Classification Cristian Tommasino et.al. 2408.01797 link
2024-08-02 Amodal Segmentation for Laparoscopic Surgery Video Instruments Ruohua Shi et.al. 2408.01067 null
2024-08-01 Leaf Angle Estimation using Mask R-CNN and LETR Vision Transformer Venkat Margapuri et.al. 2408.00749 null
2024-08-01 A Simple Background Augmentation Method for Object Detection with Diffusion Model Yuhang Li et.al. 2408.00350 null
2024-07-31 Con4m: Context-aware Consistency Learning Framework for Segmented Time Series Classification Junru Chen et.al. 2408.00041 null
2024-07-31 MaskUno: Switch-Split Block For Enhancing Instance Segmentation Jawad Haidar et.al. 2407.21498 null
2024-08-02 Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets Tianxiao Zhang et.al. 2407.19394 link
2024-07-26 A Survey on Cell Nuclei Instance Segmentation and Classification: Leveraging Context and Attention João D. Nunes et.al. 2407.18673 null
2024-07-25 LKCell: Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels Ziwei Cui et.al. 2407.18054 link
2024-07-26 Quality Assured: Rethinking Annotation Strategies in Imaging AI Tim Rädsch et.al. 2407.17596 null
2024-07-24 McGAN: Generating Manufacturable Designs by Embedding Manufacturing Rules into Conditional Generative Adversarial Network Zhichao Wang et.al. 2407.16943 null
2024-07-22 Enhancing Cell Instance Segmentation in Scanning Electron Microscopy Images via a Deep Contour Closing Operator Florian Robert et.al. 2407.15817 null
2024-07-19 Panoptic Segmentation of Mammograms with Text-To-Image Diffusion Model Kun Zhao et.al. 2407.14326 null
2024-07-19 Scale Disparity of Instances in Interactive Point Cloud Segmentation Chenrui Han et.al. 2407.14009 null
2024-07-18 GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model Abdelrahman Shaker et.al. 2407.13772 link
2024-07-17 AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer Zhuguanyu Wu et.al. 2407.12951 link
2024-07-17 Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation Kaixin Bai et.al. 2407.12449 null
2024-07-17 Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model Tao Wang et.al. 2407.12319 null
2024-07-16 SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation Lei Yao et.al. 2407.11564 null
2024-07-19 Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes Zhi Cai et.al. 2407.11464 link
2024-07-16 Generative AI Driven Task-Oriented Adaptive Semantic Communications Yuzhou Fu et.al. 2407.11354 null
2024-07-15 M18K: A Comprehensive RGB-D Dataset and Benchmark for Mushroom Detection and Instance Segmentation Abdollah Zakeri et.al. 2407.11275 link
2024-07-14 Part2Object: Hierarchical Unsupervised 3D Instance Segmentation Cheng Shi et.al. 2407.10084 link
2024-07-12 WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation Robin Schön et.al. 2407.09288 null
2024-07-11 SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation Xin You et.al. 2407.08555 null
2024-07-10 MambaVision: A Hybrid Mamba-Transformer Vision Backbone Ali Hatamizadeh et.al. 2407.08083 link
2024-07-12 Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation Hao Fang et.al. 2407.07427 link
2024-07-09 Improved Block Merging for 3D Point Cloud Instance Segmentation Leon Denis et.al. 2407.06991 null
2024-07-09 Joint prototype and coefficient prediction for 3D instance segmentation Remco Royen et.al. 2407.06958 null
2024-07-08 Training-free CryoET Tomogram Segmentation Yizhou Zhao et.al. 2407.06833 link
2024-07-05 Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge Yuanze Lin et.al. 2407.04681 null
2024-07-11 Slice-100K: A Multimodal Dataset for Extrusion-based 3D Printing Anushrut Jignasu et.al. 2407.04180 null
2024-07-04 Performance of Medical Image Fusion in High-level Analysis Tasks: A Mutual Enhancement Framework for Unaligned PAT and MRI Image Fusion Yutian Zhong et.al. 2407.03992 link
2024-07-03 Context-Aware Video Instance Segmentation Seunghun Lee et.al. 2407.03010 link
2024-07-03 ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers Yanfeng Jiang et.al. 2407.02763 null
2024-07-02 LiDAR-based HD Map Localization using Semantic Generalized ICP with Road Marking Detection Yansong Gong et.al. 2407.02061 null
2024-07-02 Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning Chengchao Shen et.al. 2407.02014 link
2024-07-01 PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction Xuan Yu et.al. 2407.01349 null
2024-07-01 Robot Instance Segmentation with Few Annotations for Grasping Moshe Kimhi et.al. 2407.01302 link
2024-06-28 PM-VIS+: High-Performance Video Instance Segmentation without Video Annotation Zhangjing Yang et.al. 2406.19665 link
2024-07-01 3D Feature Distillation with Object-Centric Priors Georgios Tziafas et.al. 2406.18742 null
2024-06-26 CoDA: Interactive Segmentation and Morphological Analysis of Dendroid Structures Exemplified on Stony Cold-Water Corals Kira Schmitt et.al. 2406.18236 link
2024-06-25 Semi-supervised classification of dental conditions in panoramic radiographs using large language model and instance segmentation: A real-world dataset evaluation Bernardo Silva et.al. 2406.17915 null
2024-06-25 Depth-Guided Semi-Supervised Instance Segmentation Xin Chen et.al. 2406.17413 null
2024-06-25 XAMI -- A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images Elisabeta-Iulia Dima et.al. 2406.17323 link
2024-06-24 GMT: Guided Mask Transformer for Leaf Instance Segmentation Feng Chen et.al. 2406.17109 null
2024-06-24 Instance Consistency Regularization for Semi-Supervised 3D Instance Segmentation Yizheng Wu et.al. 2406.16776 link
2024-06-23 CholecInstanceSeg: A Tool Instance Segmentation Dataset for Laparoscopic Surgery Oluwatosin Alabi et.al. 2406.16039 link
2024-06-22 Fine-grained Background Representation for Weakly Supervised Semantic Segmentation Xu Yin et.al. 2406.15755 link
2024-06-21 TraceNet: Segment one thing efficiently Mingyuan Wu et.al. 2406.14874 null
2024-06-19 3D Instance Segmentation Using Deep Learning on RGB-D Indoor Data Siddiqui Muhammad Yasir et.al. 2406.14581 null
2024-06-20 2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation Bin Cao et.al. 2406.13939 null
2024-06-18 Competitive Learning for Achieving Content-specific Filters in Video Coding for Machines Honglei Zhang et.al. 2406.12367 null
2024-06-17 OoDIS: Anomaly Instance Segmentation Benchmark Alexey Nekrasov et.al. 2406.11835 link
2024-06-18 Benchmarking Label Noise in Instance Segmentation: Spatial Noise Matters Eden Grad et.al. 2406.10891 link
2024-06-15 MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception M. Mahbubur Rahman et.al. 2406.10708 link
2024-06-14 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities Roman Bachmann et.al. 2406.09406 null
2024-06-12 2nd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation Zhensong Xu et.al. 2406.08192 null
2024-06-11 PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving Yining Shi et.al. 2406.07037 null
2024-06-11 RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks Zhechao Wang et.al. 2406.07032 null
2024-06-11 Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples Kailas Dayanandan et.al. 2406.06967 link
2024-06-11 UVIS: Unsupervised Video Instance Segmentation Shuaiyi Huang et.al. 2406.06908 null
2024-06-10 Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset Shijie Lian et.al. 2406.06039 link
2024-06-09 Scaling Graph Convolutions for Mobile Vision William Avery et.al. 2406.05850 link
2024-06-08 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation Qingfeng Liu et.al. 2406.05352 null
2024-06-07 Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment Venkanna Babu Guthula et.al. 2406.04949 null
2024-06-06 Instance Segmentation and Teeth Classification in Panoramic X-rays Devichand Budagam et.al. 2406.03747 link
2024-06-04 Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation Mohamed El Amine Boudjoghra et.al. 2406.02548 link
2024-06-04 Generative Active Learning for Long-tailed Instance Segmentation Muzhi Zhu et.al. 2406.02435 link
2024-06-03 MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild Zeren Jiang et.al. 2406.01595 null
2024-06-03 An expert-driven data generation pipeline for histological images Roberto Basla et.al. 2406.01403 link
2024-06-03 MP-PolarMask: A Faster and Finer Instance Segmentation for Concave Images Ke-Lei Wang et.al. 2406.01356 null
2024-06-03 SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models Qilong Zhangli et.al. 2406.01062 null
2024-06-05 From Seedling to Harvest: The GrowingSoy Dataset for Weed Detection in Soy Crops via Instance Segmentation Raul Steinmetz et.al. 2406.00313 link
2024-06-04 Extreme Point Supervised Instance Segmentation Hyeonjun Lee et.al. 2405.20729 null
2024-05-29 Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models Tianrun Chen et.al. 2405.19326 null
2024-05-28 Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation Yangxiao Lu et.al. 2405.17859 link
2024-05-26 Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning Neha Kalibhat et.al. 2405.16401 null
2024-05-25 Video Prediction Models as General Visual Encoders James Maier et.al. 2405.16382 null
2024-05-25 Efficient Temporal Action Segmentation via Boundary-aware Query Voting Peiyao Wang et.al. 2405.15995 link
2024-05-24 Autonomous Quilt Spreading for Caregiving Robots Yuchun Guo et.al. 2405.15373 null
2024-05-23 Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations Mohammed Baharoon et.al. 2405.14239 link
2024-05-22 PerSense: Personalized Instance Segmentation in Dense Images Muhammad Ibraheem Siddiqui et.al. 2405.13518 null
2024-05-22 Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation Dingwen Zhang et.al. 2405.13388 link
2024-05-22 Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer Qihang Fan et.al. 2405.13337 link
2024-05-22 Vision Transformer with Sparse Scan Prior Qihang Fan et.al. 2405.13335 link
2024-05-20 Improving the Explain-Any-Concept by Introducing Nonlinearity to the Trainable Surrogate Model Mounes Zaval et.al. 2405.11837 null
2024-05-19 Unifying 3D Vision-Language Understanding via Promptable Queries Ziyu Zhu et.al. 2405.11442 null
2024-05-18 PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking Yifan Yang et.al. 2405.11257 null
2024-05-16 DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data Chengxiang Fan et.al. 2405.10185 link
2024-05-22 Synth-to-Real Unsupervised Domain Adaptation for Instance Segmentation Yachan Guo et.al. 2405.09682 null
2024-05-13 PLUTO: Pathology-Universal Transformer Dinkar Juyal et.al. 2405.07905 null
2024-05-12 PotatoGANs: Utilizing Generative Adversarial Networks, Instance Segmentation, and Explainable AI for Enhanced Potato Disease Identification and Classification Mohammad Shafiul Alam et.al. 2405.07332 link
2024-05-11 Global Motion Understanding in Large-Scale Video Object Segmentation Volodymyr Fedynyak et.al. 2405.07031 null
2024-05-10 GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs Mustafa Munir et.al. 2405.06849 link
2024-05-13 CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks Nick Nikzad et.al. 2405.05755 null
2024-05-07 A Self-Supervised Method for Body Part Segmentation and Keypoint Detection of Rat Images László Kopácsi et.al. 2405.04650 null
2024-05-07 AugmenTory: A Fast and Flexible Polygon Augmentation Library Tanaz Ghahremani et.al. 2405.04442 link
2024-05-06 PTQ4SAM: Post-Training Quantization for Segment Anything Chengtao Lv et.al. 2405.03144 link
2024-05-03 Towards general deep-learning-based tree instance segmentation models Jonathan Henrich et.al. 2405.02061 null
2024-04-30 UniFS: Universal Few-shot Instance Perception with Point Representations Sheng Jin et.al. 2404.19401 link
2024-04-29 From Density to Geometry: YOLOv8 Instance Segmentation for Reverse Engineering of Optimized Structures Thomas Rochefort-Beaudoin et.al. 2404.18763 link
2024-04-28 Garbage Segmentation and Attribute Analysis by Robotic Dogs Nuo Xu et.al. 2404.18112 null
2024-04-25 Self-Balanced R-CNN for Instance Segmentation Leonardo Rossi et.al. 2404.16633 link
2024-05-04 Unknown Object Grasping for Assistive Robotics Elle Miller et.al. 2404.15001 null
2024-04-22 Surgical-DeSAM: Decoupling SAM for Instrument Segmentation in Robotic Surgery Yuyang Sheng et.al. 2404.14040 link
2024-04-22 PM-VIS: High-Performance Box-Supervised Video Instance Segmentation Zhangjing Yang et.al. 2404.13863 null
2024-04-27 FisheyeDetNet: 360° Surround view Fisheye Camera based Object Detection System for Autonomous Driving Ganesh Sistu et.al. 2404.13443 null
2024-04-19 Nuclei Instance Segmentation of Cryosectioned H&E Stained Histological Images using Triple U-Net Architecture Zarif Ahmed et.al. 2404.12986 null
2024-04-19 FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving Xingtai Gui et.al. 2404.12867 link
2024-04-18 Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds Oliver Lemke et.al. 2404.12440 null
2024-04-18 Performance Evaluation of Segment Anything Model with Variational Prompting for Application to Non-Visible Spectrum Imagery Yona Falinie A. Gaus et.al. 2404.12285 null
2024-04-17 Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding George Retsinas et.al. 2404.12144 link
2024-04-18 The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models Cheng Shi et.al. 2404.11957 link
2024-04-17 Criteria for Uncertainty-based Corner Cases Detection in Instance Segmentation Florian Heidecker et.al. 2404.11266 null
2024-04-12 SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception Manideep Reddy Aliminati et.al. 2404.10540 link
2024-04-15 NOISe: Nuclei-Aware Osteoclast Instance Segmentation for Mouse-to-Human Domain Transfer Sai Kumar Reddy Manne et.al. 2404.10130 link
2024-04-12 Structured Model Pruning for Efficient Inference in Computational Pathology Mohammed Adnan et.al. 2404.08831 null
2024-04-12 Benchmarking the Cell Image Segmentation Models Robustness under the Microscope Optical Aberrations Boyuan Peng et.al. 2404.08549 null
2024-04-12 Let It Flow: Simultaneous Optimization of 3D Flow and Object Clustering Patrik Vacek et.al. 2404.08363 link
2024-04-12 AdaContour: Adaptive Contour Descriptor with Hierarchical Representation Tianyu Ding et.al. 2404.08292 link
2024-04-11 ViM-UNet: Vision Mamba for Biomedical Segmentation Anwai Archit et.al. 2404.07705 link
2024-04-09 Automated National Urban Map Extraction Hasan Nasrallah et.al. 2404.06202 null
2024-04-06 Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation Danpei Zhao et.al. 2404.04608 null
2024-04-04 Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation Elham Amin Mansour et.al. 2404.03799 null
2024-04-04 OW-VISCap: Open-World Video Instance Segmentation and Captioning Anwesa Choudhuri et.al. 2404.03657 null
2024-04-04 CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception Tasks Beibei Wang et.al. 2404.03191 null
2024-04-02 Segment Any 3D Object with Language Seungjun Lee et.al. 2404.02157 null
2024-04-01 What is Point Supervision Worth in Video Instance Segmentation? Shuaiyi Huang et.al. 2404.01990 null
2024-04-01 SUGAR: Pre-training 3D Visual Representations for Robotics Shizhe Chen et.al. 2404.01491 null
2024-04-01 Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Anthropic Prior Knowledge Bo Zou et.al. 2404.01013 null
2024-04-01 Instance-Aware Group Quantization for Vision Transformers Jaehyeon Moon et.al. 2404.00928 null
2024-03-29 Multi-Region Transfer Learning for Segmentation of Crop Field Boundaries in Satellite Images with Limited Labels Hannah Kerner et.al. 2404.00179 null
2024-03-29 FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures Lisa Mais et.al. 2404.00130 null
2024-03-29 ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning Beomyoung Kim et.al. 2403.20126 link
2024-04-01 Efficient 3D Instance Mapping and Localization with Neural Fields George Tang et.al. 2403.19797 null
2024-03-28 DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs Donghyun Kim et.al. 2403.19588 link
2024-03-27 Annolid: Annotate, Segment, and Track Anything You Need Chen Yang et.al. 2403.18690 null
2024-03-26 Spectral Convolutional Transformer: Harmonizing Real vs. Complex Multi-View Spectral Operators for Vision Transformer Badri N. Patro et.al. 2403.18063 link
2024-03-26 PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition Chenhongyi Yang et.al. 2403.17695 link
2024-03-25 GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation Weiming Zhang et.al. 2403.16370 null
2024-03-24 AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans Cedric Perauer et.al. 2403.16318 null
2024-03-22 Language-Based Depth Hints for Monocular Depth Estimation Dylan Auty et.al. 2403.15551 null
2024-03-22 BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation Jiahao Lu et.al. 2403.15019 link
2024-03-20 MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining Di Wang et.al. 2403.13430 link
2024-03-19 CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation Wenqi Zhu et.al. 2403.12455 link
2024-03-19 Multi-Object RANSAC: Efficient Plane Clustering Method in a Clutter Seunghyeon Lim et.al. 2403.12449 null
2024-03-18 EffiPerception: an Efficient Framework for Various Perception Tasks Xinhao Xiang et.al. 2403.12317 null
2024-03-18 Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery Yuqi Zhang et.al. 2403.11812 link
2024-03-18 Better (pseudo-)labels for semi-supervised instance segmentation François Porcher et.al. 2403.11675 null
2024-03-18 Synthesizing multi-log grasp poses Arvid Fälldin et.al. 2403.11623 null
2024-03-18 MISS: Memory-efficient Instance Segmentation Framework By Visual Inductive Priors Flow Propagation Chih-Chung Hsu et.al. 2403.11576 null
2024-03-18 Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes Chih-Chung Hsu et.al. 2403.11572 null
2024-03-18 Circle Representation for Medical Instance Object Segmentation Juming Xiong et.al. 2403.11507 link
2024-03-18 ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation Minh Tran et.al. 2403.11376 null
2024-03-16 Segment Any Object Model (SAOM): Real-to-Simulation Fine-Tuning Strategy for Multi-Class Multi-Instance Segmentation Mariia Khan et.al. 2403.10780 null
2024-03-15 Grasp Anything: Combining Teacher-Augmented Policy Gradient Learning with Instance Segmentation to Grasp Arbitrary Objects Malte Mosbach et.al. 2403.10187 null
2024-03-14 WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity Qiyuan Wang et.al. 2403.09551 null
2024-03-14 StainFuser: Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology Images Robert Jewsbury et.al. 2403.09302 link
2024-03-14 Customizing Segmentation Foundation Model via Prompt Learning for Instance Segmentation Hyung-Il Kim et.al. 2403.09199 null
2024-03-14 When Semantic Segmentation Meets Frequency Aliasing Linwei Chen et.al. 2403.09065 link
2024-03-09 Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration Jingyun Xue et.al. 2403.05906 null
2024-03-07 SAM-PD: How Far Can SAM Take Us in Tracking and Segmenting Anything in Videos by Prompt Denoising Tao Zhou et.al. 2403.04194 link
2024-03-05 CenterDisks: Real-time instance segmentation with disk covering Katia Jodogne-Del Litto et.al. 2403.03296 link
2024-03-04 RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant Features Howard H. Qian et.al. 2403.01731 null
2024-03-04 MCA: Moment Channel Attention Networks Yangbo Jiang et.al. 2403.01713 link
2024-03-03 Self-Supervised Representation Learning with Meta Comprehensive Regularization Huijie Guo et.al. 2403.01549 null
2024-03-03 End-to-End Human Instance Matting Qinglin Liu et.al. 2403.01510 link
2024-03-02 Boosting Box-supervised Instance Segmentation with Pseudo Depth Xinyi Yu et.al. 2403.01214 null
2024-02-29 FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything Safouane El Ghazouali et.al. 2403.00175 link
2024-02-28 Feature Denoising For Low-Light Instance Segmentation Using Weighted Non-Local Blocks Joanne Lin et.al. 2402.18307 null
2024-02-27 A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track Zehui Chen et.al. 2402.17319 null
2024-02-27 Few-shot adaptation for morphology-independent cell instance segmentation Ram J. Zaveri et.al. 2402.17165 null
2024-02-26 Outline-Guided Object Inpainting with Diffusion Models Markus Pobitzer et.al. 2402.16421 null
2024-02-26 SPINEPS -- Automatic Whole Spine Segmentation of T2-weighted MR images using a Two-Phase Approach to Multi-class Semantic and Instance Segmentation Hendrik Möller et.al. 2402.16368 link
2024-02-28 Few-Shot Learning for Annotation-Efficient Nucleus Instance Segmentation Yu Ming et.al. 2402.16280 null
2024-02-27 ISCUTE: Instance Segmentation of Cables Using Text Embedding Shir Kozlovsky et.al. 2402.11996 null
2024-02-19 Real-time 3D Semantic Scene Perception for Egocentric Robots with Binocular Vision K. Nguyen et.al. 2402.11872 link
2024-02-17 ReViT: Enhancing Vision Transformers with Attention Residual Connections for Visual Recognition Anxhelo Diko et.al. 2402.11301 link
2024-02-15 Robust semi-automatic vessel tracing in the human retinal image by an instance segmentation neural network Siyi Chen et.al. 2402.10055 null
2024-02-15 SAWEC: Sensing-Assisted Wireless Edge Computing Khandaker Foysal Haque et.al. 2402.10021 null
2024-02-14 Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge Jiancheng Yang et.al. 2402.09372 null
2024-02-14 TDViT: Temporal Dilated Video Transformer for Dense Video Tasks Guanxiong Sun et.al. 2402.09257 link
2024-02-12 Complete Instances Mining for Weakly Supervised Instance Segmentation Zecheng Li et.al. 2402.07633 link
2024-02-11 Improving Pallet Detection Using Synthetic Data Henry Gann et.al. 2402.07098 null
2024-02-07 Boundary-aware Contrastive Learning for Semi-supervised Nuclei Instance Segmentation Ye Zhang et.al. 2402.04756 link
2024-02-07 FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models Chuhao Liu et.al. 2402.04555 null
2024-02-15 M2fNet: Multi-modal Forest Monitoring Network on Large-scale Virtual Dataset Yawen Lu et.al. 2402.04534 null
2024-02-06 Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road Repairing Jongmin Yu et.al. 2402.04064 null
2024-02-06 SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images Pengming Feng et.al. 2402.03708 link
2024-02-05 InstanceDiffusion: Instance-level Control for Image Generation Xudong Wang et.al. 2402.03290 link
2024-02-05 Instance Segmentation XXL-CT Challenge of a Historic Airplane Roland Gruber et.al. 2402.02928 null
2024-02-04 Spatio-temporal Prompting Network for Robust Video Feature Extraction Guanxiong Sun et.al. 2402.02574 link
2024-02-06 Deep Spectral Improvement for Unsupervised Image Instance Segmentation Farnoosh Arefi et.al. 2402.02474 link
2024-02-01 A Manifold Representation of the Key in Vision Transformers Li Meng et.al. 2402.00534 null
2024-01-31 Shrub of a thousand faces: an individual segmentation from satellite images using deep learning Rohaifa Khaldi et.al. 2401.17985 null
2024-02-02 YOLO-World: Real-Time Open-Vocabulary Object Detection Tianheng Cheng et.al. 2401.17270 link
2024-01-29 SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design Seokju Yun et.al. 2401.16456 link
2024-01-28 SegmentAnyTree: A sensor and platform agnostic deep learning model for tree segmentation using laser scanning data Maciej Wielgosz et.al. 2401.15739 null
2024-01-30 SAM-based instance segmentation models for the automation of structural damage detection Zehao Ye et.al. 2401.15266 null
2024-01-25 CloudTracks: A Dataset for Localizing Ship Tracks in Satellite Images of Clouds Muhammad Ahmed Chaudhry et.al. 2401.14486 null
2024-01-25 Rethinking Patch Dependence for Masked Autoencoders Letian Fu et.al. 2401.14391 link
2024-01-25 On generalisability of segment anything model for nuclear instance segmentation in histology images Kesi Xu et.al. 2401.14248 null
2024-01-31 UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature Representation Qingdong He et.al. 2401.11395 link
2024-01-19 One Step Learning, One Step Review Xiaolong Huang et.al. 2401.10962 link
2024-01-18 A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting Wouter Van Gansbeke et.al. 2401.10227 link
2024-01-19 Skeleton-Guided Instance Separation for Fine-Grained Segmentation in Microscopy Jun Wang et.al. 2401.09895 null
2024-01-18 SEINE: Structure Encoding and Interaction Network for Nuclei Instance Segmentation Ye Zhang et.al. 2401.09773 link
2024-01-18 Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation Zesen Cheng et.al. 2401.09732 link
2024-01-18 P2Seg: Pointly-supervised Segmentation via Mutual Distillation Zipeng Wang et.al. 2401.09709 null
2024-01-25 SymTC: A Symbiotic Transformer-CNN Net for Instance Segmentation of Lumbar Spine MRI Jiasong Chen et.al. 2401.09627 link
2024-01-17 Trapped in texture bias? A large scale comparison of deep instance segmentation Johannes Theodoridis et.al. 2401.09109 link
2024-01-16 Segment Anything Model Can Not Segment Anything: Assessing AI Foundation Model's Generalizability in Permafrost Mapping Wenwen Li et.al. 2401.08787 null

(back to top)

Panoptic Segmentation

Publish Date Title Authors PDF Code
2024-08-19 DiscoNeRF: Class-Agnostic Object Field for 3D Object Discovery Corentin Dumery et.al. 2408.09928 null
2024-07-23 SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation Pengfei Chen et.al. 2407.16682 null
2024-07-23 Strike a Balance in Continual Panoptic Segmentation Jinpeng Chen et.al. 2407.16354 link
2024-07-19 Panoptic Segmentation of Mammograms with Text-To-Image Diffusion Model Kun Zhao et.al. 2407.14326 null
2024-07-19 MC-PanDA: Mask Confidence for Panoptic Domain Adaptation Ivan Martinović et.al. 2407.14110 link
2024-07-15 OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models Zijian Zhou et.al. 2407.11213 null
2024-07-12 A Fair Ranking and New Model for Panoptic Scene Graph Generation Julian Lorenz et.al. 2407.09216 null
2024-07-12 From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation Hanrong Shi et.al. 2407.09191 null
2024-07-10 Panoptic Segmentation of Galactic Structures in LSB Images Felix Richards et.al. 2407.07494 null
2024-07-03 Context-Aware Video Instance Segmentation Seunghun Lee et.al. 2407.03010 link
2024-07-01 PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction Xuan Yu et.al. 2407.01349 null
2024-07-01 Gradient-based Class Weighting for Unsupervised Domain Adaptation in Dense Prediction Visual Tasks Roberto Alcover-Couso et.al. 2407.01327 null
2024-06-14 Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations Daan de Geus et.al. 2406.10114 link
2024-06-11 PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving Yining Shi et.al. 2406.07037 null
2024-06-08 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation Qingfeng Liu et.al. 2406.05352 null
2024-06-07 3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation Ruipu Wu et.al. 2406.04002 null
2024-06-01 2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation Biao Wu et.al. 2406.00500 null
2024-05-29 A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation Niclas Vödisch et.al. 2405.19035 link
2024-05-23 Efficient Robot Learning for Perception and Mapping Niclas Vödisch et.al. 2405.14688 null
2024-05-16 4D Panoptic Scene Graph Generation Jingkang Yang et.al. 2405.10305 link
2024-05-16 An Integrated Framework for Multi-Granular Explanation of Video Summarization Konstantinos Tsigos et.al. 2405.10082 null
2024-05-12 Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception Haoming Chen et.al. 2405.07201 link
2024-05-03 Panoptic-SLAM: Visual SLAM in Dynamic Environments using Panoptic Segmentation Gabriel Fischer Abati et.al. 2405.02177 link
2024-04-28 Panoptic Segmentation and Labelling of Lumbar Spine Vertebrae using Modified Attention Unet Rikathi Pal et.al. 2404.18291 null
2024-04-15 The revenge of BiSeNet: Efficient Multi-Task Image Segmentation Gabriele Rosi et.al. 2404.09570 null
2024-04-15 kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies Zhongrui Gui et.al. 2404.09447 null
2024-04-12 COCONut: Modernizing COCO Segmentation Xueqing Deng et.al. 2404.08639 null
2024-04-04 Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation Elham Amin Mansour et.al. 2404.03799 null
2024-04-02 JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments Duy-Tho Le et.al. 2404.01686 null
2024-03-29 ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning Beomyoung Kim et.al. 2403.20126 link
2024-03-29 Using Images as Covariates: Measuring Curb Appeal with Deep Learning Ardyn Nordstrom et.al. 2403.19915 null
2024-03-21 PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model Zheng Zhang et.al. 2403.14598 link
2024-03-14 PosSAM: Panoptic Open-vocabulary Segment Anything Vibashan VS et.al. 2403.09620 link
2024-03-01 Small, Versatile and Mighty: A Range-View Perception Framework Qiang Meng et.al. 2403.00325 null
2024-03-01 PEM: Prototype-based Efficient MaskFormer for Image Segmentation Niccolò Cavagnero et.al. 2402.19422 link
2024-02-23 Benchmarking the Robustness of Panoptic Segmentation for Automated Driving Yiting Wang et.al. 2402.15469 null
2024-02-21 Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation Jialei Chen et.al. 2402.13697 null
2024-02-17 Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review Thang-Anh-Quan Nguyen et.al. 2402.11141 link
2024-02-04 Generalizable Entity Grounding via Assistance of Large Language Model Lu Qi et.al. 2402.02555 null
2024-01-25 UrbanGenAI: Reconstructing Urban Landscapes using Panoptic Segmentation and Diffusion Models Timo Kapsalis et.al. 2401.14379 null
2024-01-23 MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty Tim Brödermann et.al. 2401.12761 link
2024-01-23 Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration Yifan Zhang et.al. 2401.12452 null
2024-01-18 OMG-Seg: Is One Model Good Enough For All Segmentation? Xiangtai Li et.al. 2401.10229 link
2024-01-18 RAP-SAM: Towards Real-Time All-Purpose Segment Anything Shilin Xu et.al. 2401.10228 link
2024-01-18 A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting Wouter Van Gansbeke et.al. 2401.10227 link
2024-02-07 Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering Damien Robert et.al. 2401.06704 link
2024-01-18 UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding Bowen Shi et.al. 2401.06397 null
2024-01-11 CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians Bin Dou et.al. 2401.05925 null
2024-01-04 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation Zihao Xiao et.al. 2401.02402 null
2023-12-28 Unsupervised Universal Image Segmentation Dantong Niu et.al. 2312.17243 link

(back to top)

Object Detection

Publish Date Title Authors PDF Code
2024-08-20 A Closer Look at Data Augmentation Strategies for Finetuning-Based Low/Few-Shot Object Detection Vladislav Li et.al. 2408.10940 null
2024-08-20 Aligning Object Detector Bounding Boxes with Human Preference Ombretta Strafforello et.al. 2408.10844 null
2024-08-20 LightMDETR: A Lightweight Approach for Low-Cost Open-Vocabulary Object Detection Training Binta Sow et.al. 2408.10787 null
2024-08-20 Just a Hint: Point-Supervised Camouflaged Object Detection Huafeng Chen et.al. 2408.10777 null
2024-08-21 Generative AI in Industrial Machine Vision -- A Review Hans Aoyang Zhou et.al. 2408.10775 null
2024-08-20 Detection of Intracranial Hemorrhage for Trauma Patients Antoine P. Sanner et.al. 2408.10768 null
2024-08-20 SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection Huafeng Chen et.al. 2408.10760 null
2024-08-20 Leveraging Temporal Contexts to Enhance Vehicle-Infrastructure Cooperative Perception Jiaru Zhong et.al. 2408.10531 null
2024-08-19 Leveraging Superfluous Information in Contrastive Representation Learning Xuechu Yu et.al. 2408.10292 null
2024-08-19 SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition Wiktor Mucha et.al. 2408.10037 null
2024-08-19 Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving Jun Yan et.al. 2408.09839 link
2024-08-19 Latent Diffusion for Guided Document Table Generation Syed Jawwad Haider Hamdani et.al. 2408.09800 null
2024-08-18 Adversarial Attacked Teacher for Unsupervised Domain Adaptive Object Detection Kaiwen Wang et.al. 2408.09431 null
2024-08-18 Boundary-Recovering Network for Temporal Action Detection Jihwan Kim et.al. 2408.09354 null
2024-08-18 YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems Chien-Yao Wang et.al. 2408.09332 null
2024-08-17 GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System Shuo Wang et.al. 2408.09191 null
2024-08-17 PADetBench: Towards Benchmarking Physical Attacks against Object Detection Jiawei Lian et.al. 2408.09181 link
2024-08-17 MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation Xiao Zhao et.al. 2408.09122 null
2024-08-17 Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community Jiancheng Pan et.al. 2408.09110 null
2024-08-16 SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation Xinyu Xiong et.al. 2408.08870 link
2024-08-16 Multimodal Relational Triple Extraction with Query-based Entity Object Transformer Lei Hei et.al. 2408.08709 null
2024-08-16 Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs Jinming Liu et.al. 2408.08575 null
2024-08-15 5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks Dongshuo Yin et.al. 2408.08345 link
2024-08-15 Learned Multimodal Compression for Autonomous Driving Hadi Hadizadeh et.al. 2408.08211 null
2024-08-16 OC3D: Weakly Supervised Outdoor 3D Object Detection with Only Coarse Click Annotation Qiming Xia et.al. 2408.08092 null
2024-08-15 CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection Xunfa Lai et.al. 2408.08050 null
2024-08-15 Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement Wenxuan Li et.al. 2408.07999 null
2024-08-15 GOReloc: Graph-based Object-Level Relocalization for Visual SLAM Yutong Wang et.al. 2408.07917 link
2024-08-14 See It All: Contextualized Late Aggregation for 3D Dense Captioning Minjung Kim et.al. 2408.07648 null
2024-08-14 Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving Yuqing Wen et.al. 2408.07605 null
2024-08-14 Infra-YOLO: Efficient Neural Network Structure with Model Compression for Real-Time Infrared Small Object Detection Zhonglin Chen et.al. 2408.07455 null
2024-08-14 Sign language recognition based on deep learning and low-cost handcrafted descriptors Alvaro Leandro Cavalcante Carneiro et.al. 2408.07244 link
2024-08-13 Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces Zhiling Chen et.al. 2408.07146 null
2024-08-13 Divide and Conquer: Improving Multi-Camera 3D Perception with 2D Semantic-Depth Priors and Input-Dependent Queries Qi Song et.al. 2408.06901 null
2024-08-13 Integrating Saliency Ranking and Reinforcement Learning for Enhanced Object Detection Matthias Bartolo et.al. 2408.06803 link
2024-08-13 Exploring Domain Shift on Radar-Based 3D Object Detection Amidst Diverse Environmental Conditions Miao Zhang et.al. 2408.06772 null
2024-08-13 Unified-IoU: For High-Quality Object Detection Xiangjie Luo et.al. 2408.06636 link
2024-08-13 MV-DETR: Multi-modality indoor object detection by Multi-View DEtecton TRansformers Zichao Dong et.al. 2408.06604 null
2024-08-12 Latent Disentanglement for Low Light Image Enhancement Zhihao Zheng et.al. 2408.06245 null
2024-08-12 MR3D-Net: Dynamic Multi-Resolution 3D Sparse Voxel Grid Fusion for LiDAR-Based Collective Perception Sven Teufel et.al. 2408.06137 link
2024-08-12 DPDETR: Decoupled Position Detection Transformer for Infrared-Visible Object Detection Junjie Guo et.al. 2408.06123 null
2024-08-12 Optimizing Vision Transformers with Data-Free Knowledge Transfer Gousia Habib et.al. 2408.05952 null
2024-08-12 MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection Zitian Wang et.al. 2408.05945 null
2024-08-12 Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes Ke Zhou et.al. 2408.05936 null
2024-08-13 Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts Peng Wu et.al. 2408.05905 null
2024-08-11 U-DECN: End-to-End Underwater Object Detection ConvNet with Improved DeNoising Training Zhuoyan Liu et.al. 2408.05780 link
2024-08-11 FADE: A Dataset for Detecting Falling Objects around Buildings in Video Zhigang Tu et.al. 2408.05750 null
2024-08-11 Evaluating BM3D and NBNet: A Comprehensive Study of Image Denoising Across Multiple Datasets Ghazal Kaviani et.al. 2408.05697 null
2024-08-09 DeepInteraction++: Multi-Modality Interaction for Autonomous Driving Zeyu Yang et.al. 2408.05075 link
2024-08-09 RadarPillars: Efficient Object Detection from 4D Radar Point Clouds Alexander Musiat et.al. 2408.05020 null
2024-08-09 Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation Yifan Feng et.al. 2408.04804 link
2024-08-08 SOD-YOLOv8 -- Enhancing YOLOv8 for Small Object Detection in Traffic Scenes Boshra Khalili et.al. 2408.04786 null
2024-08-08 Data-Driven Pixel Control: Challenges and Prospects Saurabh Farkya et.al. 2408.04767 null
2024-08-10 SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More Tianrun Chen et.al. 2408.04579 null
2024-08-07 Impact Analysis of Data Drift Towards The Development of Safety-Critical Automotive System Md Shahi Amran Hossain et.al. 2408.04476 null
2024-08-08 Detecting Car Speed using Object Detection and Depth Estimation: A Deep Learning Framework Subhasis Dasgupta et.al. 2408.04360 null
2024-08-08 Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection Shixuan Gao et.al. 2408.04326 null
2024-08-07 PaveCap: The First Multimodal Framework for Comprehensive Pavement Condition Assessment with Dense Captioning and PCI Estimation Blessing Agyei Kyem et.al. 2408.04110 link
2024-08-07 Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection Christian Fruhwirth-Reisinger et.al. 2408.03790 link
2024-08-07 Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model Guoqing Zhu et.al. 2408.03748 link
2024-08-07 CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications Tianfang Zhang et.al. 2408.03703 link
2024-08-07 L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection Xun Huang et.al. 2408.03677 null
2024-08-07 Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks Jaewook Lee et.al. 2408.03663 null
2024-08-07 Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving Amirhosein Chahe et.al. 2408.03516 null
2024-08-07 GUI Element Detection Using SOTA YOLO Deep Learning Models Seyed Shayan Daneshvar et.al. 2408.03507 null
2024-08-06 AI Foundation Models in Remote Sensing: A Survey Siqi Lu et.al. 2408.03464 null
2024-08-06 Biomedical Image Segmentation: A Systematic Literature Review of Deep Learning Based Object Detection Methods Fazli Wahid et.al. 2408.03393 null
2024-08-06 Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection Sen Nie et.al. 2408.02891 null
2024-08-05 HQOD: Harmonious Quantization for Object Detection Long Huang et.al. 2408.02561 link
2024-08-05 Tensorial template matching for fast cross-correlation with rotations and its application for tomography Antonio Martinez-Sanchez et.al. 2408.02398 null
2024-08-05 AssemAI: Interpretable Image-Based Anomaly Detection for Manufacturing Pipelines Renjith Prasad et.al. 2408.02181 null
2024-08-04 KAN-RCBEVDepth: A multi-modal fusion algorithm in object detection for autonomous driving Zhihao Lai et.al. 2408.02088 null
2024-08-06 A Survey and Evaluation of Adversarial Attacks for Object Detection Khoi Nguyen Tiet Nguyen et.al. 2408.01934 null
2024-08-04 CAF-YOLO: A Robust Framework for Multi-Scale Lesion Detection in Biomedical Imagery Zilin Chen et.al. 2408.01897 link
2024-08-03 Supervised Image Translation from Visible to Infrared Domain for Object Detection Prahlad Anand et.al. 2408.01843 null
2024-08-03 Domain penalisation for improved Out-of-Distribution Generalisation Shuvam Jena et.al. 2408.01746 null
2024-08-03 LAM3D: Leveraging Attention for Monocular 3D Object Detection Diana-Alexandra Sas et.al. 2408.01739 null
2024-08-02 A Robotics-Inspired Scanpath Model Reveals the Importance of Uncertainty and Semantic Object Cues for Gaze Guidance in Dynamic Scenes Vito Mengers et.al. 2408.01322 null
2024-08-02 Underwater Object Detection Enhancement via Channel Stabilization Muhammad Ali et.al. 2408.01293 link
2024-08-02 PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network Changqun Xia et.al. 2408.01137 null
2024-08-02 Effect of Fog Particle Size Distribution on 3D Object Detection Under Adverse Weather Conditions Ajinkya Shinde et.al. 2408.01085 null
2024-08-02 Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model Yang Jin et.al. 2408.01044 null
2024-08-02 Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach Yabin Zhu et.al. 2408.00969 link
2024-08-01 Joint Neural Networks for One-shot Object Recognition and Detection Camilo J. Vargas et.al. 2408.00701 null
2024-08-01 Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection Ruiyang Zhang et.al. 2408.00619 null
2024-08-05 U2UData: A Large-scale Cooperative Perception Dataset for Swarm UAVs Autonomous Flight Tongtong Feng et.al. 2408.00606 null
2024-08-01 MUFASA: Multi-View Fusion and Adaptation Network with Spatial Awareness for Radar Object Detection Xiangyuan Peng et.al. 2408.00565 null
2024-08-01 MonoMM: A Multi-scale Mamba-Enhanced Network for Real-time Monocular 3D Object Detection Youjia Fu et.al. 2408.00438 null
2024-08-01 DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training Yu Xie et.al. 2408.00355 null
2024-08-01 A Simple Background Augmentation Method for Object Detection with Diffusion Model Yuhang Li et.al. 2408.00350 null
2024-08-01 Diff3DETR:Agent-based Diffusion Model for Semi-supervised 3D Object Detection Jiacheng Deng et.al. 2408.00286 null
2024-08-01 RoCo:Robust Collaborative Perception By Iterative Object Matching and Pose Adjustment Zhe Huang et.al. 2408.00257 link
2024-07-31 Dynamic Object Queries for Transformer-based Incremental Object Detection Jichuan Zhang et.al. 2407.21687 null
2024-07-31 Spatial Transformer Network YOLO Model for Agricultural Object Detection Yash Zambre et.al. 2407.21652 null
2024-07-31 Evaluating SAM2's Role in Camouflaged Object Detection: From SAM to SAM2 Lv Tang et.al. 2407.21596 null
2024-07-31 InScope: A New Real-world 3D Infrastructure-side Collaborative Perception Dataset for Open Traffic Scenarios Xiaofei Zhang et.al. 2407.21581 null
2024-07-31 Voxel Scene Graph for Intracranial Hemorrhage Antoine P. Sanner et.al. 2407.21580 null
2024-07-31 MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection Kuo Wang et.al. 2407.21465 link
2024-07-30 Candidate Distant Trans-Neptunian Objects Detected by the New Horizons Subaru TNO Survey Wesley C. Fraser et.al. 2407.21142 null
2024-07-30 What is YOLOv5: A deep look into the internal features of the popular object detector Rahima Khanam et.al. 2407.20892 null
2024-07-30 WARM-3D: A Weakly-Supervised Sim2Real Domain Adaptation Framework for Roadside Monocular 3D Object Detection Xingcheng Zhou et.al. 2407.20818 null
2024-07-31 Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection Xinhao Luo et.al. 2407.20708 link
2024-07-31 Weakly Supervised Intracranial Hemorrhage Segmentation with YOLO and an Uncertainty Rectified Segment Anything Model Pascal Spiegler et.al. 2407.20461 null
2024-07-29 MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset Zaid A. El Shair et.al. 2407.20446 null
2024-07-30 AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics Xiangxiang Dai et.al. 2407.20124 link
2024-07-29 Octave-YOLO: Cross frequency detection network with octave convolution Sangjune Shin et.al. 2407.19746 null
2024-07-29 Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images Zewen Du et.al. 2407.19696 null
2024-07-29 Practical Video Object Detection via Feature Selection and Aggregation Yuheng Shi et.al. 2407.19650 link
2024-07-28 Solving Short-Term Relocalization Problems In Monocular Keyframe Visual SLAM Using Spatial And Semantic Data Azmyin Md. Kamal et.al. 2407.19518 link
2024-07-28 Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets Tianxiao Zhang et.al. 2407.19394 link
2024-07-27 Sewer Image Super-Resolution with Depth Priors and Its Lightweight Network Gang Pan et.al. 2407.19271 null
2024-07-27 Enhancing Tree Type Detection in Forest Fire Risk Assessment: Multi-Stage Approach and Color Encoding with Forest Fire Risk Evaluation Framework for UAV Imagery Jinda Zhang et.al. 2407.19184 null
2024-07-27 Reducing Spurious Correlation for Federated Domain Generalization Shuran Ma et.al. 2407.19174 null
2024-07-27 Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble Juhan Cha et.al. 2407.19156 link
2024-07-25 LION: Linear Group RNN for 3D Object Detection in Point Clouds Zhe Liu et.al. 2407.18232 link
2024-07-25 XS-VID: An Extremely Small Video Object Detection Dataset Jiahao Guo et.al. 2407.18137 null
2024-07-25 SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images Wenxi Li et.al. 2407.17956 null
2024-07-25 A Novel Perception Entropy Metric for Optimizing Vehicle Perception with LiDAR Deployment Yongjiang He et.al. 2407.17942 null
2024-07-25 Hierarchical Object Detection and Recognition Framework for Practical Plant Disease Diagnosis Kohei Iwano et.al. 2407.17906 null
2024-07-25 Advancing 3D Point Cloud Understanding through Deep Transfer Learning: A Comprehensive Survey Shahab Saquib Sohail et.al. 2407.17877 null
2024-07-25 Enhancing Fine-grained Object Detection in Aerial Images via Orthogonal Mapping Haoran Zhu et.al. 2407.17738 link
2024-07-26 Unsqueeze [CLS] Bottleneck to Learn Rich Representations Qing Su et.al. 2407.17671 link
2024-07-24 SDLNet: Statistical Deep Learning Network for Co-Occurring Object Detection and Identification Binay Kumar Singh et.al. 2407.17664 null
2024-07-24 PEEKABOO: Hiding parts of an image for unsupervised object localization Hasib Zunair et.al. 2407.17628 link
2024-07-24 ALPI: Auto-Labeller with Proxy Injection for 3D Object Detection using 2D Labels Only Saad Lahlali et.al. 2407.17197 null
2024-07-24 DVPE: Divided View Position Embedding for Multi-View 3D Object Detection Jiasen Wang et.al. 2407.16955 link
2024-07-23 What Matters in Range View 3D Object Detection Benjamin Wilson et.al. 2407.16789 link
2024-07-23 A Framework for Pupil Tracking with Event Cameras Khadija Iddrisu et.al. 2407.16665 null
2024-07-24 Velocity Driven Vision: Asynchronous Sensor Fusion Birds Eye View Models for Autonomous Vehicles Seamie Hayes et.al. 2407.16636 null
2024-07-23 COALA: A Practical and Vision-Centric Federated Learning Platform Weiming Zhuang et.al. 2407.16560 link
2024-07-23 Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection Trinh Le Ba Khanh et.al. 2407.16497 link
2024-07-23 MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection Youngmin Oh et.al. 2407.16448 link
2024-07-23 ESOD: Efficient Small Object Detection on High-Resolution Images Kai Liu et.al. 2407.16424 null
2024-07-23 Understanding Impacts of Electromagnetic Signal Injection Attacks on Object Detection Youqian Zhang et.al. 2407.16327 null
2024-07-23 DeepClean: Integrated Distortion Identification and Algorithm Selection for Rectifying Image Corruptions Aditya Kapoor et.al. 2407.16302 null
2024-07-23 FoRA: Low-Rank Adaptation Model beyond Multimodal Siamese Network Weiying Xie et.al. 2407.16129 link
2024-07-22 PLayerTV: Advanced Player Tracking and Identification for Automatic Soccer Highlight Clips Håkon Maric Solberg et.al. 2407.16076 null
2024-07-23 Disentangling spatio-temporal knowledge for weakly supervised object detection and segmentation in surgical video Guiqiu Liao et.al. 2407.15794 link
2024-07-22 Towards Open-World Object-based Anomaly Detection via Self-Supervised Outlier Synthesis Brian K. S. Isaac-Medina et.al. 2407.15763 link
2024-07-22 YOLOv10 for Automated Fracture Detection in Pediatric Wrist Trauma X-rays Ammar Ahmed et.al. 2407.15689 link
2024-07-22 SS-SFR: Synthetic Scenes Spatial Frequency Response on Virtual KITTI and Degraded Automotive Simulations for Object Detection Daniel Jakab et.al. 2407.15646 null
2024-07-22 YOLO-pdd: A Novel Multi-scale PCB Defect Detection Method Using Deep Representations with Sequential Images Bowen Liu et.al. 2407.15427 null
2024-07-22 Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection Zhili Chen et.al. 2407.15354 link
2024-07-22 Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object Detection Yiran Yang et.al. 2407.15334 null
2024-07-21 Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection Kwanyong Park et.al. 2407.15296 null
2024-07-21 Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis Jingwei Guo et.al. 2407.15199 link
2024-07-21 Rethinking Feature Backbone Fine-tuning for Remote Sensing Object Detection Yechan Kim et.al. 2407.15143 null
2024-07-19 Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation Dongyang Wu et.al. 2407.14498 null
2024-07-19 MLMT-CNN for Object Detection and Segmentation in Multi-layer and Multi-spectral Images Majedaldein Almahasneh et.al. 2407.14473 null
2024-07-19 EmoCAM: Toward Understanding What Drives CNN-based Emotion Recognition Youssef Doulfoukar et.al. 2407.14314 null
2024-07-19 Bucketed Ranking-based Losses for Efficient Training of Object Detectors Feyza Yavuz et.al. 2407.14204 link
2024-07-18 GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model Abdelrahman Shaker et.al. 2407.13772 link
2024-07-18 General Geometry-aware Weakly Supervised 3D Object Detection Guowen Zhang et.al. 2407.13748 link
2024-07-18 Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation Ilhoon Yoon et.al. 2407.13524 link
2024-07-18 Learning Camouflaged Object Detection from Noisy Pseudo Label Jin Zhang et.al. 2407.13157 null
2024-07-18 DFMSD: Dual Feature Masking Stage-wise Knowledge Distillation for Object Detection Zhourui Zhang et.al. 2407.13147 null
2024-07-18 FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection Jianwei Zhao et.al. 2407.13133 null
2024-07-17 AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer Zhuguanyu Wu et.al. 2407.12951 link
2024-07-17 Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients Dohyung Kim et.al. 2407.12637 null
2024-07-17 CerberusDet: Unified Multi-Task Object Detection Irina Tolstykh et.al. 2407.12632 link
2024-07-17 Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation Prantik Howlader et.al. 2407.12630 link
2024-07-17 Enhancing Wrist Abnormality Detection with YOLO: Analysis of State-of-the-art Single-stage Detection Models Ammar Ahmed et.al. 2407.12597 link
2024-07-17 Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection Hu Cao et.al. 2407.12582 null
2024-07-17 Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation Kaixin Bai et.al. 2407.12449 null
2024-07-17 GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval Han Zhou et.al. 2407.12431 link
2024-07-17 Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection Zhenni Yu et.al. 2407.12339 null
2024-07-16 AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs Yunling Zheng et.al. 2407.12217 null
2024-07-16 The object detection method aids in image reconstruction evaluation and clinical interpretation of meniscal abnormalities Natalia Konovalova et.al. 2407.12184 null
2024-07-16 A Case for Application-Aware Space Radiation Tolerance in Orbital Computing Meiqi Wang et.al. 2407.11853 null
2024-07-16 Improving Unsupervised Video Object Segmentation via Fake Flow Generation Suhwan Cho et.al. 2407.11714 link
2024-07-16 Relation DETR: Exploring Explicit Position Relation Prior for Object Detection Xiuquan Hou et.al. 2407.11699 link
2024-07-16 Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection Qijie Mo et.al. 2407.11499 link
2024-07-16 Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes Zhi Cai et.al. 2407.11464 link
2024-07-16 Generative AI Driven Task-Oriented Adaptive Semantic Communications Yuzhou Fu et.al. 2407.11354 null
2024-07-16 LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction Penghui Du et.al. 2407.11335 link
2024-07-16 TCFormer: Visual Recognition via Token Clustering Transformer Wang Zeng et.al. 2407.11321 link
2024-07-16 PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer Pierre-David Letourneau et.al. 2407.11306 null
2024-07-15 OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models Zijian Zhou et.al. 2407.11213 null
2024-07-15 Interpreting Hand gestures using Object Detection and Digits Classification Sangeetha K et.al. 2407.10902 null
2024-07-15 RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception Chunliang Li et.al. 2407.10876 link
2024-07-15 OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection Jinghua Hou et.al. 2407.10753 link
2024-07-15 Anticipating Future Object Compositions without Forgetting Youssef Zahran et.al. 2407.10723 null
2024-07-15 OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer Yu Wang et.al. 2407.10655 link
2024-07-15 Backdoor Attacks against Image-to-Image Networks Wenbo Jiang et.al. 2407.10445 null
2024-07-14 Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data Tuo Feng et.al. 2407.10200 link
2024-07-14 LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection Sanmin Kim et.al. 2407.10164 link
2024-07-14 FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection Zheng Jiang et.al. 2407.10135 link
2024-07-14 Plain-Det: A Plain Multi-Dataset Object Detector Cheng Shi et.al. 2407.10083 link
2024-07-12 DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training Chen Xin et.al. 2407.09174 link
2024-07-12 Open Vocabulary Multi-Label Video Classification Rohit Gupta et.al. 2407.09073 null
2024-07-12 DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects Peng Wang et.al. 2407.09051 null
2024-07-11 OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects Akshay Krishnan et.al. 2407.08711 null
2024-07-11 Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene Ruiyang Zhang et.al. 2407.08569 link
2024-07-11 Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation Zeyang Zhao et.al. 2407.08489 null
2024-07-11 Semi-Supervised Object Detection: A Survey on Progress from CNN to Transformer Tahira Shehzadi et.al. 2407.08460 null
2024-07-11 PowerYOLO: Mixed Precision Model for Hardware Efficient Object Detection with Event Data Dominika Przewlocka-Rus et.al. 2407.08272 null
2024-07-11 Knowledge distillation to effectively attain both region-of-interest and global semantics from an image where multiple objects appear Seonwhee Jin et.al. 2407.08257 link
2024-07-11 Enrich the content of the image Using Context-Aware Copy Paste Qiushi Guo et.al. 2407.08151 null
2024-07-11 DMM: Disparity-guided Multispectral Mamba for Oriented Object Detection in Remote Sensing Minghang Zhou et.al. 2407.08132 null
2024-07-10 MambaVision: A Hybrid Mamba-Transformer Vision Backbone Ali Hatamizadeh et.al. 2407.08083 link
2024-07-10 Bayesian Detector Combination for Object Detection with Crowdsourced Annotations Zhi Qin Tan et.al. 2407.07958 link
2024-07-10 Cross Domain Object Detection via Multi-Granularity Confidence Alignment based Mean Teacher Jiangming Chen et.al. 2407.07780 null
2024-07-10 LSM: A Comprehensive Metric for Assessing the Safety of Lane Detection Systems in Autonomous Driving Jörg Gamerdinger et.al. 2407.07740 null
2024-07-10 Few-Shot Domain Adaptive Object Detection for Microscopic Images Sumayya Inayat et.al. 2407.07633 link
2024-07-10 Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights Yan Hao et.al. 2407.07586 link
2024-07-09 Exploring Camera Encoder Designs for Autonomous Driving Perception Barath Lakshmanan et.al. 2407.07276 null
2024-07-09 Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images Chuanrui Zhang et.al. 2407.06984 null
2024-07-09 Cue Point Estimation using Object Detection Giulia Argüello et.al. 2407.06823 link
2024-07-09 CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection Shuang Hao et.al. 2407.06780 link
2024-07-09 Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions Yu-Guan Hsieh et.al. 2407.06723 null
2024-07-08 Stochastic Traveling Salesperson Problem with Neighborhoods for Object Detection Cheng Peng et.al. 2407.06366 null
2024-07-08 GeoWATCH for Detecting Heavy Construction in Heterogeneous Time Series of Satellite Images Jon Crall et.al. 2407.06337 null
2024-07-08 Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection Chenxu Wang et.al. 2407.05909 link
2024-07-08 Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework Hao Jing et.al. 2407.05769 null
2024-07-08 Short-term Object Interaction Anticipation with Disentangled Object Detection @ Ego4D Short Term Object Interaction Anticipation Challenge Hyunjin Cho et.al. 2407.05713 link
2024-07-08 Weakly Supervised Test-Time Domain Adaptation for Object Detection Anh-Dzung Doan et.al. 2407.05607 null
2024-07-08 Towards Reflected Object Detection: A Benchmark Zhongtian Wang et.al. 2407.05575 null
2024-07-08 GMC: A General Framework of Multi-stage Context Learning and Utilization for Visual Detection Tasks Xuan Wang et.al. 2407.05566 null
2024-07-07 CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs Akshat Ramachandran et.al. 2407.05266 link
2024-07-07 Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image Pengkun Jiao et.al. 2407.05256 null
2024-07-06 SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention Yunzhong Si et.al. 2407.05128 link
2024-07-06 Quantizing YOLOv7: A Comprehensive Study Mohammadamin Baghbanbashi et.al. 2407.04943 null
2024-07-05 SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing Industry Hafiz Mughees Ahmad et.al. 2407.04590 link
2024-07-05 Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection Zhiqiang Yang et.al. 2407.04381 link
2024-07-05 Towards Stable 3D Object Detection Jiabao Wang et.al. 2407.04305 null
2024-07-04 LiDAR-based Real-Time Object Detection and Tracking in Dynamic Environments Wenqiang Du et.al. 2407.04115 null
2024-07-04 FIPGNet:Pyramid grafting network with feature interaction strategies Ziyi Ding et.al. 2407.04085 null
2024-07-08 Detect Closer Surfaces that can be Seen: New Modeling and Evaluation in Cross-domain 3D Object Detection Ruixiao Zhang et.al. 2407.04061 link
2024-07-04 The Solution for the GAIIC2024 RGB-TIR object detection Challenge Xiangyu Wu et.al. 2407.03872 null
2024-07-04 StreamLTS: Query-based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection Yunshuang Yuan et.al. 2407.03825 null
2024-07-04 CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding Emanuele Vivoli et.al. 2407.03550 link
2024-07-03 Comics Datasets Framework: Mix of Comics datasets for detection benchmarking Emanuele Vivoli et.al. 2407.03540 null
2024-07-03 Visual Grounding with Attention-Driven Constraint Balancing Weitai Kang et.al. 2407.03243 null
2024-07-03 Category-Aware Dynamic Label Assignment with High-Quality Oriented Proposal Mingkui Feng et.al. 2407.03205 null
2024-07-03 SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding Weitai Kang et.al. 2407.03200 link
2024-07-03 Global Context Modeling in YOLOv8 for Pediatric Wrist Fracture Detection Rui-Yang Ju et.al. 2407.03163 link
2024-07-03 YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision Muhammad Hussain et.al. 2407.02988 null
2024-07-03 A Pairwise DomMix Attentive Adversarial Network for Unsupervised Domain Adaptive Object Detection Jie Shao et.al. 2407.02835 null
2024-07-03 ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers Yanfeng Jiang et.al. 2407.02763 null
2024-07-02 SMILe: Leveraging Submodular Mutual Information For Robust Few-Shot Object Detection Anay Majee et.al. 2407.02665 null
2024-07-02 Robust ADAS: Enhancing Robustness of Machine Learning-based Advanced Driver Assistance Systems for Adverse Weather Muhammad Zaeem Shahzad et.al. 2407.02581 null
2024-07-03 Similarity Distance-Based Label Assignment for Tiny Object Detection Shuohao Shi et.al. 2407.02394 link
2024-07-02 OpenSlot: Mixed Open-set Recognition with Object-centric Learning Xu Yin et.al. 2407.02386 null
2024-07-02 DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection Kaixin Xu et.al. 2407.02098 null
2024-07-02 Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning Chengchao Shen et.al. 2407.02014 link
2024-07-02 Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection Zixing Li et.al. 2407.01894 link
2024-07-01 Scarecrow monitoring system:employing mobilenet ssd for enhanced animal supervision Balaji VS et.al. 2407.01435 null
2024-07-01 Formal Verification of Object Detection Avraham Raviv et.al. 2407.01295 link
2024-07-01 Cross-Architecture Auxiliary Feature Space Translation for Efficient Few-Shot Personalized Object Detection Francesco Barbato et.al. 2407.01193 null
2024-07-01 Eliminating Position Bias of Language Models: A Mechanistic Approach Ziqi Wang et.al. 2407.01100 null
2024-07-01 No More Potentially Dynamic Objects: Static Point Cloud Map Generation based on 3D Object Detection and Ground Projection Soojin Woo et.al. 2407.01073 link
2024-07-01 Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding Yifan Tang et.al. 2406.19791 null
2024-06-28 Basketball-SORT: An Association Method for Complex Multi-object Occlusion Problems in Basketball Multi-object Tracking Qingrui Hu et.al. 2406.19655 null
2024-06-27 Robustness Testing of Black-Box Models Against CT Degradation Through Test-Time Augmentation Jack Highton et.al. 2406.19557 null
2024-06-27 BOrg: A Brain Organoid-Based Mitosis Dataset for Automatic Analysis of Brain Diseases Muhammad Awais et.al. 2406.19556 link
2024-06-27 Weighted Circle Fusion: Ensembling Circle Representation from Different Object Detection Results Jialin Yue et.al. 2406.19540 null
2024-06-27 Stereo Vision Based Robot for Remote Monitoring with VR Support Mohamed Fazil M. S. et.al. 2406.19498 null
2024-06-27 HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection Liujuan Cao et.al. 2406.19394 link
2024-06-27 STAL3D: Unsupervised Domain Adaptation for 3D Object Detection via Collaborating Self-Training and Adversarial Learning Yanan Zhang et.al. 2406.19362 null
2024-06-27 Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data Lukas Malte Kemeter et.al. 2406.19175 null
2024-06-30 Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO Fuseini Mumuni et.al. 2406.19057 null
2024-06-27 BiCo-Fusion: Bidirectional Complementary LiDAR-Camera Fusion for Semantic- and Spatial-Aware 3D Object Detection Yang Song et.al. 2406.19048 null
2024-06-27 A Universal Railway Obstacle Detection System based on Semi-supervised Segmentation And Optical Flow Qiushi Guo et.al. 2406.18908 null
2024-06-26 SpY: A Context-Based Approach to Spacecraft Component Detection Trupti Mahendrakar et.al. 2406.18709 null
2024-06-26 Unveiling the Unknown: Conditional Evidence Decoupling for Unknown Rejection Zhaowei Wu et.al. 2406.18443 link
2024-06-26 CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection Meiying Zhang et.al. 2406.18129 null
2024-06-26 The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval Meinardus Boris et.al. 2406.18113 link
2024-06-25 ET tu, CLIP? Addressing Common Object Errors for Unseen Environments Ye Won Byun et.al. 2406.17876 null
2024-06-25 MDHA: Multi-Scale Deformable Transformer with Hybrid Anchors for Multi-View 3D Object Detection Michelle Adeline et.al. 2406.17654 link
2024-06-25 Embedded event based object detection with spiking neural network Jonathan Courtois et.al. 2406.17617 null
2024-06-27 Towards Open-set Camera 3D Object Detection Zhuolin He et.al. 2406.17297 null
2024-06-25 Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments Shilei Cao et.al. 2406.16439 null
2024-06-23 Review of Zero-Shot and Few-Shot AI Algorithms in The Medical Domain Maged Badawi et.al. 2406.16143 null
2024-06-22 Smart Feature is What You Need Zhaoxin Hu et.al. 2406.15805 link
2024-06-22 MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception Guanqun Wang et.al. 2406.15768 null
2024-06-21 DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection Jia Syuen Lim et.al. 2406.14924 null
2024-06-21 MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection Zhuoxiao Chen et.al. 2406.14878 null
2024-06-20 Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines Xinyi Ying et.al. 2406.14482 link
2024-06-20 Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification Muhammad Saif Ullah Khan et.al. 2406.14370 link
2024-06-20 HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting? Ivan Karpukhin et.al. 2406.14341 link
2024-06-20 LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection Lilian Hollard et.al. 2406.14239 link
2024-06-20 SSAD: Self-supervised Auxiliary Detection Framework for Panoramic X-ray based Dental Disease Diagnosis Zijian Cai et.al. 2406.13963 link
2024-06-20 Towards the in-situ Trunk Identification and Length Measurement of Sea Cucumbers via Bézier Curve Modelling Shuaixin Liu et.al. 2406.13951 link
2024-06-19 DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection Zhuoxiao Chen et.al. 2406.13891 link
2024-06-19 Semantic Enhanced Few-shot Object Detection Zheng Wang et.al. 2406.13498 null
2024-06-19 Snowy Scenes,Clear Detections: A Robust Model for Traffic Light Detection in Adverse Weather Conditions Shivank Garg et.al. 2406.13473 link
2024-06-19 Strengthening Layer Interaction via Dynamic Layer Attention Kaishen Wang et.al. 2406.13392 link
2024-06-18 Privacy Preserving Federated Learning in Medical Imaging with Uncertainty Estimation Nikolas Koutsoubis et.al. 2406.12815 link
2024-06-18 Online Anchor-based Training for Image Classification Tasks Maria Tzelepi et.al. 2406.12662 null
2024-06-18 ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection Junhao Lin et.al. 2406.12536 link
2024-06-18 SDNIA-YOLO: A Robust Object Detection Model for Extreme Weather Conditions Yuexiong Ding et.al. 2406.12395 null
2024-06-18 Competitive Learning for Achieving Content-specific Filters in Video Coding for Machines Honglei Zhang et.al. 2406.12367 null
2024-06-18 Certified ML Object Detection for Surveillance Missions Mohammed Belcaid et.al. 2406.12362 null
2024-06-18 DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection Haodong Li et.al. 2406.12285 null
2024-06-18 The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge Hongpeng Pan et.al. 2406.12225 null
2024-06-17 Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint Xinglong Sun et.al. 2406.12079 null
2024-06-17 V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results Jiaqi Wang et.al. 2406.11739 null
2024-06-17 YOLO-FEDER FusionNet: A Novel Deep Learning Architecture for Drone Detection Tamara R. Lenhard et.al. 2406.11641 null
2024-06-17 Low-power Ship Detection in Satellite Images Using Neuromorphic Hardware Gregor Lenz et.al. 2406.11319 null
2024-06-17 Semi-Supervised Domain Adaptation Using Target-Oriented Domain Augmentation for 3D Object Detection Yecheol Kim et.al. 2406.11313 link
2024-06-17 Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection Yunsong Wang et.al. 2406.11311 null
2024-06-17 Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding Yunsong Wang et.al. 2406.11283 null
2024-06-18 YOLO9tr: A Lightweight Model for Pavement Damage Detection Utilizing a Generalized Efficient Layer Aggregation Network and Attention Mechanism Sompote Youwai et.al. 2406.11254 link
2024-06-16 Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP Shuyang Lin et.al. 2406.10961 null
2024-06-16 SparseDet: A Simple and Effective Framework for Fully Sparse LiDAR-based 3D Object Detection Lin Liu et.al. 2406.10907 null
2024-06-15 Object Detection using Oriented Window Learning Vi-sion Transformer: Roadway Assets Recognition Taqwa Alhadidi et.al. 2406.10712 null
2024-06-14 EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models Julian Straub et.al. 2406.10224 null
2024-06-14 YOLOv1 to YOLOv10: A comprehensive review of YOLO variants and their application in the agricultural domain Mujadded Al Rabbani Alif et.al. 2406.10139 null
2024-06-14 Shelf-Supervised Multi-Modal Pre-Training for 3D Object Detection Mehar Khurana et.al. 2406.10115 null
2024-06-14 Automated GIS-Based Framework for Detecting Crosswalk Changes from Bi-Temporal High-Resolution Aerial Images Richard Boadu Antwi et.al. 2406.09731 null
2024-06-14 An alternate approach for estimating grain-growth kinetics Manoj Prabakar et.al. 2406.09653 link
2024-06-13 Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach Yansheng Li et.al. 2406.09410 link
2024-06-13 Towards Evaluating the Robustness of Visual State Space Models Hashmat Shadab Malik et.al. 2406.09407 link
2024-06-13 Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models Yushi Hu et.al. 2406.09403 null
2024-06-13 Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024 Peixi Wu et.al. 2406.09201 null
2024-06-13 Computer vision-based model for detecting turning lane features on Florida's public roadways Richard Boadu Antwi et.al. 2406.08822 null
2024-06-13 BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection Wenjie Wang et.al. 2406.08785 link
2024-06-12 UnO: Unsupervised Occupancy Fields for Perception and Forecasting Ben Agro et.al. 2406.08691 null
2024-06-12 Transformation-Dependent Adversarial Attacks Yaoteng Tan et.al. 2406.08443 null
2024-06-12 Dataset Enhancement with Instance-Level Augmentations Orest Kupyn et.al. 2406.08249 link
2024-06-12 Chemistry3D: Robotic Interaction Benchmark for Chemistry Experiments Shoujie Li et.al. 2406.08160 link
2024-06-12 CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer Hualian Sheng et.al. 2406.08152 null
2024-06-12 MWIRSTD: A MWIR Small Target Detection Dataset Nikhil Kumar et.al. 2406.08063 link
2024-06-12 Sense Less, Generate More: Pre-training LiDAR Perception with Masked Autoencoders for Ultra-Efficient 3D Sensing Sina Tayebati et.al. 2406.07833 link
2024-06-13 A Deep Learning Approach to Detect Complete Safety Equipment For Construction Workers Based On YOLOv7 Md. Shariful Islam et.al. 2406.07707 null
2024-06-11 Transforming a rare event search into a not-so-rare event search in real-time with deep learning-based object detection J. Schueler et.al. 2406.07538 link
2024-06-11 Understanding Visual Concepts Across Models Brandon Trabucco et.al. 2406.07506 link
2024-06-11 Minimizing Energy Costs in Deep Learning Model Training: The Gaussian Sampling Approach Challapalli Phanindra Revanth et.al. 2406.07332 null
2024-06-11 Unsupervised Object Detection with Theoretical Guarantees Marian Longa et.al. 2406.07284 null
2024-06-11 Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation Jinyuan Li et.al. 2406.07268 link
2024-06-11 EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network Yining Shi et.al. 2406.07042 link
2024-06-11 RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks Zhechao Wang et.al. 2406.07032 null
2024-06-12 LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection Jiahua Xu et.al. 2406.07023 null
2024-06-11 Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection Junfei Yi et.al. 2406.06999 null
2024-06-10 UnSupDLA: Towards Unsupervised Document Layout Analysis Talha Uddin Sheikh et.al. 2406.06236 null
2024-06-10 UEMM-Air: A Synthetic Multi-modal Dataset for Unmanned Aerial Vehicle Object Detection Fan Liu et.al. 2406.06230 link
2024-06-10 ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery Xian Sun et.al. 2406.06028 null
2024-06-10 Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024 Jinwoo Ahn et.al. 2406.05963 null
2024-06-10 Open-Vocabulary Part-Based Grasping Tjeard van Oort et.al. 2406.05951 null
2024-06-09 Stealthy Targeted Backdoor Attacks against Image Captioning Wenshu Fan et.al. 2406.05874 link
2024-06-09 Scaling Graph Convolutions for Mobile Vision William Avery et.al. 2406.05850 link
2024-06-09 Mamba YOLO: SSMs-Based YOLO For Object Detection Zeyu Wang et.al. 2406.05835 link
2024-06-09 ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving Chen Ma et.al. 2406.05810 null
2024-06-09 SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention Muhammad Nawfal Meeran et.al. 2406.05802 link
2024-06-07 Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment Venkanna Babu Guthula et.al. 2406.04949 null
2024-06-07 EGOR: Efficient Generated Objects Replay for incremental object detection Zijia An et.al. 2406.04829 null
2024-06-07 UCDNet: Multi-UAV Collaborative 3D Object Detection Network by Reliable Feature Mapping Pengju Tian et.al. 2406.04648 null
2024-06-07 UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection Yuchao Wang et.al. 2406.04647 null
2024-06-06 CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset Abdelrahman Abdallah et.al. 2406.04493 link
2024-06-06 DeTra: A Unified Model for Object Detection and Trajectory Forecasting Sergio Casas et.al. 2406.04426 null
2024-06-06 Parameter-Inverted Image Pyramid Networks Xizhou Zhu et.al. 2406.04330 link
2024-06-06 Semmeldetector: Application of Machine Learning in Commercial Bakeries Thomas H. Schmitt et.al. 2406.04050 null
2024-06-06 Frequency-based Matcher for Long-tailed Semantic Segmentation Shan Li et.al. 2406.03917 link
2024-06-06 Instance Segmentation and Teeth Classification in Panoramic X-rays Devichand Budagam et.al. 2406.03747 link
2024-06-05 FedPylot: Navigating Federated Learning for Real-Time Object Detection in Internet of Vehicles Cyprien Quéméneur et.al. 2406.03611 link
2024-06-05 LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection Qiang Chen et.al. 2406.03459 link
2024-06-05 Global Clipper: Enhancing Safety and Reliability of Transformer-based Object Detection Models Qutub Syed Sha et.al. 2406.03229 null
2024-06-05 Situation Monitor: Diversity-Driven Zero-Shot Out-of-Distribution Detection using Budding Ensemble Architecture for Object Detection Qutub Syed et.al. 2406.03188 null
2024-06-05 Enhanced Automotive Object Detection via RGB-D Fusion in a DiffusionDet Framework Eliraz Orfaig et.al. 2406.03129 null
2024-06-04 Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation Mohamed El Amine Boudjoghra et.al. 2406.02548 link
2024-06-04 SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition Van Minh Nguyen et.al. 2406.02533 null
2024-06-04 GrootVL: Tree Topology is All You Need in State Space Model Yicheng Xiao et.al. 2406.02395 link
2024-06-04 Low-Rank Adaption on Transformer-based Oriented Object Detector for Satellite Onboard Processing of Remote Sensing Images Xinyang Pu et.al. 2406.02385 link
2024-06-04 Radar Spectra-Language Model for Automotive Scene Parsing Mariia Pushkareva et.al. 2406.02158 null
2024-06-04 Detecting Endangered Marine Species in Autonomous Underwater Vehicle Imagery Using Point Annotations and Few-Shot Learning Heather Doig et.al. 2406.01932 null
2024-06-04 GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer Ding Jia et.al. 2406.01210 link
2024-06-03 Learning Adaptive Fusion Bank for Multi-modal Salient Object Detection Kunpeng Wang et.al. 2406.01127 link
2024-06-03 Visual Car Brand Classification by Implementing a Synthetic Image Dataset Creation Pipeline Jan Lippemeier et.al. 2406.01071 null
2024-06-03 Multi-Object Tracking based on Imaging Radar 3D Object Detection Patrick Palmer et.al. 2406.01011 null
2024-05-31 Power of Cooperative Supervision: Multiple Teachers Framework for Enhanced 3D Semi-Supervised Object Detection Jin-Hee Lee et.al. 2405.20720 link
2024-05-30 On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines Selim Kuzucu et.al. 2405.20459 link
2024-05-30 RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection Fangyi Chen et.al. 2405.19854 link
2024-05-30 Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline Methodology Frank A. Ruis et.al. 2405.19822 null
2024-05-30 Fully Test-Time Adaptation for Monocular 3D Object Detection Hongbin Lin et.al. 2405.19682 null
2024-05-30 YotoR-You Only Transform One Representation José Ignacio Díaz Villa et.al. 2405.19629 null
2024-05-29 Enabling Visual Recognition at Radio Frequency Haowen Lai et.al. 2405.19516 null
2024-05-29 Model Agnostic Defense against Adversarial Patch Attacks on Object Detection in Unmanned Aerial Vehicles Saurabh Pathak et.al. 2405.19179 null
2024-05-29 RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision Jinzhong Wang et.al. 2405.18955 null
2024-05-29 SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving Yiming Cui et.al. 2405.18857 null
2024-05-29 PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram Sifan Zhou et.al. 2405.18734 null
2024-05-28 A Review and Implementation of Object Detection Models and Optimizations for Real-time Medical Mask Detection during the COVID-19 Pandemic Ioanna Gogou et.al. 2405.18387 link
2024-05-28 Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? Yifan Bai et.al. 2405.18361 null
2024-05-28 Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention Weitai Kang et.al. 2405.18295 null
2024-05-28 DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture Shentong Mo et.al. 2405.17995 link
2024-05-28 Self-supervised Pre-training for Transferable Multi-modal Perception Xiaohao Xu et.al. 2405.17942 null
2024-05-28 Boosting General Trimap-free Matting in the Real-World Image Leo Shan Wenzhang Zhou Grace Zhao et.al. 2405.17916 null
2024-05-28 The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention Xingyu Ding et.al. 2405.17776 null
2024-05-27 Understanding differences in applying DETR to natural and medical images Yanqi Xu et.al. 2405.17677 null
2024-05-27 Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection Shuai Zeng et.al. 2405.17422 link
2024-05-27 Tracking Small Birds by Detection Candidate Region Filtering and Detection History-aware Association Tingwei Liu et.al. 2405.17323 null
2024-05-27 Enhanced Automotive Radar Collaborative Sensing By Exploiting Constructive Interference Lifan Xu et.al. 2405.17297 null
2024-05-27 SCaRL- A Synthetic Multi-Modal Dataset for Autonomous Driving Avinash Nittur Ramesh et.al. 2405.17030 null
2024-05-27 Collective Perception Datasets for Autonomous Driving: A Comprehensive Review Sven Teufel et.al. 2405.16973 null
2024-05-27 OED: Towards One-stage End-to-End Dynamic Scene Graph Generation Guan Wang et.al. 2405.16925 link
2024-05-27 ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection Ziying Song et.al. 2405.16873 null
2024-05-27 A re-calibration method for object detection with multi-modal alignment bias in autonomous driving Zhihang Song et.al. 2405.16848 null
2024-05-26 A Study on Unsupervised Anomaly Detection and Defect Localization using Generative Model in Ultrasonic Non-Destructive Testing Yusaku Ando et.al. 2405.16580 null
2024-05-25 GreenCOD: A Green Camouflaged Object Detection Method Hong-Shuo Chen et.al. 2405.16144 null
2024-05-24 UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes Ted Lentsch et.al. 2405.15688 link
2024-05-24 Multimodal Object Detection via Probabilistic a priori Information Integration Hafsa El Hafyani et.al. 2405.15596 link
2024-05-24 Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection Fan Liu et.al. 2405.15465 null
2024-05-24 Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets Hoàng-Ân Lê et.al. 2405.15394 link
2024-05-24 Towards Global Optimal Visual In-Context Learning Prompt Selection Chengming Xu et.al. 2405.15279 null
2024-05-24 Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection Yajing Liu et.al. 2405.15225 null
2024-05-24 ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models Jingyuan Zhu et.al. 2405.15199 null
2024-05-24 MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method Pan Liao et.al. 2405.15176 null
2024-05-23 Learning to Detect and Segment Mobile Objects from Unlabeled Videos Yihong Sun et.al. 2405.14841 link
2024-05-23 Designing A Sustainable Marine Debris Clean-up Framework without Human Labels Raymond Wang et.al. 2405.14815 link
2024-05-23 Drones Help Drones: A Collaborative Framework for Multi-Drone Object Trajectory Prediction and Beyond Zhechao Wang et.al. 2405.14674 link
2024-05-23 Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment Muhammad Sohail Danish et.al. 2405.14497 link
2024-05-23 YOLOv10: Real-Time End-to-End Object Detection Ao Wang et.al. 2405.14458 link
2024-05-23 Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations Mohammed Baharoon et.al. 2405.14239 link
2024-05-22 Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation Mykhailo Uss et.al. 2405.14024 null
2024-05-22 TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System Diogo Lavado et.al. 2405.13989 null
2024-05-22 Class-Conditional self-reward mechanism for improved Text-to-Image models Safouane El Ghazouali et.al. 2405.13473 link
2024-05-22 Adaptive Wireless Image Semantic Transmission and Over-The-Air Testing Jiarun Ding et.al. 2405.13403 null
2024-05-21 BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once Theodore Zhao et.al. 2405.12971 null
2024-05-21 FFAM: Feature Factorization Activation Map for Explanation of 3D Detectors Shuai Liu et.al. 2405.12601 link
2024-05-21 Active Object Detection with Knowledge Aggregation and Distillation from Large Models Dejie Yang et.al. 2405.12509 link
2024-05-21 Mutual Information Analysis in Multimodal Learning Systems Hadi Hadizadeh et.al. 2405.12456 null
2024-05-20 Multi-View Attentive Contextualization for Multi-View 3D Object Detection Xianpeng Liu et.al. 2405.12200 null
2024-05-20 Bangladeshi Native Vehicle Detection in Wild Bipin Saha et.al. 2405.12150 link
2024-05-20 Salience-guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments Jooyong Park et.al. 2405.11855 null
2024-05-20 DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment Jianhong Han et.al. 2405.11765 link
2024-05-20 Versatile Teacher: A Class-aware Teacher-student Framework for Cross-domain Adaptation Runou Yang et.al. 2405.11754 link
2024-05-19 FADet: A Multi-sensor 3D Object Detection Network based on Local Featured Attention Ziang Guo et.al. 2405.11682 link
2024-05-19 SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization Jialong Guo et.al. 2405.11582 link
2024-05-18 InfRS: Incremental Few-Shot Object Detection in Remote Sensing Images Wuzhou Li et.al. 2405.11293 null
2024-05-18 Visible and Clear: Finding Tiny Objects in Difference Map Bing Cao et.al. 2405.11276 null
2024-05-17 A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision Model Mingxiang Fu et.al. 2405.10890 null
2024-05-17 DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection Zhe Huang et.al. 2405.10577 null
2024-05-16 Drone-type-Set: Drone types detection benchmark for drone detection and tracking Kholoud AlDosari et.al. 2405.10398 null
2024-05-16 Grounded 3D-LLM with Referent Tokens Yilun Chen et.al. 2405.10370 link
2024-05-16 Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Tianhe Ren et.al. 2405.10300 link
2024-05-16 Towards Task-Compatible Compressible Representations Anderson de Andrade et.al. 2405.10244 link
2024-05-16 SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network Zhaoxu Li et.al. 2405.10148 link
2024-05-16 SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection Mingxuan Liu et.al. 2405.10053 link
2024-05-19 FPDIoU Loss: A Loss Function for Efficient Bounding Box Regression of Rotated Object Detection Siliang Ma et.al. 2405.09942 null
2024-05-19 PillarNeXt: Improving the 3D detector by introducing Voxel2Pillar feature encoding and extracting multi-scale features Xusheng Li et.al. 2405.09828 null
2024-05-16 Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection Feiran Li et.al. 2405.09782 link
2024-05-15 Synth-to-Real Unsupervised Domain Adaptation for Instance Segmentation Guo Yachan et.al. 2405.09682 null
2024-05-15 Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy Labels Guozhang Liu et.al. 2405.09024 null
2024-05-14 CLIP with Quality Captions: A Strong Pretraining for Vision Tasks Pavan Kumar Anasosalu Vasu et.al. 2405.08911 null
2024-05-14 Open-Vocabulary Object Detection via Neighboring Region Attention Alignment Sunyuan Qiang et.al. 2405.08593 null
2024-05-14 RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images Zong-Wei Hong et.al. 2405.08483 link
2024-05-14 Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale Events Xin Wu et.al. 2405.08251 link
2024-05-13 oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving Abdul Hannan Khan et.al. 2405.07698 null
2024-05-13 MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders Xueying Jiang et.al. 2405.07696 null
2024-05-13 Quality-aware Selective Fusion Network for V-D-T Salient Object Detection Liuxin Bao et.al. 2405.07655 link
2024-05-13 Fast Training Data Acquisition for Object Detection and Segmentation using Black Screen Luminance Keying Thomas Pöllabauer et.al. 2405.07653 null
2024-05-13 Integrity Monitoring of 3D Object Detection in Automated Driving Systems using Raw Activation Patterns and Spatial Filtering Hakan Yekta Yatbaz et.al. 2405.07600 null
2024-05-13 Environmental Matching Attack Against Unmanned Aerial Vehicles Object Detection Dehong Kong et.al. 2405.07595 null
2024-05-13 Enhancing 3D Object Detection by Using Neural Network with Self-adaptive Thresholding Houze Liu et.al. 2405.07479 null
2024-05-12 Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception Haoming Chen et.al. 2405.07201 link
2024-05-12 Differentiable Model Scaling using Differentiable Topk Kai Liu et.al. 2405.07194 link
2024-05-12 Resource Efficient Perception for Vision Systems A V Subramanyam et.al. 2405.07166 link
2024-05-10 How to Augment for Atmospheric Turbulence Effects on Thermal Adapted Object Detection Models? Engin Uzun et.al. 2405.06383 null
2024-05-10 Precise Apple Detection and Localization in Orchards using YOLOv5 for Robotic Harvesting Systems Jiang Ziyue et.al. 2405.06260 null
2024-05-13 CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks Nick Nikzad et.al. 2405.05755 null
2024-05-09 Depth Awakens: A Depth-perceptual Attention Fusion Network for RGB-D Camouflaged Object Detection Xinran Liua et.al. 2405.05614 null
2024-05-09 The object detection model uses combined extraction with KNN and RF classification Florentina Tatrin Kurniati et.al. 2405.05551 null
2024-05-08 Reviewing Intelligent Cinematography: AI research for camera-based video production Adrian Azzarelli et.al. 2405.05039 null
2024-05-07 A Novel Wide-Area Multiobject Detection System with High-Probability Region Searching Xianlei Long et.al. 2405.04589 null
2024-05-07 DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving Chen Min et.al. 2405.04390 null
2024-05-07 A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields Raiyan Rahman et.al. 2405.04305 null
2024-05-07 ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers Jinke Li et.al. 2405.04299 link
2024-05-07 Deep Event-based Object Detection in Autonomous Driving: A Survey Bingquan Zhou et.al. 2405.03995 null
2024-05-06 BadFusion: 2D-Oriented Backdoor Attacks against 3D Object Detection Saket S. Chaturvedi et.al. 2405.03884 null
2024-05-06 RepVGG-GELAN: Enhanced GELAN with VGG-STYLE ConvNets for Brain Tumour Detection Thennarasi Balakrishnan et.al. 2405.03541 link
2024-05-06 Low-light Object Detection Pengpeng Li et.al. 2405.03519 null
2024-05-09 Salient Object Detection From Arbitrary Modalities Nianchang Huang et.al. 2405.03352 link
2024-05-06 Modality Prompts for Arbitrary Modality Salient Object Detection Nianchang Huang et.al. 2405.03351 null
2024-05-06 PTQ4SAM: Post-Training Quantization for Segment Anything Chengtao Lv et.al. 2405.03144 link
2024-05-05 Performance Evaluation of Real-Time Object Detection for Electric Scooters Dong Chen et.al. 2405.03039 link
2024-05-05 SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection Kassaw Abraham Mulat et.al. 2405.02906 null
2024-05-07 Adaptive Guidance Learning for Camouflaged Object Detection Zhennan Chen et.al. 2405.02824 null
2024-05-05 PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection Zhaoqi Leng et.al. 2405.02811 null
2024-05-05 Fused attention mechanism-based ore sorting network Junjiang Zhen et.al. 2405.02785 null
2024-05-02 Segmentation-Free Outcome Prediction in Head and Neck Cancer: Deep Learning-based Feature Extraction from Multi-Angle Maximum Intensity Projections (MA-MIPs) of PET Images Amirhosein Toosi et.al. 2405.01756 null
2024-05-02 PointCompress3D -- A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems Walter Zimmer et.al. 2405.01750 null
2024-05-02 Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey Guoping Xu et.al. 2405.01725 link
2024-05-06 SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients Tushar Verma et.al. 2405.01699 null
2024-05-02 Overcoming LLM Challenges using RAG-Driven Precision in Coffee Leaf Disease Remediation Dr. Selva Kumar S et.al. 2405.01310 null
2024-05-02 Towards Consistent Object Detection via LiDAR-Camera Synergy Kai Luo et.al. 2405.01258 link
2024-05-02 Federated Learning with Heterogeneous Data Handling for Robust Vehicular Object Detection Ahmad Khalil et.al. 2405.01108 link
2024-05-01 Object detection under the linear subspace model with application to cryo-EM images Amitay Eldar et.al. 2405.00364 link
2024-04-30 Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Yunhao Ge et.al. 2404.19752 null
2024-04-30 Quantifying Nematodes through Images: Datasets, Models, and Baselines of Deep Learning Zhipeng Yuan et.al. 2404.19748 null
2024-04-30 Masked Multi-Query Slot Attention for Unsupervised Object Discovery Rishav Pramanik et.al. 2404.19654 link
2024-04-30 Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World Wen Yin et.al. 2404.19417 null
2024-04-30 UniFS: Universal Few-shot Instance Perception with Point Representations Sheng Jin et.al. 2404.19401 link
2024-04-30 Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection Zhanwei Zhang et.al. 2404.19384 null
2024-04-29 MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection Heitor R. Medeiros et.al. 2404.18849 link
2024-04-29 Leveraging PointNet and PointNet++ for Lyft Point Cloud Classification Challenge Rajat K. Doshi et.al. 2404.18665 null
2024-04-29 CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception Yunshuang Yuan et.al. 2404.18617 link
2024-04-29 Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing Images Wenbin Guan et.al. 2404.18426 null
2024-04-29 Multi-modal Perception Dataset of In-water Objects for Autonomous Surface Vehicles Mingi Jeong et.al. 2404.18411 null
2024-04-28 FAD-SAR: A Novel Fishing Activity Detection System via Synthetic Aperture Radar Images Based on Deep Learning Method Yanbing Bai et.al. 2404.18245 null
2024-04-28 RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation Oded Bialer et.al. 2404.18150 null
2024-04-27 Reliable Student: Addressing Noise in Semi-Supervised 3D Object Detection Farzad Nozarian et.al. 2404.17910 link
2024-04-27 A Hybrid Approach for Document Layout Analysis in Document images Tahira Shehzadi et.al. 2404.17888 null
2024-04-27 BoostRad: Enhancing Object Detection by Boosting Radar Reflections Yuval Haitman et.al. 2404.17861 null
2024-04-26 Inhomogeneous illuminated image enhancement under extremely low visibility condition Libang Chen et.al. 2404.17503 null
2024-04-26 Cost-Sensitive Uncertainty-Based Failure Recognition for Object Detection Moussa Kassem Sbeyti et.al. 2404.17427 link
2024-04-26 Enhancing mmWave Radar Point Cloud via Visual-inertial Supervision Cong Fan et.al. 2404.17229 link
2024-04-25 Generating Minimalist Adversarial Perturbations to Test Object-Detection Models: An Adaptive Multi-Metric Evolutionary Search Approach Cristopher McIntyre-Garcia et.al. 2404.17020 link
2024-04-25 Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection Mehmet Kerem Turkcan et.al. 2404.16944 link
2024-04-25 Self-Balanced R-CNN for Instance Segmentation Leonardo Rossi et.al. 2404.16633 link
2024-04-25 Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System Daniel Dworak et.al. 2404.16548 null
2024-04-25 Commonsense Prototype for Outdoor Unsupervised 3D Object Detection Hai Wu et.al. 2404.16493 link
2024-04-25 IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks Zitong Huang et.al. 2404.16331 null
2024-04-25 CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions Haoyuan Li et.al. 2404.16302 link
2024-04-24 AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models Zhiqiang Tang et.al. 2404.16233 null
2024-04-24 Observational parameters of Blue Large-Amplitude Pulsators P. Pietrukowicz et.al. 2404.16089 null
2024-04-26 A Survey on Visual Mamba Hanwei Zhang et.al. 2404.15956 null
2024-04-24 Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks Erh-Chung Chen et.al. 2404.15881 null
2024-04-24 Revisiting Out-of-Distribution Detection in LiDAR-based 3D Object Detection Michael Kösel et.al. 2404.15879 link
2024-04-23 CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection Hongyi Cai et.al. 2404.15451 null
2024-04-23 Source-free Domain Adaptation for Video Object Detection Under Adverse Image Conditions Xingguang Zhang et.al. 2404.15252 null
2024-04-23 Efficient Transformer Encoders for Mask2Former-style models Manyi Yao et.al. 2404.15244 null
2024-04-23 Gallbladder Cancer Detection in Ultrasound Images based on YOLO and Faster R-CNN Sara Dadjouy et.al. 2404.15129 null
2024-04-23 External Prompt Features Enhanced Parameter-efficient Fine-tuning for Salient Object Detection Wen Liang et.al. 2404.15008 null
2024-04-23 ContextualFusion: Context-Based Multi-Sensor Fusion for 3D Object Detection in Adverse Operating Conditions Shounak Sural et.al. 2404.14780 null
2024-04-23 Unified Unsupervised Salient Object Detection via Knowledge Transfer Yao Yuan et.al. 2404.14759 link
2024-04-22 CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective Wencheng Zhu et.al. 2404.14109 null
2024-04-22 Benchmarking Multi-Modal LLMs for Testing Visual Deep Learning Systems Through the Lens of Image Mutation Liwen Wang et.al. 2404.13945 null
2024-04-22 NeRF-DetS: Enhancing Multi-View 3D Object Detection with Sampling-adaptive Network of Continuous NeRF-based Representation Chi Huang et.al. 2404.13921 null
2024-04-22 TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos Atom Scott et.al. 2404.13868 null
2024-04-22 Toward Robust LiDAR based 3D Object Detection via Density-Aware Adaptive Thresholding Eunho Lee et.al. 2404.13852 null
2024-04-21 A Nasal Cytology Dataset for Object Detection and Deep Learning Mauro Camporeale et.al. 2404.13745 null
2024-04-23 Clio: Real-time Task-Driven Open-Set 3D Scene Graphs Dominic Maggio et.al. 2404.13696 link
2024-04-20 FisheyeDetNet: Object Detection on Fisheye Surround View Camera Systems for Automated Driving Ganesh Sistu et.al. 2404.13443 null
2024-04-20 Efficient and Concise Explanations for Object Detection with Gaussian-Class Activation Mapping Explainer Quoc Khanh Nguyen et.al. 2404.13417 link
2024-04-19 A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics David Rapado-Rincon et.al. 2404.12963 null
2024-04-19 Language-Driven Active Learning for Diverse Open-Set 3D Object Detection Ross Greer et.al. 2404.12856 link
2024-04-19 ECOR: Explainable CLIP for Object Recognition Ali Rasekh et.al. 2404.12839 null
2024-04-19 A Point-Based Approach to Efficient LiDAR Multi-Task Perception Christopher Lang et.al. 2404.12798 null
2024-04-19 ELEV-VISION-SAM: Integrated Vision Language and Foundation Model for Automated Estimation of Building Lowest Floor Elevation Yu-Hsuan Ho et.al. 2404.12606 null
2024-04-18 The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models Cheng Shi et.al. 2404.11957 link
2024-04-18 Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition Xunsong Li et.al. 2404.11903 null
2024-04-17 TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation Thomas Monninger et.al. 2404.11803 null
2024-04-17 Multimodal 3D Object Detection on Unseen Domains Deepti Hegde et.al. 2404.11764 null
2024-04-17 Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection Deepti Hegde et.al. 2404.11737 null
2024-04-17 Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems Luca Bompani et.al. 2404.11488 link
2024-04-17 EcoMLS: A Self-Adaptation Approach for Architecting Green ML-Enabled Systems Meghana Tedla et.al. 2404.11411 null
2024-04-17 Detector Collapse: Backdooring Object Detection to Catastrophic Overload or Blindness Hangtao Zhang et.al. 2404.11357 null
2024-04-17 Simple In-place Data Augmentation for Surveillance Object Detection Munkh-Erdene Otgonbold et.al. 2404.11226 null
2024-04-19 Feature Corrective Transfer Learning: End-to-End Solutions to Object Detection in Non-Ideal Visual Conditions Chuheng Wei et.al. 2404.11214 null
2024-04-17 GhostNetV3: Exploring the Training Strategies for Compact Models Zhenhua Liu et.al. 2404.11202 null
2024-04-17 How to deal with glare for improved perception of Autonomous Vehicles Muhammad Z. Alam et.al. 2404.10992 null
2024-04-17 Leveraging 3D LiDAR Sensors to Enable Enhanced Urban Safety and Public Health: Pedestrian Monitoring and Abnormal Activity Detection Nawfal Guefrachi et.al. 2404.10978 null
2024-04-16 OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery Matthew Inkawhich et.al. 2404.10865 null
2024-04-16 Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark Jiangning Zhang et.al. 2404.10760 link
2024-04-16 Watch Your Step: Optimal Retrieval for Continual Learning at Scale Truman Hickok et.al. 2404.10758 null
2024-04-16 Camera clustering for scalable stream-based active distillation Dani Manjah et.al. 2404.10411 null
2024-04-15 Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets Dai Quoc Tran et.al. 2404.10078 link
2024-04-15 Explainable Light-Weight Deep Learning Pipeline for Improved Drought Stres Aswini Kumar Patra et.al. 2404.10073 null
2024-04-15 VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection Bonan Ding et.al. 2404.09431 null
2024-04-14 TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model Wiktor Mucha et.al. 2404.09254 null
2024-04-14 DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection Lewei Yao et.al. 2404.09216 null
2024-04-14 Coreset Selection for Object Detection Hojun Lee et.al. 2404.09161 null
2024-04-14 Fusion-Mamba for Cross-modality Object Detection Wenhao Dong et.al. 2404.09146 null
2024-04-13 The Snake's Beating Heart? A Millisecond Pulsar Binary in the Galactic Center Radio Filament G359.1 $-$ 0.2 Marcus E. Lower et.al. 2404.09098 null
2024-04-13 BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection Jian Zhang et.al. 2404.08979 null
2024-04-13 Shifting Spotlight for Co-supervision: A Simple yet Efficient Single-branch Network to See Through Camouflage Yang Hu et.al. 2404.08936 null
2024-04-12 Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation Yanhao Zheng et.al. 2404.08603 link
2024-04-12 FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation Riza Velioglu et.al. 2404.08582 link
2024-04-12 Analyzing Decades-Long Environmental Changes in Namibia Using Archival Aerial Photography and Deep Learning Girmaw Abebe Tadesse et.al. 2404.08544 null
2024-04-12 MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion Zhe Li et.al. 2404.08406 link
2024-04-12 Overcoming Scene Context Constraints for Object Detection in wild using Defilters Vamshi Krishna Kancharla et.al. 2404.08293 null
2024-04-11 ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model Lifan Jiang et.al. 2404.07773 link
2024-04-11 Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification Ricardo Pereira et.al. 2404.07739 null
2024-04-11 Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns Hakan Yekta Yatbaz et.al. 2404.07685 null
2024-04-11 Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes Poulami Sinhamahapatra et.al. 2404.07664 null
2024-04-11 Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method Tashmoy Ghosh et.al. 2404.07649 null
2024-04-11 GLID: Pre-training a Generalist Encoder-Decoder Vision Model Jihao Liu et.al. 2404.07603 null
2024-04-11 SFSORT: Scene Features-based Simple Online Real-Time Tracker M. M. Morsali et.al. 2404.07553 link
2024-04-11 The Sydney Radio Star Catalogue: properties of radio stars at megahertz to gigahertz frequencies Laura N. Driessen et.al. 2404.07418 null
2024-04-11 Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing Jaemin Kang et.al. 2404.07405 null
2024-04-11 A fine-tuning workflow for automatic first-break picking with deep learning Amir Mardan et.al. 2404.07400 link
2024-04-10 Identification of Fine-grained Systematic Errors via Controlled Scene Generation Valentyn Boreiko et.al. 2404.07045 null
2024-04-10 Accurate Tennis Court Line Detection on Amateur Recorded Matches Sameer Agrawal et.al. 2404.06977 null
2024-04-10 Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data Aakash Kumar et.al. 2404.06715 null
2024-04-10 Scaling Multi-Camera 3D Object Detection through Weak-to-Strong Eliciting Hao Lu et.al. 2404.06700 link
2024-04-09 Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping Anas Gouda et.al. 2404.06277 link
2024-04-09 Label-Efficient 3D Object Detection For Road-Side Units Minh-Quan Dao et.al. 2404.06256 null
2024-04-09 Automatic Defect Detection in Sewer Network Using Deep Learning Based Object Detector Bach Ha et.al. 2404.06219 null
2024-04-09 YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images Chenguang Liu et.al. 2404.06180 link
2024-04-09 Enhanced Radar Perception via Multi-Task Learning: Towards Refined Data for Sensor Fusion Applications Huawei Sun et.al. 2404.06165 null
2024-04-09 Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation Zong-Wei Hong et.al. 2404.06029 null
2024-04-08 Retrieval-Augmented Open-Vocabulary Object Detection Jooyeon Kim et.al. 2404.05687 link
2024-04-08 3D-COCO: extension of MS-COCO dataset for image detection and 3D reconstruction modules Maxence Bideaux et.al. 2404.05641 null
2024-04-08 Detecting Every Object from Events Haitian Zhang et.al. 2404.05285 link
2024-04-08 MOSE: Boosting Vision-based Roadside 3D Object Detection with Scene Cues Xiahan Chen et.al. 2404.05280 null
2024-04-08 Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes Yu Sheng et.al. 2404.05164 null
2024-04-08 Better Monocular 3D Detectors with LiDAR from the Past Yurong You et.al. 2404.05139 link
2024-04-07 AirShot: Efficient Few-Shot Detection for Autonomous Exploration Zihan Wang et.al. 2404.05069 link
2024-04-07 Hyperbolic Learning with Synthetic Captions for Open-World Detection Fanjie Kong et.al. 2404.05016 null
2024-04-07 MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection Hou-I Liu et.al. 2404.04910 null
2024-04-07 Few-Shot Object Detection: Research Advances and Challenges Zhimeng Xin et.al. 2404.04799 null
2024-04-05 SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers Weile Li et.al. 2404.04179 link
2024-04-05 Designing Robots to Help Women Martin Cooney et.al. 2404.04123 null
2024-04-04 Is CLIP the main roadblock for fine-grained open-world perception? Lorenzo Bianchi et.al. 2404.03539 link
2024-04-04 DQ-DETR: DETR with Dynamic Query for Tiny Object Detection Yi-Xin Huang et.al. 2404.03507 null
2024-04-05 A Methodology to Study the Impact of Spiking Neural Network Parameters considering Event-Based Automotive Data Iqra Bano et.al. 2404.03493 null
2024-04-04 MonoCD: Monocular 3D Object Detection with Complementary Depths Longfei Yan et.al. 2404.03181 link
2024-04-03 DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection Felix Fent et.al. 2404.03015 link
2024-04-03 ALOHa: A New Measure for Hallucination in Captioning Models Suzanne Petryk et.al. 2404.02904 null
2024-04-03 FlightScope: A Deep Comprehensive Assessment of Aircraft Detection Algorithms in Satellite Imagery Safouane El Ghazouali et.al. 2404.02877 link
2024-04-03 HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras Zhongyu Xia et.al. 2404.02517 link
2024-04-04 TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression Ho-Joong Kim et.al. 2404.02405 link
2024-04-05 EGTR: Extracting Graph from Transformer for Scene Graph Generation Jinbae Im et.al. 2404.02072 link
2024-04-03 Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection Jicheng Yuan et.al. 2404.01988 link
2024-04-02 Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA -- A Semi-Supervised Video Object Detection Method Jyun-An Lin et.al. 2404.01929 null
2024-04-02 Scene Adaptive Sparse Transformer for Event-based Object Detection Yansong Peng et.al. 2404.01882 link
2024-04-02 Semi-Supervised Domain Adaptation for Wildfire Detection JooYoung Jang et.al. 2404.01842 link
2024-04-02 Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection Tahira Shehzadi et.al. 2404.01819 null
2024-04-02 Analyzing the Single Event Upset Vulnerability of Binarized Neural Networks on SRAM FPGAs Ioanna Souvatzoglou et.al. 2404.01757 null
2024-04-02 Disentangled Pre-training for Human-Object Interaction Detection Zhuolong Li et.al. 2404.01725 link
2024-04-02 Task Integration Distillation for Object Detectors Hai Su et.al. 2404.01699 null
2024-04-02 Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss Jaeha Kim et.al. 2404.01692 link
2024-03-29 PLoc: A New Evaluation Criterion Based on Physical Location for Autonomous Driving Datasets Ruining Yang et.al. 2403.19893 null
2024-03-29 MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection Ali Behrouz et.al. 2403.19888 null
2024-03-28 DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs Donghyun Kim et.al. 2403.19588 link
2024-03-28 OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation Zhenyu Wang et.al. 2403.19580 link
2024-03-28 Sparse Generation: Making Pseudo Labels Sparse for weakly supervision with points Tian Ma et.al. 2403.19306 null
2024-03-28 CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection Mikhail Kennerley et.al. 2403.19278 link
2024-03-28 Algorithmic Ways of Seeing: Using Object Detection to Facilitate Art Exploration Louie Søs Meyer et.al. 2403.19174 null
2024-03-28 CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation Lingjun Zhao et.al. 2403.19104 null
2024-03-28 A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement Junjie Wen et.al. 2403.19079 null
2024-03-27 Illicit object detection in X-ray images using Vision Transformers Jorgen Cani et.al. 2403.19043 null
2024-03-27 Benchmarking Object Detectors with COCO: A New Path Forward Shweta Singh et.al. 2403.18819 link
2024-03-27 PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab Investigations Ehsan Latif et.al. 2403.18721 null
2024-03-27 CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection Jiayi Zhu et.al. 2403.18554 null
2024-03-27 BAM: Box Abstraction Monitors for Real-time OoD Detection in Object Detection Changshun Wu et.al. 2403.18373 null
2024-03-27 Ship in Sight: Diffusion Models for Ship-Image Super Resolution Luigi Sigillo et.al. 2403.18370 link
2024-03-27 DODA: Diffusion for Object-detection Domain Adaptation in Agriculture Shuai Xiang et.al. 2403.18334 link
2024-03-27 Tracking-Assisted Object Detection with Event Cameras Ting-Kang Yen et.al. 2403.18330 link
2024-03-27 SGDM: Static-Guided Dynamic Module Make Stronger Visual Models Wenjie Xing et.al. 2403.18282 null
2024-03-27 Road Obstacle Detection based on Unknown Objectness Scores Chihiro Noguchi et.al. 2403.18207 null
2024-03-26 State of the art applications of deep learning within tracking and detecting marine debris: A survey Zoe Moorton et.al. 2403.18067 null
2024-03-26 The Solution for the CVPR 2023 1st foundation model challenge-Track2 Haonan Xu et.al. 2403.17702 null
2024-03-26 PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition Chenhongyi Yang et.al. 2403.17695 link
2024-03-26 UADA3D: Unsupervised Adversarial Domain Adaptation for 3D Object Detection with Sparse LiDAR and Large Domain Gaps Maciej K Wozniak et.al. 2403.17633 link
2024-03-26 SSF3D: Strict Semi-Supervised 3D Object Detection with Switching Filter Songbur Wong et.al. 2403.17390 null
2024-03-26 Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection Jiacheng Zhang et.al. 2403.17387 null
2024-03-26 AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving Mingfu Liang et.al. 2403.17373 null
2024-03-26 Staircase Localization for Autonomous Exploration in Urban Environments Jinrae Kim et.al. 2403.17330 null
2024-03-25 Co-Occurring of Object Detection and Identification towards unlabeled object discovery Binay Kumar Singh et.al. 2403.17223 null
2024-03-25 Optimizing LiDAR Placements for Robust Driving Perception in Adverse Conditions Ye Li et.al. 2403.17009 link
2024-03-25 Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance Jingyuan Zhu et.al. 2403.16954 null
2024-03-25 RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection Zhiwei Lin et.al. 2403.16440 link
2024-03-25 ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation Hannah Schieber et.al. 2403.16400 link
2024-03-25 Impact of Video Compression Artifacts on Fisheye Camera Visual Perception Tasks Madhumitha Sakthi et.al. 2403.16338 null
2024-03-24 Cross-domain Multi-modal Few-shot Object Detection via Rich Text Zeyu Shangguan et.al. 2403.16188 link
2024-03-24 Semantic Is Enough: Only Semantic Information For NeRF Reconstruction Ruibo Wang et.al. 2403.16043 null
2024-03-23 Adversarial Defense Teacher for Cross-Domain Object Detection under Poor Visibility Conditions Kaiwen Wang et.al. 2403.15786 null
2024-03-25 Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection Hongzhi Gao et.al. 2403.15317 null
2024-03-22 CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking Nicolas Baumann et.al. 2403.15313 link
2024-03-22 IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection Junbo Yin et.al. 2403.15241 link
2024-03-22 SFOD: Spiking Fusion Object Detector Yimeng Fan et.al. 2403.15192 link
2024-03-22 CRPlace: Camera-Radar Fusion with BEV Representation for Place Recognition Shaowei Fu et.al. 2403.15183 null
2024-03-22 An In-Depth Analysis of Data Reduction Methods for Sustainable Deep Learning Víctor Toscano-Durán et.al. 2403.15150 link
2024-03-22 Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection Jiaming Li et.al. 2403.15127 link
2024-03-22 VRSO: Visual-Centric Reconstruction for Static Object Annotation Chenyao Yu et.al. 2403.15026 link
2024-03-21 Deep Active Learning: A Reality Check Edrina Gashi et.al. 2403.14800 null
2024-03-21 Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering Bowen Jiang et.al. 2403.14783 link
2024-03-21 T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy Qing Jiang et.al. 2403.14610 link
2024-03-21 UAV-Assisted Maritime Search and Rescue: A Holistic Approach Martin Messmer et.al. 2403.14281 null
2024-03-21 Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection Tim Salzmann et.al. 2403.14270 null
2024-03-21 3D Object Detection from Point Cloud via Voting Step Diffusion Haoran Hou et.al. 2403.14133 null
2024-03-20 EcoSense: Energy-Efficient Intelligent Sensing for In-Shore Ship Detection through Edge-Cloud Collaboration Wenjun Huang et.al. 2403.14027 null
2024-03-20 RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition Ziyu Liu et.al. 2403.13805 link
2024-03-20 Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments Yang Yang et.al. 2403.13803 link
2024-03-20 Fostc3net:A Lightweight YOLOv5 Based On the Network Structure Optimization Danqing Ma et.al. 2403.13703 null
2024-03-20 Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments Djamahl Etchegaray et.al. 2403.13556 link
2024-03-20 MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining Di Wang et.al. 2403.13430 link
2024-03-20 Few-shot Oriented Object Detection with Memorable Contrastive Learning in Remote Sensing Images Jiawei Zhou et.al. 2403.13375 null
2024-03-20 DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception Yibo Wang et.al. 2403.13304 null
2024-03-19 SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model Armen Avetisyan et.al. 2403.13064 null
2024-03-19 TAPTR: Tracking Any Point with Transformers as Detection Hongyang Li et.al. 2403.13042 null
2024-03-19 Wildfire danger prediction optimization with transfer learning Spiros Maggioros et.al. 2403.12871 link
2024-03-19 As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks? Anjun Hu et.al. 2403.12693 null
2024-03-19 EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks Ziming Wang et.al. 2403.12574 null
2024-03-19 DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM Yixuan Wu et.al. 2403.12488 link
2024-03-19 TransformMix: Learning Transformation and Mixing Strategies from Data Tsz-Him Cheung et.al. 2403.12429 null
2024-03-19 VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation Hao Wang et.al. 2403.12415 link
2024-03-19 Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition Jielin Qiu et.al. 2403.12339 null
2024-03-18 EffiPerception: an Efficient Framework for Various Perception Tasks Xinhao Xiang et.al. 2403.12317 null
2024-03-18 Prototipo de un Contador Bidireccional Automático de Personas basado en sensores de visión 3D Benjamín Ojeda-Magaña et.al. 2403.12310 null
2024-03-18 Align and Distill: Unifying and Improving Domain Adaptive Object Detection Justin Kay et.al. 2403.12029 link
2024-03-18 TrajectoryNAS: A Neural Architecture Search for Trajectory Prediction Ali Asghar Sharifi et.al. 2403.11695 null
2024-03-18 Just Add $100 More: Augmenting NeRF-based Pseudo-LiDAR Point Cloud for Resolving Class-imbalance Problem Mincheol Chang et.al. 2403.11573 null
2024-03-18 R2SNet: Scalable Domain Adaptation for Object Detection in Cloud-Based Robots Ecosystems via Proposal Refinement Michele Antonazzi et.al. 2403.11567 null
2024-03-18 Continual Forgetting for Pre-trained Vision Models Hongbo Zhao et.al. 2403.11530 link
2024-03-17 V2X-DGW: Domain Generalization for Multi-agent Perception under Adverse Weather Conditions Baolu Li et.al. 2403.11371 null
2024-03-17 Advanced Knowledge Extraction of Physical Design Drawings, Translation and conversion to CAD formats using Deep Learning Jesher Joshua M et.al. 2403.11291 null
2024-03-17 ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models Siyuan Huang et.al. 2403.11289 link
2024-03-19 CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations Yuwei Zhang et.al. 2403.11220 link
2024-03-19 GRA: Detecting Oriented Objects through Group-wise Rotating and Attention Jiangshan Wang et.al. 2403.11127 null
2024-03-17 Self-supervised co-salient object detection via feature correspondence at multiple scales Souradeep Chakraborty et.al. 2403.11107 link
2024-03-15 SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras Yingqi Tang et.al. 2403.10353 link
2024-03-15 Generative Region-Language Pretraining for Open-Ended Object Detection Chuang Lin et.al. 2403.10191 link
2024-03-15 A Hybrid SNN-ANN Network for Event-based Object Detection with Spatial and Temporal Attention Soikat Hasan Ahmed et.al. 2403.10173 null
2024-03-15 CSDNet: Detect Salient Object in Depth-Thermal via A Lightweight Cross Shallow and Deep Perception Network Xiaotong Yu et.al. 2403.10104 null
2024-03-15 SparseFusion: Efficient Sparse Multi-Modal Fusion Framework for Long-Range 3D Perception Yiheng Li et.al. 2403.10036 null
2024-03-14 Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptive Object Detection Atif Belal et.al. 2403.09918 link
2024-03-14 Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization Zhao Wang et.al. 2403.09433 null
2024-03-14 D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection Dinh Phat Do et.al. 2403.09359 link
2024-03-14 Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring Yufei Zhan et.al. 2403.09333 link
2024-03-14 EfficientMFD: Towards More Efficient Multimodal Synchronous Fusion Detection Jiaqing Zhang et.al. 2403.09323 link
2024-03-14 Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection Martin Aubard et.al. 2403.09313 link
2024-03-14 MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion Arul Selvam Periyasamy et.al. 2403.09309 null
2024-03-14 CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification Yiming Ma et.al. 2403.09281 link
2024-03-14 D-YOLO a robust framework for object detection in adverse weather conditions Zihan Chu et.al. 2403.09233 null
2024-03-14 Improving Distant 3D Object Detection Using 2D Box Supervision Zetong Yang et.al. 2403.09230 null
2024-03-14 PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest Jiajun Deng et.al. 2403.09212 null
2024-03-13 MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning Jialv Zou et.al. 2403.08760 link
2024-03-13 PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections Matteo Taiana et.al. 2403.08586 null
2024-03-13 A Multimodal Fusion Network For Student Emotion Recognition Based on Transformer and Tensor Product Ao Xiang et.al. 2403.08511 null
2024-03-13 Improved YOLOv5 Based on Attention Mechanism and FasterNet for Foreign Object Detection on Railway and Airway tracks Zongqing Qi et.al. 2403.08499 null
2024-03-13 IAMCV Multi-Scenario Vehicle Interaction Dataset Novel Certad et.al. 2403.08455 null
2024-03-13 Advancing Security in AI Systems: A Novel Approach to Detecting Backdoors in Deep Neural Networks Khondoker Murad Hossain et.al. 2403.08208 null
2024-03-12 TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection Hanning Chen et.al. 2403.08108 null
2024-03-12 Aedes aegypti Egg Counting with Neural Networks for Object Detection Micheli Nayara de Oliveira Vicente et.al. 2403.08016 null
2024-03-12 Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference Changmin Jeon et.al. 2403.07598 null
2024-03-12 PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution Honghao Chen et.al. 2403.07589 null
2024-03-12 A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions Quoc-Vinh Lai-Dang et.al. 2403.07542 null
2024-03-12 JSTR: Joint Spatio-Temporal Reasoning for Event-based Moving Object Detection Hanyu Zhou et.al. 2403.07436 null
2024-03-12 Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D Object Detection Jiahui Fu et.al. 2403.07372 null
2024-03-12 SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection Hongcheng Zhang et.al. 2403.07284 null
2024-03-12 Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction Alexander Timans et.al. 2403.07263 link
2024-03-11 Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation Strategies Nieves Crasto et.al. 2403.07113 link
2024-03-11 LISO: Lidar-only Self-Supervised 3D Object Detection Stefan Baur et.al. 2403.07071 null
2024-03-11 Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head Tiancheng Zhao et.al. 2403.06892 link
2024-03-11 LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations Mohammad Alkhalefi et.al. 2403.06813 null
2024-03-11 Genetic Learning for Designing Sim-to-Real Data Augmentations Bram Vanherle et.al. 2403.06786 link
2024-03-11 Evaluating the Energy Efficiency of Few-Shot Learning for Object Detection in Industrial Settings Georgios Tsoumplekas et.al. 2403.06631 null
2024-03-11 Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers Alexander H. Berger et.al. 2403.06601 null
2024-03-11 SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection Yuxuan Li et.al. 2403.06534 link
2024-03-11 3D Semantic Segmentation-Driven Representations for 3D Object Detection Hayeon O et.al. 2403.06501 link
2024-03-11 Fine-Grained Pillar Feature Encoding Via Spatio-Temporal Virtual Grid for 3D Object Detection Konyul Park et.al. 2403.06433 link
2024-03-10 Transformer based Multitask Learning for Image Captioning and Object Detection Debolena Basak et.al. 2403.06292 null
2024-03-10 Poly Kernel Inception Network for Remote Sensing Detection Xinhao Cai et.al. 2403.06258 link
2024-03-08 SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection Yahao Lu et.al. 2403.05416 link
2024-03-08 Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery Xavier Bou et.al. 2403.05381 link
2024-03-08 Frequency-Adaptive Dilated Convolution for Semantic Segmentation Linwei Chen et.al. 2403.05369 link
2024-03-08 VLM-PL: Advanced Pseudo Labeling approach Class Incremental Object Detection with Vision-Language Model Junsu Kim et.al. 2403.05346 null
2024-03-08 Improving the Successful Robotic Grasp Detection Using Convolutional Neural Networks Hamed Hosseini et.al. 2403.05211 null
2024-03-08 LanePtrNet: Revisiting Lane Detection as Point Voting and Grouping on Curves Jiayan Cao et.al. 2403.05155 null
2024-03-08 RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features Geonho Bang et.al. 2403.05061 null
2024-03-08 ActFormer: Scalable Collaborative Perception via Active Queries Suozhi Huang et.al. 2403.04968 null
2024-03-07 FriendNet: Detection-Friendly Dehazing Network Yihua Fan et.al. 2403.04443 link
2024-03-07 Effectiveness Assessment of Recent Large Vision-Language Models Yao Jiang et.al. 2403.04306 null
2024-03-07 ACC-ViT : Atrous Convolution's Comeback in Vision Transformers Nabil Ibtehaz et.al. 2403.04200 null
2024-03-07 CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images Guanlin Shen et.al. 2403.04198 link
2024-03-07 Scalable and Robust Transformer Decoders for Interpretable Image Classification with Foundation Models Evelyn Mannix et.al. 2403.04125 null
2024-03-07 CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection Gyusam Chang et.al. 2403.03721 null
2024-03-06 Adversarial Infrared Geometry: Using Geometry to Perform Adversarial Attack against Infrared Pedestrian Detectors Kalibinuer Tiliwalidi et.al. 2403.03674 null
2024-03-06 Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator Wonhyeok Choi et.al. 2403.03468 null
2024-03-06 FLAME Diffuser: Grounded Wildfire Image Synthesis using Mask Guided Diffusion Hao Wang et.al. 2403.03463 null
2024-03-06 Performance Evaluation of Semi-supervised Learning Frameworks for Multi-Class Weed Detection Jiajia Li et.al. 2403.03390 link
2024-03-05 Detecting Concrete Visual Tokens for Multimodal Machine Translation Braeden Bowen et.al. 2403.03075 null
2024-03-05 Loss Design for Single-carrier Joint Communication and Neural Network-based Sensing Charlotte Muth et.al. 2403.02929 null
2024-03-05 Are Dense Labels Always Necessary for 3D Object Detection from Point Cloud? Chenqiang Gao et.al. 2403.02818 null
2024-03-05 Bootstrapping Rare Object Detection in High-Resolution Satellite Imagery Akram Zaytar et.al. 2403.02736 null
2024-03-05 FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View Jiawei Hou et.al. 2403.02710 null
2024-03-05 False Positive Sampling-based Data Augmentation for Enhanced 3D Object Detection Accuracy Jiyong Oh et.al. 2403.02639 null
2024-03-05 BSDP: Brain-inspired Streaming Dual-level Perturbations for Online Open World Object Detection Yu Chen et.al. 2403.02637 null
2024-03-04 NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function Abdullah Nazhat Abdullah et.al. 2403.02411 link
2024-03-04 COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems against Semantic Attacks Zijian Huang et.al. 2403.02329 null
2024-03-04 Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving Yuxuan Liu et.al. 2403.02037 link
2024-03-02 TUMTraf V2X Cooperative Perception Dataset Walter Zimmer et.al. 2403.01316 link
2024-03-02 Run-time Introspection of 2D Object Detection in Automated Driving Systems Using Learning Representations Hakan Yekta Yatbaz et.al. 2403.01172 null
2024-03-02 ELA: Efficient Local Attention for Deep Convolutional Neural Networks Wei Xu et.al. 2403.01123 null
2024-03-02 Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images Shufan Pei et.al. 2403.01083 null
2024-03-01 Learning Causal Features for Incremental Object Detection Zhenwei He et.al. 2403.00591 null
2024-03-01 Abductive Ego-View Accident Video Understanding for Safe Driving Perception Jianwu Fang et.al. 2403.00436 null
2024-03-04 DAMS-DETR: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion Junjie Guo et.al. 2403.00326 link
2024-03-01 YOLO-MED : Multi-Task Interaction Network for Biomedical Images Suizhi Huang et.al. 2403.00245 null
2024-02-29 FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything Safouane El Ghazouali et.al. 2403.00175 link
2024-02-29 LLMs in Political Science: Heralding a New Era of Visual Analysis Yu Wang et.al. 2403.00154 null
2024-02-29 SeMoLi: What Moves Together Belongs Together Jenny Seidenschwarz et.al. 2402.19463 null
2024-02-29 Genie: Smart ROS-based Caching for Connected Autonomous Robots Zexin Li et.al. 2402.19410 null
2024-02-29 ProtoP-OD: Explainable Object Detection with Prototypical Parts Pavlos Rath-Manakidis et.al. 2402.19142 null
2024-02-29 Theoretically Achieving Continuous Representation of Oriented Bounding Boxes Zikai Xiao et.al. 2402.18975 link
2024-02-29 Boosting Semi-Supervised Object Detection in Remote Sensing Images With Active Teaching Boxuan Zhang et.al. 2402.18958 null
2024-02-29 Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering Xiang Chen et.al. 2402.18927 null
2024-02-29 A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection Chao Hao et.al. 2402.18922 null
2024-02-29 Privacy-Preserving Autoencoder for Collaborative Object Detection Bardia Azizian et.al. 2402.18864 null
2024-02-29 Debiased Novel Category Discovering and Localization Juexiao Feng et.al. 2402.18821 null
2024-02-28 Spatial Coherence Loss for Salient and Camouflaged Object Detection and Beyond Ziyun Yang et.al. 2402.18698 null
2024-02-28 UniMODE: Unified Monocular 3D Object Detection Zhuoling Li et.al. 2402.18573 null
2024-02-28 Detection of Micromobility Vehicles in Urban Traffic Videos Khalil Sabri et.al. 2402.18503 link
2024-02-28 Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection Xun Huang et.al. 2402.18493 null
2024-02-28 Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization Deng Li et.al. 2402.18447 null
2024-02-28 Unveiling novel insights into Kirchhoff migration for effective object detection using experimental Fresnel dataset Won-Kwang Park et.al. 2402.18322 null
2024-02-28 Zero-Shot Aerial Object Detection with Visual Description Regularization Zhengqing Zang et.al. 2402.18233 null
2024-02-28 VulMCI : Code Splicing-based Pixel-row Oversampling for More Continuous Vulnerability Image Generation Tao Peng et.al. 2402.18189 link
2024-02-27 SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection Junsu Kim et.al. 2402.17323 null
2024-02-27 A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track Zehui Chen et.al. 2402.17319 null
2024-02-27 Probing Multimodal Large Language Models for Global and Local Semantic Representation Mingxu Tao et.al. 2402.17304 link
2024-02-27 Deployment Prior Injection for Run-time Calibratable Object Detection Mo Zhou et.al. 2402.17207 null
2024-02-26 A NIRCam-dark galaxy detected with the MIRI/F1000W filter in the MIDIS/JADES Hubble Ultra Deep Field Pablo G. Pérez-González et.al. 2402.16942 null
2024-02-26 DEYO: DETR with YOLO for End-to-End Object Detection Haodong Ouyang et.al. 2402.16370 link
2024-02-26 mAPm: multi-scale Attention Pyramid module for Enhanced scale-variation in RLD detection Yunusa Haruna et.al. 2402.16291 null
2024-02-26 Real-Time Vehicle Detection and Urban Traffic Behavior Analysis Based on UAV Traffic Videos on Mobile Devices Yuan Zhu et.al. 2402.16246 null
2024-02-25 Semi-supervised Open-World Object Detection Sahal Shaji Mullappilly et.al. 2402.16013 link
2024-02-24 MMW-Carry: Enhancing Carry Object Detection through Millimeter-Wave Radar-Camera Fusion Xiangyu Gao et.al. 2402.15897 null
2024-02-23 A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends Abolfazl Younesi et.al. 2402.15490 null
2024-02-23 A Universal Method for Solar Filament Detection from H-alpha Observations using Semi-supervised Deep Learning Andrea Diercke et.al. 2402.15407 null
2024-02-23 EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection Zhe Wang et.al. 2402.15272 link
2024-02-22 WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition Lianghui Zhu et.al. 2402.14812 link
2024-02-22 High-Speed Detector For Low-Powered Devices In Aerial Grasping Ashish Kumar et.al. 2402.14591 null
2024-02-22 S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR Jialun Pei et.al. 2402.14461 null
2024-02-22 YOLO-TLA: An Efficient and Lightweight Small Object Detection Model based on YOLOv5 Peng Gao et.al. 2402.14309 null
2024-02-21 YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information Chien-Yao Wang et.al. 2402.13616 link
2024-02-21 TransGOP: Transformer-Based Gaze Object Prediction Binglu Wang et.al. 2402.13578 link
2024-02-21 Unsupervised learning based object detection using Contrastive Learning Chandan Kumar et.al. 2402.13465 null
2024-02-20 Combining unsupervised and supervised learning in microscopy enables defect analysis of a full 4H-SiC wafer Binh Duong Nguyen et.al. 2402.13353 null
2024-02-20 GOOD: Towards Domain Generalized Orientated Object Detection Qi Bi et.al. 2402.12765 null
2024-02-20 CST: Calibration Side-Tuning for Parameter and Memory Efficient Transfer Learning Feng Chen et.al. 2402.12736 null
2024-02-20 YOLO-Ant: A Lightweight Detector via Depthwise Separable Convolutional and Large Kernel Design for Antenna Interference Source Detection Xiaoyu Tang et.al. 2402.12641 link
2024-02-20 Efficient Parameter Mining and Freezing for Continual Object Detection Angelo G. Menezes et.al. 2402.12624 null
2024-02-19 LangXAI: Integrating Large Vision Models for Generating Textual Explanations to Enhance Explainability in Visual Perception Tasks Truong Thanh Hung Nguyen et.al. 2402.12525 link
2024-02-19 UncertaintyTrack: Exploiting Detection and Localization Uncertainty in Multi-Object Tracking Chang Won Lee et.al. 2402.12303 link
2024-02-19 Weakly Supervised Object Detection in Chest X-Rays with Differentiable ROI Proposal Networks and Soft ROI Pooling Philip Müller et.al. 2402.11985 link
2024-02-19 SDGE: Stereo Guided Depth Estimation for 360° Camera Sets Jialei Xu et.al. 2402.11791 null
2024-02-19 Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation Bharat Srikishan et.al. 2402.11760 link
2024-02-18 LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object Detection Jingyu Song et.al. 2402.11735 link
2024-02-18 MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection Till Beemelmanns et.al. 2402.11677 link
2024-02-18 VoltSchemer: Use Voltage Noise to Manipulate Your Wireless Charger Zihao Zhan et.al. 2402.11423 null
2024-02-18 A Multispectral Automated Transfer Technique (MATT) for machine-driven image labeling utilizing the Segment Anything Model (SAM) James E. Gallagher et.al. 2402.11413 null
2024-02-17 GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation Ayan Banerjee et.al. 2402.11401 link
2024-02-17 ReViT: Enhancing Vision Transformers with Attention Residual Connections for Visual Recognition Anxhelo Diko et.al. 2402.11301 link
2024-02-16 AutoGPT+P: Affordance-based Task Planning with Large Language Models Timo Birr et.al. 2402.10778 null
2024-02-16 STF: Spatio-Temporal Fusion Module for Improving Video Object Detection Noreen Anwar et.al. 2402.10752 link
2024-02-16 CodaMal: Contrastive Domain Adaptation for Malaria Detection in Low-Cost Microscopes Ishan Rajendrakumar Dave et.al. 2402.10478 link
2024-02-15 LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition Jinyuan Li et.al. 2402.09989 link
2024-02-15 A Comprehensive Review on Computer Vision Analysis of Aerial Data Vivek Tetarwal et.al. 2402.09781 null
2024-02-14 Few-Shot Object Detection with Sparse Context Transformers Jie Mei et.al. 2402.09315 null
2024-02-14 TDViT: Temporal Dilated Video Transformer for Dense Video Tasks Guanxiong Sun et.al. 2402.09257 link
2024-02-14 Efficient One-stage Video Object Detection by Exploiting Temporal Consistency Guanxiong Sun et.al. 2402.09241 link
2024-02-14 Switch EMA: A Free Lunch for Better Flatness and Sharpness Siyuan Li et.al. 2402.09240 link
2024-02-13 Leveraging Self-Supervised Instance Contrastive Learning for Radar Object Detection Colin Decourt et.al. 2402.08427 null
2024-02-13 Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss Kei Iino et.al. 2402.08267 null
2024-02-13 Object Detection in Thermal Images Using Deep Learning for Unmanned Aerial Vehicles Minh Dang Tu et.al. 2402.08251 null
2024-02-12 MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO Shubhabrata Mukherjee et.al. 2402.07894 link
2024-02-12 Evaluation of a Smart Mobile Robotic System for Industrial Plant Inspection and Supervision Georg K. J. Fischer et.al. 2402.07691 null
2024-02-12 AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision Transformer Tanmoy Dam et.al. 2402.07680 link
2024-02-12 A Flow-based Credibility Metric for Safety-critical Pedestrian Detection Maria Lyssenko et.al. 2402.07642 null
2024-02-12 Context-aware Multi-Model Object Detection for Diversely Heterogeneous Compute Systems Justin Davis et.al. 2402.07415 null
2024-02-10 Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance Raza Imam et.al. 2402.07059 link
2024-02-10 Semantic Object-level Modeling for Robust Visual Camera Relocalization Yifan Zhu et.al. 2402.06951 null
2024-02-09 Neural Rendering based Urban Scene Reconstruction for Autonomous Driving Shihao Shen et.al. 2402.06826 null
2024-02-09 Event-to-Video Conversion for Overhead Object Detection Darryl Hannan et.al. 2402.06805 null
2024-02-09 Transfer learning with generative models for object detection on limited datasets Matteo Paiano et.al. 2402.06784 null
2024-02-09 SWITCH: An Exemplar for Evaluating Self-Adaptive ML-Enabled Systems Arya Marda et.al. 2402.06351 link
2024-02-08 A versatile robotic hand with 3D perception, force sensing for autonomous manipulation Nikolaus Correll et.al. 2402.06018 link
2024-02-08 InstaGen: Enhancing Object Detection by Training on Synthetic Dataset Chengjian Feng et.al. 2402.05937 null
2024-02-08 YOLO-CIANNA: Galaxy detection with deep learning in radio data. I. A new YOLO-inspired source detection method applied to the SKAO SDC1 D. Cornu et.al. 2402.05925 link
2024-02-08 Using YOLO v7 to Detect Kidney in Magnetic Resonance Imaging: A Supervised Contrastive Learning Pouria Yazdian Anari et.al. 2402.05817 null
2024-02-08 Scrapping The Web For Early Wildfire Detection Mateo Lostanlen et.al. 2402.05349 null
2024-02-07 Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration Chaoqun Wang et.al. 2402.04883 null
2024-02-07 STAR: Shape-focused Texture Agnostic Representations for Improved Object Detection and 6D Pose Estimation Peter Hönig et.al. 2402.04878 link
2024-02-07 Streamlined Hybrid Annotation Framework using Scalable Codestream for Bandwidth-Restricted UAV Object Detection Karim El Khoury et.al. 2402.04673 null
2024-02-07 G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection Fan Wu et.al. 2402.04672 link
2024-02-07 LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors Sheng Jin et.al. 2402.04630 null
2024-02-07 FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models Chuhao Liu et.al. 2402.04555 null
2024-02-06 Breaking Data Silos: Cross-Domain Learning for Multi-Agent Perception from Independent Private Sources Jinlong Li et.al. 2402.04273 link
2024-02-06 Acceleration and energy consumption optimization in cascading classifiers for face detection on low-cost ARM big.LITTLE asymmetric architectures Alberto Corpas et.al. 2402.04090 null
2024-02-06 YOLOPoint Joint Keypoint and Object Detection Anton Backhaus et.al. 2402.03989 link
2024-02-06 Enhancing Embodied Object Detection through Language-Image Pre-training and Implicit Object Memory Nicolas Harvey Chapman et.al. 2402.03721 null
2024-02-06 Online Informative Sampling using Semantic Features in Underwater Environments Shrutika Vishal Thengane et.al. 2402.03636 null
2024-02-06 BEAM: Beta Distribution Ray Denoising for Multi-view 3D Object Detection Feng Liu et.al. 2402.03634 link
2024-02-05 Stitching the Spectrum: Semantic Spectrum Segmentation with Wideband Signal Daniel Uvaydov et.al. 2402.03465 link
2024-02-05 HASSOD: Hierarchical Adaptive Self-Supervised Object Detection Shengcao Cao et.al. 2402.03311 link
2024-02-05 ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object Detection Ahmed Ghita et.al. 2402.03235 null
2024-02-05 Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector Yuqian Fu et.al. 2402.03094 link
2024-02-05 Improving Robustness of LiDAR-Camera Fusion Model against Weather Corruption from Fusion Strategy Perspective Yihao Huang et.al. 2402.02738 null
2024-02-04 Spatio-temporal Prompting Network for Robust Video Feature Extraction Guanxiong Sun et.al. 2402.02574 link
2024-02-04 Gazebo Plants: Simulating Plant-Robot Interaction with Cosserat Rods Junchen Deng et.al. 2402.02570 null
2024-02-04 DeSparsify: Adversarial Attack Against Token Sparsification Mechanisms in Vision Transformers Oryan Yehezkel et.al. 2402.02554 null
2024-02-03 $\textit{A Contrario}$ Paradigm for YOLO-based Infrared Small Target Detection Alina Ciocarlan et.al. 2402.02288 null
2024-02-03 CoFiNet: Unveiling Camouflaged Objects with Multi-Scale Finesse Cunhan Guo et.al. 2402.02217 null
2024-02-03 Decomposition-based and Interference Perception for Infrared and Visible Image Fusion in Complex Scenes Xilai Li et.al. 2402.02096 null
2024-02-02 Dynamic Occupancy Grids for Object Detection: A Radar-Centric Approach Max Peter Ronecker et.al. 2402.01488 null
2024-02-02 Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection Hao Li et.al. 2402.01304 null
2024-02-02 Spiking CenterNet: A Distillation-boosted Spiking Neural Network for Object Detection Lennard Bodden et.al. 2402.01287 null
2024-02-02 TSJNet: A Multi-modality Target and Semantic Awareness Joint-driven Image Fusion Network Yuchan Jie et.al. 2402.01212 null
2024-02-02 A Survey for Foundation Models in Autonomous Driving Haoxiang Gao et.al. 2402.01105 null
2024-02-01 Semantic-Aware and Goal-Oriented Communications for Object Detection in Wireless End-to-End Image Transmission Fatemeh Zahra Safaeipour et.al. 2402.01064 null
2024-02-01 Vehicle Perception from Satellite Bin Zhao et.al. 2402.00703 link
2024-02-01 A Manifold Representation of the Key in Vision Transformers Li Meng et.al. 2402.00534 null
2024-02-01 Night-Rider: Nocturnal Vision-aided Localization in Streetlight Maps Using Invariant Extended Kalman Filtering Tianxiao Gao et.al. 2402.00330 link
2024-02-01 FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation Takuma Yagi et.al. 2402.00293 null
2024-01-31 Capacity Constraint Analysis Using Object Detection for Smart Manufacturing Hafiz Mughees Ahmad et.al. 2402.00243 null
2024-01-31 Improving Object Detection Quality in Football Through Super-Resolution Techniques Karolina Seweryn et.al. 2402.00163 null
2024-01-31 Real-time Traffic Object Detection for Autonomous Driving Abdul Hannan Khan et.al. 2402.00128 null
2024-01-31 Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study Qirui Jiao et.al. 2401.17981 null
2024-01-31 MelNet: A Real-Time Deep Learning Algorithm for Object Detection Yashar Azadvatan et.al. 2401.17972 null
2024-01-31 Source-free Domain Adaptive Object Detection in Remote Sensing Images Weixing Liu et.al. 2401.17916 null
2024-01-31 SubPipe: A Submarine Pipeline Inspection Dataset for Segmentation and Visual-inertial Localization Olaya Álvarez-Tuñón et.al. 2401.17907 link
2024-01-31 Do Object Detection Localization Errors Affect Human Performance and Trust? Sven de Witte et.al. 2401.17821 null
2024-01-31 Haris: an Advanced Autonomous Mobile Robot for Smart Parking Assistance Layth Hamad et.al. 2401.17741 null
2024-01-30 AdvGPS: Adversarial GPS for Multi-Agent Perception Attack Jinlong Li et.al. 2401.17499 link
2024-01-30 YOLO-World: Real-Time Open-Vocabulary Object Detection Tianheng Cheng et.al. 2401.17270 link
2024-01-30 A Bearing-Angle Approach for Unknown Target Motion Analysis Based on Visual Measurements Zian Ning et.al. 2401.17117 null
2024-01-30 LF Tracy: A Unified Single-Pipeline Approach for Salient Object Detection in Light Field Cameras Fei Teng et.al. 2401.16712 link
2024-01-30 Characterization of Magnetic Labyrinthine Structures through Junctions and Terminals Detection using Template Matching and CNN Vinícius Yu Okubo et.al. 2401.16688 null
2024-01-30 The Why, When, and How to Use Active Learning in Large-Data-Driven 3D Object Detection for Safe Autonomous Driving: An Empirical Exploration Ross Greer et.al. 2401.16634 null
2024-01-29 SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design Seokju Yun et.al. 2401.16456 link
2024-01-29 Computer Vision for Primate Behavior Analysis in the Wild Richard Vogg et.al. 2401.16424 null
2024-01-29 MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection Yuxue Yang et.al. 2401.16305 link
2024-01-29 Towards Scenario Generalization for Vision-based Roadside 3D Object Detection Lei Yang et.al. 2401.16110 link
2024-01-29 Rectify the Regression Bias in Long-Tailed Object Detection Ke Zhu et.al. 2401.15885 null
2024-01-29 LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection Sifan Zhou et.al. 2401.15865 link
2024-01-29 LCVO: An Efficient Pretraining-Free Framework for Visual Question Answering Grounding Yuhan Chen et.al. 2401.15842 null
2024-01-28 Real-time object detection and robotic manipulation for agriculture using a YOLO-based learning approach Hongyu Zhao et.al. 2401.15785 null
2024-01-27 New Foggy Object Detecting Model Rahul Banavathu et.al. 2401.15455 null
2024-01-27 You Only Look Bottom-Up for Monocular 3D Object Detection Kaixin Xiong et.al. 2401.15319 null
2024-01-26 pLitterStreet: Street Level Plastic Litter Detection and Mapping Sriram Reddy Mandhati et.al. 2401.14719 link
2024-01-26 From Blurry to Brilliant Detection: YOLOv5-Based Aerial Object Detection with Super Resolution Ragib Amin Nihal et.al. 2401.14661 null
2024-01-25 UrbanGenAI: Reconstructing Urban Landscapes using Panoptic Segmentation and Diffusion Models Timo Kapsalis et.al. 2401.14379 null
2024-01-25 MultiTest: Physical-Aware Object Insertion for Testing Multi-sensor Fusion Perception Systems Xinyu Gao et.al. 2401.14314 null
2024-01-25 Knowledge Graph Driven UAV Cognitive Semantic Communication Systems for Efficient Object Detection Xi Song et.al. 2401.13995 null
2024-01-24 PLATE: A perception-latency aware estimator, Rodrigo Aldana-López et.al. 2401.13596 null
2024-01-24 Deep Learning for Improved Polyp Detection from Synthetic Narrow-Band Imaging Mathias Ramm Haugland et.al. 2401.13315 null
2024-01-24 AMANet: Advancing SAR Ship Detection with Adaptive Multi-Hierarchical Attention Network Xiaolin Ma et.al. 2401.13214 null
2024-01-23 Enhancing Object Detection Performance for Small Objects through Synthetic Data Generation and Proportional Class-Balancing Technique: A Comparative Study in Industrial Scenarios Jibinraj Antony et.al. 2401.12729 null
2024-01-23 Pragmatic Communication in Multi-Agent Collaborative Perception Yue Hu et.al. 2401.12694 null
2024-01-23 Small Language Model Meets with Reinforced Vision Vocabulary Haoran Wei et.al. 2401.12503 null
2024-01-23 Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration Yifan Zhang et.al. 2401.12452 null
2024-01-22 OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics Peiqi Liu et.al. 2401.12202 link
2024-01-22 Out-of-Distribution Detection & Applications With Ablated Learned Temperature Energy Will LeVine et.al. 2401.12129 link
2024-01-22 A Saliency Enhanced Feature Fusion based multiscale RGB-D Salient Object Detection Network Rui Huang et.al. 2401.11914 null
2024-01-22 Large receptive field strategy and important feature extraction strategy in 3D object detection Leichao Cui et.al. 2401.11913 null
2024-01-22 Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis Jiawei Wang et.al. 2401.11874 link
2024-01-22 Rethinking Centered Kernel Alignment in Knowledge Distillation Zikai Zhou et.al. 2401.11824 link

(back to top)

Keypoint Detection

Publish Date Title Authors PDF Code
2024-08-15 Towards Practical Human Motion Prediction with LiDAR Point Clouds Xiao Han et.al. 2408.08202 null
2024-07-31 Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods Xusheng Luo et.al. 2408.00117 null
2024-07-26 SHIC: Shape-Image Correspondences with no Keypoint Supervision Aleksandar Shtedritski et.al. 2407.18907 null
2024-07-25 LION: Linear Group RNN for 3D Object Detection in Point Clouds Zhe Liu et.al. 2407.18232 link
2024-07-22 RADA: Robust and Accurate Feature Learning with Domain Adaptation Jingtai He et.al. 2407.15791 null
2024-07-09 LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition Teng Wang et.al. 2407.06730 null
2024-07-04 PFGS: High Fidelity Point Cloud Rendering via Feature Splatting Jiaxu Wang et.al. 2407.03857 link
2024-07-03 A Radiometric Correction based Optical Modeling Approach to Removing Reflection Noise in TLS Point Clouds of Urban Scenes Li Fang et.al. 2407.02830 link
2024-07-02 Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning Chengchao Shen et.al. 2407.02014 link
2024-06-28 Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics Chengrui Gao et.al. 2406.19672 null
2024-07-23 A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking Lorenzo Shaikewitz et.al. 2406.16837 link
2024-06-03 Scale-Free Image Keypoints Using Differentiable Persistent Homology Giovanni Barbarani et.al. 2406.01315 link
2024-06-23 W-Net: A Facial Feature-Guided Face Super-Resolution Network Hao Liu et.al. 2406.00676 null
2024-05-25 Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration Junjie Gao et.al. 2405.16085 null
2024-06-01 Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding Weizhen Liu et.al. 2405.12476 link
2024-05-14 TP3M: Transformer-based Pseudo 3D Image Matching with Reference Liming Han et.al. 2405.08434 null
2024-05-15 Vector-Symbolic Architecture for Event-Based Optical Flow Hongzhi You et.al. 2405.08300 null
2024-05-13 RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration Congjia Chen et.al. 2405.07594 null
2024-05-08 Unsupervised Skin Feature Tracking with Deep Neural Networks Jose Chang et.al. 2405.04943 null
2024-05-07 A Self-Supervised Method for Body Part Segmentation and Keypoint Detection of Rat Images László Kopácsi et.al. 2405.04650 null
2024-04-30 A Light-weight Transformer-based Self-supervised Matching Network for Heterogeneous Images Wang Zhang et.al. 2404.19311 null
2024-04-25 Adaptive Local Binary Pattern: A Novel Feature Descriptor for Enhanced Analysis of Kidney Abnormalities in CT Scan Images using ensemble based Machine Learning Approach Tahmim Hossain et.al. 2404.14560 null
2024-04-19 SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers Vandad Davoodnia et.al. 2404.12625 null
2024-04-17 Pixel-Wise Symbol Spotting via Progressive Points Location for Parsing CAD Images Junbiao Pang et.al. 2404.10985 null
2024-03-28 Towards Long Term SLAM on Thermal Imagery Colin Keil et.al. 2403.19885 link
2024-03-28 Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation Xiao Lin et.al. 2403.19527 link
2024-03-27 RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation Yang Tian et.al. 2403.18259 null
2024-03-18 FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events Xiangyuan Wang et.al. 2403.11662 link
2024-03-05 Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion Meng Zheng et.al. 2403.03217 null
2024-02-22 A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across Datasets Chengzhang Yu et.al. 2402.14241 null
2024-02-25 A Feature Matching Method Based on Multi-Level Refinement Strategy Shaojie Zhang et.al. 2402.13488 null
2024-03-05 3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data Zhi-Yi Lin et.al. 2402.13172 null
2024-02-25 Region Feature Descriptor Adapted to High Affine Transformations Shaojie Zhang et.al. 2402.09724 null
2024-01-29 Reconstructing Close Human Interactions from Multiple Views Qing Shuai et.al. 2401.16173 link
2024-01-17 To deform or not: treatment-aware longitudinal registration for breast DCE-MRI during neoadjuvant chemotherapy via unsupervised keypoints detection Luyi Han et.al. 2401.09336 link
2024-01-08 Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach Huanyu Liu et.al. 2401.03742 link
2024-03-22 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation Li Xu et.al. 2401.00029 null
2023-12-27 Bezier-based Regression Feature Descriptor for Deformable Linear Objects Fangqing Chen et.al. 2312.16502 null
2023-12-24 Residual Learning for Image Point Descriptors Rashik Shrestha et.al. 2312.15471 null
2023-12-22 BonnBeetClouds3D: A Dataset Towards Point Cloud-based Organ-level Phenotyping of Sugar Beet Plants under Field Conditions Elias Marks et.al. 2312.14706 null
2023-12-19 Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation Jiaming Liu et.al. 2312.12480 null
2023-12-19 An effective image copy-move forgery detection using entropy image Zhaowei Lu et.al. 2312.11793 link
2023-12-11 VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data Jian Shi et.al. 2312.08871 link
2023-12-11 Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach Travis Driver et.al. 2312.06865 link

(back to top)

Open-Vocabulary

Publish Date Title Authors PDF Code
2024-08-20 OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding Youjun Zhao et.al. 2408.11030 link
2024-08-20 Open 3D World in Autonomous Driving Xinlong Cheng et.al. 2408.10880 null
2024-08-20 LightMDETR: A Lightweight Approach for Low-Cost Open-Vocabulary Object Detection Training Binta Sow et.al. 2408.10787 null
2024-08-20 Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant Guofeng Mei et.al. 2408.10652 null
2024-08-20 SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition Zebang Cheng et.al. 2408.10500 link
2024-08-18 OVOSE: Open-Vocabulary Semantic Segmentation in Event-Based Cameras Muhammad Rameez Ur Rahman et.al. 2408.09424 link
2024-08-17 Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community Jiancheng Pan et.al. 2408.09110 null
2024-08-16 From Lazy to Prolific: Tackling Missing Labels in Open Vocabulary Extreme Classification by Positive-Unlabeled Sequence Learning Haoran Ranran Zhang et.al. 2408.08981 null
2024-08-16 Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation Tri Ton et.al. 2408.08591 null
2024-08-15 Towards Flexible Visual Relationship Segmentation Fangrui Zhu et.al. 2408.08305 null
2024-08-15 VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability Maps Senthil Hariharan Arul et.al. 2408.08301 null
2024-08-15 DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions Ryosuke Korekata et.al. 2408.07910 null
2024-08-18 Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space Hyunjee Lee et.al. 2408.07416 null
2024-08-13 Fingerspelling within Sign Language Translation Garrett Tanzer et.al. 2408.07065 null
2024-08-11 An analysis of HOI: using a training-free method with multimodal visual foundation models when only the test set is available, without the training set Chaoyi Ai et.al. 2408.05772 null
2024-08-11 Efficient and Versatile Robust Fine-Tuning of Zero-shot Models Sungyeon Kim et.al. 2408.05749 null
2024-08-09 In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation Dahyun Kang et.al. 2408.04961 link
2024-08-09 ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation Mengcheng Lan et.al. 2408.04883 link
2024-08-07 Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving Amirhosein Chahe et.al. 2408.03516 null
2024-08-05 Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts Andong Tan et.al. 2408.02265 null
2024-08-01 Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation Siyu Jiao et.al. 2408.00744 link
2024-07-31 Open-Vocabulary Audio-Visual Semantic Segmentation Ruohao Guo et.al. 2407.21721 null
2024-07-31 MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection Kuo Wang et.al. 2407.21465 link
2024-07-29 MaskInversion: Localized Embeddings via Optimization of Explainability Maps Walid Bousselham et.al. 2407.20034 null
2024-07-24 DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation Qian Feng et.al. 2407.17348 null
2024-07-25 LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering Simon Boeder et.al. 2407.17310 null
2024-07-24 OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos Debidatta Dwibedi et.al. 2407.17085 null
2024-07-23 SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation Pengfei Chen et.al. 2407.16682 null
2024-07-24 MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues Liyun Zhang et.al. 2407.16552 null
2024-07-18 Which objects help me to act effectively? Reasoning about physically-grounded affordances Anne Kemmeren et.al. 2407.13811 null
2024-07-18 SegPoint: Segment Any Point Cloud via Large Language Model Shuting He et.al. 2407.13761 null
2024-07-18 Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models Xiaoyu Zhu et.al. 2407.13642 null
2024-07-18 Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation Pengfei Wang et.al. 2407.13362 null
2024-07-18 OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping Li Meng et.al. 2407.13175 link
2024-07-17 CerberusDet: Unified Multi-Task Object Detection Irina Tolstykh et.al. 2407.12632 link
2024-07-17 ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference Mengcheng Lan et.al. 2407.12442 null
2024-07-17 VEON: Vocabulary-Enhanced Occupancy Prediction Jilai Zheng et.al. 2407.12294 null
2024-07-18 LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction Penghui Du et.al. 2407.11335 link
2024-07-17 Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion Philipp Allgeuer et.al. 2407.11211 null
2024-07-15 OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer Yu Wang et.al. 2407.10655 link
2024-07-15 Evaluating Model Bias Requires Characterizing its Mistakes Isabela Albuquerque et.al. 2407.10633 null
2024-07-13 DexGrasp-Diffusion: Diffusion-based Unified Functional Grasp Synthesis Pipeline for Multi-Dexterous Robotic Hands Zhengshen Zhang et.al. 2407.09899 null
2024-07-13 Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding Ruihuang Li et.al. 2407.09781 null
2024-07-12 DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training Chen Xin et.al. 2407.09174 link
2024-07-12 Open Vocabulary Multi-Label Video Classification Rohit Gupta et.al. 2407.09073 null
2024-07-12 Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing Jun Zhu et.al. 2407.09053 null
2024-07-12 OVExp: Open Vocabulary Exploration for Object-Oriented Navigation Meng Wei et.al. 2407.09016 null
2024-07-12 Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection Xingyu Peng et.al. 2407.08931 link
2024-07-11 Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation Tong Shao et.al. 2407.08268 link
2024-07-10 OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion Hao Wang et.al. 2407.07844 link
2024-07-10 Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours of EEG Data Motoshige Sato et.al. 2407.07595 null
2024-07-12 Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation Hao Fang et.al. 2407.07427 link
2024-07-09 Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization Jeongseok Hyun et.al. 2407.07024 link
2024-07-09 Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge Sriram Yenamandra et.al. 2407.06939 null
2024-07-09 Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions Yu-Guan Hsieh et.al. 2407.06723 null
2024-07-07 Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image Pengkun Jiao et.al. 2407.05256 null
2024-07-06 A Study of Test-time Contrastive Concepts for Open-world, Open-vocabulary Semantic Segmentation Monika Wysoczańska et.al. 2407.05061 null
2024-07-05 CountGD: Multi-Modal Open-World Counting Niki Amini-Naieni et.al. 2407.04619 null
2024-07-03 A Unified Framework for 3D Scene Understanding Wei Xu et.al. 2407.03263 null
2024-07-02 Open Panoramic Segmentation Junwei Zheng et.al. 2407.02685 link
2024-07-01 PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction Xuan Yu et.al. 2407.01349 null
2024-07-01 Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation Zihan Gao et.al. 2407.01220 null
2024-07-01 Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models Takayuki Nishimura et.al. 2407.00985 null
2024-06-29 When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration Philipp Allgeuer et.al. 2407.00518 null
2024-06-28 PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators Kuo-Hao Zeng et.al. 2406.20083 null
2024-07-01 3D Feature Distillation with Object-Centric Priors Georgios Tziafas et.al. 2406.18742 null
2024-06-26 Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps Dicong Qiu et.al. 2406.18115 null
2024-06-24 High-resolution open-vocabulary object 6D pose estimation Jaime Corsetti et.al. 2406.16384 null
2024-07-01 A Simple Framework for Open-Vocabulary Zero-Shot Segmentation Thomas Stegmüller et.al. 2406.16085 null
2024-06-21 Open-vocabulary Pick and Place via Patch-level Semantic Maps Mingxi Jia et.al. 2406.15677 null
2024-06-21 Open-Vocabulary Temporal Action Localization using Multimodal Guidance Akshita Gupta et.al. 2406.15556 null
2024-06-19 StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images Rushikesh Zawar et.al. 2406.13735 null
2024-06-17 V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results Jiaqi Wang et.al. 2406.11739 null
2024-06-17 Understanding Multi-Granularity for Open-Vocabulary Part Segmentation Jiho Choi et.al. 2406.11384 link
2024-06-16 Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP Shuyang Lin et.al. 2406.10961 null
2024-06-14 Open-Vocabulary Semantic Segmentation with Image Embedding Balancing Xiangheng Shan et.al. 2406.09829 link
2024-06-14 Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting Ce Hao et.al. 2406.09767 null
2024-06-13 ImageNet3D: Towards General-Purpose Object-Level 3D Understanding Wufei Ma et.al. 2406.09613 link
2024-06-21 Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024 Peixi Wu et.al. 2406.09201 null
2024-06-13 Auto-Vocabulary Segmentation for LiDAR Points Weijie Wei et.al. 2406.09126 null
2024-06-13 LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions Rumaisa Azeem et.al. 2406.08824 null
2024-06-12 OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding Yinan Deng et.al. 2406.08009 link
2024-06-12 CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting Sichen Jin et.al. 2406.07923 null
2024-06-11 Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph Sergey Linok et.al. 2406.07113 null
2024-06-10 Open-Vocabulary Part-Based Grasping Tjeard van Oort et.al. 2406.05951 null
2024-06-07 USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation Xiaoqi Wang et.al. 2406.05271 null
2024-06-07 OVMR: Open-Vocabulary Recognition with Multi-Modal References Zehong Ma et.al. 2406.04675 link
2024-06-07 FusionBench: A Comprehensive Benchmark of Deep Model Fusion Anke Tang et.al. 2406.03280 link
2024-06-04 Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation Mohamed El Amine Boudjoghra et.al. 2406.02548 link
2024-06-04 OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding Yanmin Wu et.al. 2406.02058 null
2024-06-04 FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping Yuzhou Ji et.al. 2406.01916 null
2024-06-03 ELSA: Evaluating Localization of Social Activities in Urban Streets Maryam Hosseini et.al. 2406.01551 null
2024-06-03 EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding Thanh-Dat Truong et.al. 2406.01429 null
2024-06-02 Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection Yang Cao et.al. 2406.00830 link
2024-06-01 Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection Jiaming Li et.al. 2406.00510 null
2024-05-31 Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding Xiaolong Sun et.al. 2406.00143 null
2024-05-30 OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation Gonca Yilmaz et.al. 2405.20141 null
2024-05-30 RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection Fangyi Chen et.al. 2405.19854 link
2024-05-29 Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models Tianrun Chen et.al. 2405.19326 null
2024-05-29 Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation Zelin Peng et.al. 2405.18840 null
2024-05-28 OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision Junjie Wang et.al. 2405.17913 link
2024-06-03 EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? Boshen Xu et.al. 2405.17719 link
2024-05-27 GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane Yansong Qu et.al. 2405.17596 null
2024-05-26 Map-based Modular Approach for Zero-shot Embodied Question Answering Koya Sakamoto et.al. 2405.16559 null
2024-05-26 CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection Lin Zhu et.al. 2405.16417 link
2024-05-25 DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution Yuzhong Zhao et.al. 2405.16071 link
2024-05-24 Open-Vocabulary SAM3D: Understand Any 3D Scene Hanchen Tai et.al. 2405.15580 null
2024-05-24 3D Unsupervised Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving Boyi Sun et.al. 2405.15286 link
2024-05-23 TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing Teng Xu et.al. 2405.14455 null
2024-05-23 Tuning-free Universally-Supervised Semantic Segmentation Xiaobo Yang et.al. 2405.14294 null
2024-05-19 Unsupervised Image Prior via Prompt Learning and CLIP Semantic Guidance for Low-Light Image Enhancement Igor Morawski et.al. 2405.11478 null
2024-05-17 Open-Vocabulary Spatio-Temporal Action Detection Tao Wu et.al. 2405.10832 null
2024-05-16 When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models Xianzheng Ma et.al. 2405.10255 link
2024-05-16 SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection Mingxuan Liu et.al. 2405.10053 link
2024-05-15 A Survey On Text-to-3D Contents Generation In The Wild Chenhan Jiang et.al. 2405.09431 null
2024-05-14 Open-Vocabulary Object Detection via Neighboring Region Attention Alignment Sunyuan Qiang et.al. 2405.08593 null
2024-05-13 Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM Xiaoyu Chen et.al. 2405.07840 null
2024-05-13 Constructing a BPE Tokenization DFA Martin Berglund et.al. 2405.07671 null
2024-05-10 Are EEG-to-Text Models Working? Hyejeong Jo et.al. 2405.06459 link
2024-05-09 Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control Gunshi Gupta et.al. 2405.05852 link
2024-05-09 DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation Sitian Shen et.al. 2405.05800 null
2024-05-09 RoboHop: Segment-based Topological Map Representation for Open-World Visual Navigation Sourav Garg et.al. 2405.05792 null
2024-05-08 OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies Lingdong Kong et.al. 2405.05259 link
2024-05-08 Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving Lingdong Kong et.al. 2405.05258 link
2024-05-08 DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector Kaiyu Li et.al. 2405.04788 link
2024-05-14 Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting Ola Shorinwa et.al. 2405.04378 null
2024-05-03 DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos Wen-Hsuan Chu et.al. 2405.02280 link
2024-05-03 EEG2TEXT: Open Vocabulary EEG-to-Text Decoding with EEG Pre-Training and Multi-View Transformer Hanwen Liu et.al. 2405.02165 null
2024-04-30 One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features Trung Thanh Nguyen et.al. 2404.19542 link
2024-04-30 MoST: Multi-modality Scene Tokenization for Motion Prediction Norman Mu et.al. 2404.19531 null
2024-04-28 Garbage Segmentation and Attribute Analysis by Robotic Dogs Nuo Xu et.al. 2404.18112 null
2024-04-29 MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition Zheng Lian et.al. 2404.17113 link
2024-04-23 DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition Haozhe Cheng et.al. 2404.14890 null
2024-04-19 ECOR: Explainable CLIP for Object Recognition Ali Rasekh et.al. 2404.12839 null
2024-04-18 Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds Oliver Lemke et.al. 2404.12440 null
2024-04-18 The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models Cheng Shi et.al. 2404.11957 link
2024-04-17 OVAL-Prompt: Open-Vocabulary Affordance Localization for Robot Manipulation through LLM Affordance-Grounding Edmond Tong et.al. 2404.11000 null
2024-04-16 Watch Your Step: Optimal Retrieval for Continual Learning at Scale Truman Hickok et.al. 2404.10758 null
2024-04-16 Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V Peiyuan Zhi et.al. 2404.10220 null
2024-04-15 Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels Amaya Dharmasiri et.al. 2404.10146 link
2024-04-15 Evolving Interpretable Visual Classifiers with Large Language Models Mia Chiquier et.al. 2404.09941 null
2024-04-15 kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies Zhongrui Gui et.al. 2404.09447 null
2024-04-14 DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection Lewei Yao et.al. 2404.09216 null
2024-04-12 Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation Yanhao Zheng et.al. 2404.08603 link
2024-04-12 Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation Sina Hajimiri et.al. 2404.08181 link
2024-04-11 Transferable and Principled Efficiency for Open-Vocabulary Segmentation Jingxuan Xu et.al. 2404.07448 link
2024-04-10 O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation Muer Tie et.al. 2404.06836 null
2024-04-09 GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation Mukul Khanna et.al. 2404.06609 null
2024-04-09 Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation Luca Barsellotti et.al. 2404.06542 null
2024-04-10 Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection Ting Lei et.al. 2404.06194 link
2024-04-08 Retrieval-Augmented Open-Vocabulary Object Detection Jooyeon Kim et.al. 2404.05687 link
2024-04-08 MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation Kunpeng Song et.al. 2404.05674 link
2024-04-07 Hyperbolic Learning with Synthetic Captions for Open-World Detection Fanjie Kong et.al. 2404.05016 null
2024-04-06 Mixed-Query Transformer: A Unified Image Segmentation Architecture Pei Wang et.al. 2404.04469 null
2024-04-05 Open vocabulary keyword spotting through transfer learning from speech synthesis Kesavaraj V et.al. 2404.03914 null
2024-04-04 OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views Francis Engelmann et.al. 2404.03650 null
2024-04-04 Is CLIP the main roadblock for fine-grained open-world perception? Lorenzo Bianchi et.al. 2404.03539 link
2024-04-04 Learning Transferable Negative Prompts for Out-of-Distribution Detection Tianqi Li et.al. 2404.03248 link
2024-04-04 LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity Walid Bousselham et.al. 2404.03214 link
2024-04-03 ALOHa: A New Measure for Hallucination in Captioning Models Suzanne Petryk et.al. 2404.02904 null
2024-04-03 Low-resource neural machine translation with morphological modeling Antoine Nzeyimana et.al. 2404.02392 link
2024-04-02 Segment Any 3D Object with Language Seungjun Lee et.al. 2404.02157 null
2024-04-03 ViTamin: Designing Scalable Vision Models in the Vision-Language Era Jieneng Chen et.al. 2404.02132 link
2024-04-01 OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation Xiongwei Wu et.al. 2404.01409 null
2024-04-02 Open-Vocabulary Federated Learning with Multimodal Prototyping Huimin Zeng et.al. 2404.01232 link
2024-04-01 GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields Yunsong Wang et.al. 2404.00931 link
2024-04-01 From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models Rongjie Li et.al. 2404.00906 link
2024-03-31 Training-Free Semantic Segmentation via LLM-Supervision Wenfang Sun et.al. 2404.00701 null
2024-03-30 Do Vision-Language Models Understand Compound Nouns? Sonal Kumar et.al. 2404.00419 link
2024-03-30 Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation Yuan Wang et.al. 2404.00262 null
2024-03-29 FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models Barbara Toniella Corradini et.al. 2403.20105 null
2024-03-28 OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation Zhenyu Wang et.al. 2403.19580 link
2024-03-27 Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D Mukund Varma T et.al. 2403.18922 null
2024-03-26 Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation Abdelrhman Werby et.al. 2403.17846 null
2024-03-26 OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation Ganlong Zhao et.al. 2403.17334 null
2024-03-22 Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting Jun Guo et.al. 2403.15624 null
2024-03-21 PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model Zheng Zhang et.al. 2403.14598 link
2024-03-21 Exosense: A Vision-Centric Scene Understanding System For Safe Exoskeleton Navigation Jianeng Wang et.al. 2403.14320 null
2024-03-21 Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models Pablo Marcos-Manchón et.al. 2403.14291 link
2024-03-21 Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection Tim Salzmann et.al. 2403.14270 null
2024-03-20 Learning from Models and Data for Visual Grounding Ruozhen He et.al. 2403.13804 null
2024-03-20 Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation Hugues Thomas et.al. 2403.13777 null
2024-03-20 Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments Djamahl Etchegaray et.al. 2403.13556 link
2024-03-19 AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents Jieming Cui et.al. 2403.12835 null
2024-03-19 DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM Yixuan Wu et.al. 2403.12488 link
2024-03-19 CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation Wenqi Zhu et.al. 2403.12455 link
2024-03-19 VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation Hao Wang et.al. 2403.12415 link
2024-03-19 OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation Junhao Cai et.al. 2403.12396 null
2024-03-18 OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation Haochen Jiang et.al. 2403.11796 null
2024-03-17 TAG: Guidance-free Open-Vocabulary Semantic Segmentation Yasufumi Kawano et.al. 2403.11197 link
2024-03-17 MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation Yasufumi Kawano et.al. 2403.11194 link
2024-03-16 N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields Yash Bhalgat et.al. 2403.10997 null
2024-03-16 Unsupervised Collaborative Metric Learning with Mixed-Scale Groups for General Object Retrieval Shichao Kan et.al. 2403.10798 link
2024-03-15 Generative Region-Language Pretraining for Open-Ended Object Detection Chuang Lin et.al. 2403.10191 link
2024-03-15 Do Visual-Language Maps Capture Latent Semantics? Matti Pekkanen et.al. 2403.10117 null
2024-03-14 GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping Yuhang Zheng et.al. 2403.09637 link
2024-03-14 PosSAM: Panoptic Open-vocabulary Segment Anything Vibashan VS et.al. 2403.09620 link
2024-03-14 Renovating Names in Open-Vocabulary Segmentation Benchmarks Haiwen Huang et.al. 2403.09593 null
2024-03-14 Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization Zhao Wang et.al. 2403.09433 null
2024-03-14 OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments Yinan Deng et.al. 2403.09412 link
2024-03-14 Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation Daniel Honerkamp et.al. 2403.08605 link
2024-03-12 Learning Generalizable Feature Fields for Mobile Manipulation Ri-Zhao Qiu et.al. 2403.07563 null
2024-03-12 Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss Xuhua Ren et.al. 2403.07518 null
2024-03-11 Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head Tiancheng Zhao et.al. 2403.06892 link
2024-03-02 A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition Tyler Benster et.al. 2403.05583 link
2024-03-14 OmniCount: Multi-label Object Counting with Semantic-Geometric Priors Anindya Mondal et.al. 2403.05435 null
2024-03-08 Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery Xavier Bou et.al. 2403.05381 link
2024-03-07 Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities Kaiwen Cai et.al. 2403.04908 link
2024-03-06 Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery Wei Zhang et.al. 2403.03790 null
2024-03-06 Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision Yajie Liu et.al. 2403.03707 null
2024-03-05 MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting Fangchen Liu et.al. 2403.03174 null
2024-03-03 Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition Kun-Yu Lin et.al. 2403.01560 link
2024-03-10 Benchmarking Segmentation Models with Mask-Preserved Attribute Editing Zijin Yin et.al. 2403.01231 link
2024-03-01 Multi-modal Attribute Prompting for Vision-Language Models Xin Liu et.al. 2403.00219 null
2024-02-29 DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments Ji Ma et.al. 2402.19007 null
2024-02-29 MOSAIC: A Modular System for Assistive and Interactive Cooking Huaxiaoyue Wang et.al. 2402.18796 null
2024-02-26 CARTE: pretraining and transfer for tabular learning Myung Jun Kim et.al. 2402.16785 link
2024-02-23 OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding Francis Engelmann et.al. 2402.15321 null
2024-02-21 Real-time 3D-aware Portrait Editing from a Single Image Qingyan Bai et.al. 2402.14000 link
2024-02-21 Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation Jialei Chen et.al. 2402.13697 null
2024-02-20 A Touch, Vision, and Language Dataset for Multimodal Alignment Letian Fu et.al. 2402.13232 link
2024-02-19 Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships Sebastian Koch et.al. 2402.12259 link
2024-02-18 Verifiably Following Complex Robot Instructions with Foundation Models Benedict Quartey et.al. 2402.11498 null
2024-02-15 Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment Angelos Zavras et.al. 2402.09816 null
2024-02-14 Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision Zhaoqing Wang et.al. 2402.08960 link
2024-02-20 InstaGen: Enhancing Object Detection by Training on Synthetic Dataset Chengjian Feng et.al. 2402.05937 null
2024-02-15 Open-Vocabulary Calibration for Vision-Language Models Shuoyuan Wang et.al. 2402.04655 link
2024-02-07 OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding Guibiao Liao et.al. 2402.04648 null
2024-02-07 LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors Sheng Jin et.al. 2402.04630 null
2024-02-06 Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience Xilin Jiang et.al. 2402.03710 null
2024-02-05 FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition Xiaohu Huang et.al. 2402.03241 null
2024-02-05 Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector Yuqian Fu et.al. 2402.03094 link
2024-02-02 YOLO-World: Real-Time Open-Vocabulary Object Detection Tianheng Cheng et.al. 2401.17270 link
2024-01-29 Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors Shiyin Dong et.al. 2401.16459 null
2024-01-29 Spatial-Aware Latent Initialization for Controllable Image Generation Wenqiang Sun et.al. 2401.16157 null
2024-01-29 LCVO: An Efficient Pretraining-Free Framework for Visual Question Answering Grounding Yuhan Chen et.al. 2401.15842 null
2024-01-25 Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks Tianhe Ren et.al. 2401.14159 link
2024-01-25 True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning Weihao Tan et.al. 2401.14151 link
2024-01-22 Exploring Simple Open-Vocabulary Semantic Segmentation Zihang Lai et.al. 2401.12217 link
2024-01-22 OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics Peiqi Liu et.al. 2401.12202 link
2024-01-22 HomeRobot Open Vocabulary Mobile Manipulation Challenge 2023 Participant Report (Team KuzHum) Volodymyr Kuzma et.al. 2401.12048 null
2024-01-31 UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature Representation Qingdong He et.al. 2401.11395 link
2024-01-18 OMG-Seg: Is One Model Good Enough For All Segmentation? Xiangtai Li et.al. 2401.10229 link
2024-01-18 Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation Songhe Deng et.al. 2401.09883 link
2024-01-18 Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation Zesen Cheng et.al. 2401.09732 link
2024-01-17 POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images Antonin Vobecky et.al. 2401.09413 null
2024-01-17 OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed Reality Aditya Sharma et.al. 2401.08973 null
2024-01-16 Robotic Imitation of Human Actions Josua Spisak et.al. 2401.08381 null

(back to top)

Image Captioning

Publish Date Title Authors PDF Code
2024-08-19 The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks Niyar R Barman et.al. 2408.10446 null
2024-08-16 An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation Peiming Guo et.al. 2408.08650 null
2024-08-13 PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology Xiaomin Wu et.al. 2408.07037 null
2024-08-12 Prompt Recovery for Image Generation Models: A Comparative Study of Discrete Optimizers Joshua Nathaniel Williams et.al. 2408.06502 null
2024-08-09 Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy and Novel Ensemble Method Uri Berger et.al. 2408.04909 null
2024-08-09 FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers Joshua Nathaniel Williams et.al. 2408.04816 link
2024-08-08 Enhancing Journalism with AI: A Study of Contextualized Image Captioning for News Articles using LLMs and LMMs Aliki Anagnostopoulou et.al. 2408.04331 null
2024-08-06 Multitask and Multimodal Neural Tuning for Large Models Hao Sun et.al. 2408.03001 null
2024-08-05 Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection Sajal Aggarwal et.al. 2408.02595 null
2024-08-04 Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI Robert Wolfe et.al. 2408.01959 null
2024-08-03 A Novel Evaluation Framework for Image2Text Generation Jia-Hong Huang et.al. 2408.01723 null
2024-08-02 The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models Simone Caldarella et.al. 2408.01228 null
2024-07-30 AI Safety in Practice: Enhancing Adversarial Robustness in Multimodal Image Captioning Maisha Binte Rashid et.al. 2407.21174 null
2024-07-29 BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues Sara Sarto et.al. 2407.20341 link
2024-07-29 VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks Juhwan Choi et.al. 2407.19795 null
2024-07-26 SWIFT: Semantic Watermarking for Image Forgery Thwarting Gautier Evennou et.al. 2407.18995 null
2024-07-26 HICEScore: A Hierarchical Metric for Image Captioning Evaluation Zequn Zeng et.al. 2407.18589 null
2024-07-26 SPOLRE: Semantic Preserving Object Layout Reconstruction for Image Captioning System Testing Yi Liu et.al. 2407.18512 null
2024-07-23 VisMin: Visual Minimal-Change Understanding Rabiul Awal et.al. 2407.16772 null
2024-07-23 Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models Aristeidis Panos et.al. 2407.16526 null
2024-07-23 Harmonizing Visual Text Comprehension and Generation Zhen Zhao et.al. 2407.16364 null
2024-07-26 Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning Xinwei Liu et.al. 2407.16307 link
2024-07-28 DiffX: Guide Your Layout to Cross-Modal Generative Modeling Zeyu Wang et.al. 2407.15488 link
2024-07-21 VideoGameBunny: Towards vision assistants for video games Mohammad Reza Taesiri et.al. 2407.15295 null
2024-07-20 Downstream-Pretext Domain Knowledge Traceback for Active Learning Beichen Zhang et.al. 2407.14720 null
2024-07-19 On Pre-training of Multimodal Language Models Customized for Chart Understanding Wan-Cyuan Fan et.al. 2407.14506 null
2024-07-19 EVLM: An Efficient Vision-Language Model for Visual Understanding Kaibing Chen et.al. 2407.14177 null
2024-07-18 Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models Xiaoyu Zhu et.al. 2407.13642 null
2024-07-17 LookupViT: Compressing visual information to a limited number of tokens Rajat Koner et.al. 2407.12753 null
2024-07-16 Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights Shunqi Mao et.al. 2407.11449 link
2024-07-17 CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation Kalliopi Basioti et.al. 2407.11393 link
2024-07-15 Can Textual Semantics Mitigate Sounding Object Segmentation Preference? Yaoting Wang et.al. 2407.10947 link
2024-07-12 TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models Jeongho Kim et.al. 2407.09012 null
2024-07-12 15M Multimodal Facial Image-Text Dataset Dawei Dai et.al. 2407.08515 null
2024-07-17 Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation Seonghoon Yu et.al. 2407.07412 link
2024-07-08 Leveraging image captions for selective whole slide image annotation Jingna Qiu et.al. 2407.06363 link
2024-07-08 Pseudo-triplet Guided Few-shot Composed Image Retrieval Bohan Hou et.al. 2407.06001 null
2024-07-08 Negative Results of Image Processing for Identifying Duplicate Questions on Stack Overflow Faiz Ahmed et.al. 2407.05523 null
2024-07-11 Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes Yusuke Hirota et.al. 2407.03623 null
2024-07-02 Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness Khyathi Raghavi Chandu et.al. 2407.01942 null
2024-07-01 Semantic Compositions Enhance Vision-Language Contrastive Learning Maxwell Aladago et.al. 2407.01408 null
2024-06-28 Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review Moseli Mots'oehli et.al. 2407.00252 null
2024-06-28 PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration Yuxuan Sun et.al. 2407.00203 null
2024-06-28 MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment Jihao Liu et.al. 2406.19736 link
2024-06-27 RAVEN: Multitask Retrieval Augmented Vision-Language Learning Varun Nagaraj Rao et.al. 2406.19150 null
2024-07-02 Revisiting Backdoor Attacks against Large Vision-Language Models Siyuan Liang et.al. 2406.18844 null
2024-06-26 MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data William Berman et.al. 2406.18790 null
2024-06-24 Enhancing Scientific Figure Captioning Through Cross-modal Learning Mateo Alejandro Rojas et.al. 2406.17047 null
2024-07-01 A Simple Framework for Open-Vocabulary Zero-Shot Segmentation Thomas Stegmüller et.al. 2406.16085 null
2024-06-22 Reading Is Believing: Revisiting Language Bottleneck Models for Image Classification Honori Udo et.al. 2406.15816 null
2024-06-20 Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? Gregor Geigle et.al. 2406.14492 null
2024-06-20 From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment Yusuke Hirota et.al. 2406.13912 null
2024-06-19 Reinforcing Pre-trained Models Using Counterfactual Images Xiang Li et.al. 2406.13316 null
2024-06-18 Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning? Mingqian Feng et.al. 2406.12663 null
2024-06-18 VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding Xiang Li et.al. 2406.12384 link
2024-06-17 Composing Object Relations and Attributes for Image-Text Matching Khoi Pham et.al. 2406.11820 null
2024-06-17 LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning Dantong Niu et.al. 2406.11815 null
2024-06-17 MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models Shengkang Wang et.al. 2406.11288 link
2024-06-14 From Pixels to Prose: A Large Dataset of Dense Image Captions Vasu Singla et.al. 2406.10328 null
2024-06-14 OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst Jingtao Cao et.al. 2406.09779 null
2024-06-13 ImageNet3D: Towards General-Purpose Object-Level 3D Understanding Wufei Ma et.al. 2406.09613 link
2024-06-13 Yo'LLaVA: Your Personalized Language and Vision Assistant Thao Nguyen et.al. 2406.09400 null
2024-06-13 Towards Vision-Language Geo-Foundation Model: A Survey Yue Zhou et.al. 2406.09385 link
2024-06-11 Translating speech with just images Dan Oneata et.al. 2406.07133 link
2024-06-11 UVIS: Unsupervised Video Instance Segmentation Shuaiyi Huang et.al. 2406.06908 null
2024-06-10 TRINS: Towards Multimodal Language Models that Can Read Ruiyi Zhang et.al. 2406.06730 null
2024-06-10 VCR: Visual Caption Restoration Tianyu Zhang et.al. 2406.06462 link
2024-06-10 FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model Yebin Lee et.al. 2406.06004 link
2024-06-09 Stealthy Targeted Backdoor Attacks against Image Captioning Wenshu Fan et.al. 2406.05874 link
2024-06-07 Interpretable Multimodal Out-of-context Detection with Soft Logic Regularization Huanhuan Ma et.al. 2406.04756 null
2024-06-06 Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning Wenyan Li et.al. 2406.02265 link
2024-06-03 Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model Kezhen Chen et.al. 2406.00977 link
2024-06-01 DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration Nhi Ngoc-Yen Nguyen et.al. 2406.00391 null
2024-06-01 Image Captioning via Dynamic Path Customization Yiwei Ma et.al. 2406.00334 link
2024-05-30 OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation Gonca Yilmaz et.al. 2405.20141 null
2024-05-30 RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection Fangyi Chen et.al. 2405.19854 link
2024-05-29 Multi-Modal Generative Embedding Model Feipeng Ma et.al. 2405.19333 null
2024-05-29 MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification Laura Fieback et.al. 2405.19186 null
2024-05-31 Benchmarking and Improving Detail Image Caption Hongyuan Dong et.al. 2405.19092 link
2024-05-28 Text-only Synthesis for Image Captioning Qing Zhou et.al. 2405.18258 null
2024-05-24 How Culturally Aware are Vision-Language Models? Olena Burda-Lassen et.al. 2405.17475 null
2024-05-25 Semantic Importance-Aware Communications with Semantic Correction Using Large Language Models Shuaishuai Guo et.al. 2405.16011 null
2024-05-23 LG-VQ: Language-Guided Codebook Learning Guotao Liang et.al. 2405.14206 null
2024-05-23 A Survey on Vision-Language-Action Models for Embodied AI Yueen Ma et.al. 2405.14093 null
2024-05-22 CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models Guangzhi Sun et.al. 2405.13684 null
2024-05-25 Class-Conditional self-reward mechanism for improved Text-to-Image models Safouane El Ghazouali et.al. 2405.13473 link
2024-05-21 Towards Retrieval-Augmented Architectures for Image Captioning Sara Sarto et.al. 2405.13127 null
2024-05-16 UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models Sahel Sharifymoghaddam et.al. 2405.10311 null
2024-05-16 ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset Johannes Rückert et.al. 2405.10004 link
2024-05-16 Chameleon: Mixed-Modal Early-Fusion Foundation Models Chameleon Team et.al. 2405.09818 null
2024-05-14 Contextual Emotion Recognition using Large Vision Language Models Yasaman Etesam et.al. 2405.08992 null
2024-05-13 Boostlet.js: Image processing plugins for the web via JavaScript injection Edward Gaibor et.al. 2405.07868 link
2024-05-09 Using Machine Translation to Augment Multilingual Classification Adam King et.al. 2405.05478 null
2024-05-03 LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model Yulin Luo et.al. 2405.02363 link
2024-05-02 Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores Kiyoon Jeong et.al. 2405.01028 link
2024-05-01 Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis Prateek Verma et.al. 2405.00876 null
2024-05-01 The Pyramid of Captions Delong Chen et.al. 2405.00485 null
2024-04-29 Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models Hongyi Zhu et.al. 2404.18746 null
2024-04-28 Semi-supervised Text-based Person Search Daming Gao et.al. 2404.18106 null
2024-04-28 Compressed Image Captioning using CNN-based Encoder-Decoder Framework Md Alif Rahman Ridoy et.al. 2404.18062 null
2024-04-26 Learning text-to-video retrieval from image captioning Lucas Ventura et.al. 2404.17498 null
2024-04-25 OmniSearchSage: Multi-Task Multi-Entity Embeddings for Pinterest Search Prabhat Agarwal et.al. 2404.16260 link
2024-04-24 FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication Eric Slyman et.al. 2404.16123 null
2024-04-23 Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval Young Kyun Jang et.al. 2404.15516 null
2024-04-23 GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots Simranjit Singh et.al. 2404.15500 null
2024-04-12 FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learning Duy Phuong Nguyen et.al. 2404.15182 null
2024-04-21 Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers Georgios Pantazopoulos et.al. 2404.13594 link
2024-04-19 Data Alignment for Zero-Shot Concept Generation in Dermatology AI Soham Gadgil et.al. 2404.13043 null
2024-04-19 MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering Avinash Anand et.al. 2404.12926 null
2024-04-19 The Solution for the CVPR2024 NICE Image Captioning Challenge Longfei Huang et.al. 2404.12739 null
2024-04-16 LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? Yuchi Wang et.al. 2404.10763 link
2024-04-15 ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis Aashish Anantha Ramakrishnan et.al. 2404.10141 link
2024-04-15 Bridging Vision and Language Spaces with Assignment Prediction Jungin Park et.al. 2404.09632 link
2024-04-13 On Speculative Decoding for Multimodal Large Language Models Mukul Gagrani et.al. 2404.08856 null
2024-04-12 Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts Övgü Özdemir et.al. 2404.08589 link
2024-04-11 View Selection for 3D Captioning via Diffusion Ranking Tiange Luo et.al. 2404.07984 null
2024-04-11 Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Haotian Zhang et.al. 2404.07973 null
2024-04-09 Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation Luca Barsellotti et.al. 2404.06542 null
2024-04-06 Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation Danpei Zhao et.al. 2404.04608 null
2024-04-04 CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Dongzhi Jiang et.al. 2404.03653 link

(back to top)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.