首页 文章详情

CVPR 2021 论文和开源项目合集(Papers with Code)

机器学习AI算法工程 | 4227 2021-06-03 10:11 0 0 0
UniSMS (合一短信)


向AI转型的程序员都关注了这个号👇👇👇

人工智能大数据与深度学习  公众号:datayx


Visual Transformer

1. End-to-End Human Pose and Mesh Reconstruction with Transformers

  • Paper: https://arxiv.org/abs/2012.09760

  • Code: https://github.com/microsoft/MeshTransformer

2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

  • Paper: https://arxiv.org/abs/2101.06184

  • Code: https://github.com/tobyperrett/trx

3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain

  • Paper: https://arxiv.org/abs/2103.16110

  • Code: https://github.com/mczhuge/Kaleido-BERT

4. HOTR: End-to-End Human-Object Interaction Detection with Transformers

  • Paper: https://arxiv.org/abs/2104.13682

  • Code: None

5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

  • Paper: https://arxiv.org/abs/2104.09224

  • Code: https://github.com/autonomousvision/transfuser

6. Pose Recognition with Cascade Transformers

  • Paper: https://arxiv.org/abs/2104.06976

  • Code: https://github.com/mlpc-ucsd/PRTR

7. Variational Transformer Networks for Layout Generation

  • Paper: https://arxiv.org/abs/2104.02416

  • Code: None

8. LoFTR: Detector-Free Local Feature Matching with Transformers

  • Homepage: https://zju3dv.github.io/loftr/

  • Paper: https://arxiv.org/abs/2104.00680

  • Code: https://github.com/zju3dv/LoFTR

9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

  • Paper: https://arxiv.org/abs/2012.15840

  • Code: https://github.com/fudan-zvg/SETR

10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

  • Paper: https://arxiv.org/abs/2103.16553

  • Code: None

11. Transformer Tracking

  • Paper: https://arxiv.org/abs/2103.15436

  • Code: https://github.com/chenxin-dlut/TransT

12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

  • Paper(Oral): None

  • Code: https://github.com/dingmyu/HR-NAS

13. MIST: Multiple Instance Spatial Transformer

  • Paper: https://arxiv.org/abs/1811.10725

  • Code: None

14. Multimodal Motion Prediction with Stacked Transformers

  • Paper: https://arxiv.org/abs/2103.11624

  • Code: https://decisionforce.github.io/mmTransformer

15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

  • Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning

  • Code: https://github.com/amzn/image-to-recipe-transformers

16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

  • Paper(Oral): https://arxiv.org/abs/2103.11681

  • Code: https://github.com/594422814/TransformerTrack

17. Pre-Trained Image Processing Transformer

  • Paper: https://arxiv.org/abs/2012.00364

  • Code: None

18. End-to-End Video Instance Segmentation with Transformers

  • Paper(Oral): https://arxiv.org/abs/2011.14503

  • Code: https://github.com/Epiphqny/VisTR

19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

  • Paper(Oral): https://arxiv.org/abs/2011.09094

  • Code: https://github.com/dddzg/up-detr

20. End-to-End Human Object Interaction Detection with HOI Transformer

  • Paper: https://arxiv.org/abs/2103.04503

  • Code: https://github.com/bbepoch/HoiTransformer

21. Transformer Interpretability Beyond Attention Visualization

  • Paper: https://arxiv.org/abs/2012.09838

  • Code: https://github.com/hila-chefer/Transformer-Explainability

22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer

  • Paper: None

  • Code: None

23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

  • Paper: None

  • Code: None

24. Line Segment Detection Using Transformers without Edges

  • Paper(Oral): https://arxiv.org/abs/2101.01909

  • Code: None

25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

  • Paper: MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

  • Code: None

26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

  • Paper(Oral): https://arxiv.org/abs/2101.08833

  • Code: https://github.com/dukebw/SSTVOS

27. Facial Action Unit Detection With Transformers

  • Paper: None

  • Code: None

28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition

  • Paper: None

  • Code: None

29. Lesion-Aware Transformers for Diabetic Retinopathy Grading

  • Paper: None

  • Code: None

30. Topological Planning With Transformers for Vision-and-Language Navigation

  • Paper: https://arxiv.org/abs/2012.05292

  • Code: None

31. Adaptive Image Transformer for One-Shot Object Detection

  • Paper: None

  • Code: None

32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

  • Paper: None

  • Code: None

33. Taming Transformers for High-Resolution Image Synthesis

  • Homepage: https://compvis.github.io/taming-transformers/

  • Paper(Oral): https://arxiv.org/abs/2012.09841

  • Code: https://github.com/CompVis/taming-transformers

34. Self-Supervised Video Hashing via Bidirectional Transformers

  • Paper: None

  • Code: None

35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

  • Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf

  • Code: None

36. Gaussian Context Transformer

  • Paper: None

  • Code: None

37. General Multi-Label Image Classification With Transformers

  • Paper: https://arxiv.org/abs/2011.14027

  • Code: None

38. Bottleneck Transformers for Visual Recognition

  • Paper: https://arxiv.org/abs/2101.11605

  • Code: None

39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

  • Paper(Oral): https://arxiv.org/abs/2011.13922

  • Code: https://github.com/YicongHong/Recurrent-VLN-BERT

40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

  • Paper(Oral): https://arxiv.org/abs/2102.06183

  • Code: https://github.com/jayleicn/ClipBERT

41. Self-attention based Text Knowledge Mining for Text Detection

  • Paper: None

  • Code: https://github.com/CVI-SZU/STKM

42. SSAN: Separable Self-Attention Network for Video Representation Learning

  • Paper: None

  • Code: None

43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones

  • Paper(Oral): https://arxiv.org/abs/2103.12731

  • Code: None


图像分类(Image Classification)

Correlated Input-Dependent Label Noise in Large-Scale Image Classification

  • Paper(Oral): https://arxiv.org/abs/2105.10305

  • Code: https://github.com/google/uncertainty-baselines/tree/master/baselines/imagenet


2D目标检测(Object Detection)

2D目标检测

Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

  • Paper: https://arxiv.org/abs/2105.12971

  • Code: None

PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery

  • Paper: https://arxiv.org/abs/2105.12990

  • Code: None

Domain-Specific Suppression for Adaptive Object Detection

  • Paper: https://arxiv.org/abs/2105.03570

  • Code: None

IQDet: Instance-wise Quality Distribution Sampling for Object Detection

  • Paper: https://arxiv.org/abs/2104.06936

  • Code: None

Multi-Scale Aligned Distillation for Low-Resolution Detection

  • Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf

  • Code: https://github.com/Jia-Research-Lab/MSAD

Adaptive Class Suppression Loss for Long-Tail Object Detection

  • Paper: https://arxiv.org/abs/2104.00885

  • Code: https://github.com/CASIA-IVA-Lab/ACSL

VarifocalNet: An IoU-aware Dense Object Detector

  • Paper(Oral): https://arxiv.org/abs/2008.13367

  • Code: https://github.com/hyz-xmaster/VarifocalNet

Scale-aware Automatic Augmentation for Object Detection

  • Paper: https://arxiv.org/abs/2103.17220

  • Code: https://github.com/Jia-Research-Lab/SA-AutoAug

OTA: Optimal Transport Assignment for Object Detection

  • Paper: https://arxiv.org/abs/2103.14259

  • Code: https://github.com/Megvii-BaseDetection/OTA

Distilling Object Detectors via Decoupled Features

  • Paper: https://arxiv.org/abs/2103.14475

  • Code: https://github.com/ggjy/DeFeat.pytorch

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

  • Paper: https://arxiv.org/abs/2011.12450

  • Code: https://github.com/PeizeSun/SparseR-CNN

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

  • Homepage: https://rl.uni-freiburg.de/

  • Paper: https://arxiv.org/abs/2103.01353

  • Code: None

Positive-Unlabeled Data Purification in the Wild for Object Detection

  • Paper: None

  • Code: None

Instance Localization for Self-supervised Detection Pretraining

  • Paper: https://arxiv.org/abs/2102.08318

  • Code: https://github.com/limbo0000/InstanceLoc

MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection

  • Paper: https://arxiv.org/abs/2103.04224

  • Code: None

End-to-End Object Detection with Fully Convolutional Network

  • Paper: https://arxiv.org/abs/2012.03544

  • Code: https://github.com/Megvii-BaseDetection/DeFCN

Robust and Accurate Object Detection via Adversarial Learning

  • Paper: https://arxiv.org/abs/2103.13886

  • Code: None

I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors

  • Paper: https://arxiv.org/abs/2103.13757

  • Code: None

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

  • Paper: https://arxiv.org/abs/2103.11402

  • Code: None

OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

  • Paper: https://arxiv.org/abs/2103.04507

  • Code: https://github.com/VDIGPKU/OPANAS

YOLOF:You Only Look One-level Feature

  • Paper: https://arxiv.org/abs/2103.09460

  • Code: https://github.com/megvii-model/YOLOF

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

  • Paper(Oral): https://arxiv.org/abs/2011.09094

  • Code: https://github.com/dddzg/up-detr

General Instance Distillation for Object Detection

  • Paper: https://arxiv.org/abs/2103.02340

  • Code: None

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

  • Homepage: http://rl.uni-freiburg.de/research/multimodal-distill

  • Paper: https://arxiv.org/abs/2103.01353

  • Code: http://rl.uni-freiburg.de/research/multimodal-distill

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

  • Paper: https://arxiv.org/abs/2011.12885

  • Code: https://github.com/implus/GFocalV2

Multiple Instance Active Learning for Object Detection

  • Paper: https://github.com/yuantn/MIAL/raw/master/paper.pdf

  • Code: https://github.com/yuantn/MIAL

Towards Open World Object Detection

  • Paper(Oral): https://arxiv.org/abs/2103.02603

  • Code: https://github.com/JosephKJ/OWOD

Few-Shot目标检测

Adaptive Image Transformer for One-Shot Object Detection

  • Paper: None

  • Code: None

Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection

  • Paper: https://arxiv.org/abs/2103.17115

  • Code: https://github.com/hzhupku/DCNet

Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection

  • Paper: https://arxiv.org/abs/2103.01903

  • Code: None

Few-Shot Object Detection via Contrastive Proposal Encoding

  • Paper: https://arxiv.org/abs/2103.05950

  • Code: https://github.com/MegviiDetection/FSCE

旋转目标检测

ReDet: A Rotation-equivariant Detector for Aerial Object Detection

  • Paper: https://arxiv.org/abs/2103.07733

  • Code: https://github.com/csuhan/ReDet


单/多目标跟踪(Object Tracking)

单目标跟踪

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

  • Paper: https://arxiv.org/abs/2104.14545

  • Code: https://github.com/researchmm/LightTrack

Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark

  • Homepage: https://sites.google.com/view/langtrackbenchmark/

  • Paper: https://arxiv.org/abs/2103.16746

  • Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit

  • Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

  • Paper: https://arxiv.org/abs/2103.14938

  • Code: https://github.com/VISION-SJTU/IoUattack

Graph Attention Tracking

  • Paper: https://arxiv.org/abs/2011.11204

  • Code: https://github.com/ohhhyeahhh/SiamGAT

Rotation Equivariant Siamese Networks for Tracking

  • Paper: https://arxiv.org/abs/2012.13078

  • Code: None

Track to Detect and Segment: An Online Multi-Object Tracker

  • Homepage: https://jialianwu.com/projects/TraDeS.html

  • Paper: None

  • Code: None

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

  • Paper(Oral): https://arxiv.org/abs/2103.11681

  • Code: https://github.com/594422814/TransformerTrack

Transformer Tracking

  • Paper: https://arxiv.org/abs/2103.15436

  • Code: https://github.com/chenxin-dlut/TransT

多目标跟踪

Multiple Object Tracking with Correlation Learning

  • Paper: https://arxiv.org/abs/2104.03541

  • Code: None

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

  • Paper: https://arxiv.org/abs/2012.02337

  • Code: None

Learning a Proposal Classifier for Multiple Object Tracking

  • Paper: https://arxiv.org/abs/2103.07889

  • Code: https://github.com/daip13/LPC_MOT.git

Track to Detect and Segment: An Online Multi-Object Tracker

  • Homepage: https://jialianwu.com/projects/TraDeS.html

  • Paper: https://arxiv.org/abs/2103.08808

  • Code: https://github.com/JialianW/TraDeS


语义分割(Semantic Segmentation)

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

  • Paper: https://arxiv.org/abs/2012.05258

  • Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab

  • Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab

Rethinking BiSeNet For Real-time Semantic Segmentation

  • Paper: https://arxiv.org/abs/2104.13188

  • Code: https://github.com/MichaelFan01/STDC-Seg

Progressive Semantic Segmentation

  • Paper: https://arxiv.org/abs/2104.03778

  • Code: https://github.com/VinAIResearch/MagNet

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

  • Paper: https://arxiv.org/abs/2012.15840

  • Code: https://github.com/fudan-zvg/SETR

Bidirectional Projection Network for Cross Dimension Scene Understanding

  • Paper(Oral): https://arxiv.org/abs/2103.14326

  • Code: https://github.com/wbhu/BPNet

Cross-Dataset Collaborative Learning for Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.11351

  • Code: None

Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations

  • Paper: https://arxiv.org/abs/2103.06342

  • Code: None

Capturing Omni-Range Context for Omnidirectional Segmentation

  • Paper: https://arxiv.org/abs/2103.05687

  • Code: None

Learning Statistical Texture for Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.04133

  • Code: None

PLOP: Learning without Forgetting for Continual Semantic Segmentation

  • Paper: https://arxiv.org/abs/2011.11390

  • Code: None

弱监督语义分割

Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

  • Homepage: https://cvlab.yonsei.ac.kr/projects/BANA/

  • Paper: https://arxiv.org/abs/2104.00905

  • Code: None

Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.14581

  • Code: None

BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation

  • Paper: https://arxiv.org/abs/2103.08907

  • Code: None

半监督语义分割

Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.04705

域自适应语义分割

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation

  • Paper: https://arxiv.org/abs/2105.00097

  • Code: https://github.com/visinf/da-sac

RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

  • Paper: https://arxiv.org/abs/2103.15597

  • Code: https://github.com/shachoi/RobustNet

Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization

  • Paper: https://arxiv.org/abs/2103.13041

  • Code: None

MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.05254

  • Code: None

Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.04717

  • Code: None

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation

  • Paper: https://arxiv.org/abs/2101.10979

  • Code: https://github.com/microsoft/ProDA

视频语义分割

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

  • Homepage: https://www.vspwdataset.com/

  • Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf

  • GitHub: https://github.com/sssdddwww2/vspw_dataset_download


实例分割(Instance Segmentation)

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

  • Paper: https://arxiv.org/abs/2011.09876

  • Code: https://github.com/aliyun/DCT-Mask

Incremental Few-Shot Instance Segmentation

  • Paper: https://arxiv.org/abs/2105.05312

  • Code: https://github.com/danganea/iMTFA

A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation

  • Paper: https://arxiv.org/abs/2105.03186

  • Code: None

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

  • Paper: https://arxiv.org/abs/2104.08569

  • Code: https://github.com/zhanggang001/RefineMask/

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation

  • Paper: https://arxiv.org/abs/2104.05239

  • Code: https://github.com/tinyalpha/BPR

Multi-Scale Aligned Distillation for Low-Resolution Detection

  • Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf

  • Code: https://github.com/Jia-Research-Lab/MSAD

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

  • Homepage: https://bowenc0221.github.io/boundary-iou/

  • Paper: https://arxiv.org/abs/2103.16562

  • Code: https://github.com/bowenc0221/boundary-iou-api

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

  • Paper: https://arxiv.org/abs/2103.12340

  • Code: https://github.com/lkeab/BCNet

Zero-shot instance segmentation(Not Sure)

  • Paper: None

  • Code: https://github.com/CVPR2021-pape-id-1395/CVPR2021-paper-id-1395

视频实例分割

STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

  • Paper: http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm

  • Code: https://github.com/MinghanLi/STMask

End-to-End Video Instance Segmentation with Transformers

  • Paper(Oral): https://arxiv.org/abs/2011.14503

  • Code: https://github.com/Epiphqny/VisTR


全景分割(Panoptic Segmentation)

Exemplar-Based Open-Set Panoptic Segmentation Network

  • Homepage: https://cv.snu.ac.kr/research/EOPSN/

  • Paper: https://arxiv.org/abs/2105.08336

  • Code: https://github.com/jd730/EOPSN

MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

  • Paper: MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

  • Code: None

Panoptic Segmentation Forecasting

  • Paper: https://arxiv.org/abs/2104.03962

  • Code: https://github.com/nianticlabs/panoptic-forecasting

Fully Convolutional Networks for Panoptic Segmentation

  • Paper: https://arxiv.org/abs/2012.00720

  • Code: https://github.com/yanwei-li/PanopticFCN

Cross-View Regularization for Domain Adaptive Panoptic Segmentation

  • Paper: https://arxiv.org/abs/2103.02584

  • Code: None


医学图像分割

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

  • Paper: https://arxiv.org/abs/2103.06030

  • Code: https://github.com/liuquande/FedDG-ELCFS

3D医学图像分割

DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

  • Paper(Oral): https://arxiv.org/abs/2103.15954

  • Code: None


场景文本检测(Scene Text Detection)

Fourier Contour Embedding for Arbitrary-Shaped Text Detection

  • Paper: https://arxiv.org/abs/2104.10442

  • Code: None


场景文本识别(Scene Text Recognition)

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

  • Paper: https://arxiv.org/abs/2103.06495

  • Code: https://github.com/FangShancheng/ABINet


超分辨率(Super-Resolution)

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

  • Homepage: http://mepro.bjtu.edu.cn/resource.html

  • Paper: https://arxiv.org/abs/2104.06174

  • Code: None

ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

  • Paper: https://arxiv.org/abs/2103.04039

  • Code: https://github.com/Xiangtaokong/ClassSR

AdderSR: Towards Energy Efficient Image Super-Resolution

  • Paper: https://arxiv.org/abs/2009.08891

  • Code: None


去雾(Dehazing)

Contrastive Learning for Compact Single Image Dehazing

  • Paper: https://arxiv.org/abs/2104.09367

  • Code: https://github.com/GlassyWu/AECR-Net

视频超分辨率

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

  • Paper: None

  • Code: https://github.com/CS-GangXu/TMNet


图像恢复(Image Restoration)

Multi-Stage Progressive Image Restoration

  • Paper: https://arxiv.org/abs/2102.02808

  • Code: https://github.com/swz30/MPRNet


图像补全(Image Inpainting)

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

  • Paper: https://arxiv.org/abs/2105.02201

  • Code: https://github.com/KumapowerLIU/PD-GAN

TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations

  • Homepage: https://yzhouas.github.io/projects/TransFill/index.html

  • Paper: https://arxiv.org/abs/2103.15982

  • Code: None


图像编辑(Image Editing)

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

  • Paper: https://arxiv.org/abs/2104.14754

  • Code: https://github.com/naver-ai/StyleMapGAN

  • Demo Video: https://youtu.be/qCapNyRA_Ng

High-Fidelity and Arbitrary Face Editing

  • Paper: https://arxiv.org/abs/2103.15814

  • Code: None

Anycost GANs for Interactive Image Synthesis and Editing

  • Paper: https://arxiv.org/abs/2103.03243

  • Code: https://github.com/mit-han-lab/anycost-gan

PISE: Person Image Synthesis and Editing with Decoupled GAN

  • Paper: https://arxiv.org/abs/2103.04023

  • Code: https://github.com/Zhangjinso/PISE

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

  • Paper: http://raywzy.com/

  • Code: http://raywzy.com/

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

  • Paper: None

  • Code: None


图像描述(Image Captioning)

Towards Accurate Text-based Image Captioning with Content Diversity Exploration

  • Paper: https://arxiv.org/abs/2105.03236

  • Code: None


图像匹配(Image Matcing)

LoFTR: Detector-Free Local Feature Matching with Transformers

  • Homepage: https://zju3dv.github.io/loftr/

  • Paper: https://arxiv.org/abs/2104.00680

  • Code: https://github.com/zju3dv/LoFTR

Convolutional Hough Matching Networks

  • Homapage: http://cvlab.postech.ac.kr/research/CHM/

  • Paper(Oral): https://arxiv.org/abs/2103.16831

  • Code: None


图像融合(Image Blending)

Bridging the Visual Gap: Wide-Range Image Blending

  • Paper: https://arxiv.org/abs/2103.15149

  • Code: https://github.com/julia0607/Wide-Range-Image-Blending


数据集(Datasets)

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

  • Paper: https://arxiv.org/abs/2105.09188

  • Code: https://github.com/csjliang/LPTN

  • Dataset: https://github.com/csjliang/LPTN

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

  • Paper: https://arxiv.org/abs/2105.02440

  • Code: https://github.com/VisDrone/DroneCrowd

  • Dataset: https://github.com/VisDrone/DroneCrowd

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

  • Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/

  • Paper(Oral): https://arxiv.org/abs/2104.12690

  • Code: https://github.com/fidler-lab/efficient-annotation-cookbook

论文下载链接:

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

  • Paper: https://arxiv.org/abs/2012.05258

  • Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab

  • Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab

Learning To Count Everything

  • Paper: https://arxiv.org/abs/2104.08391

  • Code: https://github.com/cvlab-stonybrook/LearningToCountEverything

  • Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything

Semantic Image Matting

  • Paper: https://arxiv.org/abs/2104.08201

  • Code: https://github.com/nowsyn/SIM

  • Dataset: https://github.com/nowsyn/SIM

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

  • Homepage: http://mepro.bjtu.edu.cn/resource.html

  • Paper: https://arxiv.org/abs/2104.06174

  • Code: None

Visual Semantic Role Labeling for Video Understanding

  • Homepage: https://vidsitu.org/

  • Paper: https://arxiv.org/abs/2104.00990

  • Code: https://github.com/TheShadow29/VidSitu

  • Dataset: https://github.com/TheShadow29/VidSitu

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

  • Homepage: https://www.vspwdataset.com/

  • Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf

  • GitHub: https://github.com/sssdddwww2/vspw_dataset_download

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

  • Homepage: https://vap.aau.dk/sewer-ml/

  • Paper: https://arxiv.org/abs/2103.10619

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

  • Homepage: https://vap.aau.dk/sewer-ml/

  • Paper: https://arxiv.org/abs/2103.10895

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

  • Paper: https://arxiv.org/abs/2103.03375

  • Dataset: None

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

  • Homepage: https://github.com/QingyongHu/SensatUrban

  • Paper: http://arxiv.org/abs/2009.03137

  • Code: https://github.com/QingyongHu/SensatUrban

  • Dataset: https://github.com/QingyongHu/SensatUrban

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

  • Paper(Oral): https://arxiv.org/abs/2103.01520

  • Code: https://github.com/Hzzone/MTLFace

  • Dataset: https://github.com/Hzzone/MTLFace

Depth from Camera Motion and Object Detection

  • Paper: https://arxiv.org/abs/2103.01468

  • Code: https://github.com/griffbr/ODMD

  • Dataset: https://github.com/griffbr/ODMD

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

  • Homepage: http://rl.uni-freiburg.de/research/multimodal-distill

  • Paper: https://arxiv.org/abs/2103.01353

  • Code: http://rl.uni-freiburg.de/research/multimodal-distill

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

  • Paper: https://arxiv.org/abs/2012.02206

  • Code: https://github.com/daveredrum/Scan2Cap

  • Dataset: https://github.com/daveredrum/ScanRefer

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

  • Paper: https://arxiv.org/abs/2103.01353

  • Code: http://rl.uni-freiburg.de/research/multimodal-distill

  • Dataset: http://rl.uni-freiburg.de/research/multimodal-distill


机器学习算法AI大数据技术

 搜索公众号添加: datanlp

长按图片,识别二维码




阅读过本文的人还看了以下文章:


TensorFlow 2.0深度学习案例实战


基于40万表格数据集TableBank,用MaskRCNN做表格检测


《基于深度学习的自然语言处理》中/英PDF


Deep Learning 中文版初版-周志华团队


【全套视频课】最全的目标检测算法系列讲解,通俗易懂!


《美团机器学习实践》_美团算法团队.pdf


《深度学习入门:基于Python的理论与实现》高清中文PDF+源码


特征提取与图像处理(第二版).pdf


python就业班学习视频,从入门到实战项目


2019最新《PyTorch自然语言处理》英、中文版PDF+源码


《21个项目玩转深度学习:基于TensorFlow的实践详解》完整版PDF+附书代码


《深度学习之pytorch》pdf+附书源码


PyTorch深度学习快速实战入门《pytorch-handbook》


【下载】豆瓣评分8.1,《机器学习实战:基于Scikit-Learn和TensorFlow》


《Python数据分析与挖掘实战》PDF+完整源码


汽车行业完整知识图谱项目实战视频(全23课)


李沐大神开源《动手学深度学习》,加州伯克利深度学习(2019春)教材


笔记、代码清晰易懂!李航《统计学习方法》最新资源全套!


《神经网络与深度学习》最新2018版中英PDF+源码


将机器学习模型部署为REST API


FashionAI服装属性标签图像识别Top1-5方案分享


重要开源!CNN-RNN-CTC 实现手写汉字识别


yolo3 检测出图像中的不规则汉字


同样是机器学习算法工程师,你的面试为什么过不了?


前海征信大数据算法:风险概率预测


【Keras】完整实现‘交通标志’分类、‘票据’分类两个项目,让你掌握深度学习图像分类


VGG16迁移学习,实现医学图像识别分类工程项目


特征工程(一)


特征工程(二) :文本数据的展开、过滤和分块


特征工程(三):特征缩放,从词袋到 TF-IDF


特征工程(四): 类别特征


特征工程(五): PCA 降维


特征工程(六): 非线性特征提取和模型堆叠


特征工程(七):图像特征提取和深度学习


如何利用全新的决策树集成级联结构gcForest做特征工程并打分?


Machine Learning Yearning 中文翻译稿


蚂蚁金服2018秋招-算法工程师(共四面)通过


全球AI挑战-场景分类的比赛源码(多模型融合)


斯坦福CS230官方指南:CNN、RNN及使用技巧速查(打印收藏)


python+flask搭建CNN在线识别手写中文网站


中科院Kaggle全球文本匹配竞赛华人第1名团队-深度学习与特征工程



不断更新资源

深度学习、机器学习、数据分析、python

 搜索公众号添加: datayx  

good-icon 0
favorite-icon 0
收藏
回复数量: 0
    暂无评论~~
    Ctrl+Enter