向AI转型的程序员都关注了这个号👇👇👇

人工智能大数据与深度学习公众号：datayx

Visual Transformer

1. End-to-End Human Pose and Mesh Reconstruction with Transformers

Paper: https://arxiv.org/abs/2012.09760
Code: https://github.com/microsoft/MeshTransformer

2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

Paper: https://arxiv.org/abs/2101.06184
Code: https://github.com/tobyperrett/trx

3. Kaleido-BERT：Vision-Language Pre-training on Fashion Domain

Paper: https://arxiv.org/abs/2103.16110
Code: https://github.com/mczhuge/Kaleido-BERT

4. HOTR: End-to-End Human-Object Interaction Detection with Transformers

Paper: https://arxiv.org/abs/2104.13682
Code: None

5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Paper: https://arxiv.org/abs/2104.09224
Code: https://github.com/autonomousvision/transfuser

6. Pose Recognition with Cascade Transformers

Paper: https://arxiv.org/abs/2104.06976
Code: https://github.com/mlpc-ucsd/PRTR

7. Variational Transformer Networks for Layout Generation

Paper: https://arxiv.org/abs/2104.02416
Code: None

8. LoFTR: Detector-Free Local Feature Matching with Transformers

Homepage: https://zju3dv.github.io/loftr/
Paper: https://arxiv.org/abs/2104.00680
Code: https://github.com/zju3dv/LoFTR

9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Paper: https://arxiv.org/abs/2012.15840
Code: https://github.com/fudan-zvg/SETR

10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

Paper: https://arxiv.org/abs/2103.16553
Code: None

11. Transformer Tracking

Paper: https://arxiv.org/abs/2103.15436
Code: https://github.com/chenxin-dlut/TransT

12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

Paper(Oral): None
Code: https://github.com/dingmyu/HR-NAS

13. MIST: Multiple Instance Spatial Transformer

Paper: https://arxiv.org/abs/1811.10725
Code: None

14. Multimodal Motion Prediction with Stacked Transformers

Paper: https://arxiv.org/abs/2103.11624
Code: https://decisionforce.github.io/mmTransformer

15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning
Code: https://github.com/amzn/image-to-recipe-transformers

16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Paper(Oral): https://arxiv.org/abs/2103.11681
Code: https://github.com/594422814/TransformerTrack

17. Pre-Trained Image Processing Transformer

Paper: https://arxiv.org/abs/2012.00364
Code: None

18. End-to-End Video Instance Segmentation with Transformers

Paper(Oral): https://arxiv.org/abs/2011.14503
Code: https://github.com/Epiphqny/VisTR

19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Paper(Oral): https://arxiv.org/abs/2011.09094
Code: https://github.com/dddzg/up-detr

20. End-to-End Human Object Interaction Detection with HOI Transformer

Paper: https://arxiv.org/abs/2103.04503
Code: https://github.com/bbepoch/HoiTransformer

21. Transformer Interpretability Beyond Attention Visualization

Paper: https://arxiv.org/abs/2012.09838
Code: https://github.com/hila-chefer/Transformer-Explainability

22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer

Paper: None
Code: None

23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

Paper: None
Code: None

24. Line Segment Detection Using Transformers without Edges

Paper(Oral): https://arxiv.org/abs/2101.01909
Code: None

25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

Paper: MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Code: None

26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

Paper(Oral): https://arxiv.org/abs/2101.08833
Code: https://github.com/dukebw/SSTVOS

27. Facial Action Unit Detection With Transformers

Paper: None
Code: None

28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition

Paper: None
Code: None

29. Lesion-Aware Transformers for Diabetic Retinopathy Grading

Paper: None
Code: None

30. Topological Planning With Transformers for Vision-and-Language Navigation

Paper: https://arxiv.org/abs/2012.05292
Code: None

31. Adaptive Image Transformer for One-Shot Object Detection

Paper: None
Code: None

32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

Paper: None
Code: None

33. Taming Transformers for High-Resolution Image Synthesis

Homepage: https://compvis.github.io/taming-transformers/
Paper(Oral): https://arxiv.org/abs/2012.09841
Code: https://github.com/CompVis/taming-transformers

34. Self-Supervised Video Hashing via Bidirectional Transformers

Paper: None
Code: None

35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf
Code: None

36. Gaussian Context Transformer

Paper: None
Code: None

37. General Multi-Label Image Classification With Transformers

Paper: https://arxiv.org/abs/2011.14027
Code: None

38. Bottleneck Transformers for Visual Recognition

Paper: https://arxiv.org/abs/2101.11605
Code: None

39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

Paper(Oral): https://arxiv.org/abs/2011.13922
Code: https://github.com/YicongHong/Recurrent-VLN-BERT

40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Paper(Oral): https://arxiv.org/abs/2102.06183
Code: https://github.com/jayleicn/ClipBERT

41. Self-attention based Text Knowledge Mining for Text Detection

Paper: None
Code: https://github.com/CVI-SZU/STKM

42. SSAN: Separable Self-Attention Network for Video Representation Learning

Paper: None
Code: None

43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Paper(Oral): https://arxiv.org/abs/2103.12731
Code: None

图像分类(Image Classification)

Correlated Input-Dependent Label Noise in Large-Scale Image Classification

Paper(Oral): https://arxiv.org/abs/2105.10305
Code: https://github.com/google/uncertainty-baselines/tree/master/baselines/imagenet

2D目标检测(Object Detection)

2D目标检测

Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

Paper: https://arxiv.org/abs/2105.12971
Code: None

PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery

Paper: https://arxiv.org/abs/2105.12990
Code: None

Domain-Specific Suppression for Adaptive Object Detection

Paper: https://arxiv.org/abs/2105.03570
Code: None

IQDet: Instance-wise Quality Distribution Sampling for Object Detection

Paper: https://arxiv.org/abs/2104.06936
Code: None

Multi-Scale Aligned Distillation for Low-Resolution Detection

Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf
Code: https://github.com/Jia-Research-Lab/MSAD

Adaptive Class Suppression Loss for Long-Tail Object Detection

Paper: https://arxiv.org/abs/2104.00885
Code: https://github.com/CASIA-IVA-Lab/ACSL

VarifocalNet: An IoU-aware Dense Object Detector

Paper(Oral): https://arxiv.org/abs/2008.13367
Code: https://github.com/hyz-xmaster/VarifocalNet

Scale-aware Automatic Augmentation for Object Detection

Paper: https://arxiv.org/abs/2103.17220
Code: https://github.com/Jia-Research-Lab/SA-AutoAug

OTA: Optimal Transport Assignment for Object Detection

Paper: https://arxiv.org/abs/2103.14259
Code: https://github.com/Megvii-BaseDetection/OTA

Distilling Object Detectors via Decoupled Features

Paper: https://arxiv.org/abs/2103.14475
Code: https://github.com/ggjy/DeFeat.pytorch

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

Paper: https://arxiv.org/abs/2011.12450
Code: https://github.com/PeizeSun/SparseR-CNN

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Homepage: https://rl.uni-freiburg.de/
Paper: https://arxiv.org/abs/2103.01353
Code: None

Positive-Unlabeled Data Purification in the Wild for Object Detection

Paper: None
Code: None

Instance Localization for Self-supervised Detection Pretraining

Paper: https://arxiv.org/abs/2102.08318
Code: https://github.com/limbo0000/InstanceLoc

MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection

Paper: https://arxiv.org/abs/2103.04224
Code: None

End-to-End Object Detection with Fully Convolutional Network

Paper: https://arxiv.org/abs/2012.03544
Code: https://github.com/Megvii-BaseDetection/DeFCN

Robust and Accurate Object Detection via Adversarial Learning

Paper: https://arxiv.org/abs/2103.13886
Code: None

I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors

Paper: https://arxiv.org/abs/2103.13757
Code: None

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Paper: https://arxiv.org/abs/2103.11402
Code: None

OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

Paper: https://arxiv.org/abs/2103.04507
Code: https://github.com/VDIGPKU/OPANAS

YOLOF：You Only Look One-level Feature

Paper: https://arxiv.org/abs/2103.09460
Code: https://github.com/megvii-model/YOLOF

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Paper(Oral): https://arxiv.org/abs/2011.09094
Code: https://github.com/dddzg/up-detr

General Instance Distillation for Object Detection

Paper: https://arxiv.org/abs/2103.02340
Code: None

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

Paper: https://arxiv.org/abs/2011.12885
Code: https://github.com/implus/GFocalV2

Multiple Instance Active Learning for Object Detection

Paper: https://github.com/yuantn/MIAL/raw/master/paper.pdf
Code: https://github.com/yuantn/MIAL

Towards Open World Object Detection

Paper(Oral): https://arxiv.org/abs/2103.02603
Code: https://github.com/JosephKJ/OWOD

Few-Shot目标检测

Adaptive Image Transformer for One-Shot Object Detection

Paper: None
Code: None

Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection

Paper: https://arxiv.org/abs/2103.17115
Code: https://github.com/hzhupku/DCNet

Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection

Paper: https://arxiv.org/abs/2103.01903
Code: None

Few-Shot Object Detection via Contrastive Proposal Encoding

Paper: https://arxiv.org/abs/2103.05950
Code: https://github.com/MegviiDetection/FSCE

旋转目标检测

ReDet: A Rotation-equivariant Detector for Aerial Object Detection

Paper: https://arxiv.org/abs/2103.07733
Code: https://github.com/csuhan/ReDet

单/多目标跟踪(Object Tracking)

单目标跟踪

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

Paper: https://arxiv.org/abs/2104.14545
Code: https://github.com/researchmm/LightTrack

Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark

Homepage: https://sites.google.com/view/langtrackbenchmark/
Paper: https://arxiv.org/abs/2103.16746
Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit
Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

Paper: https://arxiv.org/abs/2103.14938
Code: https://github.com/VISION-SJTU/IoUattack

Graph Attention Tracking

Paper: https://arxiv.org/abs/2011.11204
Code: https://github.com/ohhhyeahhh/SiamGAT

Rotation Equivariant Siamese Networks for Tracking

Paper: https://arxiv.org/abs/2012.13078
Code: None

Track to Detect and Segment: An Online Multi-Object Tracker

Homepage: https://jialianwu.com/projects/TraDeS.html
Paper: None
Code: None

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Paper(Oral): https://arxiv.org/abs/2103.11681
Code: https://github.com/594422814/TransformerTrack

Transformer Tracking

Paper: https://arxiv.org/abs/2103.15436
Code: https://github.com/chenxin-dlut/TransT

多目标跟踪

Multiple Object Tracking with Correlation Learning

Paper: https://arxiv.org/abs/2104.03541
Code: None

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Paper: https://arxiv.org/abs/2012.02337
Code: None

Learning a Proposal Classifier for Multiple Object Tracking

Paper: https://arxiv.org/abs/2103.07889
Code: https://github.com/daip13/LPC_MOT.git

Track to Detect and Segment: An Online Multi-Object Tracker

Homepage: https://jialianwu.com/projects/TraDeS.html
Paper: https://arxiv.org/abs/2103.08808
Code: https://github.com/JialianW/TraDeS

语义分割(Semantic Segmentation)

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Paper: https://arxiv.org/abs/2012.05258
Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab

Rethinking BiSeNet For Real-time Semantic Segmentation

Paper: https://arxiv.org/abs/2104.13188
Code: https://github.com/MichaelFan01/STDC-Seg

Progressive Semantic Segmentation

Paper: https://arxiv.org/abs/2104.03778
Code: https://github.com/VinAIResearch/MagNet

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Paper: https://arxiv.org/abs/2012.15840
Code: https://github.com/fudan-zvg/SETR

Bidirectional Projection Network for Cross Dimension Scene Understanding

Paper(Oral): https://arxiv.org/abs/2103.14326
Code: https://github.com/wbhu/BPNet

Cross-Dataset Collaborative Learning for Semantic Segmentation

Paper: https://arxiv.org/abs/2103.11351
Code: None

Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations

Paper: https://arxiv.org/abs/2103.06342
Code: None

Capturing Omni-Range Context for Omnidirectional Segmentation

Paper: https://arxiv.org/abs/2103.05687
Code: None

Learning Statistical Texture for Semantic Segmentation

Paper: https://arxiv.org/abs/2103.04133
Code: None

PLOP: Learning without Forgetting for Continual Semantic Segmentation

Paper: https://arxiv.org/abs/2011.11390
Code: None

弱监督语义分割

Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

Homepage: https://cvlab.yonsei.ac.kr/projects/BANA/
Paper: https://arxiv.org/abs/2104.00905
Code: None

Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

Paper: https://arxiv.org/abs/2103.14581
Code: None

BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation

Paper: https://arxiv.org/abs/2103.08907
Code: None

半监督语义分割

Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation

Paper: https://arxiv.org/abs/2103.04705

域自适应语义分割

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation

Paper: https://arxiv.org/abs/2105.00097
Code: https://github.com/visinf/da-sac

RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

Paper: https://arxiv.org/abs/2103.15597
Code: https://github.com/shachoi/RobustNet

Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization

Paper: https://arxiv.org/abs/2103.13041
Code: None

MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation

Paper: https://arxiv.org/abs/2103.05254
Code: None

Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation

Paper: https://arxiv.org/abs/2103.04717
Code: None

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation

Paper: https://arxiv.org/abs/2101.10979
Code: https://github.com/microsoft/ProDA

视频语义分割

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

Homepage: https://www.vspwdataset.com/
Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
GitHub: https://github.com/sssdddwww2/vspw_dataset_download

实例分割(Instance Segmentation)

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

Paper: https://arxiv.org/abs/2011.09876
Code: https://github.com/aliyun/DCT-Mask

Incremental Few-Shot Instance Segmentation

Paper: https://arxiv.org/abs/2105.05312
Code: https://github.com/danganea/iMTFA

A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation

Paper: https://arxiv.org/abs/2105.03186
Code: None

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

Paper: https://arxiv.org/abs/2104.08569
Code: https://github.com/zhanggang001/RefineMask/

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation

Paper: https://arxiv.org/abs/2104.05239
Code: https://github.com/tinyalpha/BPR

Multi-Scale Aligned Distillation for Low-Resolution Detection

Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf
Code: https://github.com/Jia-Research-Lab/MSAD

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

Homepage: https://bowenc0221.github.io/boundary-iou/
Paper: https://arxiv.org/abs/2103.16562
Code: https://github.com/bowenc0221/boundary-iou-api

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

Paper: https://arxiv.org/abs/2103.12340
Code: https://github.com/lkeab/BCNet

Zero-shot instance segmentation（Not Sure）

Paper: None
Code: https://github.com/CVPR2021-pape-id-1395/CVPR2021-paper-id-1395

视频实例分割

STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Paper: http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm
Code: https://github.com/MinghanLi/STMask

End-to-End Video Instance Segmentation with Transformers

Paper(Oral): https://arxiv.org/abs/2011.14503
Code: https://github.com/Epiphqny/VisTR

全景分割(Panoptic Segmentation)

Exemplar-Based Open-Set Panoptic Segmentation Network

Homepage: https://cv.snu.ac.kr/research/EOPSN/
Paper: https://arxiv.org/abs/2105.08336
Code: https://github.com/jd730/EOPSN

MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

Paper: MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Code: None

Panoptic Segmentation Forecasting

Paper: https://arxiv.org/abs/2104.03962
Code: https://github.com/nianticlabs/panoptic-forecasting

Fully Convolutional Networks for Panoptic Segmentation

Paper: https://arxiv.org/abs/2012.00720
Code: https://github.com/yanwei-li/PanopticFCN

Cross-View Regularization for Domain Adaptive Panoptic Segmentation

Paper: https://arxiv.org/abs/2103.02584
Code: None

医学图像分割

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

Paper: https://arxiv.org/abs/2103.06030
Code: https://github.com/liuquande/FedDG-ELCFS

3D医学图像分割

DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

Paper(Oral): https://arxiv.org/abs/2103.15954
Code: None

场景文本检测(Scene Text Detection)

Fourier Contour Embedding for Arbitrary-Shaped Text Detection

Paper: https://arxiv.org/abs/2104.10442
Code: None

场景文本识别(Scene Text Recognition)

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Paper: https://arxiv.org/abs/2103.06495
Code: https://github.com/FangShancheng/ABINet

超分辨率(Super-Resolution)

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

Homepage: http://mepro.bjtu.edu.cn/resource.html
Paper: https://arxiv.org/abs/2104.06174
Code: None

ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

Paper: https://arxiv.org/abs/2103.04039
Code: https://github.com/Xiangtaokong/ClassSR

AdderSR: Towards Energy Efficient Image Super-Resolution

Paper: https://arxiv.org/abs/2009.08891
Code: None

去雾(Dehazing)

Contrastive Learning for Compact Single Image Dehazing

Paper: https://arxiv.org/abs/2104.09367
Code: https://github.com/GlassyWu/AECR-Net

视频超分辨率

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

Paper: None
Code: https://github.com/CS-GangXu/TMNet

图像恢复(Image Restoration)

Multi-Stage Progressive Image Restoration

Paper: https://arxiv.org/abs/2102.02808
Code: https://github.com/swz30/MPRNet

图像补全(Image Inpainting)

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

Paper: https://arxiv.org/abs/2105.02201
Code: https://github.com/KumapowerLIU/PD-GAN

TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations

Homepage: https://yzhouas.github.io/projects/TransFill/index.html
Paper: https://arxiv.org/abs/2103.15982
Code: None

图像编辑(Image Editing)

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

Paper: https://arxiv.org/abs/2104.14754
Code: https://github.com/naver-ai/StyleMapGAN
Demo Video: https://youtu.be/qCapNyRA_Ng

High-Fidelity and Arbitrary Face Editing

Paper: https://arxiv.org/abs/2103.15814
Code: None

Anycost GANs for Interactive Image Synthesis and Editing

Paper: https://arxiv.org/abs/2103.03243
Code: https://github.com/mit-han-lab/anycost-gan

PISE: Person Image Synthesis and Editing with Decoupled GAN

Paper: https://arxiv.org/abs/2103.04023
Code: https://github.com/Zhangjinso/PISE

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

Paper: http://raywzy.com/
Code: http://raywzy.com/

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

Paper: None
Code: None

图像描述(Image Captioning)

Towards Accurate Text-based Image Captioning with Content Diversity Exploration

Paper: https://arxiv.org/abs/2105.03236
Code: None

图像匹配(Image Matcing)

LoFTR: Detector-Free Local Feature Matching with Transformers

Homepage: https://zju3dv.github.io/loftr/
Paper: https://arxiv.org/abs/2104.00680
Code: https://github.com/zju3dv/LoFTR

Convolutional Hough Matching Networks

Homapage: http://cvlab.postech.ac.kr/research/CHM/
Paper(Oral): https://arxiv.org/abs/2103.16831
Code: None

图像融合(Image Blending)

Bridging the Visual Gap: Wide-Range Image Blending

Paper: https://arxiv.org/abs/2103.15149
Code: https://github.com/julia0607/Wide-Range-Image-Blending

数据集(Datasets)

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

Paper: https://arxiv.org/abs/2105.09188
Code: https://github.com/csjliang/LPTN
Dataset: https://github.com/csjliang/LPTN

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

Paper: https://arxiv.org/abs/2105.02440
Code: https://github.com/VisDrone/DroneCrowd
Dataset: https://github.com/VisDrone/DroneCrowd

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/
Paper(Oral): https://arxiv.org/abs/2104.12690
Code: https://github.com/fidler-lab/efficient-annotation-cookbook

论文下载链接：

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Paper: https://arxiv.org/abs/2012.05258
Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab

Learning To Count Everything

Paper: https://arxiv.org/abs/2104.08391
Code: https://github.com/cvlab-stonybrook/LearningToCountEverything
Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything

Semantic Image Matting

Paper: https://arxiv.org/abs/2104.08201
Code: https://github.com/nowsyn/SIM
Dataset: https://github.com/nowsyn/SIM

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

Homepage: http://mepro.bjtu.edu.cn/resource.html
Paper: https://arxiv.org/abs/2104.06174
Code: None

Visual Semantic Role Labeling for Video Understanding

Homepage: https://vidsitu.org/
Paper: https://arxiv.org/abs/2104.00990
Code: https://github.com/TheShadow29/VidSitu
Dataset: https://github.com/TheShadow29/VidSitu

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

Homepage: https://www.vspwdataset.com/
Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
GitHub: https://github.com/sssdddwww2/vspw_dataset_download

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Homepage: https://vap.aau.dk/sewer-ml/
Paper: https://arxiv.org/abs/2103.10619

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Homepage: https://vap.aau.dk/sewer-ml/
Paper: https://arxiv.org/abs/2103.10895

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

Paper: https://arxiv.org/abs/2103.03375
Dataset: None

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

Homepage: https://github.com/QingyongHu/SensatUrban
Paper: http://arxiv.org/abs/2009.03137
Code: https://github.com/QingyongHu/SensatUrban
Dataset: https://github.com/QingyongHu/SensatUrban

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

Paper(Oral): https://arxiv.org/abs/2103.01520
Code: https://github.com/Hzzone/MTLFace
Dataset: https://github.com/Hzzone/MTLFace

Depth from Camera Motion and Object Detection

Paper: https://arxiv.org/abs/2103.01468
Code: https://github.com/griffbr/ODMD
Dataset: https://github.com/griffbr/ODMD

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

Paper: https://arxiv.org/abs/2012.02206
Code: https://github.com/daveredrum/Scan2Cap
Dataset: https://github.com/daveredrum/ScanRefer

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill
Dataset: http://rl.uni-freiburg.de/research/multimodal-distill

机器学习算法AI大数据技术

搜索公众号添加： datanlp

长按图片，识别二维码

阅读过本文的人还看了以下文章：

TensorFlow 2.0深度学习案例实战

基于40万表格数据集TableBank，用MaskRCNN做表格检测

《基于深度学习的自然语言处理》中/英PDF

Deep Learning 中文版初版-周志华团队

【全套视频课】最全的目标检测算法系列讲解，通俗易懂！

《美团机器学习实践》_美团算法团队.pdf

《深度学习入门：基于Python的理论与实现》高清中文PDF+源码

特征提取与图像处理(第二版).pdf

python就业班学习视频，从入门到实战项目

2019最新《PyTorch自然语言处理》英、中文版PDF+源码

《21个项目玩转深度学习：基于TensorFlow的实践详解》完整版PDF+附书代码

《深度学习之pytorch》pdf+附书源码

PyTorch深度学习快速实战入门《pytorch-handbook》

【下载】豆瓣评分8.1,《机器学习实战:基于Scikit-Learn和TensorFlow》

《Python数据分析与挖掘实战》PDF+完整源码

汽车行业完整知识图谱项目实战视频(全23课)

李沐大神开源《动手学深度学习》，加州伯克利深度学习（2019春）教材

笔记、代码清晰易懂！李航《统计学习方法》最新资源全套！

《神经网络与深度学习》最新2018版中英PDF+源码

将机器学习模型部署为REST API

FashionAI服装属性标签图像识别Top1-5方案分享

重要开源！CNN-RNN-CTC 实现手写汉字识别

yolo3 检测出图像中的不规则汉字

同样是机器学习算法工程师，你的面试为什么过不了？

前海征信大数据算法：风险概率预测

【Keras】完整实现‘交通标志’分类、‘票据’分类两个项目，让你掌握深度学习图像分类

VGG16迁移学习，实现医学图像识别分类工程项目

特征工程(一)

特征工程(二) :文本数据的展开、过滤和分块

特征工程(三):特征缩放,从词袋到 TF-IDF

特征工程(四): 类别特征

特征工程(五): PCA 降维

特征工程(六): 非线性特征提取和模型堆叠

特征工程(七)：图像特征提取和深度学习

如何利用全新的决策树集成级联结构gcForest做特征工程并打分？

Machine Learning Yearning 中文翻译稿

蚂蚁金服2018秋招-算法工程师（共四面）通过

全球AI挑战-场景分类的比赛源码(多模型融合)

斯坦福CS230官方指南：CNN、RNN及使用技巧速查（打印收藏）

python+flask搭建CNN在线识别手写中文网站

中科院Kaggle全球文本匹配竞赛华人第1名团队-深度学习与特征工程

不断更新资源

深度学习、机器学习、数据分析、python

搜索公众号添加： datayx

CVPR 2021 论文和开源项目合集(Papers with Code)

Visual Transformer

图像分类(Image Classification)

2D目标检测(Object Detection)

2D目标检测

Few-Shot目标检测

旋转目标检测

单/多目标跟踪(Object Tracking)

单目标跟踪

多目标跟踪

语义分割(Semantic Segmentation)

弱监督语义分割

半监督语义分割

域自适应语义分割

视频语义分割

实例分割(Instance Segmentation)

视频实例分割

全景分割(Panoptic Segmentation)

医学图像分割

3D医学图像分割

场景文本检测(Scene Text Detection)

场景文本识别(Scene Text Recognition)

超分辨率(Super-Resolution)

去雾(Dehazing)

视频超分辨率

图像恢复(Image Restoration)

图像补全(Image Inpainting)

图像编辑(Image Editing)

图像描述(Image Captioning)

图像匹配(Image Matcing)

图像融合(Image Blending)

数据集(Datasets)

相关文章推荐

CVPR 2021 论文和开源项目合集(Papers with Code)

Visual Transformer

图像分类(Image Classification)

2D目标检测(Object Detection)

2D目标检测

Few-Shot目标检测

旋转目标检测

单/多目标跟踪(Object Tracking)

单目标跟踪

多目标跟踪

语义分割(Semantic Segmentation)

弱监督语义分割

半监督语义分割

域自适应语义分割

视频语义分割

实例分割(Instance Segmentation)

视频实例分割

全景分割(Panoptic Segmentation)

医学图像分割

3D医学图像分割

场景文本检测(Scene Text Detection)

场景文本识别(Scene Text Recognition)

超分辨率(Super-Resolution)

去雾(Dehazing)

视频超分辨率

图像恢复(Image Restoration)

图像补全(Image Inpainting)

图像编辑(Image Editing)

图像描述(Image Captioning)

图像匹配(Image Matcing)

图像融合(Image Blending)

数据集(Datasets)

添加附言

微信扫一扫：分享

相关文章推荐