向AI转型的程序员都关注了这个号👇👇👇
人工智能大数据与深度学习 公众号:datayx
Visual Transformer
1. End-to-End Human Pose and Mesh Reconstruction with Transformers
Paper: https://arxiv.org/abs/2012.09760
Code: https://github.com/microsoft/MeshTransformer
2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition
Paper: https://arxiv.org/abs/2101.06184
Code: https://github.com/tobyperrett/trx
3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain
Paper: https://arxiv.org/abs/2103.16110
Code: https://github.com/mczhuge/Kaleido-BERT
4. HOTR: End-to-End Human-Object Interaction Detection with Transformers
Paper: https://arxiv.org/abs/2104.13682
Code: None
5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
Paper: https://arxiv.org/abs/2104.09224
Code: https://github.com/autonomousvision/transfuser
6. Pose Recognition with Cascade Transformers
Paper: https://arxiv.org/abs/2104.06976
Code: https://github.com/mlpc-ucsd/PRTR
7. Variational Transformer Networks for Layout Generation
Paper: https://arxiv.org/abs/2104.02416
Code: None
8. LoFTR: Detector-Free Local Feature Matching with Transformers
Homepage: https://zju3dv.github.io/loftr/
Paper: https://arxiv.org/abs/2104.00680
Code: https://github.com/zju3dv/LoFTR
9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Paper: https://arxiv.org/abs/2012.15840
Code: https://github.com/fudan-zvg/SETR
10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Paper: https://arxiv.org/abs/2103.16553
Code: None
11. Transformer Tracking
Paper: https://arxiv.org/abs/2103.15436
Code: https://github.com/chenxin-dlut/TransT
12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers
Paper(Oral): None
Code: https://github.com/dingmyu/HR-NAS
13. MIST: Multiple Instance Spatial Transformer
Paper: https://arxiv.org/abs/1811.10725
Code: None
14. Multimodal Motion Prediction with Stacked Transformers
Paper: https://arxiv.org/abs/2103.11624
Code: https://decisionforce.github.io/mmTransformer
15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning
Code: https://github.com/amzn/image-to-recipe-transformers
16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
Paper(Oral): https://arxiv.org/abs/2103.11681
Code: https://github.com/594422814/TransformerTrack
17. Pre-Trained Image Processing Transformer
Paper: https://arxiv.org/abs/2012.00364
Code: None
18. End-to-End Video Instance Segmentation with Transformers
Paper(Oral): https://arxiv.org/abs/2011.14503
Code: https://github.com/Epiphqny/VisTR
19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
Paper(Oral): https://arxiv.org/abs/2011.09094
Code: https://github.com/dddzg/up-detr
20. End-to-End Human Object Interaction Detection with HOI Transformer
Paper: https://arxiv.org/abs/2103.04503
Code: https://github.com/bbepoch/HoiTransformer
21. Transformer Interpretability Beyond Attention Visualization
Paper: https://arxiv.org/abs/2012.09838
Code: https://github.com/hila-chefer/Transformer-Explainability
22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer
Paper: None
Code: None
23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
Paper: None
Code: None
24. Line Segment Detection Using Transformers without Edges
Paper(Oral): https://arxiv.org/abs/2101.01909
Code: None
25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
Paper: MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Code: None
26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Paper(Oral): https://arxiv.org/abs/2101.08833
Code: https://github.com/dukebw/SSTVOS
27. Facial Action Unit Detection With Transformers
Paper: None
Code: None
28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition
Paper: None
Code: None
29. Lesion-Aware Transformers for Diabetic Retinopathy Grading
Paper: None
Code: None
30. Topological Planning With Transformers for Vision-and-Language Navigation
Paper: https://arxiv.org/abs/2012.05292
Code: None
31. Adaptive Image Transformer for One-Shot Object Detection
Paper: None
Code: None
32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos
Paper: None
Code: None
33. Taming Transformers for High-Resolution Image Synthesis
Homepage: https://compvis.github.io/taming-transformers/
Paper(Oral): https://arxiv.org/abs/2012.09841
Code: https://github.com/CompVis/taming-transformers
34. Self-Supervised Video Hashing via Bidirectional Transformers
Paper: None
Code: None
35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos
Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf
Code: None
36. Gaussian Context Transformer
Paper: None
Code: None
37. General Multi-Label Image Classification With Transformers
Paper: https://arxiv.org/abs/2011.14027
Code: None
38. Bottleneck Transformers for Visual Recognition
Paper: https://arxiv.org/abs/2101.11605
Code: None
39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation
Paper(Oral): https://arxiv.org/abs/2011.13922
Code: https://github.com/YicongHong/Recurrent-VLN-BERT
40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Paper(Oral): https://arxiv.org/abs/2102.06183
Code: https://github.com/jayleicn/ClipBERT
41. Self-attention based Text Knowledge Mining for Text Detection
Paper: None
Code: https://github.com/CVI-SZU/STKM
42. SSAN: Separable Self-Attention Network for Video Representation Learning
Paper: None
Code: None
43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones
Paper(Oral): https://arxiv.org/abs/2103.12731
Code: None
图像分类(Image Classification)
Correlated Input-Dependent Label Noise in Large-Scale Image Classification
Paper(Oral): https://arxiv.org/abs/2105.10305
Code: https://github.com/google/uncertainty-baselines/tree/master/baselines/imagenet
2D目标检测(Object Detection)
2D目标检测
Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
Paper: https://arxiv.org/abs/2105.12971
Code: None
PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
Paper: https://arxiv.org/abs/2105.12990
Code: None
Domain-Specific Suppression for Adaptive Object Detection
Paper: https://arxiv.org/abs/2105.03570
Code: None
IQDet: Instance-wise Quality Distribution Sampling for Object Detection
Paper: https://arxiv.org/abs/2104.06936
Code: None
Multi-Scale Aligned Distillation for Low-Resolution Detection
Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf
Code: https://github.com/Jia-Research-Lab/MSAD
Adaptive Class Suppression Loss for Long-Tail Object Detection
Paper: https://arxiv.org/abs/2104.00885
Code: https://github.com/CASIA-IVA-Lab/ACSL
VarifocalNet: An IoU-aware Dense Object Detector
Paper(Oral): https://arxiv.org/abs/2008.13367
Code: https://github.com/hyz-xmaster/VarifocalNet
Scale-aware Automatic Augmentation for Object Detection
Paper: https://arxiv.org/abs/2103.17220
Code: https://github.com/Jia-Research-Lab/SA-AutoAug
OTA: Optimal Transport Assignment for Object Detection
Paper: https://arxiv.org/abs/2103.14259
Code: https://github.com/Megvii-BaseDetection/OTA
Distilling Object Detectors via Decoupled Features
Paper: https://arxiv.org/abs/2103.14475
Code: https://github.com/ggjy/DeFeat.pytorch
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
Paper: https://arxiv.org/abs/2011.12450
Code: https://github.com/PeizeSun/SparseR-CNN
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Homepage: https://rl.uni-freiburg.de/
Paper: https://arxiv.org/abs/2103.01353
Code: None
Positive-Unlabeled Data Purification in the Wild for Object Detection
Paper: None
Code: None
Instance Localization for Self-supervised Detection Pretraining
Paper: https://arxiv.org/abs/2102.08318
Code: https://github.com/limbo0000/InstanceLoc
MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection
Paper: https://arxiv.org/abs/2103.04224
Code: None
End-to-End Object Detection with Fully Convolutional Network
Paper: https://arxiv.org/abs/2012.03544
Code: https://github.com/Megvii-BaseDetection/DeFCN
Robust and Accurate Object Detection via Adversarial Learning
Paper: https://arxiv.org/abs/2103.13886
Code: None
I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors
Paper: https://arxiv.org/abs/2103.13757
Code: None
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
Paper: https://arxiv.org/abs/2103.11402
Code: None
OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
Paper: https://arxiv.org/abs/2103.04507
Code: https://github.com/VDIGPKU/OPANAS
YOLOF:You Only Look One-level Feature
Paper: https://arxiv.org/abs/2103.09460
Code: https://github.com/megvii-model/YOLOF
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
Paper(Oral): https://arxiv.org/abs/2011.09094
Code: https://github.com/dddzg/up-detr
General Instance Distillation for Object Detection
Paper: https://arxiv.org/abs/2103.02340
Code: None
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection
Paper: https://arxiv.org/abs/2011.12885
Code: https://github.com/implus/GFocalV2
Multiple Instance Active Learning for Object Detection
Paper: https://github.com/yuantn/MIAL/raw/master/paper.pdf
Code: https://github.com/yuantn/MIAL
Towards Open World Object Detection
Paper(Oral): https://arxiv.org/abs/2103.02603
Code: https://github.com/JosephKJ/OWOD
Few-Shot目标检测
Adaptive Image Transformer for One-Shot Object Detection
Paper: None
Code: None
Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection
Paper: https://arxiv.org/abs/2103.17115
Code: https://github.com/hzhupku/DCNet
Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection
Paper: https://arxiv.org/abs/2103.01903
Code: None
Few-Shot Object Detection via Contrastive Proposal Encoding
Paper: https://arxiv.org/abs/2103.05950
Code: https://github.com/MegviiDetection/FSCE
旋转目标检测
ReDet: A Rotation-equivariant Detector for Aerial Object Detection
Paper: https://arxiv.org/abs/2103.07733
Code: https://github.com/csuhan/ReDet
单/多目标跟踪(Object Tracking)
单目标跟踪
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
Paper: https://arxiv.org/abs/2104.14545
Code: https://github.com/researchmm/LightTrack
Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark
Homepage: https://sites.google.com/view/langtrackbenchmark/
Paper: https://arxiv.org/abs/2103.16746
Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit
Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang
IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking
Paper: https://arxiv.org/abs/2103.14938
Code: https://github.com/VISION-SJTU/IoUattack
Graph Attention Tracking
Paper: https://arxiv.org/abs/2011.11204
Code: https://github.com/ohhhyeahhh/SiamGAT
Rotation Equivariant Siamese Networks for Tracking
Paper: https://arxiv.org/abs/2012.13078
Code: None
Track to Detect and Segment: An Online Multi-Object Tracker
Homepage: https://jialianwu.com/projects/TraDeS.html
Paper: None
Code: None
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
Paper(Oral): https://arxiv.org/abs/2103.11681
Code: https://github.com/594422814/TransformerTrack
Transformer Tracking
Paper: https://arxiv.org/abs/2103.15436
Code: https://github.com/chenxin-dlut/TransT
多目标跟踪
Multiple Object Tracking with Correlation Learning
Paper: https://arxiv.org/abs/2104.03541
Code: None
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
Paper: https://arxiv.org/abs/2012.02337
Code: None
Learning a Proposal Classifier for Multiple Object Tracking
Paper: https://arxiv.org/abs/2103.07889
Code: https://github.com/daip13/LPC_MOT.git
Track to Detect and Segment: An Online Multi-Object Tracker
Homepage: https://jialianwu.com/projects/TraDeS.html
Paper: https://arxiv.org/abs/2103.08808
Code: https://github.com/JialianW/TraDeS
语义分割(Semantic Segmentation)
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
Paper: https://arxiv.org/abs/2012.05258
Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab
Rethinking BiSeNet For Real-time Semantic Segmentation
Paper: https://arxiv.org/abs/2104.13188
Code: https://github.com/MichaelFan01/STDC-Seg
Progressive Semantic Segmentation
Paper: https://arxiv.org/abs/2104.03778
Code: https://github.com/VinAIResearch/MagNet
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Paper: https://arxiv.org/abs/2012.15840
Code: https://github.com/fudan-zvg/SETR
Bidirectional Projection Network for Cross Dimension Scene Understanding
Paper(Oral): https://arxiv.org/abs/2103.14326
Code: https://github.com/wbhu/BPNet
Cross-Dataset Collaborative Learning for Semantic Segmentation
Paper: https://arxiv.org/abs/2103.11351
Code: None
Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations
Paper: https://arxiv.org/abs/2103.06342
Code: None
Capturing Omni-Range Context for Omnidirectional Segmentation
Paper: https://arxiv.org/abs/2103.05687
Code: None
Learning Statistical Texture for Semantic Segmentation
Paper: https://arxiv.org/abs/2103.04133
Code: None
PLOP: Learning without Forgetting for Continual Semantic Segmentation
Paper: https://arxiv.org/abs/2011.11390
Code: None
弱监督语义分割
Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation
Homepage: https://cvlab.yonsei.ac.kr/projects/BANA/
Paper: https://arxiv.org/abs/2104.00905
Code: None
Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2103.14581
Code: None
BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation
Paper: https://arxiv.org/abs/2103.08907
Code: None
半监督语义分割
Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
Paper: https://arxiv.org/abs/2103.04705
域自适应语义分割
Self-supervised Augmentation Consistency for Adapting Semantic Segmentation
Paper: https://arxiv.org/abs/2105.00097
Code: https://github.com/visinf/da-sac
RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
Paper: https://arxiv.org/abs/2103.15597
Code: https://github.com/shachoi/RobustNet
Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization
Paper: https://arxiv.org/abs/2103.13041
Code: None
MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation
Paper: https://arxiv.org/abs/2103.05254
Code: None
Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation
Paper: https://arxiv.org/abs/2103.04717
Code: None
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation
Paper: https://arxiv.org/abs/2101.10979
Code: https://github.com/microsoft/ProDA
视频语义分割
VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
Homepage: https://www.vspwdataset.com/
Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
GitHub: https://github.com/sssdddwww2/vspw_dataset_download
实例分割(Instance Segmentation)
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
Paper: https://arxiv.org/abs/2011.09876
Code: https://github.com/aliyun/DCT-Mask
Incremental Few-Shot Instance Segmentation
Paper: https://arxiv.org/abs/2105.05312
Code: https://github.com/danganea/iMTFA
A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation
Paper: https://arxiv.org/abs/2105.03186
Code: None
RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features
Paper: https://arxiv.org/abs/2104.08569
Code: https://github.com/zhanggang001/RefineMask/
Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation
Paper: https://arxiv.org/abs/2104.05239
Code: https://github.com/tinyalpha/BPR
Multi-Scale Aligned Distillation for Low-Resolution Detection
Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf
Code: https://github.com/Jia-Research-Lab/MSAD
Boundary IoU: Improving Object-Centric Image Segmentation Evaluation
Homepage: https://bowenc0221.github.io/boundary-iou/
Paper: https://arxiv.org/abs/2103.16562
Code: https://github.com/bowenc0221/boundary-iou-api
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers
Paper: https://arxiv.org/abs/2103.12340
Code: https://github.com/lkeab/BCNet
Zero-shot instance segmentation(Not Sure)
Paper: None
Code: https://github.com/CVPR2021-pape-id-1395/CVPR2021-paper-id-1395
视频实例分割
STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
Paper: http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm
Code: https://github.com/MinghanLi/STMask
End-to-End Video Instance Segmentation with Transformers
Paper(Oral): https://arxiv.org/abs/2011.14503
Code: https://github.com/Epiphqny/VisTR
全景分割(Panoptic Segmentation)
Exemplar-Based Open-Set Panoptic Segmentation Network
Homepage: https://cv.snu.ac.kr/research/EOPSN/
Paper: https://arxiv.org/abs/2105.08336
Code: https://github.com/jd730/EOPSN
MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
Paper: MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Code: None
Panoptic Segmentation Forecasting
Paper: https://arxiv.org/abs/2104.03962
Code: https://github.com/nianticlabs/panoptic-forecasting
Fully Convolutional Networks for Panoptic Segmentation
Paper: https://arxiv.org/abs/2012.00720
Code: https://github.com/yanwei-li/PanopticFCN
Cross-View Regularization for Domain Adaptive Panoptic Segmentation
Paper: https://arxiv.org/abs/2103.02584
Code: None
医学图像分割
FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
Paper: https://arxiv.org/abs/2103.06030
Code: https://github.com/liuquande/FedDG-ELCFS
3D医学图像分割
DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation
Paper(Oral): https://arxiv.org/abs/2103.15954
Code: None
场景文本检测(Scene Text Detection)
Fourier Contour Embedding for Arbitrary-Shaped Text Detection
Paper: https://arxiv.org/abs/2104.10442
Code: None
场景文本识别(Scene Text Recognition)
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Paper: https://arxiv.org/abs/2103.06495
Code: https://github.com/FangShancheng/ABINet
超分辨率(Super-Resolution)
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
Homepage: http://mepro.bjtu.edu.cn/resource.html
Paper: https://arxiv.org/abs/2104.06174
Code: None
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
Paper: https://arxiv.org/abs/2103.04039
Code: https://github.com/Xiangtaokong/ClassSR
AdderSR: Towards Energy Efficient Image Super-Resolution
Paper: https://arxiv.org/abs/2009.08891
Code: None
去雾(Dehazing)
Contrastive Learning for Compact Single Image Dehazing
Paper: https://arxiv.org/abs/2104.09367
Code: https://github.com/GlassyWu/AECR-Net
视频超分辨率
Temporal Modulation Network for Controllable Space-Time Video Super-Resolution
Paper: None
Code: https://github.com/CS-GangXu/TMNet
图像恢复(Image Restoration)
Multi-Stage Progressive Image Restoration
Paper: https://arxiv.org/abs/2102.02808
Code: https://github.com/swz30/MPRNet
图像补全(Image Inpainting)
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
Paper: https://arxiv.org/abs/2105.02201
Code: https://github.com/KumapowerLIU/PD-GAN
TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations
Homepage: https://yzhouas.github.io/projects/TransFill/index.html
Paper: https://arxiv.org/abs/2103.15982
Code: None
图像编辑(Image Editing)
StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
Paper: https://arxiv.org/abs/2104.14754
Code: https://github.com/naver-ai/StyleMapGAN
Demo Video: https://youtu.be/qCapNyRA_Ng
High-Fidelity and Arbitrary Face Editing
Paper: https://arxiv.org/abs/2103.15814
Code: None
Anycost GANs for Interactive Image Synthesis and Editing
Paper: https://arxiv.org/abs/2103.03243
Code: https://github.com/mit-han-lab/anycost-gan
PISE: Person Image Synthesis and Editing with Decoupled GAN
Paper: https://arxiv.org/abs/2103.04023
Code: https://github.com/Zhangjinso/PISE
DeFLOCNet: Deep Image Editing via Flexible Low-level Controls
Paper: http://raywzy.com/
Code: http://raywzy.com/
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
Paper: None
Code: None
图像描述(Image Captioning)
Towards Accurate Text-based Image Captioning with Content Diversity Exploration
Paper: https://arxiv.org/abs/2105.03236
Code: None
图像匹配(Image Matcing)
LoFTR: Detector-Free Local Feature Matching with Transformers
Homepage: https://zju3dv.github.io/loftr/
Paper: https://arxiv.org/abs/2104.00680
Code: https://github.com/zju3dv/LoFTR
Convolutional Hough Matching Networks
Homapage: http://cvlab.postech.ac.kr/research/CHM/
Paper(Oral): https://arxiv.org/abs/2103.16831
Code: None
图像融合(Image Blending)
Bridging the Visual Gap: Wide-Range Image Blending
Paper: https://arxiv.org/abs/2103.15149
Code: https://github.com/julia0607/Wide-Range-Image-Blending
数据集(Datasets)
High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
Paper: https://arxiv.org/abs/2105.09188
Code: https://github.com/csjliang/LPTN
Dataset: https://github.com/csjliang/LPTN
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark
Paper: https://arxiv.org/abs/2105.02440
Code: https://github.com/VisDrone/DroneCrowd
Dataset: https://github.com/VisDrone/DroneCrowd
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/
Paper(Oral): https://arxiv.org/abs/2104.12690
Code: https://github.com/fidler-lab/efficient-annotation-cookbook
论文下载链接:
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
Paper: https://arxiv.org/abs/2012.05258
Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab
Learning To Count Everything
Paper: https://arxiv.org/abs/2104.08391
Code: https://github.com/cvlab-stonybrook/LearningToCountEverything
Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything
Semantic Image Matting
Paper: https://arxiv.org/abs/2104.08201
Code: https://github.com/nowsyn/SIM
Dataset: https://github.com/nowsyn/SIM
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
Homepage: http://mepro.bjtu.edu.cn/resource.html
Paper: https://arxiv.org/abs/2104.06174
Code: None
Visual Semantic Role Labeling for Video Understanding
Homepage: https://vidsitu.org/
Paper: https://arxiv.org/abs/2104.00990
Code: https://github.com/TheShadow29/VidSitu
Dataset: https://github.com/TheShadow29/VidSitu
VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
Homepage: https://www.vspwdataset.com/
Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
GitHub: https://github.com/sssdddwww2/vspw_dataset_download
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
Homepage: https://vap.aau.dk/sewer-ml/
Paper: https://arxiv.org/abs/2103.10619
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
Homepage: https://vap.aau.dk/sewer-ml/
Paper: https://arxiv.org/abs/2103.10895
Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
Paper: https://arxiv.org/abs/2103.03375
Dataset: None
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
Homepage: https://github.com/QingyongHu/SensatUrban
Paper: http://arxiv.org/abs/2009.03137
Code: https://github.com/QingyongHu/SensatUrban
Dataset: https://github.com/QingyongHu/SensatUrban
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
Paper(Oral): https://arxiv.org/abs/2103.01520
Code: https://github.com/Hzzone/MTLFace
Dataset: https://github.com/Hzzone/MTLFace
Depth from Camera Motion and Object Detection
Paper: https://arxiv.org/abs/2103.01468
Code: https://github.com/griffbr/ODMD
Dataset: https://github.com/griffbr/ODMD
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Paper: https://arxiv.org/abs/2012.02206
Code: https://github.com/daveredrum/Scan2Cap
Dataset: https://github.com/daveredrum/ScanRefer
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill
Dataset: http://rl.uni-freiburg.de/research/multimodal-distill
机器学习算法AI大数据技术
搜索公众号添加: datanlp
长按图片,识别二维码
阅读过本文的人还看了以下文章:
基于40万表格数据集TableBank,用MaskRCNN做表格检测
《深度学习入门:基于Python的理论与实现》高清中文PDF+源码
2019最新《PyTorch自然语言处理》英、中文版PDF+源码
《21个项目玩转深度学习:基于TensorFlow的实践详解》完整版PDF+附书代码
PyTorch深度学习快速实战入门《pytorch-handbook》
【下载】豆瓣评分8.1,《机器学习实战:基于Scikit-Learn和TensorFlow》
李沐大神开源《动手学深度学习》,加州伯克利深度学习(2019春)教材
【Keras】完整实现‘交通标志’分类、‘票据’分类两个项目,让你掌握深度学习图像分类
如何利用全新的决策树集成级联结构gcForest做特征工程并打分?
Machine Learning Yearning 中文翻译稿
斯坦福CS230官方指南:CNN、RNN及使用技巧速查(打印收藏)
中科院Kaggle全球文本匹配竞赛华人第1名团队-深度学习与特征工程
不断更新资源
深度学习、机器学习、数据分析、python
搜索公众号添加: datayx