Prune and Merge Efficient Token Compression for Vision Transformer With Spatial ...
Adaptive Complex Wavelet Informed Transformer Operator
Learning Intrinsic Invariance Within Intra Class for Domain Generalization
TITFormer Combining Textual Modality and Simulating Infrared Modality Based on T...
CLIP AE A Multi Modal Unsupervised Images Enhancement Method Based on High Order...
Facial Action Units as a Joint Dataset Training Bridge for Facial Expression Rec...
Rectangling for Stitched Image via Pixel Wise Deformation Learning
Efficient Chroma Intra Prediction via Exemplar Colorization Network for Versatil...
S3GAAR Segmented Spatiotemporal Skeleton Graph Attention for Action Recognition
Cross Modal Progressive Perspective Matching Network for Remote Sensing Image Te...
Text2Avatar Articulated 3D Avatar Creation With Text Instructions
Learning Shape Color Diffusion Priors for Text Guided 3D Object Generation
A 3D Self Awareness Diffusion Network for Multimodal Classification
Latent Watermark Inject and Detect Watermarks in Latent Diffusion Space
Scalable Context Based Facial Emotion Recognition Using Facial Landmarks and Att...
Dual Path Adaptive Channel Attention Network Based on Feature Constraints for Fa...