Download PDFOpen PDF in browserEfficient Sequence Pooling Model for Sparse Attention with Mixture of ExpertsEasyChair Preprint 152622 pages•Date: October 18, 2024AbstractThis paper introduces the Efficient Attention-based Model (EAM), a novel architecture designed to reduce the computational and memory overhead of Transformer-based models in sequence modeling tasks. By incorporating sparse attention, token pooling, and a mixture of experts (MoE), the EAM model reduces memory usage and training time without sacrificing accuracy. We evaluate the EAM architecture on sequence classification tasks, comparing it to the Transformer model. The results show that the EAM model achieves competitive accuracy while using significantly less memory and computational power, making it a suitable alternative for resource-constrained environments. Keyphrases: Efficient Attention-based Model, Mixture of Experts, memory efficiency, sequence modeling, sparse attention, transformer
|