Title | Journal | Journal Categories | Citations | Publication Date |
---|---|---|---|---|
G10: Enabling an efficient unified GPU memory and storage architecture with smart tensor migrations | 2023 | |||
Checkmate: Breaking the memory wall with optimal tensor rematerialization | 2022 | |||
Accelerating distributed MoE training and inference with Lina | 2023 | |||
SmartMoE: Efficiently training sparsely-activated models through combining offline and online parallelization | 2023 | |||
Fast inference from transformers via speculative decoding | 2023 |