4320机器学习简单数值题short
Attention Memory Footprint
题目
A full-attention model uses L=1024 tokens and stores one attention score matrix per head in float16. Roughly how much memory does one head's score matrix use?
解题计时
0:00
提交作答时记录,用于后续平均用时统计。
你的答案
题目
A full-attention model uses L=1024 tokens and stores one attention score matrix per head in float16. Roughly how much memory does one head's score matrix use?
解题计时
0:00
提交作答时记录,用于后续平均用时统计。
你的答案