4320机器学习简单数值题short

Attention Memory Footprint

题目

A full-attention model uses L=1024 tokens and stores one attention score matrix per head in float16. Roughly how much memory does one head's score matrix use?

解题计时

0:00

提交作答时记录，用于后续平均用时统计。

你的答案

数值

只填写数值本身，单位 MB 已固定在输入框右侧。