4316机器学习简单数值题short
Attention Score Count
题目
A Transformer layer processes L=256 tokens with H=8 heads. Ignoring the value dimension, how many raw attention score entries are formed across all heads?
解题计时
0:00
提交作答时记录,用于后续平均用时统计。
你的答案
题目
A Transformer layer processes L=256 tokens with H=8 heads. Ignoring the value dimension, how many raw attention score entries are formed across all heads?
解题计时
0:00
提交作答时记录,用于后续平均用时统计。
你的答案