4316机器学习简单数值题short

Attention Score Count

题目

A Transformer layer processes L=256 tokens with H=8 heads. Ignoring the value dimension, how many raw attention score entries are formed across all heads?

解题计时

0:00

提交作答时记录，用于后续平均用时统计。

你的答案

数值

支持题库 schema 允许的整数、小数、分数或四则表达式。