← 返回数学题库
4326机器学习中等essayshort

Length-Doubling Cost Shock

题目

A local CNN with window size 7 scales like 7L interactions, while a Transformer attention block scales like L^2 score pairs. If L doubles from 256 to 512, by what factor does each interaction count grow, and which architecture hits the sharper scaling wall?

解题计时

0:00

提交作答时记录,用于后续平均用时统计。

你的答案

CNN growth factor

Transformer growth factor

Architecture with sharper scaling wall