4326机器学习中等essayshort
Length-Doubling Cost Shock
题目
A local CNN with window size 7 scales like 7L interactions, while a Transformer attention block scales like L^2 score pairs. If L doubles from 256 to 512, by what factor does each interaction count grow, and which architecture hits the sharper scaling wall?
解题计时
0:00
提交作答时记录,用于后续平均用时统计。
你的答案
CNN growth factor
Transformer growth factor
Architecture with sharper scaling wall