橙子的短想法
00:25 · May 2, 2022 · Mon
https://twitter.com/karlhigley/status/1520229418479894536?s=28&t=ISfEwwcP7fZM8h1-dLf2hg
Twitter
Karl Higley
If you (like me) have wondered what the feed-forward layers in transformer models are actually doing, this is a pretty interesting paper on that topic: arxiv.org/abs/2012.14913#
Home
Powered by
BroadcastChannel
&
Sepia