橙子的短想法
橙子的短想法
00:25 · May 2, 2022 · Mon
https://twitter.com/karlhigley/status/1520229418479894536?s=28&t=ISfEwwcP7fZM8h1-dLf2hg
Twitter
Karl Higley
Karl Higley
If you (like me) have wondered what the feed-forward layers in transformer models are actually doing, this is a pretty interesting paper on that topic: arxiv.org/abs/2012.14913#
 
 
Home
Powered by BroadcastChannel & Sepia