Comments on Machined Learnings: Attention: Can we formalize it?

tag:blogger.com,1999:blog-4446292666398344382.post5513736456557334381..comments2026-03-13T22:37:56.759-07:00Comments on Machined Learnings: Attention: Can we formalize it?Paul Mineirohttp://www.blogger.com/profile/05439062526157173163noreply@blogger.comBlogger1125tag:blogger.com,1999:blog-4446292666398344382.post-464438244077232882016-01-08T10:06:02.597-08:002016-01-08T10:06:02.597-08:00We are aware of an encouraging result for the case...We are aware of an encouraging result for the case of static attention where the "parts" are features and there is no competition among them (i.e. the attention vector z does not have to sum to 1). This is the same as learning a sparse model and Andrew Ng's analysis of L1 regularization ( http://ai.stanford.edu/~ang/papers/icml04-l1l2.pdf ) shows that it can exponentially reduce sample complexity (from O(number of features) to O(log(number of features)). At the same time, rotationally invariant methods (c.f. the paper above) have to use O(number of features). When I read the paper, long time ago, I did not find the analysis very enlightening, but perhaps the ideas in there are right headed.Nikoshttps://www.blogger.com/profile/11112961058824811801noreply@blogger.com