in the blog you says there is a more efficient way of implementation? see lecture at the top. Do you mean the youtube vide at the top?
but there is no code explaination in the video , do i have to watch the video and implement myself or any blogs about the more efficient way of self attention?
thanks a lot!