this is actually very interesting.
i take it you’ve heard of the concept of “mechanistic interpretability”? perhaps you could learn something about your networks by implementing some of that methodology. here’s a glossary. also recommend poking around neelanda’s blog if you want to learn more.
this is actually very interesting. i take it you’ve heard of the concept of “mechanistic interpretability”? perhaps you could learn something about your networks by implementing some of that methodology. here’s a glossary. also recommend poking around neelanda’s blog if you want to learn more.
Thanks for sharing! These seem to focus on LLMs/transformers, but since they use MLPs I should be able to find a way to adapt them for my use!