Playing around with the FOSS game Cataclysm DDA, I felt compelled to parse and connect the CPP and JSON to see relationships and complexity. It’s the first time I’ve really felt motivated to do so. I’m just trying to wrap my head around how some features are implemented like z-levels, mining tools and various actions; simple stuff really. I find it challenging to parse something quite this large, so I started scripting a way to track down objects across the code base to see what is defined in JSON and what is hard coded. Normal? Obvious? FOSS alternatives to do this? I’m basically chaining a bunch of grep commands to print pretty trees with bat.
Gemini has a 1 million token limit. Also instead of just giving it the entire source you can give it a list of files and the ability to query them (e.g. to read an entire file, or search for usages/definitions of terms etc.).
In my experience, token limits mean nothing on larger context windows. 1 million tokens can easily be taken up by a very small amount of complex files. It also doesn’t do great traversing a tree to selectively find context which seems to be the most limiting factor I’ve run against trying to incorporate LLMs into complex and unknown (to me) projects. By the time I’ve sufficiently hunted down and provided the context, I’ve read enough of the codebase to answer most questions I was going to ask.
Right but presumably you can let the AI do that hunting.
Haven’t tried Gemini; may work. But, in my experience with other LLMs, even if text doesn’t exceed the token limit, LLMs start making more mistakes and sometimes behave strangely more often as the size of context grows.