If anyone is interested, I can do some work on prototyping an application for this, what I would need is somebody to generate some training data.
First, to deal with the "shadow problem:" If someone can generate screenshot pairs of game screens with lighting turned on, and lighting turned off, then we can use that to train a model to de-shadow an image, producing a texture layer and a shadow layer. This should be done programatically so the screenshots match up exactly - so you would record a demo of walking around a map, and then on playback get the engine to screencap the original screen once and then again with lighting disabled. This can also be done to the spectral highlights as well, but it's best to start off basic first.
I'm not familiar with photogrammetry algorithms, be we can give that a shot. Off the top of my head, we can create another model to add depth information to a screenshot (and perhaps even better if it's from a video, but for now it's better to focus on just the screenshot case first), with the dataset again being generated the same way as in the above, just with a depth-based shader being used instead. Stitching together the depth-based images into 3d space should be easier. Otherwise, there are tonnes of other algorithms to look into as well (eg: https://github.com/natowi/3D-Reconstruction-with-Deep-Learning-Methods ).
So if anyone wants to take the dataset task on, or if anyone has old gaming hard to devote to this, post here I guess. For anyone interested in the deep learning side of things, trying to implement some of these algorithms could be a good way to get into AI stuff.