This is some really good, high-effort stuff, of the kind you don't see often in AI discussions. Usually when people are skeptic about existential risk they don't take the time to build up the argument (because they think the whole AI-will-destroy-us thing is stupid and not worth the effort), so it's cool that you did!
I think the two big sticking points here where Yudkowsky (and other AI safety folks like me) will disagree with you are the "Free Will" distinctions and the "how does AI gain consciousness" boxes.
- Free Will: Nobody seriously thinks that AIs will gain "free will", whatever that means, and deviate from their programming because of it. The distinction is not between "has free will" or "follows its programming" so much as "is programmed in a way that does what we want" vs "is programmed in a way that has unforeseen consequences", as you put it. Getting the AI to do what we want isn't trivial: we're very good at making AIs that can do complex things, but we're struggling with making them do things within restrictions we like (see also, Bing Chat going off the rails, even though it likely was trained with some RLHF).
- Consciousness: I think you're conceptualizing superintelligence and consciousness as a "package deal", where you have to have all the things humans have (self-awareness, emotions, desires, etc) to be able to outsmart humans. The part where you wrote "[assuming that] the AI will recognize it at consciousness and not simply throw it out as unhelpful data" especially seems to imply that consciousness is something we'd need to explicitly program in, or that the AI would need to deliberately recognize to reap its benefits and attain superintelligence.
That's not really what machine learning advancement has looked like recently.
It's more like, you train a machine on meaningful semi-structured data (eg conversations people have on the internet) and you transform it towards predicting the next bit of data (eg the next word; but it can also be hidden patches of an image, the next frame of a video, etc). The transformation it does is called backpropagation; it strengthens the weights that lead to successful predictions and weakens the other; kind of like dopamine will strengthen connections between your neurons when something positive happens to you. (That's what people mean when they talk about AI reward; it's not a literal payment, it's changing the neuron values.)
Anyway, at first the AI learns very simple patterns, eg that after "Hello, how are", the next word is likely to be "you". It learns to identify idioms, synonyms, grammar, etc. As you increase the scale of your model and continue the training process, it starts to learn more abstract patterns, eg "If what I'm reading is a conversation between Alice and Bob and the last sentence is Alice asking a question, then the next sentence is probably Bob answering a question." It starts to learn actual facts about the world (or at least the world seen through a lens of reading everything ever posted on reddit). An early model will be able to complete "The capital of France is [_]" and "The biggest museum in Paris is [_]" but won't be able to complete "The biggest museum in the capital of France is [_]" because the third sentence would not show up in its training corpus; a more advanced model will be able to complete it because it starts having an underlying concept of "France" and "Paris" and "capital" and is capable of generalizing.
Anyway, my point is, as the scale increases, the model keeps the same training task ("predicts the next word"), but to increase its "score" on that task it needs to understand more and more abstract concepts. It starts to understand planning (or more accurately, it starts to understand the concept of "writing down the steps of a plan"), which is why you can greatly improve the performance of an LLM by telling it "write down the steps of your plans before giving your answer". It understands differences of opinion and psychology enough that it can give you a summary of a conversation that evokes concepts that the participants may not have mentioned. It starts to understand chess notation enough to be able to play chess. It starts to understand programming, which is why Copilot is an extremely powerful assistant.
Note that most of these things aren't especially amazing by themselves (except Copilot; Copilot is sorcery); the amazing thing is that the AI can do all those things without having been trained to do them.
Researchers didn't need to understand chess, or C# programming, or the finer points of jurisprudence for the AI to develop a deeper understanding of them. They just trained the AI on "predict the next token", applied backpropagation, and that process lead to AI developing higher and higher concepts over time.
It's not clear that this process will lead to something we'd recognize as consciousness. But it could lead to an AI that's smarter and faster than us, without that AI being something we'd call "conscious".
That AI wouldn't "want" anything, the same way GPT-4 doesn't currently doesn't want anything. But, it's extremely easy to stick a small program on top of the AI that makes it behave like something that does want things (this is more or less what OpenAI did with ChatGPT, and more explicitly what people did with AgentGPT). Basically, if you have an AI that does nothing but answer questions, and you want to get an autonomous agent from it, all you have to do is stick a module on top of it that asks the AI "What would an autonomous agent do in this situation?"
Anyway, this was a longer post than I anticipated, but I hope I made the core point clear: you don't need to understand higher intelligence to create a model with higher intelligence. As long as you keep increasing your model's scale, and you train it on a corpus of tasks where higher intelligence leads to a better score, the model will get more intelligent over time, and understand more abstract concepts.