I remember reading an English anthropologist who argued that the opposable thumb was the critical feature. No doubt it makes making and using more complex tools possible. But for thumbs to do that brains must get larger. That is problematic enough in itself. But it is taking place at the same time the upright stance is evolving -- the hands with opposable thumbs were being used for other things than walking. For us to stand (not just run) upright the architecture of hips and pelvis had to change, shrinking the birth canal. So nature was asking women to deliver large-brained babies through shrunken birth canals.
That problem was solved by delivering offspring BEFORE their skulls were fully developed. But then the babies entered the world as incomplete skulls with useless ganglia dangling from them. At least two years had to pass before they were developed fully enough to survive without constant attention. That meant males had to do more than their biologically essential minute or two of sperm delivery. So we get the "family" out of that.
And the whole thing does depend on the coevolution of speech and language, which require altering our air passages in ways that reduce our breathing abilities. Fortunately, the language developing permits organizing cooperative efforts to compensate for the declining physical abilities of individual organisms. Now we get rudimentary social systems, which language has to further develop to make correlating individual behaviors over distances and time possible.