It is a natural alternative. The "simpler but more cores" project works on paper (i.e. potential instruction throughput). In reality it falls apart for variety of reasons. The most fundamental is because of
the difficulty of exploiting thread-level parallelism. The complex Out-of-Order cores do a really good job of improving throughput by finding independent instructions to execute in parallel. The path-to-parallelism is much easier at the granularity of instructions than at the granularity of cores. Parallel programming is hard. Amadahls law cannot be avoided except through ----- speculation, so we are back to complexity again.
> In reality it falls apart for variety of reasons.
On the contrary, the "many wimpy cores" approach has been very successful on GPUs (and other vector processors such as TPUs, etc.) What it hasn't been successful at is running existing software unmodified.
The best solution (the one we use today, in fact) seems to be a hybrid approach. We have a few powerful cores to run sequential code and many weaker units (often vector units; e.g. GPUs and SIMD) to run parallel code. Not all algorithms can be parallelized, so we'll likely always need a few fast sequential cores. But a lot of code can.
Amdahl's law for sure. We all know most software sucks, so in this vision (simpler cores), that reduces us to taking the performance hit and optimizing elsewhere. For example, older ARM-based systems do not use speculative execution, and can handle encryption and video transcoding though co-processors.