In some/many cases there is code before the branch that does not influence the branch condition. In that case, would it be possible to architect the cpu such that it could execute the branch 'early'? I'm thinking of a special asm instruction like 'execute the next 4 instructions, and only then do the conditional jump'.
Yes, that was made explicit in the MIPS instruction set with branch (and load) delay slots, and it's implicit in out-of-order processors. As I understand it the branch delay slot paradigm did not pan out as well as was hoped and it has not found its way into many other architectures.
The issue with the branch delay slot is that it assumes there is exactly one clock cycle (I.e. one stage) between fetch and execute. This was true on some of the early 5 stage in order riscs, but hasn't been true for a while. In an high frequency in order design there are maybe a dozen stages/clock cycles which would be hard to fill with delay slots. OoO is even more complicated.
That’s called delay slots. It turns out to be so hard to work out what to put in them that they’re basically useless. Look at the massive flop that was the Itanium before you try it again!
ARMv8-M has a set of extensions for this kind of branch. There is a set of prepare-to-branch instructions as well as counted loop instructions. In order to support a variable-length delay slot, they don't have very many options. In order to handle interrupts, you still need an ordinary branch instruction from the launch point as well. So if you take an interrupt in the middle of the sequence, the ordinary branch instruction still gets executed.
E.g. instead of:
we would have: