I'm probably late to the game with this, but if you don't know about the NEC2 multithreaded engine, now you do.
I ran a parabolic dish sample file on 4nec2 with this engine last night. The reference PC I think was a PIII. I'm running a Haswell. The reference PC took 40 minutes. I was done in 40 seconds. Of course processor speed would need to be accounted for, rather than just multithreading. I'm also running4nec2 on wine, but I assume that doesn't effect the multithreading.
Obviously the big advantage to a speed increase would be when you run the optimizer.