Parallel computing is an intricate mix of marketplace requirements, architectural understanding, technology issues, and issues concerning costs. One controlling myth is that high-volume commodity processors must, by the nature of things, be the common building blocks for both desktop clients and room-size servers. This myth--and the supporting myth of architectural convergence of clients and servers--should be subject to dispassionate analysis. With only one program counter per processor, conventional processors are becoming increasingly unresponsive in spite of faster clock rates. We will show that reliance on Instruction-Level Parallelism (ILP) for performance drives processor state upwards. When this massive state is not distributed across multiple program counters, processors choke on their own expensive context switches, here reconceptualized to show their true cost. Within the framework of Little's law from queueing theory, we analyze conventional RISC superscalar processors as a case study of the inadequacy of the class of "ILP" processors. We contrast this to multithreaded processors that exploit both ILP and Thread-Level Parallelism (TLP). As a contribution to parallel programming, we show how data caches in multithreaded architectures can be used to manage speculative state, and perform atomic updates involving multiple variables. There is no "convergence architecture"; there are only divergence architectures