With the upcoming Tremor release, 0.9.0, we're moving from threads as a basis for ramps and pipelines to async tasks.
Let's talk about why this is significant, what is changing, and how the architecture is changing.
Note that this is not a comprehensive treatise on threads or async tasks.
The Tremor That Was (threads)
Threads are a basic building block of programs that run multiple pieces of code concurrently. The operating system is responsible for coordinating across competing resource demands.
The OS can preempt, pause, and resume threads. We can leverage infinite or tight loops without the risk of completely blocking the system. These guarantees make concurrent code more accessible, with tools like
crossbeam-channels to build upon.
Threads work especially well in use cases where the system and logical concurrency models are well aligned; or, we can map application threads to logical cores on the system being used. Each thread can happily work away on its part of the logic and pass the result on to the next. The one thread per core model is what tremor 0.8 and earlier used. We had a thread for the onramp, a thread for the pipeline, and a thread for the offramp. As the computational cost of decoding, processing, and encoding was often in the same ballpark, this worked exceptionally well. We managed to push up to 400MB/s of JSON through the system this way (including parsing, tremor-script logic, and serialization).
This design can degenerate badly if there are more ramps and pipelines than cores on the system in use. Throughput degrades rapidly (as in up to 2 orders of magnitude worse at 30:1 ratio). At the time of writing this, the deployment model was one pipeline/ramp group on a four-core system, so it worked well in practice.
However, this places a burden on operators having to think about concurrency and parallelism to tune tremor for optimal performance and capacity.
In SMP systems, we observe other undesireable effects: The moment two communicating threads don't share the same underlying cache, performance plummets. This scenario exists when threads reside on two different CPUs or CCXs (thank you AMD for making me learn so much about CPU caches). As long as two communicating threads share the same cache, data shared between them can avoid trips to main memory and cache coherency protocol overheads. When two threads communicate across different caches, reads/writes may catastrophically collide and introduce overheads that drastically reduce overall performance.