Skip to main content

· 6 min read
The Tremor Team

With the upcoming Tremor release, 0.9.0, we're moving from threads as a basis for ramps and pipelines to async tasks.

Let's talk about why this is significant, what is changing, and how the architecture is changing.

Note that this is not a comprehensive treatise on threads or async tasks.

The Tremor That Was (threads)

Threads are a basic building block of programs that run multiple pieces of code concurrently. The operating system is responsible for coordinating across competing resource demands.

The OS can preempt, pause, and resume threads. We can leverage infinite or tight loops without the risk of completely blocking the system. These guarantees make concurrent code more accessible, with tools likecrossbeam-channels to build upon.

Threads work especially well in use cases where the system and logical concurrency models are well aligned; or, we can map application threads to logical cores on the system being used. Each thread can happily work away on its part of the logic and pass the result on to the next. The one thread per core model is what tremor 0.8 and earlier used. We had a thread for the onramp, a thread for the pipeline, and a thread for the offramp. As the computational cost of decoding, processing, and encoding was often in the same ballpark, this worked exceptionally well. We managed to push up to 400MB/s of JSON through the system this way (including parsing, tremor-script logic, and serialization).

This design can degenerate badly if there are more ramps and pipelines than cores on the system in use. Throughput degrades rapidly (as in up to 2 orders of magnitude worse at 30:1 ratio). At the time of writing this, the deployment model was one pipeline/ramp group on a four-core system, so it worked well in practice.

However, this places a burden on operators having to think about concurrency and parallelism to tune tremor for optimal performance and capacity.

In SMP systems, we observe other undesireable effects: The moment two communicating threads don't share the same underlying cache, performance plummets. This scenario exists when threads reside on two different CPUs or CCXs (thank you AMD for making me learn so much about CPU caches). As long as two communicating threads share the same cache, data shared between them can avoid trips to main memory and cache coherency protocol overheads. When two threads communicate across different caches, reads/writes may catastrophically collide and introduce overheads that drastically reduce overall performance.

· One min read
The Tremor Team

We had the great pleasure of spending some time with the crowd of Rust & Tell today. At the meetup, we got the chance to share some of the stories of tremor and Event Processing Cartography. Long story short, if you're curious, the talk was recorded, and we got the slides right here!

· 9 min read
The Tremor Team

Yesterday, we spent the day on a report that our influx parser was slow, and it turns out it was indeed.

This is an exciting topic as a few days ago, we gave a talk at BoBKonf 2020 on this topic, so this is a great opportunity to show some of the topics and our process in action.

All the topics in this blog are links; the main one above this text is to the pull request, the titles of each section link to the commit that implements the topic discussed. Go ahead, click on some, and take a look!

There are two tools worth introducing here that we used during this performance session.

One is perf, which we used with a minimal setup of perf record and perf report. We use this to get a glance at where code is spending time. This is not perfect, but it is quick for decent results.

The other one is criterion, an excellent Rust framework for microbenchmarks based on the haskell framework with the same name. It is so helpful since it allows us to show changes in performance between changes. That makes it perfect for the kind of incremental improvements our process favors.