Hygienic error handling and validation for pipelines

September 24, 2022 · 7 min read

Tremor 2022 Summer Mentee

Introduction

Hello, my name's Carol Geng, a current sophomore pursuing a Bachelor's degree in Computer Science at Texas A&M University. Over Summer 2022, I have contributed to Tremor as part of the LFX Mentorship Program with my mentors Heinz Gies and Matthias Wahl, and this blog will show how valuable and enjoyable the experience was.

About the Project

Tremor is an event processing engine that uses pipelines and connectors for data to be passed through. However, errors could be made by the user when linking the ports of the pipelines, and those errors are not easily displayed to the user. There were cases where the location of the error wasn’t printed, nothing was printed at all, or even cases where the program would work as if everything was all fine. My mentorship project was focused on creating the messages that would be displayed to the user in a clear and concise manner through the console.

The Mentorship Journey

Before this mentorship, I had only worked in open-source a few times on projects and had no knowledge of how to code in Rust. This mentorship therefore allowed me to build on what I already knew while also giving me an opportunity to learn a new language. Throughout this mentorship, I was able to get a start in DevOps and learn how to work on the compiler and source code for a software program while contributing to the user experience of everyone who uses Tremor and whoever will use Tremor in the future.

Starting Tremor

To learn how to contribute to Tremor’s source code, I had to first understand how Tremor worked and what it did. I was introduced to pipelines, ports, and scripts and was allowed to mess around with the code. Tremor’s code allowed the user to go into several commands to process and change the input into the desired output. Through playing around with Tremor, I decided to add a function to reverse a string with Tremor’s pipelines. Because tremor did not have a built-in reverse string function, I went into the source code to implement a reverse function in Rust along with several tests and documentation for the reverse function. This relatively simple function is where I made my first pull request and also where I first started getting familiar with Tremor.

Additionally, I had used other programming languages in my academic career, and therefore did not have any experience with Rust beforehand. This is when I got introduced to Rustlings, which had small exercises to help me with starting with Rust syntax and writing code in Rust. There were also a lot of functions unique to Rust in terms of how the code worked, which included enums, move semantics, and structs. This also included constructs like Ok() and concepts like async/await which I would end up utilizing in my project.

Understanding Code

A large portion of my mentorship focused on understanding the errors that were printed out. There were several different kinds of errors, such as ones that dealt with the console input, console output, pipeline input, and more, which validated whether the input or output existed in the code and where it led to, another connector or the console. While this may seem simple, this ended up leading me into the rabbit hole of the source code as I traversed through numerous files and functions to determine where the error went through and what I could do to the preexisting code to read and perform functions onto the error.

To do a lot of this, I learned the process of error logging. I defined each error there in great detail, such as what it is, what it does currently, what it’s supposed to do, and gradually updated it every time it was needed. Throughout implementing messages to these errors, I also learned more about version control with git. While I did have practice on the basics like git pull and git push, I also learned about git rebase, git merge, and more.

Error Handling

Error handling was the main focus of the project. As my mentor suggested, I focused on one error at a time from the error logs. First, I focused on the worst case- a case where nothing was printed and the program ran as if everything was normal, except there was no output.

pipeline_out_error.troy

# Our main flow
define flow main
flow
  # import the `tremor::connectors` module
  use tremor::connectors;
  use lib::pipelines;

  # create an instance of the console connector
  create connector console from connectors::console;

  # create an instance of the passthrough pipeline
  create pipeline main from pipelines::main;

  # connect the console (STDIN) to our pipeline input
  connect /connector/console/out to /pipeline/main;

  # then connect the pipeline output to the console (STDOUT)

  # no doesn't exist, bad error
  connect /pipeline/main/no to /connector/console/in;

end;
# Deploy the flow so tremor starts it
deploy flow main;

This error focused on the fact that the output did not exist, so to fix the error, I wrote a separate function that checked whether the output existed, then connected it to the ConnectInput structs while sending an error statement and a status report to the system. This process introduced me to focusing on small incremental steps rather than solving the problem as a whole and allowed me to not get overwhelmed. Additionally, outputs had to be added to the Executable Graph struct, which led me to implementing outputs in related functions and building a hashmap for the representation of its graph. With that, the basic problem of catching the errors was solved for that error problem.

After that led to defining what the error was, which involved adding a port struct to both ConnectInput and ConnectOutput to connect inputs. This port struct wasn’t needed before but with changes that were made later on, port proved to be very useful in defining what the port was to be able to be used in other code. Because this field was added to the struct, changes then had to be made to all the functions that used this struct and a port had to be defined in those functions.

Last was implementing where the code was to the user. This focused heavily on adding transmitters and receivers to output the status and mapping the status when the user’s code had errors. The transmitters and receivers, tx and rx had to be added to the ConnectInput and ConnectOutput structs, then weaved together with the other types of code to be defined and used. This involved many functions to be modified in order to return a result. In this process, there were also bugs to be fixed in the preexisting code and tests that had to be corrected in order to make sure tx and rx were properly working. Afterwards, Result::map_err was used in order to map out the results with its parameters through rx if a user inputted errors.

Throughout the programming, I had to search for the files to determine where the code was and what function utilized the next function. This led me to search into files to see where the error travels through and trace where the error goes as I searched through the declarations, definitions, and references of variables and functions. I also got to learn several handy keyboard shortcuts throughout this process as demonstrated from my mentor.

Conclusion

Overall, working with Tremor was extremely fun and valuable as I not only got to contribute to an incredible project, but also got to learn more about programming and the open-source world. I was able to while meeting amazing people to guide me with my work, and I can not imagine it to be any other way.

Support for the ClickHouse database in Tremor

July 7, 2022 · 6 min read

Sasha Pourcelot

Tremor 2022 Spring Mentee

I'm Sasha, a Computer Science student living in south-eastern France. I contributed to Tremor as part of the Database Connector mentorship proposed by the LFX Mentorship Program for Spring 2022. I was mentored by Matthias Wahl and got help from Darach Ennis, Heinz Gies and Ramona Łuczkiewicz.

This blogpost summarizes my work at Tremor as a mentee and shows what could be done next.

About Tremor

Tremor is an event processing system with the goal of allowing users to handle a high volume of messages by viewing them as a stream flowing between different nodes of a graph. Events go in and out of this graph thanks to connectors.

The interactions between each node and the connector configuration are defined using the troy programming language.

Abstract

My mentorship at Tremor was focused on building a connector for the ClickHouse database engine. More specifically, I was focused on the sink part, the one that allows data to flow out of the Tremor application.

ClickHouse is a database management system designed to allow for real-time analysis of high volumes of non-aggregated data. Its initial goal was to power the Yandex.Metrica analytics platform.

The ClickHouse connector

The next subsections describe what I did during the mentorship.

Interacting with a ClickHouse database in Rust

My first goal was to find a way to send requests to a ClickHouse node from a Rust program. It was a good start because it allowed me to experiment on the ClickHouse side without caring about what was happening in Tremor. I spent around three weeks playing on a separate repository named scrabsha/plays-with-clickhouse and getting extensive knowledge from Matthias about how Tremor works from the inside.

I found two Rust crates that could help us send requests to a ClickHouse database: clickhouse and clickhouse-rs. clickhouse is the first one I considered. I had to discard it because it was focused on concrete Rust types and needed to know at compile time what each datatype is composed of. The second crate was a bit more low-level and allowed us to do what we need.

Writing a super simple sink

Once I got every ClickHouse detail right, I started playing with the Tremor side. There were multiple connectors already implemented in Tremor. I picked the simplest ones and started copying parts of it and quickly got something working.

The only challenge I encountered is that Tremor defines its own Value type, representing a value whose type is not really known at compile time, and uses it a lot. It was a bit disconcerting doing such things in Rust, as I felt I was writing dynamically-typed code in a statically-typed language, but I managed to get through it.

Converting Tremor values to ClickHouse values

Once I was able to insert data in a database from Tremor, I started writing a conversion bridge for ClickHouse types, i.e. something that would handle the conversion of native Tremor values to their corresponding ClickHouse types.

Most of the conversions are quite simple: an integer can be obtained from an integer, a string can be obtained from any string, and so on.

Some other conversions were less obvious. For instance, in order to create a ClickHouse IPv4 type, we could either use a string representing the address, or an array of four integers. The goal was to make every ClickHouse type constructible from a Tremor value. I tried to make every conversion as obvious as possible, and to document them as much as possible.

The first implementation of this mapping function was huge. It was about 250 lines of tricky and slightly incorrect code. I managed to rewrite it from scratch using another approach and it got way better.

Working on this specific part led me to open three pull request to the clickhouse-rs crate:

Testing

The conversion function described in the previous subsection was fairly complex. Each possible conversion and cast has been carefully tested in order to ensure that it behaves correctly. These tests were greatly simplified thanks to the use of declarative macro.

Some other tests were focused on testing the sink as a whole, and how it would interact with a ClickHouse database. In this kind of test, we would run ClickHouse Docker containers, create a ClickHouse sink, plug the sink to the container, send some events, and see what has been inserted in the ClickHouse side.

What to do next?

Improving the sink

Automatically getting the table schema

It turns out that ClickHouse has a DESCRIBE TABLE statement, which allows to gather information about each column of a specific table. We currently rely on the end-user to provide us valid information about the table columns. Using this statement would reduce the amount of information we require from them, hence making it simpler to use.

Inserting data concurrently

The current implementation of the ClickHouse container uses a single database connection object, which is reused across insertions. A good way to improve this system is to use multiple concurrent connections, so that we could insert multiple batch of events at once.

A similar pattern has already been implemented in Tremor as part of the elastic connector. As such, most of the required machinery is already there.

The source part

This mentorship was focused on building a sink for ClickHouse. The next step could be to add a source connector, so that coming from a ClickHouse database can be ingested directly into a Tremor application.

ClickHouse has a feature allowing to watch for updates in a given table. This way, we can simulate a stream of data, and make it flow in the Tremor system. Each time some data is inserted in the table, we can retrieve it, convert it into Tremor value and make it flow into the graph of nodes described earlier.

This is something that we totally want to see in Tremor in the future. We can't wait for the WATCH statement to be considered stable.

TDengine colaboration

June 15, 2022 · 2 min read

Tremor Team

The best aspect of open source software is collaboration, not only of systems but of the people involved. It is wonderful to see them come together to take multiple parts to build something bigger out of them.

In this spirit, we had the chance to collaborate on a little project with the crew from TDengine. For those that don’t know them yet, TDengine is an exciting new time-series database focusing on scalability, performance, and using SQL as a way to interact. Better yet using the adapter allows using a number of protocols including the Influx Line Protocol.

This does compliment tremor extremely well, as what we build can be described as an event processing engine focusing on performance and usability, using SQL as a way to manipulate data. I’m pretty sure you can already see how those two pieces fit together.

But, we don’t want to go much into the technical details here the post over at the TDengine blog does a wonderful job at that.

Working with Shuduo on this was a lot of fun, it was a fascinating forth and back of ideas and possible improvements. The integration was super painless with both systems supporting the same protocols, and it was really great to learn some tricks from the experts, like how queries work best in TDengine.

The most interesting part of this is however that both systems not only create the proverbial, larger sum than their parts are but also grow in the process.

And the speed the folks at TDengine do that is amazing, during the time of the collaboration, they not only released an official grafana data source, released a new version of TDengine itself, but also created and published a grafana dashboard for system monitoring that made it extremely easy to add this.

For tremor we have discovered that we’re really starting to miss sliding windows, they have been on the roadmap for a while, but never with any direct requirement other than “it would be nice to have them”. While talking about real-time alerting, sliding windows have significant advantages over tumbling windows, so this collaboration revived that discussion - which means we’ll hopefully add them sooner rather than later.

Automating the tremor release process

June 3, 2022 · 5 min read

Prashant

Tremor 2022 Spring Mentee

It was a pleasant night. I was waiting for LFX to send acceptance/rejection e-mails. And there it was, "Congratulations! You were accepted to CNCF - Tremor" . It was a great and exciting feeling to start this journey! And here I am, at the end of it, writing this blog. It was wonderful, everything that I expected it to be, and even more so in the 3 months! I am writing this blog about my experience in this mentorship.

Introduction

My name's Prashant (Also known as Pimmy on the internet), a 2nd-year university student pursuing my Bachelor's degree in Information Technology. This blog will talk about my project experience in contributing to Tremor as part of LFX Mentorship Program Spring 2022.

The Problem

We all hate manual tasks, don't we? No seriously if anyone loves doing things on their own, it's totally fine. Of course not everything can be automated. But in this case, it was something more tedious. Here's a flowchart for basic explanation:

Flow

This is how it was done, but manually. Each process had to be checked by someone to ensure a smooth sailing. It was quite the work, and so making a release candidate was never easy.

The Approach

The first thing was to divide the tasks into smaller sections and work on this. As my mentors at tremor always used to say, make notes! Keep documenting stuff, really helps. These notes helped me divide the tasks of the current CI process into individual sets of goal, and then I started working on it.

Now I did have to test a lot, 400+ workflows just to get this finally done. So I will explain how the release process works.

Drafting the release

We select which version we want to release, as shown in the code snippet taken from github actions workflow yaml file.

on:
  workflow_dispatch:
    inputs:
      new-version:
        type: choice
        description: "Which version you'd like to release?"
        options:
        - major (_.X.X)
        - minor (X._.X)
        - patch (X.X._)
        - rc (X.X.X-rc)
        - release (removes rc)
        required: true

Extract the version input (we want major, minor, patch, etc without the brackets), and bumping cargo packages, as shown below. As you can see I extracted the old version before the bump, and put it into $GITHUB_ENV , which is creating env variables with these values. Similarly done for new version after the bump. They are needed for creating the PR.

      - name: Extracting version from input
        run: |
          VERSION=$(echo "${{github.event.inputs.new-version}}" | sed 's/ (.*)$//')
          echo "VER=$VERSION" >> $GITHUB_ENV
      - name: Bump new version in TOML files
        run: |
          OLD_VERSION=$(cargo pkgid | cut -d# -f2 | cut -d: -f2)
          echo "OLD=$OLD_VERSION" >> $GITHUB_ENV
          cargo set-version --workspace --bump ${{ env.VER }}
          NEW_VERSION=$(cargo pkgid | cut -d# -f2 | cut -d: -f2)
          echo "NEW=$NEW_VERSION" >> $GITHUB_ENV   

Commit, push, and Pull Request is created automatically with Release tag for the release. From there, the maintainers will do all the necessary reviews, and merge once the CI passes.

Publishing Release

So, the Draft Release pull request is merged, great! It automatically triggers the release workflow, which by the way only works if the PR has the Release tag, and ignores all other. This is achieved using the conditional statement:

 if: github.event.pull_request.merged && contains( github.event.pull_request.labels.*.name, 'Release')

Changelog is automatically extracted using this great workflow action, and the release is made.

      - name: Extract release notes
        id: extract-release-notes
        uses: ffurrer2/extract-release-notes@v1
      - name: Create release
        uses: actions/create-release@v1

To trigger the publish crates workflow, I used this workflow dispatch action as shown below:

      - name: Trigger publish crates workflow
        uses: benc-uk/workflow-dispatch@v1
        with:
          workflow: Publish crates
          token: ${{ secrets.PAT_TOKEN }}

And that's it for the release!

Publishing crates

The Publish crates workflow is now triggered as mentioned in the previous state. There are 4 main crates to be published, and one job to trigger the draft release workflow for tremor-language-server repo (All automated!). Github actions makes it really great to see which job is interconnected.

With all the crates published, including the language-server which follows the exact same process. Tremor has successfully released a new version! Congratulations!

My thoughts

The tremor community has been extremely helpful in guiding me through the entire mentorship. They have this principle of "Never worry, have fun" that will always stay forever with me, and forward in my career. Special mention for Heinz who mentored me throughout the months and helped me. And to the tremor community in general, my thanks to all of them! I didn't know much about github actions or DevOps in general. But now I can confidently say that I can indeed, make processes boring by automating them. I will continue to engage in open source projects, and guide others to the same, cheers!

Tremors - May '22

June 2, 2022 · 4 min read

Gary + The Tremor Team

Welcome, Tremor Enthusiasts! This post is another in a series intended to inform and entertain Tremor Technologists with recent changes in the tremor project. We'll mostly focus these posts on Pull Requests and other notable developments. With these posts, you can stay informed and learn more about the project without having to read pull requests, or wait for release notes.

Have you ever cleaned the house for company, to make it seem like nobody lives there? That’s basically what we’ve been up to getting ready for 0.12.0. It’s a big deal. We're trying to make the codebase spotless with plenty of sweeping up. We also thank our contributors for all the hard work put into automation and new stuff this month!

New Stuff

Most exciting of all: We made the 0.12.0 release! We put a lot of cool stuff in right before releasing, so finish the article before you jump in!

Being able to make a new tremor project has been a pain point from some of our users. We wanted getting started to be as easy as typing a command. Now it is! You can simply call our fancy new command to make a new tremor template. Quick and easy!

Our elasticsearch integrations got some upgrades with support for raw elastic payloads through our connector, and native support for auth between elastic <-> tremor. No more need for custom headers on connectors! Check out the links provided for more details, and we'll update the docs soon.

CI

The CI has been changed heavily in the last month. We have PrimalPimmy on GitHub to thank for much of the contributions. Thank you!

We've tweaked our CI when we create a release that should prevent creating crates unexpectedly. We also trigger many workflows across our projects from a release flow in tremor-runtime. We also decided that publishing in many steps would help to prevent single points of failure.

We can also better provide descriptive options when we create a release. We can also provide better options for publishing.

Lastly, we tweaked how much we use sed when we cut a release. This one is great for a quick example:

cd tremor-script
sed -e "s/^tremor-common = { version = \"${old}\"/tremor-common = { version = \"${new}\"/" -i.release "Cargo.toml"
sed -e "s/^tremor-influx = { version = \"${old}\"/tremor-common = { version = \"${new}\"/" -i.release "Cargo.toml"
sed -e "s/^tremor-value = { version = \"${old}\"/tremor-common = { version = \"${new}\"/" -i.release "Cargo.toml"

Where we used to manually edit each project's .toml file, we now do this as part of our GitHub Actions workflow elsewhere.

Sweeping Up

As we mentioned before, there was a lot of sweeping up we wanted to finish before the newest release.

One of the more interesting fixes we put in was a small bug for tremor run. Something that had gone unnoticed for a while was an upper limit when running tremor scripts. After 150 seconds, the script would time out. A strange and small feature limitation we didn't notice.

We had a lot of work specifically to do on connectors.

Some connectors wrote more events than they should
Edge cases existed where we would acknowledge error events without processing them.
Some bench connectors would not stop as expected.
Show errors on unknown connector types, instead of silently failing.
Make destructive connector actions much more difficult to accidentally stumble upon.

We also removed a limitation we had for code coverage. Where our contributors would have to avoid functionally any drop at all in code coverage, we now allow a .1 percent drop.

If you were a using the tremor api sub command, you should be aware that we've completely deprecated the subcommand.

Bug Fixes

Our bug fixes were pretty few over ht elast month. We had some publishing, connecting, and regular expressions patches. We updated our new code coverage tool, codecov, to more accurately track our coverage. We also found that piping information to tremor would sometimes cause issues and fixed that.

Thank You

The Tremor project as strong as the community around it. Reading this article, making contributions, and generally being involved in the project makes us more successful. Thank you for reading and contributing! See you next time.

Gary, and the Tremor team.

Tremors - April '22

May 3, 2022 · 4 min read

Gary + The Tremor Team

Welcome, Tremor Enthusiasts! This is the first post in a series intended inform and entertain Tremor Technologists with recent changes in the tremor project. We'll mostly focus these posts on Pull Requests and other notable developments. With these posts, you can stay informed and learn more about the project without having to read pull requests, or wait for release notes.

This posts' theme is open and improved communication.

Release Candidates

Where's the next release of Tremor?!

Tremor's release schedule is becoming a bit more regular. We're releasing about twice a year, every 6 to eight months. The next release of Tremor should include an exciting new way to write plugins with the plugin development kit, along with other goodies for faster, well understood tremor scripts and plenty of other performance and bug fixes.

Most notable as an update for us all is that Tremor has a release candidate live. If we're pleased with it, then a new version of tremor is soon to be released!

In this article, Let's dive into three topics: CI, PDK, and performance!

CI

One of my favorite parts of working with tremor, or any project, is using automation. Running a bunch of stuff on my machine can be quicker, but less reliable than a well established automation platform. Tremor makes an effort to improve Continuous integration regularly; and this month is no exception.

One of our active members of the Tremor Community, Pimmy has taken it upon himself to dramatically improve the CI and relase process for tremor-runtime.

The new process will automatically publish a release when we're ready. Using this automatino, we can bump versions appropriately with specially named branches and some fancy GitHub Actions. This lets the Tremor team take full advantage of the seamless GitHub experience from merge -> release, complete with logs and links directly from the Pull Request. We can even extract release notes and bump versions automatically. Once again, truly great work from our community here!

PDK (Plugin Development Kit)

I won't spend too much time on this point, since the lovely marioortizmanero has already taken the time to write out a full blog post or two on the topic. We'll give a full breakdown another time. For now: know that Tremor is currently in the works of creating an easier way for developers to plug their own binaries into the system to run connectors and pipelines.

Performance

Tremor cares a lot about performance. As you may already know from TremorCon 2021's fabulous talks; it was originally created and adopted for performance gains in Wayfair's event processing infrastructure. There are plenty of performance gains to make in a project as large as Tremor, both inside the project and through dependencies.

One such dependency we depend on to parse gigabytes of JSON per second is simd_json. simd_json is a port of the simdjson c++ library into rust. It can not only parse from JSON into Rust, but aid the serde library that can easily serialize and deserialize Rust data structures. That's pretty handy for an event processing project that will have to marshal plenty of events from disparate sources of all data forms! As you might imagine, a project like simd_json comes with a lot of configuration options, and optimizations behind those options.

Further than just the configuration that you might give serde, or simd_json, is the options you may give to the rust compiler. The compiler, by default, compiles for the largest compatability. For best performance on a specific plaform, we make use of the target-feature flag. This will allow us to target specific features available on different platforms and CPUs.

It should be no surprise that we continue this effort within the Tremor team. Know that we also depend on our community, such as a pull request from scarabsha. Sasha idenfitifed additional target-features for the target x86_64-unknown-linux-gnu that speed up the performance of simd-json. There's not a benchmark to show exactly how much faster this made json processing, but we appreciate the fact that it will undoubtedly increase performance for some clients of Tremor.

Thank You

Gary, and the Tremor team.

Collaboration - Using Redpanda with Tremor

December 13, 2021 · 4 min read

Tremor Team

Like all good projects in the open source community, great collaborations start with ad hoc interactions.

Redpanda Rocket

A recent set of discussions between the Tremor maintainers and our community brought Redpanda to our attention.

Some of our community would like to replace their Kafka deployments with Redpanda, a Kafka API-compatible streaming data platform, to realize gains in performance and reduce their total cost of operations. . As we already have Kafka connectivity, enabling this shouldn’t be too complex, right? Let’s find out.

Context

We are always looking out for interesting technology, and recently we read a very interesting article on the Redpanda blog about using WASM as server-side filters for subscription. Being as excitable as we are, we obviously had to give Redpanda a shot.

To our joy this turned out to be painless, Redpanda reuses the Kafka API so our existing connectors for Kafka work out of the box — and as a bonus we could get rid of a whole bunch of docker-compose YAML dancing that we needed to set up Zookeeper.

Alex Gallego, founder of Redpanda, reached out to us and we started experimenting with Tremor and Redpanda in this repo: github.com/tremor-rs/tremor-redpanda

Setting up Tremor and Redpanda

So, let’s get our hands dirty and actually connect Redpanda and tremor in a real-world project.

We have prepared a fully equipped workshop for this occasion. Give it a shot here if you are impatient.

Tremor can flexibly act as a Redpanda/Kafka consumer or producer, make use of auto-commit for offset management or manually commit when events are completely handled by the tremor pipeline. Here we are configuring Tremor only committing offsets when events have been successfully handled.

onramp:
  - id: redpanda-in
    type: kafka
    codec: json
    config:
      brokers:
        - redpanda:9092
      topics:
        - tremor
      group_id: redpanda_es_correlation
      retry_failed_events: false
      rdkafka_options:
        enable.auto.commit: false

This tremor application is reporting success or failure of ingesting the received events into elasticsearch via another Redpanda topic. Configuring this is also straightforward, here we have a Redpanda consumer ready for copy-pasting:

offramp:
  - id: redpanda-out
    type: kafka
    codec: json
    config:
      group_id: tremor-in
      brokers:
        - redpanda:9092
      topic: tremor

Here you go. A fully working setup for orchestrating document ingestion with Redpanda delivering the documents and receiving acknowledgements. For more details check out this Redpanda recipe on our website.

Tremor is designed to be robust when faced with high volumetric data streams. It comes with batteries included for traffic shaping, QoS and data distribution. With those tremor can guarantee at-least-once message delivery. We try to reduce CPU and memory footprint for a given workload and at the same time provide a “just works” experience for operators. And we think we found a soulmate project in Redpanda.

And most importantly, it is working like a charm. In fact we just dropped in Redpanda and expected some hours of troubleshooting, but this hope was cut short by a smooth transition:

104_redpanda_elastic_correlation-tremor_out-1     | 2021-12-10T15:17:01.828694200+00:00 INFO tremor_runtime::system - Binding onramp tremor://localhost/onramp/redpanda-in/01/out
104_redpanda_elastic_correlation-tremor_out-1     | 2021-12-10T15:17:01.830056300+00:00 INFO tremor_runtime::source::kafka - [Source::tremor://localhost/onramp/redpanda-in/01/out] Starting kafka onramp
104_redpanda_elastic_correlation-tremor_out-1     | 2021-12-10T15:17:01.831848100+00:00 INFO tremor_runtime::source::kafka - [Source::tremor://localhost/onramp/redpanda-in/01/out] Subscribing to: ["tremor"]
104_redpanda_elastic_correlation-tremor_out-1     | 2021-12-10T15:17:01.832735600+00:00 INFO tremor_runtime::source::kafka - [Source::tremor://localhost/onramp/redpanda-in/01/out] Subscription initiated...
104_redpanda_elastic_correlation-tremor_out-1     | 2021-12-10T15:17:01.833342+00:00 INFO tremor_runtime::onramp - Onramp tremor://localhost/onramp/redpanda-in/01/out started.
104_redpanda_elastic_correlation-tremor_out-1     | 2021-12-10T15:17:01.828694200+00:00 INFO tremor_runtime::system - Binding onramp tremor://localhost/onramp/redpanda-in/01/out
104_redpanda_elastic_correlation-tremor_out-1     | 2021-12-10T15:17:01.830056300+00:00 INFO tremor_runtime::source::kafka - [Source::tremor://localhost/onramp/redpanda-in/01/out] Starting kafka onramp
104_redpanda_elastic_correlation-tremor_out-1     | 2021-12-10T15:17:01.831848100+00:00 INFO tremor_runtime::source::kafka - [Source::tremor://localhost/onramp/redpanda-in/01/out] Subscribing to: ["tremor"]
104_redpanda_elastic_correlation-tremor_out-1     | 2021-12-10T15:17:01.832735600+00:00 INFO tremor_runtime::source::kafka - [Source::tremor://localhost/onramp/redpanda-in/01/out] Subscription initiated...
104_redpanda_elastic_correlation-tremor_out-1     | 2021-12-10T15:17:01.833342+00:00 INFO tremor_runtime::onramp - Onramp tremor://localhost/onramp/redpanda-in/01/out started.
...

Both in terms of operator friendliness and performance we root for Redpanda.

We started using it in our integration test suite, so users can be 100% sure Redpanda connectivity just works.

Connectors for Streaming to AWS S3

December 4, 2021 · 4 min read

Daksh

Tremor 2021 Fall Mentee

Introduction

Hi folks, I'm Daksh, a senior year CS student at Indian Institute of Technology, Jammu. This blog talks about my project and experience contributing to Tremor as part of LFX Mentorship Program Fall 2021.

Learning about Tremor

I came across rust in early 2020, and I absolutely loved its design, the syntax and how approachable it was to a beginner. I discovered Tremor while looking for open source projects written in rust. Tremor is an event processing system (think kafka) for unstructured data with rich support for structural pattern-matching, filtering and transformation. Over the summers, I did a few minor PR's. Going through the examples and the docs I could set up Tremor and start hacking on!!

My Project

It is very common in event processing to stream data to a persistent storage engine for later processing or archival purposes. My job was to add connectors to stream data to AWS S3. You may find more information in the github issue.

So what is a connector? A connector is the component of an Event Processing System that provides the functionality of communicating with the outside world. This would enable current, and future users of Tremor to now connect and stream events to any endpoint which supports the S3 API.

AWS S3 Connectors

I would explain the sink via an example. To connect to S3, one would require the s3 credentials. Due to lack of support from the sdk only public-secret key credentials are supported (to be extended once the sdk supports other means for credentials). Tremor would read the key names specified in the config from the environment.

s3demo.troy

define flow s3demo
flow
    define connector s3conn from s3 with
    codec="json",
    config={
        "aws_access_token": "AWS_ACCESS_KEY_ID",
        "aws_secret_access_key": "AWS_SECRET_ACCESS_KEY",
        "aws_region": "AWS_REGION",
        "bucket": "tremordemo",
        "min_part_size": 5242880,
    }
    end;

    define connector files3 from file with
    code="json",
    config={
        "mode": "read",
        "path": "sample.json",
    },
    preprocessors=["lines"]
    end;

    define pipeline s3pipe
    pipeline
        define script s3Event
            script
            let e = event;
            let $s3 = {
                "key": e.key
                };
            let payload = e.payload;
            payload
            end;

        create script s3Event;

        select event from in into s3Event;
        select event from s3Event into out;

    end;

    create connector s3conn;
    create connector files3;
    create pipeline s3pipe;

    connect /connector/files3 to /pipeline/s3pipe;
    connect /pipeline/s3pipe to /connector/s3conn;

end;

deploy flow s3demo;

sample.json

{"key": "key1", "payload": {"event1": "hello1", "key2":[1,2,3,4,5]} }
{"key": "key2", "payload": {"event3": {"nested Obj": ["vec1", "vec2", "vec3"]}} }
{"key": "key3", "payload": {"event3": null}}

This configuration reads the file sample.json delimited by lines for events. The s3pipe pipeline destructures the line contents to set the data for the object to upload and its key as meta-data. The s3-sink would then upload the data to AWS S3 with key set to $key inside the bucket tremordemo or anything that is given in the config

Sample Working

The sink also has the min_part_size configuration parameter. S3 support uploading larger objects in multiple parts. One can send multiple events with the same key consecutively, and Tremor would append the content of all those events, and whenever the content size gets larger than the min_part_size, a part is uploaded to s3. Whenever the key changes or Tremor stops, the upload for the previous key is completed.

Ending Thoughts

I had a very productive and fun time with the Tremor Community. The Tremor principle of "never worry about it" has helped me to deal with clueless moments during this mentorship. I would like to express my regards and gratitude to Matthias, Heinz, and Darach for giving me this wonderful opportunity and helping me develop as an open-source contributor and as a joyful person. A special thanks to Matthias for being there to clarify my doubts and fix my mistakes and for being really helpful. I would continue to be a part of the Tremor Community and hope to engage with more newcomers to open-source. I would wish to be part of future CNCF events. You may see me around at the Tremor Discord.

Case Study - Dead Code Detection in PHP

November 9, 2021 · 2 min read

Ramona Łuczkiewicz

Identified Need

We have a humongous PHP application - around 20 million lines of code. It’s believed that there are certain parts of the codebase that are no longer used, but it’s hard to reliably find out which ones. Due to the dynamic nature of PHP, attempts at static analysis have failed. We decided that dynamic analysis was needed, and created a PHP extension that logs all the calls to all functions and methods. That’s a lot of data, and we used tremor to aggregate it.

Required Outcome

We need to be able to aggregate a lot of data (the extension sends one message for each HTTP request, which can be at the order of hundreds per second for some servers), counting the number of calls to each function or method in the codebase and send it to our service which does further aggregation and presents the data. High Level Architecture

Characteristics

We used Unix Sockets (which were added as a source to tremor during this project) and then a simple script that aggregates the data. We’re hoping to use custom aggregate functions in the future (there’s an open RFC, waiting for the official voting) to further simplify the pipeline.

Conclusion

We are currently testing the solution with smaller applications, seeing minimal performance impact and no impact on stability. We were able to delete unused code from those applications without affecting their operation.

Case Study - Unified Observability Platform

November 8, 2021 · 6 min read

The Tremor Team

As Wayfair's technology organization modernizes its infrastructure services to meet new volumetric peaks they continually adapt, adopt and atrophy out systems and services.

This is particularly difficult with shared infrastructure and services that are ever-present but seldom seen by many of the engineers in the organization.

The rise of OpenTelemetry provides an opportunity for developers to consolidate on SDKs that are consistent across programming languages and frameworks. For our SRE's and operators it offers ease of integration and ease of migration of services.

Tremor is a core enabling component of Wayfair's migration strategy. Tremor supports but is endpoint, protocol and service agnostic. This allows our operational teams to switch from on-premise to cloud native services with minimal coordination with others.

This also allows switching from in-house to managed or out of the box services supported by cloud providers.

Moving 1000s of developers to a new technology stack when you are operating continuously with no downtime and at a large scale is hard.

Identified Need

Tremor emerged from production needs in existing systems, specifically in the logging and metrics or observability domains in a large hypergrowth and heavily speciated and specialized production environment that operates 24x7x365.

As this infrastructure is migrated from a largely on-premise data-centre based environment to a cloud native environment and the wider technology organization it operates within doubles in size, change is inevitable. You can bet your roadmap on it.

One great emerging de facto standard is the observability sub-domain that cloud-native computing is propelling forward with standards such as the CNCF’s OpenTelemetry.

Tremor added initial support for OpenTelemetry in February 2021.

For decades, observability - whether logging, metrics or distributed tracing has been common. But, lacking in a unified approach, lacking in a core set of interoperable and interworking standards.

OpenTelemetry has good support for capturing and distributing logs, metrics and trace spans. It has sufficient out of the box to suit the most common needs, most of the time. It affords developers an opportunity to rationalize libraries and frameworks around shared concepts, shared understanding and shared effort. This is especially valuable in large, complex production operations such as Wayfair’s eCommerce production estate.

Languages and platforms change over time. System and component infrastructure are continuously evolving. As SaaS environments continue to evolve and innovate new forms of observability; or production operations, system reliability engineers or network operations teams who often need to move faster than the pace of application or infrastructure developers can develop new software and systems - this is a hard problem.

By targeting OpenTelemetry - developers can depend on a consistent set of SDKs for most common observability needs. Production focused teams can depend on a consistent wire-format and protocol allowing them to move from data-centre to the cloud, and rewire the infrastructure and services just in time.

Through OpenTelemetry, we can normalize the concepts, the code and the inner workings of our cloud-based systems and services - whilst introducing far more flexibility than ever before as observability services and software vendors provide a standards-based and interoperable path.

It is a win-win for all concerned.

Required Outcome

High level architecture - current system

In production at Wayfair, logging and metrics data distribution pipelines are all handled by tremor. With OpenTelemetry, distributed tracing can now be standardized across our production estate.

Solution

Standardize an Tremor and OpenTelemetry together.

Characteristics

The introduction of OpenTelemetry allows developers to standardize on Observability in applications, services and code for logging, metrics and distributed tracing. As OpenTelemetry is natively supported in tremor, it means only minor configuration changes to our existing logging and metrics services.

The new OpenTelemetry normal

With OpenTelemetry, distributed tracing can benefit from the same traffic shaping and adaptive rate limiting as the rest of our observability stack. The unification of the source, collector and distribution tiers via kafka provides scalable and recoverable telemetry shipping and distribution.

Tremor adds adaptive load shedding and traffic shaping that are tunable in production. Tremor also allows legacy observability frameworks to co-exist with OpenTelemetry for a gradual migration across the programming languages and frameworks in production use. Finally, tremor enables multiple downstream services to participate in the overall solution.

As a heterogeneous ecosystem of interconnected services - the unified observability platform based on tremor has no preferred upstream or downstream endpoint. It is system and service agnostic. It is bendable.

This allows teams that prefer the GCP ecosystem to normalize to those native services for visualization, debugging and troubleshooting. For our ElasticSearch population, Elastic’s APM may be a better alternative. For other teams, and our operational staff and folk with more ad hoc needs - DataDog may be preferable.

It’s a cloud native decentralized rock’n’roll observability world out there. Getting 5000 and growing engineers to choose a single observability path is impossible. So, as the population cannot bend, the unified observability leans on tremor for this purpose.

As improved services, frameworks and methods are onboarded our tremor-based systems can be incrementally adjusted to meet changing demands.

Insights

This application of tremor does not introduce new features or capabilities per se.

However, it is the first unified tremor-based system that spans the entire observability spectrum. It centralizes common capabilities and facilities for greater operational freedom, whilst decentralizing the point-to-point endpoint connectivity for the widest applicability across our production estate.

As tremor exposes OpenTelemetry as a client, and as an embedded server - it is effectively used to disintermediate, interpose and intework with legacy environments and to standardise on OpenTelemetry.

It does this as an incremental update. Existing users have time to migrate their systems to the new OpenTelemetry-based best-practice. Existing processes, practices and battle-tested systems are maintained.

Operators have better tools to manage the production estate and to tune capacity, performance and cost.

This is also the first tremor-based system where tremor is a key architectural primitive allowing our observability community to bend to the changing needs of our development and operational community with minimal effort, and at short notice.

Conclusion

As tremor expands to new domains such as search, service orchestration and supply-chain and logistics to name but a few - our early adopters in the logging domain have evolved from using tremor as a point solution for traffic shaping - to building our entire observability infrastructure based on tremor.

New domains will extend tremor’s capabilities in multi-participant transaction processing and distributed orchestration. The now unified observability domain will further expand and extract capabilities that enhance modularity and flexibility of tremor to build large distributed systems with a relatively simple and easy to program and growing set of languages designed for large-scale event-based processing.

Introduction​

About the Project

The Mentorship Journey

Starting Tremor​

Understanding Code​

Error Handling​

pipeline_out_error.troy​

Conclusion

About Tremor

Abstract

The ClickHouse connector

Interacting with a ClickHouse database in Rust​

Writing a super simple sink​

Converting Tremor values to ClickHouse values​

Testing​

What to do next?

Improving the sink​

Automatically getting the table schema​

Inserting data concurrently​

The source part​

Introduction​

The Problem​

The Approach​

Drafting the release​

Publishing Release​

Publishing crates​

My thoughts​

New Stuff​

CI​

Sweeping Up​

Bug Fixes​

Thank You​

Release Candidates​

CI​

PDK (Plugin Development Kit)​

Performance​

Thank You​

Context​

Setting up Tremor and Redpanda​

Introduction​

Learning about Tremor​

My Project​

AWS S3 Connectors​

s3demo.troy​

sample.json​

Ending Thoughts​

Identified Need​

Required Outcome​

Characteristics​

Conclusion​

Identified Need​

Required Outcome​

Solution​

Characteristics​

Insights​

Conclusion​

Introduction

Starting Tremor

Understanding Code

Error Handling

pipeline_out_error.troy

Interacting with a ClickHouse database in Rust

Writing a super simple sink

Converting Tremor values to ClickHouse values

Testing

Improving the sink

Automatically getting the table schema

Inserting data concurrently

The source part

Introduction

The Problem

The Approach

Drafting the release

Publishing Release

Publishing crates

My thoughts

New Stuff

CI

Sweeping Up

Bug Fixes

Thank You

Release Candidates

CI

PDK (Plugin Development Kit)

Performance

Thank You

Context

Setting up Tremor and Redpanda

Introduction

Learning about Tremor

My Project

AWS S3 Connectors

s3demo.troy

sample.json

Ending Thoughts

Identified Need

Required Outcome

Characteristics

Conclusion

Identified Need

Required Outcome

Solution

Characteristics

Insights

Conclusion