6 posts tagged with "cncf"

Hygienic error handling and validation for pipelines

September 24, 2022 · 7 min read

Tremor 2022 Summer Mentee

Introduction

Hello, my name's Carol Geng, a current sophomore pursuing a Bachelor's degree in Computer Science at Texas A&M University. Over Summer 2022, I have contributed to Tremor as part of the LFX Mentorship Program with my mentors Heinz Gies and Matthias Wahl, and this blog will show how valuable and enjoyable the experience was.

About the Project

Tremor is an event processing engine that uses pipelines and connectors for data to be passed through. However, errors could be made by the user when linking the ports of the pipelines, and those errors are not easily displayed to the user. There were cases where the location of the error wasn’t printed, nothing was printed at all, or even cases where the program would work as if everything was all fine. My mentorship project was focused on creating the messages that would be displayed to the user in a clear and concise manner through the console.

The Mentorship Journey

Before this mentorship, I had only worked in open-source a few times on projects and had no knowledge of how to code in Rust. This mentorship therefore allowed me to build on what I already knew while also giving me an opportunity to learn a new language. Throughout this mentorship, I was able to get a start in DevOps and learn how to work on the compiler and source code for a software program while contributing to the user experience of everyone who uses Tremor and whoever will use Tremor in the future.

Starting Tremor

To learn how to contribute to Tremor’s source code, I had to first understand how Tremor worked and what it did. I was introduced to pipelines, ports, and scripts and was allowed to mess around with the code. Tremor’s code allowed the user to go into several commands to process and change the input into the desired output. Through playing around with Tremor, I decided to add a function to reverse a string with Tremor’s pipelines. Because tremor did not have a built-in reverse string function, I went into the source code to implement a reverse function in Rust along with several tests and documentation for the reverse function. This relatively simple function is where I made my first pull request and also where I first started getting familiar with Tremor.

Additionally, I had used other programming languages in my academic career, and therefore did not have any experience with Rust beforehand. This is when I got introduced to Rustlings, which had small exercises to help me with starting with Rust syntax and writing code in Rust. There were also a lot of functions unique to Rust in terms of how the code worked, which included enums, move semantics, and structs. This also included constructs like Ok() and concepts like async/await which I would end up utilizing in my project.

Understanding Code

A large portion of my mentorship focused on understanding the errors that were printed out. There were several different kinds of errors, such as ones that dealt with the console input, console output, pipeline input, and more, which validated whether the input or output existed in the code and where it led to, another connector or the console. While this may seem simple, this ended up leading me into the rabbit hole of the source code as I traversed through numerous files and functions to determine where the error went through and what I could do to the preexisting code to read and perform functions onto the error.

To do a lot of this, I learned the process of error logging. I defined each error there in great detail, such as what it is, what it does currently, what it’s supposed to do, and gradually updated it every time it was needed. Throughout implementing messages to these errors, I also learned more about version control with git. While I did have practice on the basics like git pull and git push, I also learned about git rebase, git merge, and more.

Error Handling

Error handling was the main focus of the project. As my mentor suggested, I focused on one error at a time from the error logs. First, I focused on the worst case- a case where nothing was printed and the program ran as if everything was normal, except there was no output.

pipeline_out_error.troy

# Our main flow
define flow main
flow
  # import the `tremor::connectors` module
  use tremor::connectors;
  use lib::pipelines;

  # create an instance of the console connector
  create connector console from connectors::console;

  # create an instance of the passthrough pipeline
  create pipeline main from pipelines::main;

  # connect the console (STDIN) to our pipeline input
  connect /connector/console/out to /pipeline/main;

  # then connect the pipeline output to the console (STDOUT)

  # no doesn't exist, bad error
  connect /pipeline/main/no to /connector/console/in;

end;
# Deploy the flow so tremor starts it
deploy flow main;

This error focused on the fact that the output did not exist, so to fix the error, I wrote a separate function that checked whether the output existed, then connected it to the ConnectInput structs while sending an error statement and a status report to the system. This process introduced me to focusing on small incremental steps rather than solving the problem as a whole and allowed me to not get overwhelmed. Additionally, outputs had to be added to the Executable Graph struct, which led me to implementing outputs in related functions and building a hashmap for the representation of its graph. With that, the basic problem of catching the errors was solved for that error problem.

After that led to defining what the error was, which involved adding a port struct to both ConnectInput and ConnectOutput to connect inputs. This port struct wasn’t needed before but with changes that were made later on, port proved to be very useful in defining what the port was to be able to be used in other code. Because this field was added to the struct, changes then had to be made to all the functions that used this struct and a port had to be defined in those functions.

Last was implementing where the code was to the user. This focused heavily on adding transmitters and receivers to output the status and mapping the status when the user’s code had errors. The transmitters and receivers, tx and rx had to be added to the ConnectInput and ConnectOutput structs, then weaved together with the other types of code to be defined and used. This involved many functions to be modified in order to return a result. In this process, there were also bugs to be fixed in the preexisting code and tests that had to be corrected in order to make sure tx and rx were properly working. Afterwards, Result::map_err was used in order to map out the results with its parameters through rx if a user inputted errors.

Throughout the programming, I had to search for the files to determine where the code was and what function utilized the next function. This led me to search into files to see where the error travels through and trace where the error goes as I searched through the declarations, definitions, and references of variables and functions. I also got to learn several handy keyboard shortcuts throughout this process as demonstrated from my mentor.

Conclusion

Overall, working with Tremor was extremely fun and valuable as I not only got to contribute to an incredible project, but also got to learn more about programming and the open-source world. I was able to while meeting amazing people to guide me with my work, and I can not imagine it to be any other way.

Automating the tremor release process

June 3, 2022 · 5 min read

Prashant

Tremor 2022 Spring Mentee

It was a pleasant night. I was waiting for LFX to send acceptance/rejection e-mails. And there it was, "Congratulations! You were accepted to CNCF - Tremor" . It was a great and exciting feeling to start this journey! And here I am, at the end of it, writing this blog. It was wonderful, everything that I expected it to be, and even more so in the 3 months! I am writing this blog about my experience in this mentorship.

Introduction

My name's Prashant (Also known as Pimmy on the internet), a 2nd-year university student pursuing my Bachelor's degree in Information Technology. This blog will talk about my project experience in contributing to Tremor as part of LFX Mentorship Program Spring 2022.

The Problem

We all hate manual tasks, don't we? No seriously if anyone loves doing things on their own, it's totally fine. Of course not everything can be automated. But in this case, it was something more tedious. Here's a flowchart for basic explanation:

Flow

This is how it was done, but manually. Each process had to be checked by someone to ensure a smooth sailing. It was quite the work, and so making a release candidate was never easy.

The Approach

The first thing was to divide the tasks into smaller sections and work on this. As my mentors at tremor always used to say, make notes! Keep documenting stuff, really helps. These notes helped me divide the tasks of the current CI process into individual sets of goal, and then I started working on it.

Now I did have to test a lot, 400+ workflows just to get this finally done. So I will explain how the release process works.

Drafting the release

We select which version we want to release, as shown in the code snippet taken from github actions workflow yaml file.

on:
  workflow_dispatch:
    inputs:
      new-version:
        type: choice
        description: "Which version you'd like to release?"
        options:
        - major (_.X.X)
        - minor (X._.X)
        - patch (X.X._)
        - rc (X.X.X-rc)
        - release (removes rc)
        required: true

Extract the version input (we want major, minor, patch, etc without the brackets), and bumping cargo packages, as shown below. As you can see I extracted the old version before the bump, and put it into $GITHUB_ENV , which is creating env variables with these values. Similarly done for new version after the bump. They are needed for creating the PR.

      - name: Extracting version from input
        run: |
          VERSION=$(echo "${{github.event.inputs.new-version}}" | sed 's/ (.*)$//')
          echo "VER=$VERSION" >> $GITHUB_ENV
      - name: Bump new version in TOML files
        run: |
          OLD_VERSION=$(cargo pkgid | cut -d# -f2 | cut -d: -f2)
          echo "OLD=$OLD_VERSION" >> $GITHUB_ENV
          cargo set-version --workspace --bump ${{ env.VER }}
          NEW_VERSION=$(cargo pkgid | cut -d# -f2 | cut -d: -f2)
          echo "NEW=$NEW_VERSION" >> $GITHUB_ENV   

Commit, push, and Pull Request is created automatically with Release tag for the release. From there, the maintainers will do all the necessary reviews, and merge once the CI passes.

Publishing Release

So, the Draft Release pull request is merged, great! It automatically triggers the release workflow, which by the way only works if the PR has the Release tag, and ignores all other. This is achieved using the conditional statement:

 if: github.event.pull_request.merged && contains( github.event.pull_request.labels.*.name, 'Release')

Changelog is automatically extracted using this great workflow action, and the release is made.

      - name: Extract release notes
        id: extract-release-notes
        uses: ffurrer2/extract-release-notes@v1
      - name: Create release
        uses: actions/create-release@v1

To trigger the publish crates workflow, I used this workflow dispatch action as shown below:

      - name: Trigger publish crates workflow
        uses: benc-uk/workflow-dispatch@v1
        with:
          workflow: Publish crates
          token: ${{ secrets.PAT_TOKEN }}

And that's it for the release!

Publishing crates

The Publish crates workflow is now triggered as mentioned in the previous state. There are 4 main crates to be published, and one job to trigger the draft release workflow for tremor-language-server repo (All automated!). Github actions makes it really great to see which job is interconnected.

With all the crates published, including the language-server which follows the exact same process. Tremor has successfully released a new version! Congratulations!

My thoughts

The tremor community has been extremely helpful in guiding me through the entire mentorship. They have this principle of "Never worry, have fun" that will always stay forever with me, and forward in my career. Special mention for Heinz who mentored me throughout the months and helped me. And to the tremor community in general, my thanks to all of them! I didn't know much about github actions or DevOps in general. But now I can confidently say that I can indeed, make processes boring by automating them. I will continue to engage in open source projects, and guide others to the same, cheers!

Connectors for Streaming to AWS S3

December 4, 2021 · 4 min read

Daksh

Tremor 2021 Fall Mentee

Introduction

Hi folks, I'm Daksh, a senior year CS student at Indian Institute of Technology, Jammu. This blog talks about my project and experience contributing to Tremor as part of LFX Mentorship Program Fall 2021.

Learning about Tremor

I came across rust in early 2020, and I absolutely loved its design, the syntax and how approachable it was to a beginner. I discovered Tremor while looking for open source projects written in rust. Tremor is an event processing system (think kafka) for unstructured data with rich support for structural pattern-matching, filtering and transformation. Over the summers, I did a few minor PR's. Going through the examples and the docs I could set up Tremor and start hacking on!!

My Project

It is very common in event processing to stream data to a persistent storage engine for later processing or archival purposes. My job was to add connectors to stream data to AWS S3. You may find more information in the github issue.

So what is a connector? A connector is the component of an Event Processing System that provides the functionality of communicating with the outside world. This would enable current, and future users of Tremor to now connect and stream events to any endpoint which supports the S3 API.

AWS S3 Connectors

I would explain the sink via an example. To connect to S3, one would require the s3 credentials. Due to lack of support from the sdk only public-secret key credentials are supported (to be extended once the sdk supports other means for credentials). Tremor would read the key names specified in the config from the environment.

s3demo.troy

define flow s3demo
flow
    define connector s3conn from s3 with
    codec="json",
    config={
        "aws_access_token": "AWS_ACCESS_KEY_ID",
        "aws_secret_access_key": "AWS_SECRET_ACCESS_KEY",
        "aws_region": "AWS_REGION",
        "bucket": "tremordemo",
        "min_part_size": 5242880,
    }
    end;

    define connector files3 from file with
    code="json",
    config={
        "mode": "read",
        "path": "sample.json",
    },
    preprocessors=["lines"]
    end;

    define pipeline s3pipe
    pipeline
        define script s3Event
            script
            let e = event;
            let $s3 = {
                "key": e.key
                };
            let payload = e.payload;
            payload
            end;

        create script s3Event;

        select event from in into s3Event;
        select event from s3Event into out;

    end;

    create connector s3conn;
    create connector files3;
    create pipeline s3pipe;

    connect /connector/files3 to /pipeline/s3pipe;
    connect /pipeline/s3pipe to /connector/s3conn;

end;

deploy flow s3demo;

sample.json

{"key": "key1", "payload": {"event1": "hello1", "key2":[1,2,3,4,5]} }
{"key": "key2", "payload": {"event3": {"nested Obj": ["vec1", "vec2", "vec3"]}} }
{"key": "key3", "payload": {"event3": null}}

This configuration reads the file sample.json delimited by lines for events. The s3pipe pipeline destructures the line contents to set the data for the object to upload and its key as meta-data. The s3-sink would then upload the data to AWS S3 with key set to $key inside the bucket tremordemo or anything that is given in the config

Sample Working

The sink also has the min_part_size configuration parameter. S3 support uploading larger objects in multiple parts. One can send multiple events with the same key consecutively, and Tremor would append the content of all those events, and whenever the content size gets larger than the min_part_size, a part is uploaded to s3. Whenever the key changes or Tremor stops, the upload for the previous key is completed.

Ending Thoughts

I had a very productive and fun time with the Tremor Community. The Tremor principle of "never worry about it" has helped me to deal with clueless moments during this mentorship. I would like to express my regards and gratitude to Matthias, Heinz, and Darach for giving me this wonderful opportunity and helping me develop as an open-source contributor and as a joyful person. A special thanks to Matthias for being there to clarify my doubts and fix my mistakes and for being really helpful. I would continue to be a part of the Tremor Community and hope to engage with more newcomers to open-source. I would wish to be part of future CNCF events. You may see me around at the Tremor Discord.

Property Based Testing of Tremor Script

July 6, 2021 · 8 min read

Rohit Dandamudi

Tremor 2021 Spring Mentee

Introduction

Hey, I am Rohit Dandamudi from India, about to complete my undergrad in CSE and will be working as a Software Engineer soon. I will be sharing my expereince at Tremor :)

Main motivation for applying

My work involved writing "Property-based tests for tremor-script" and some of the reasons for applying are:

It involved a new type of testing I never heard of
Be part of a sandbox project where I can learn and grow with the community
The concept of learning Erlang + Rust was very interesting to me and frankly out of my comfort zone, as a person used to Python and web development in general.

New concepts I learned specific to my work

Erlang and Rust
- My work mostly revolved around Erlang and a little Rust and I was completely new to this ecosystem, it didn't help to not find much resources or actively accessible community for Erlang.
- I took this as a challenge and went through various resources to learn Erlang, functional programming in general and I was able to see why this Language was involved to do the task at hand, my mentor is very passionate about Erlang and shared his thought-process, experience which helped me broaden my knowledge and how to approach any concept while learning something completely new.

Support for the Syslog Protocol

July 5, 2021 · 4 min read

Nupur Agrawal

Tremor 2021 Spring Mentee

Introduction

Hey folks, I am Nupur Agrawal, a third year student at Indian Institute of Technology Roorkee. This blog describes my experience of contributing to Tremor, CNCF sandbox project in the 2021 spring chapter of LFX Mentorship Program, under the mentorship of Matthias Wahl, Anup Dhamala and Heinz Gies.

Project Abstract

Tremor is an event processing system originally designed for the needs of platform engineering and infrastructure. It is built for the users that have a high message volume to deal with and want to build pipelines to process, route, or limit this event stream.

At the beginning of the program, I was given walkthrough of the project by Matthias and he patiently explained me the components and working of tremor. Tremor is nicely documented and the docs can be very useful for referring many things.

My Project

My project's aim was to enable tremor to receive and send Syslog Protocol Messages, a standard protocol used to send system log or event messages. It was desired to support both the standard IETF format and the old BSD format via UDP and TCP/TLS. More detailed description can be found here.

Google Cloud Storage and Pub/Sub Connectors

June 29, 2021 · 13 min read

Jigyasa Khaneja

Tremor 2021 Spring Mentee

Introduction

Hello folks! I'm Jigyasa, a final-year computer science engineering student at Indira Gandhi Delhi Technical University for Women pursuing my bachelor's in Technology. This blog is about my experience contributing to Tremor as part of the LFX Mentorship program. i

Learning about Tremor

Tremor is an event processing system for unstructured data with rich support for structural pattern matching, filtering, and transformation. It is built for users that have a high message volume to deal with and want to build pipelines to process, route, or limit this event stream. It has a scripting language called tremor-script and a query language as well called tremor-query or trickle.

I had never worked on an event processing system before this internship. In fact, my first major contribution to open-source was through this mentorship program. To get started with it, my mentor Darach Ennis, suggested me some documents that helped me learn more about it:

/docs/overview/ (deprecated)

/docs/course (deprecated)

Apart from that, learning more about the tremor-query, tremor-script, and going through the workshops in the docs can be really helpful.

The codebase of Tremor is in Rust, and since I had no prior experience with Rust, I started learning the language.

Introduction​

About the Project

The Mentorship Journey

Starting Tremor​

Understanding Code​

Error Handling​

pipeline_out_error.troy​

Conclusion

Introduction​

The Problem​

The Approach​

Drafting the release​

Publishing Release​

Publishing crates​

My thoughts​

Introduction​

Learning about Tremor​

My Project​

AWS S3 Connectors​

s3demo.troy​

sample.json​

Ending Thoughts​

Introduction​

Main motivation for applying​

New concepts I learned specific to my work​

Introduction​

Project Abstract​

My Project​

Introduction​

Learning about Tremor​

Introduction

Starting Tremor

Understanding Code

Error Handling

pipeline_out_error.troy

Introduction

The Problem

The Approach

Drafting the release

Publishing Release

Publishing crates

My thoughts

Introduction

Learning about Tremor

My Project

AWS S3 Connectors

s3demo.troy

sample.json

Ending Thoughts

Introduction

Main motivation for applying

New concepts I learned specific to my work

Introduction

Project Abstract

My Project

Introduction

Learning about Tremor