Blog | Conduit | Data Integration for Production Data Stores

Stream Inspector

December 15, 2022 · 2 min read

In this guide, we are going to learn how use Conduit's stream inspector. Stream inspection is available via the Conduit UI and the API.

UI

To access the stream inspector through the UI, first navigate to the pipeline which you'd like to inspect. Then, click on the connector in which you're interested. You'll see something similar to this:

stream inspector pipeline view

Click the "Inspect Stream" button to start inspecting the connector. A new pop-up window will show the records:

stream inspector show stream

On the "Stream" tab you'll see the latest 10 records. If you switch to the "Single record" view, only the last record will be shown. You can use the "Pause" button to pause the inspector and stop receiving the latest record(s). The ones that are already shown will be kept so you can inspect them more thoroughly.

API

To access the stream inspector through the API, you'll need a WebSocket client (for example wscat). The URL on which the inspector is available comes in the following format: ws://host:port/v1/connectors/<connector ID>/inspect. For example, if you run Conduit locally with the default settings, you can inspect a connector by running the following command:

$ wscat -c ws://localhost:8080/v1/connectors/pipeline1:destination1/inspect | jq .
{
  "result": {
    "position": "NGVmNTFhMzUtMzUwMi00M2VjLWE2YjEtMzdkMDllZjRlY2U1",
    "operation": "OPERATION_CREATE",
    "metadata": {
      "opencdc.readAt": "1669886131666337227"
    },
    "key": {
      "rawData": "NzQwYjUyYzQtOTNhOS00MTkzLTkzMmQtN2Q0OWI3NWY5YzQ3"
    },
    "payload": {
      "before": {
        "rawData": ""
      },
      "after": {
        "structuredData": {
          "company": "string 1d4398e3-21cf-41e0-9134-3fe012e6d1fb",
          "id": 1534737621,
          "name": "string fbc664fa-fdf2-4c5a-b656-d52cbddab671",
          "trial": true
        }
      }
    }
  }
}

The above command also uses jq to pretty-print the output. You can also use jq to decode Base64-encoded strings, which may represent record positions, keys or payloads:

wscat -c ws://localhost:8080/v1/connectors/pipeline1:destination1/inspect | jq '.result.key.rawData |= @base64d'

How to build a Conduit Connector

April 5, 2022 · 8 min read

@anaptfox

In this article, we are going to walk through, step by step, how to build a Conduit connector.

Conduit connectors communicate with Conduit by either writing records into the pipeline (source connector) and/or the other way around (destination connector).

For this example, we are going to build an Algolia destination connector. The goal of this connector is to give the user the ability to send data to Algolia. In the context of search engines, this is called indexing. Since Conduit is a generic tool to move data between data infrastructure, with this new connector we can index data from any Conduit Source (PostgreSQL, Kafka, etc.).

You may find this full example on GitHub.

Let's build!

Using Kafka Connect Connectors with Conduit

April 5, 2022 · 3 min read

@anaptfox

The Conduit Kafka Connect Wrapper connector is a special connector that allows you to use Kafka Connect connectors with Conduit. Conduit doesn't come bundled with Kafka Connect connectors, but you can use it to bring any Kafka Connect connector with Conduit.

This connector gives you the ability to:

Easily migrate from Kafka Connect to Conduit.
Remove Kafka as a dependency to move data between data infrastructure.
Leverage a datastore if Conduit doesn't have a native connector.

Since the Conduit Kafka Connect Wrapper itself is written in Java, but most of Conduit's connectors are written in Go, it also serves as a good example of the flexbilty of the Conduit Plugin SDK.

Let's begin.

How it works

To use the Kafka Connect wrapper connector, you'll need to:

Clone the conduit-kafka-connect-wrapper repository.
Build the Connector JAR.
Download Kafka Connect JARs and any dependencies you would like to add.
Create a Conduit pipeline.
Add the Connector to pipeline.

How to test Conduit's REST API with Swagger UI

March 9, 2022 · 3 min read

@anaptfox

By default, Conduit ships with a REST API that allows you to automate the creation of data pipelines and connectors. To make it easy to get started with the API, we have provided a Swagger UI to visualize and interact with the Condiut without having to write any code...yet 😉.

After you start Conduit, if you navigate to http://localhost:8080/openapi/, you will see a page that looks like this:

Then, after you test the API, you can write code to make the equilivent request. For example, here is how you would make a request using the axios Node.js library.

const config = {
    type: 'TYPE_SOURCE',
    plugin: `${pkgPath}/pkg/plugins/pg/pg`,
    pipelineId: pipeline.id,
    config: {
        name: 'pg',
        settings: {
            table: pgTable,
            url: pgUrl,
            cdc: 'false',
        },
    },
}

const response = await axios.post(`http://localhost:8080/v1/connectors`, config)

Esentially, the API is everything you'd need to auomate pipeline creation. Let's begin.

Starting Conduit

To get started, you need to install and start Conduit. You may even add Conduit to your $PATH.

./conduit

To open the Swagger UI, open your browser and navigate to http://localhost:8080/openapi. This UI allows you to interact with the API and create connectors. It also serves as a reference for the API.

Making a Request

The API lets you manage all parts of Conduit. For example, all we need to create and start a pieline are these three APIs:

Create Pipelines - POST /v1/pipelines
Create Connectors - POST /v1/connectors
Start/Stop Pipelines POST /v1/pipelines/{id}/start

Let's use the Swagger UI to create a pipeline.

First, find the create pipeline API, and select "Try it out":

Update the body of the request with your new pipeline details:

In this case, the config describes the name and the description of the new pipeline:

{
    "config": {
        "name": "string",
        "description": "string"
    }
}

Select "Execute", notice the response of the request:

For every request, you will be able to try it out, see the body of the request, and the expected response.

What's Next

Now that you know how to try out the API, you can explore Conduit with these other resources:

How to add Conduit to your Path

March 7, 2022 · 2 min read

@anaptfox

Adding Conduit to your path makes it easy to start Conduit locally on your machine. From anywhere in your terminal, you’ll be able to run:

Deploying Conduit to an AWS EC2 instance

February 22, 2022 · 6 min read

Before you get started, Conduit should not be listening on any ports that are accessible to the public Internet. In this guide we'll set up a Conduit instance that can be reached through the public Internet via SSH. We'll start at the top with a fresh AWS Account. In the end, the topology will look like this:

AWS EC2 Topology

Create a new VPC

Start by logging into the AWS console, in your region of choice, and navigate to the VPC service. Click the Create VPC button at the top of the page.

Create VPC

We'll select the Create VPC and More option within the page. This option will create all the default networking needed to create a VPC that can route traffic internally to it and reach out to the internet. But, the VPC will not be able to accept connections from outside the VPC to it. You'll need to add at least one NAT Gateway to the VPC to make that possible. If you don't, you will not be able to access the instance via SSH.

Select NAT Gateway

You can leave the rest of the defaults and click Create VPC. Make sure that once you've completed the VPC setup, you get a copy of the VPC ID. We'll need this in the rest of the setup. We'll use vpc-1234 as a placeholder value in this guide.

Launch a new EC2 Instance in your VPC

With the VPC set up, it's time to launch an instance that'll run Conduit. Navigate to the EC2 Service in the AWS console and create a new instance:

Launch EC2

We recommend using the Amazon Linux 2 AMI, SSD Volume Type as the base for your instance. The AMI should be selected by default. A t2.medium instance is sufficient for most production setups. Your resource requirements might be higher depending on how many systems you'll need to connect.

T2 Medium

In the setup of the EC2 instance, you'll want to click on the Create new key pair button. This will create a new SSH key for access to the instance. Don't forget to give it a name to be able to identify it within AWS.

Create Key Pair

After you hit create, you'll download a .pem file for use on your machine. From here, you'll want to select the VPC that you just created in the previous step. Also, make sure you select a public subnet:

Select VPC

Now you're ready to launch the instance.

Launch Successful

Install Conduit on the Instance

Before we can begin installing Conduit, we need to be able to SSH into the newly created instance. Grab the Public IPv4 DNS address from the EC2 Instance detail page.

EC2 Instance

Let's SSH into the instance. With the newly created Key Pair, follow these commands. Don't forget to change PUBLIC_IP_V4_ADDRESS to the address that you got from the EC2 instance page and change LOCATION_OF_PEM_FILE to place where it exists on your machine:

$ export SSH_KEY=LOCATION_OF_PEM_FILE
$ chmod 400 $SSH_KEY
$ ssh -i $SSH_KEY ec2-user@PUBLIC_IP_V4_ADDRESS

We're going to need to download the latest released version of Conduit. You can go the releases page on the Conduit repo in GitHub and find the version that you need, or you can copy and paste this command on the machine to do it for you:

$ TAG=$(curl -s https://api.github.com/repos/ConduitIO/conduit/releases/latest | grep "tag_name\": \"v[0-9.]*" | grep -oE '[.0-9]+') && curl -o conduit.tgz -L "https://github.com/ConduitIO/conduit/releases/download/v${TAG}/conduit_${TAG}_Linux_x86_64.tar.gz"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 16.1M  100 16.1M    0     0  11.7M      0  0:00:01  0:00:01 --:--:-- 26.3M

What's going on in that command

There's a lot going on in that command. Let's break it down. The first part will find what the latest version is and put it into a variable:

$ TAG=$(curl -s https://api.github.com/repos/ConduitIO/conduit/releases/latest | grep "tag_name\": \"v[0-9.]*" | grep -oE '[.0-9]+')

Once we've got the variable, it gets used in the next part which will download the specific version number from the GitHub downloads for Linux. If you needed to do this for a different operating system (Windows or Darwin), you'd change that in the file name. A .tgz file will then be downloaded to your machine.

$ curl -o conduit.tgz -L "https://github.com/ConduitIO/conduit/releases/download/v${TAG}/conduit_${TAG}_Linux_x86_64.tar.gz"

Finish up Installation

Now that we've got conduit.tgz downloaded, we need to expand the archive:

$ tar zxvf conduit.tgz
x LICENSE.md
x README.md
x conduit

Then we can copy the conduit binary to a place where it can be accessed by the system. We'll put it in /usr/bin.

$ sudo cp conduit /usr/bin

Next we'll create a place for Conduit to store all of it's temporary files:

$ sudo mkdir -p /var/lib/conduit
$ sudo chown -R ec2-user /var/lib/conduit

Conduit has now been installed on your EC2 instance but we're not quite done. Let's set up a few more configuration items before we start running pipelines.

Add Conduit to Systemd and Start it!

Any time the machine needs to restart or if the process fails, you'll want the machine to restart Conduit. We'll want to add a startup script to the machine. Make sure that you've created an SSH connection to the instance first. Now you're ready to create the appropriate systemd folders and create the unit file:

$ sudo mkdir -p /usr/local/lib/systemd/system
$ sudo chown -R ec2-user /usr/local/lib/systemd/system

$ printf "[Unit]\nDescription=Conduit daemon\n\n[Service]\nType=simple\nUser=ec2-user\nWorkingDirectory=/var/lib/conduit\nExecStart=/usr/bin/conduit\n\n[Install]\nWantedBy=multi-user.target" >> /usr/local/lib/systemd/system/conduit.service

An example of a systemd script can also be found in the Conduit repo itself in the /scripts directory.

Now that we have the script ready to go, we can add Conduit to systemd:

$ sudo systemctl enable conduit.service

Great! Conduit is now being managed by the operating system. Any time you have a restart or something else, Conduit will automatically be restarted.

If you are choosing to use a different Linux distro than the one provided by Amazon, you may be working with a different init and system manager. How to add Conduit to other managers is outside the scope of this guide.

Finally, start up Conduit!

$ sudo systemctl start conduit.service

Connect to Conduit

Now that Conduit is running on the EC2 instance within your VPC, you'll need to connect to it via SSH and the forward a port from your machine to the Conduit instance.

Let's start by connecting to the instance. Don't forget to change the PUBLIC_IP_V4_ADDRESS to the one that you got from the EC2 Instance detail page:

$ ssh -i $SSH_KEY -L 8080:localhost:8080 -N ec2-user@PUBLIC_IP_V4_ADDRESS

You're now able to create pipelines!

Conduit Screen

Troubleshooting

If you run into any issues, please join us on discord or create a discussion on the Conduit repo in GitHub.

Real-time Pipeline: File to File

January 24, 2022 · 3 min read

@anaptfox

In this guide, we will build a data pipline that moves data between files. This example is a great to get started with Conduit on a local machine, but it's also the foundation of use cases such as log aggregation.

Kafka to Postgres Conduit Pipeline

Everytime that data is appended to the src.log, data will be move in real-time to dest.log.

Real-time Pipeline: Kafka to PostgresSQL

January 24, 2022 · 4 min read

Dylan Lott

Conduit Engineer

In this guide, we will build a data pipline that moves data between a Kafka topic and a Postgres table. We will also use Docker to run local instances of Apache Kafka and Postgres.

Kafka to Postgres Conduit Pipeline

UI​

API​

How it works​

Starting Conduit​

Making a Request​

What's Next​

Create a new VPC​

Launch a new EC2 Instance in your VPC​

Install Conduit on the Instance​

What's going on in that command​

Finish up Installation​

Add Conduit to Systemd and Start it!​

Connect to Conduit​

Troubleshooting​

UI

API

How it works

Starting Conduit

Making a Request

What's Next

Create a new VPC

Launch a new EC2 Instance in your VPC

Install Conduit on the Instance

What's going on in that command

Finish up Installation

Add Conduit to Systemd and Start it!

Connect to Conduit

Troubleshooting