Skip to content

Diagrams

System Architecture Diagram

graph TD
  subgraph Data Sources
    A[Web Server Logs]
    B[Social Media Feeds]
    C[Customer Reviews]
  end

  subgraph NiFi
    D[NiFi Cluster]
  end

  subgraph Kafka
    E[Kafka Brokers]
    F[Topic: Web Server Logs]
    G[Topic: Social Media Feeds]
    H[Topic: Customer Reviews]
  end

  subgraph Elasticsearch
    I[Elasticsearch Nodes]
    J[Index: Web Server Logs]
    K[Index: Social Media Feeds]
    L[Index: Customer Reviews]
  end

  subgraph Kibana
    M[Kibana Server]
  end

  A --HTTP--> D
  B --API--> D
  C --HTTP--> D

  D --Flowfiles--> E
  E --Kafka Connect--> F
  E --Kafka Connect--> G
  E --Kafka Connect--> H

  F --Kafka--> I
  G --Kafka--> I
  H --Kafka--> I

  I --Elasticsearch API--> J
  I --Elasticsearch API--> K
  I --Elasticsearch API--> L

  M --Kibana API--> J
  M --Kibana API--> K
  M --Kibana API--> L

Data Flow Diagram

graph TD
  subgraph Data Sources
    A[Web Server Logs]
    B[Social Media Feeds]
    C[Customer Reviews]
  end

  subgraph NiFi
    D[NiFi Cluster]
    E[Processor: Extract Fields]
    F[Processor: Convert Timestamps]
    G[Processor: Enrich with GeoIP]
    H[Processor: Merge Records]
    I[Processor: Filter Records]
    J[Processor: Route Records]
  end

  subgraph Kafka
    K[Kafka Brokers]
    L[Topic: Web Server Logs]
    M[Topic: Social Media Feeds]
    N[Topic: Customer Reviews]
  end

  subgraph Elasticsearch
    O[Elasticsearch Nodes]
    P[Index: Web Server Logs]
    Q[Index: Social Media Feeds]
    R[Index: Customer Reviews]
  end

  subgraph Kibana
    S[Kibana Server]
    T[Dashboard: Web Server Logs]
    U[Dashboard: Social Media Feeds]
    V[Dashboard: Customer Reviews]
  end

  A --> D
  B --> D
  C --> D

  D -- Flowfiles --> E
  E -- Flowfiles --> F
  F -- Flowfiles --> G
  G -- Flowfiles --> H
  H -- Flowfiles --> I
  I -- Flowfiles --> J

  J -- Kafka Producer --> K
  K -- Kafka Consumer --> L
  K -- Kafka Consumer --> M
  K -- Kafka Consumer --> N

  L -- Kafka Connect --> O
  M -- Kafka Connect --> O
  N -- Kafka Connect --> O

  O -- Elasticsearch API --> P
  O -- Elasticsearch API --> Q
  O -- Elasticsearch API --> R

  S -- Kibana API --> T
  S -- Kibana API --> U
  S -- Kibana API --> V

Sequence Diagram

sequenceDiagram
  participant Data Source
  participant NiFi
  participant Kafka
  participant Elasticsearch
  participant Kibana
  participant Logstash
  participant Filebeat
  participant Beats
  participant API

  Data Source ->> NiFi: Send data
  Note over NiFi: Data is processed\nand transformed
  NiFi ->> Kafka: Send data
  Note over Kafka: Data is stored\nin topic partitions
  Kafka ->> Logstash: Forward data
  Note over Logstash: Data is transformed\nand enriched
  Logstash ->> Elasticsearch: Index data
  Note over Elasticsearch: Data is stored\nin shards
  Elasticsearch ->> Kibana: Serve data
  Note over Kibana: Data is visualized\nand analyzed

  Filebeat ->> Logstash: Forward data
  Beats ->> Logstash: Forward data

  API ->> Elasticsearch: Query data
  Elasticsearch ->> API: Return data

Tooling Overview

What benefits does logstash provide when compared to nifi?

Logstash and NiFi are both data processing and integration tools, but they have different strengths and use cases.

Logstash is a tool for collecting, transforming, and enriching data. It allows you to easily ingest data from various sources, apply filters and transformations to the data, and output it to various destinations. Logstash provides a rich set of plugins and filters that allow you to parse, manipulate, and enrich data in various ways. Logstash is particularly well-suited for text-based data, such as log files, and can be used to extract fields, convert data types, and perform other text-processing tasks.

On the other hand, NiFi is a tool for moving and transforming data in real-time. It provides a visual interface for building data flows, which can be used to ingest data from various sources, transform and enrich the data, and output it to various destinations. NiFi has a wide range of processors and connectors that can be used to build complex data flows, and it can handle both structured and unstructured data.

In terms of benefits, Logstash has a strong focus on data parsing, filtering, and transformation, which makes it well-suited for processing text-based data. It has a large number of plugins and filters that make it easy to transform data in a variety of ways. Logstash also has a low learning curve, which makes it easy to get started with.

NiFi, on the other hand, provides a more comprehensive data integration and management platform, with a focus on real-time data processing. Its visual interface makes it easy to build and manage complex data flows, and it has a wide range of processors and connectors that make it easy to integrate with various data sources and destinations. NiFi also provides robust security and data governance features, which make it a good choice for organizations with strict data privacy and compliance requirements.

Overall, both Logstash and NiFi are powerful tools for data integration and processing, and the choice between them will depend on your specific use case and requirements.

What are the strengths and weaknesses of deploying NiFi on a Kubernetes cluster? Is there a good pattern for success?

Deploying NiFi on a Kubernetes cluster has both strengths and weaknesses, which depend on your specific use case and requirements.

Strengths:

  • Scalability: Kubernetes makes it easy to scale up or down the number of NiFi instances depending on the workload. This allows for better resource utilization and reduces costs.
  • Resilience: Kubernetes provides features such as automatic failover and self-healing, which help to ensure high availability and reduce downtime.
  • Portability: Kubernetes provides a consistent deployment environment across different platforms, which makes it easier to move NiFi instances between different environments (e.g., from development to production).
  • Automation: Kubernetes allows for the automation of deployment and management tasks, which reduces the need for manual intervention and improves the efficiency of the deployment process.

Weaknesses:

  • Complexity: Deploying NiFi on a Kubernetes cluster requires knowledge of both NiFi and Kubernetes. This can be challenging for organizations that do not have expertise in both technologies.
  • Resource overhead: Kubernetes introduces additional overhead in terms of resource usage and management, which can affect performance and increase costs.
  • Networking complexity: NiFi requires complex networking configurations, which can be challenging to set up and manage in a Kubernetes environment.

To successfully deploy NiFi on a Kubernetes cluster, it is important to follow some best practices, such as:

  • Use a statefulset deployment: This ensures that each NiFi instance has a unique hostname and persistent storage, which is required for clustering and data retention.
  • Use persistent storage: NiFi requires persistent storage for data retention, and Kubernetes provides several options for this, such as using persistent volumes or cloud storage.
  • Configure resource limits: Kubernetes allows you to set resource limits for NiFi instances, which can help to prevent resource contention and ensure stable performance.
  • Use a load balancer: Kubernetes provides several options for load balancing NiFi instances, such as using a load balancer service or an ingress controller. This helps to distribute the workload evenly across the NiFi instances.

What are strengths and weaknesses for deploying Kafka on a Kubernetes cluster? Is there a good pattern for success?

Deploying Kafka on a Kubernetes cluster also has strengths and weaknesses, which depend on your specific use case and requirements.

Strengths:

  • Scalability: Kubernetes makes it easy to scale up or down the number of Kafka brokers depending on the workload. This allows for better resource utilization and reduces costs.
  • Resilience: Kubernetes provides features such as automatic failover and self-healing, which help to ensure high availability and reduce downtime.
  • Portability: Kubernetes provides a consistent deployment environment across different platforms, which makes it easier to move Kafka instances between different environments (e.g., from development to production).
  • Automation: Kubernetes allows for the automation of deployment and management tasks, which reduces the need for manual intervention and improves the efficiency of the deployment process.

Weaknesses:

  • Complexity: Deploying Kafka on a Kubernetes cluster requires knowledge of both Kafka and Kubernetes. This can be challenging for organizations that do not have expertise in both technologies.
  • Networking complexity: Kafka requires complex networking configurations, which can be challenging to set up and manage in a Kubernetes environment.
  • Storage requirements: Kafka requires large amounts of persistent storage, which can be costly and difficult to manage in a Kubernetes environment.

To successfully deploy Kafka on a Kubernetes cluster, it is important to follow some best practices, such as:

  • Use a statefulset deployment: This ensures that each Kafka broker has a unique hostname and persistent storage, which is required for clustering and data retention.
  • Use persistent storage: Kafka requires persistent storage for storing messages and metadata, and Kubernetes provides several options for this, such as using persistent volumes or cloud storage.
  • Configure resource limits: Kubernetes allows you to set resource limits for Kafka brokers, which can help to prevent resource contention and ensure stable performance.
  • Use a load balancer: Kubernetes provides several options for load balancing Kafka brokers, such as using a load balancer service or an ingress controller. This helps to distribute the workload evenly across the Kafka brokers.

I would like to consider integrating NiFi and Kafka into a service mesh as well, such as Istio.

For both NiFi and Kafka, I would like to understand the following in detail:

  • What would this process look like?
  • Does it provide a reasonable benefit?
  • Are there any significant challenges?

Integrating NiFi and Kafka into a service mesh such as Istio can provide benefits such as improved observability, security, and traffic management. However, there are also some challenges that need to be considered.

Integrating NiFi with Istio:

  • Process: To integrate NiFi with Istio, you would need to deploy NiFi as a Kubernetes deployment or statefulset, and then deploy an Istio sidecar container alongside each NiFi container. The sidecar container can then intercept NiFi traffic and enable Istio features such as traffic routing, load balancing, and telemetry. You would also need to configure Istio to recognize NiFi as a service, and apply Istio policies to control traffic between NiFi and other services in the mesh.
  • Benefit: Integrating NiFi with Istio can provide benefits such as improved visibility into NiFi traffic, improved security through mutual TLS and access control policies, and improved traffic management through Istio’s load balancing and routing features.
  • Challenges: One challenge of integrating NiFi with Istio is that NiFi can be sensitive to network latency and packet loss, which can be introduced by the additional network hops required by the sidecar container. To minimize these effects, you can configure Istio to use the direct envoy-to-envoy communication mode, which reduces the number of network hops. Another challenge is that NiFi can have complex networking requirements, which can be challenging to configure in Istio.

Integrating Kafka with Istio:

  • Process: To integrate Kafka with Istio, you would need to deploy Kafka as a Kubernetes deployment or statefulset, and then deploy an Istio sidecar container alongside each Kafka broker container. The sidecar container can then enable Istio features such as traffic routing, load balancing, and telemetry. You would also need to configure Istio to recognize Kafka as a service, and apply Istio policies to control traffic between Kafka and other services in the mesh.
  • Benefit: Integrating Kafka with Istio can provide benefits such as improved visibility into Kafka traffic, improved security through mutual TLS and access control policies, and improved traffic management through Istio’s load balancing and routing features.
  • Challenges: One challenge of integrating Kafka with Istio is that Kafka has complex networking requirements, which can be challenging to configure in Istio. For example, Kafka requires specific ports to be open for communication between brokers, and Istio’s default network policies may interfere with this communication. Another challenge is that Istio’s mTLS encryption can introduce additional latency and overhead, which can affect Kafka’s performance.

NiFi hostname challenges

I am seeing that Apache NiFi poses some challenges to get it running on Istio. These challenges come from the clustering requirements it has. For example, there is a requirement that cluster components must be uniquely addressable using cluster-wide host names. This requirement conflicts with Istio’s requirement that workloads bind and listen on localhost / 127.0.0.1 within the pod. What are your recommendations for these challenges?

Source: https://istio.io/v1.7/faq/applications/

To overcome this challenge, one potential solution is to use an Istio Gateway to route traffic to the NiFi service. The gateway can be configured to use a fixed IP address, which can then be used as the unique cluster-wide hostname for NiFi. This IP address can be assigned to a Kubernetes service that forwards traffic to the NiFi pods. By using the Istio gateway, you can still take advantage of Istio’s traffic management and security features while addressing NiFi’s clustering requirements.

Another option is to use a custom DNS configuration that maps the unique hostnames used by NiFi to the local IP address within the pod. This would require modifying the DNS configuration within the Kubernetes cluster to ensure that the unique hostnames used by NiFi are resolved to the correct IP addresses.

It’s also worth noting that the most recent versions of NiFi include support for running in a Kubernetes environment, which may alleviate some of the challenges associated with integrating with Istio.

Istio sequence diagram

sequenceDiagram
    participant Client
    participant Gateway
    participant NiFi
    participant Kafka
    participant IstioIngressGateway
    participant IstioPilot
    participant IstioMixer
    participant IstioCitadel
    participant IstioSidecar
    participant KubernetesAPI
    participant Node01
    participant Node02

    Client->>+Gateway: HTTPS Request
    Gateway->>+IstioIngressGateway: HTTPS Request
    IstioIngressGateway->>-Gateway: HTTPS Response
    Gateway->>+IstioPilot: Envoy Proxy Request
    IstioPilot->>-Gateway: Envoy Proxy Response
    Gateway->>+IstioMixer: Envoy Proxy Request
    IstioMixer->>-Gateway: Envoy Proxy Response
    Gateway->>+IstioCitadel: Envoy Proxy Request
    IstioCitadel->>-Gateway: Envoy Proxy Response
    Gateway->>+NiFi: HTTPS Request
    NiFi->>-Gateway: HTTPS Response
    Gateway->>+Kafka: Kafka Request
    Kafka->>-Gateway: Kafka Response
    Gateway->>+IstioSidecar: Envoy Proxy Request
    IstioSidecar->>+KubernetesAPI: Kubernetes API Request
    KubernetesAPI->>+Node01: Pod-to-Pod Traffic
    Node01->>-KubernetesAPI: Pod-to-Pod Traffic Response
    KubernetesAPI->>+Node02: Pod-to-Pod Traffic
    Node02->>-KubernetesAPI: Pod-to-Pod Traffic Response
    IstioSidecar->>-Gateway: Envoy Proxy Response
    Gateway->>-Client: HTTPS Response

In this diagram, we start with a client making an HTTPS request that is routed through the gateway and the Istio Ingress Gateway. The request is then intercepted by the Envoy proxy in the gateway, which sends requests to Istio Pilot, Mixer, and Citadel for routing and security purposes. The request is then forwarded to the NiFi service and Kafka service through the gateway.

When the request is sent to Kafka, it is forwarded through the gateway to the Kafka service. In addition, the request is intercepted by the Istio sidecar proxy, which makes a request to the Kubernetes API server to forward the request to the correct Kafka pod. This pod-to-pod traffic is routed through the nodes in the Kubernetes cluster before being returned to the Istio sidecar proxy, which forwards the response through the gateway to the client.

It’s worth noting that this is just one potential data flow through a complex network with NiFi, Kafka, and Istio. The specific implementation may vary depending on the specific use case and configuration of the network.

Using virtual services

sequenceDiagram
    participant Client
    participant Gateway
    participant NiFi
    participant Kafka
    participant IstioIngressGateway
    participant IstioPilot
    participant IstioMixer
    participant IstioCitadel
    participant IstioSidecar
    participant KubernetesAPI
    participant Node01
    participant Node02

    Client->>+Gateway: HTTPS Request
    Gateway->>+IstioIngressGateway: HTTPS Request
    IstioIngressGateway->>-Gateway: HTTPS Response
    Gateway->>+IstioPilot: Envoy Proxy Request
    IstioPilot->>-Gateway: Envoy Proxy Response
    Gateway->>+IstioMixer: Envoy Proxy Request
    IstioMixer->>-Gateway: Envoy Proxy Response
    Gateway->>+IstioCitadel: Envoy Proxy Request
    IstioCitadel->>-Gateway: Envoy Proxy Response
    Gateway->>+IstioSidecar: Virtual Service Request
    IstioSidecar->>-Gateway: Virtual Service Response
    Gateway->>+NiFi: Virtual Service Request
    NiFi->>-Gateway: Virtual Service Response
    Gateway->>+Kafka: Virtual Service Request
    Kafka->>-Gateway: Virtual Service Response
    Gateway->>+IstioSidecar: Envoy Proxy Request
    IstioSidecar->>+KubernetesAPI: Kubernetes API Request
    KubernetesAPI->>+Node01: Pod-to-Pod Traffic
    Node01->>-KubernetesAPI: Pod-to-Pod Traffic Response
    KubernetesAPI->>+Node02: Pod-to-Pod Traffic
    Node02->>-KubernetesAPI: Pod-to-Pod Traffic Response
    IstioSidecar->>-Gateway: Envoy Proxy Response
    Gateway->>-Client: HTTPS Response

In this updated diagram, we’re using Istio virtual services to route traffic to NiFi and Kafka. The request from the client still goes through the gateway and the Istio Ingress Gateway. The request is then intercepted by the Envoy proxy in the gateway, which sends requests to Istio Pilot, Mixer, and Citadel for routing and security purposes.

Once the request reaches the Istio sidecar proxy, it is intercepted and forwarded to the appropriate virtual service. If the request is intended for NiFi, the request is forwarded through the gateway to the NiFi virtual service. If the request is intended for Kafka, the request is forwarded through the gateway to the Kafka virtual service.

When the request is sent to Kafka, it is forwarded through the gateway to the Kafka virtual service. In addition, the request is intercepted by the Istio sidecar proxy, which makes a request to the Kubernetes API server to forward the request to the correct Kafka pod. This pod-to-pod traffic is routed through the nodes in the Kubernetes cluster before being returned to the Istio sidecar proxy, which forwards the response through the gateway to the client.


Last update : 28 avril 2023
Created : 28 avril 2023