Apache Kafka: December 2019

3. KAFKA : Sample Program1

Exaple 1:

Producer - send million message (Java program)
Consumer - receive million message (Java program)
https://www.youtube.com/watch?v=1Og9n9FJteM

########################################################################

STEP1:

https://git-scm.com/download/win
-> Git-2.24.1.2-64-bit.exe
-> Set path to -> C:\Program Files\Git\bin

STEP2:

Get sample kafka program from GIT:
https://github.com/mapr-demos/kafka-sample-programs
-> cd kafka/SampleProgram/
-> git clone https://github.com/mapr-demos/kafka-sample-programs.git
-> //It creates folder called kafka/SampleProgram/kafka--sample-programs

STEP3:

Apache kafka download - kafka.apache.org/downloads.html
-> Scala 2.13 - kafka_2.13-2.4.0.tgz
-> http://apachemirror.wuchna.com/kafka/2.4.0/kafka_2.13-2.4.0.tgz
-> Unzip kafka_2.13-2.4.0.tgz
-> cd kafka_2.13-2.4.0/

-> START ZOOKEEPER:
$ .\bin\zookeeper-server-start.sh .\config\zookeeper.properties

-> START KAFKA SERVER:
$ .\bin\kafka-server-start.sh .\config\server.properties &

-> CREATE TOPIC:
$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic fast-messages
$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic summary-markers

-> Check list of topics available in kafka serrver
$ bin/kafka-topics.sh --list --zookeeper localhost:2181

-> BUILD PRODUCER/CONSUMER program
$ cd ..
$ mvn package //it will compile and create /target foler

-> Run PRODUCER program
$ target/kafka-example producer

-> Run CONSUMER program
$ target/kafka-example consumer

2. KAFKA ARCHITECTURE

export http_proxy=www-**************.com:80
export https_proxy=www-*************.com:80
wget www.google.com
//download kafka directly in Workstation
wget http://www-us.apache.org/dist/kafka/2.2.1/kafka_2.12-2.2.1.tgz

//inside kafka_2.13-2.4.0\config
zookeeper.properties //zookeeper can be downloaded independent also
dataDir=/tmp/zookeeper //(default store in /tmp folder) - Physically maintain all Znode informations...
clientPort=2181 //on which port zookeeper is running...

server.properties //every broker one server.properties will be there
broker.id=0 //mandatory: unique broker should be there, minimum property
listeners=PLAINTEXT://:9092 //9092 is the port produceer and consumer can communicate
log.dirs=/tmp/kafka-logs
log.retention.hours=168
log.segment.bytes=1073741824 //if segment reaches this size then go that segment to one file | | | | |

1. KAFKA INTRODUCTION

Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache software foundation.

- Independent tool nothing to do with HDFS
- runs on its own cluster
- Once Msgs receives by kafka it store on multiple m/c just like how HDFS stores, but it has its own cluster.

- Kafka acts like Receiver:
###################################################################
Requirement : Capturing the stream:

Expectation : Receiver must run all the time
Data processing

Normal Spark Receiver/Flume:
-----------------------------------------------------------
- Normal Spark Receiver runs on commodity hardware, if it goes down one more comes, but by that time no one is there to capture the coming data during that time.
- Normal spark receiver - if data coming rate is high it will burst (same thing happened with Indian railway, they developed using Flume)
- Runs on one m/c (JVM), but we can run 2 Receiver, but both receive the data, there will be duplicate copies and uses more resource in cluster.
- If Receiver configure for every 5 sec, process should happen with in that 5 sec so that it can take another message, but if process itself takes 10 sec then ...

Kafka/RabitMQ:
-------------------------------------------
Kafka -
- Its NOT Master/slave
- In case of Fault tolerance : It does Replication of messages
- what ever msg we send back end it will be stored in binary as file system
- Able to handle millions of message per sec easily (linked in get millions per sec)
- Stream processing is available (Kafka confluent) : Later kafka started giving Kafka processing engine also (same like spark), little bit of processing we can do in kafka itself.
Retension: by default it preserve the streams for 7 days, and preserve in array of bytes, message will not be removed irrespective of consumer consumed or not, once retension comes then only it will removed
- retension time is over
- retension size is over
Messages can be (csv,txt,avro...)
Header
----------

Body

###################################################################

1 Tweet (1msg should convert to Byte Array) -> Kafka(Recieve always as Byte array) -> Byte Array -----> Consumer - Poll (Time : 100ms, size: 100kb)
If message size is > 100kb => will throw error, so that we need to increase message size

If message size is < 100 kb => other message also come with that

Tuesday, December 31, 2019

3. KAFKA : Sample Program1

3. KAFKA : Sample Program1

Exaple 1:

2. KAFKA ARCHITECTURE

1. KAFKA INTRODUCTION

1. KAFKA INTRODUCTION