Flume vs Kafka

Abhijeet dhumal 6/27/2015 3 Comments Big Data , Data ingestion , difference , Flume , Kafka , kafka vs flume Edit

ID	KAFKA	FLUME
1	Kafka is a publish-subscribe model messaging system, which offers strong durability, scalabitity and fault-tolerance support.	Flume is a distributed, reliable, and available system for efficiently collecting, aggregating, and moving large amounts of data from many different sources to a centralized data store, such as HDFS

2	Kafka Provides a back pressure to prevent overflowing a broker	Flume/Flume NG doesn’t provide any such a functionality

3	With Kafka you pull data, so each consumer has and manages it's own read pointer. This allows a large number of consumers of each Kafka queue, that pull data at their own pace. With this, you could deliver your event streams to HBase, Cassandra, Storm, Hadoop, RDBMS all in parallel.	To get data out of Flume, you use a sink, which writes to your target store (HDFS, HBase, Cassandra etc). Flume will re-try connections to your sinks if they are offline. Because Flume pushes data, you have to do some interesting work to sink data to two data stores

5	With Kafka 0.8+ you get replication of your event data. If you lose a broker node, others will take up the slack to delivery your events without loss.	With Flume & FlumeNG, and a File channel, if you loose a broker node you will lose access to those events until you recover that disk. The database channel with Flume is reported too slow for any production use cases at volume.

6	Kafka just provides messaging	Flume Provides number of pre built collectors

7	Flume’s main use-case is to ingest data into Hadoop. It is tightly integrated with Hadoop’s monitoring system, file system, file formats, and utilities such a Morphlines. A lot of the Flume development effort goes into maintaining compatibility with Hadoop. Sure, Flume’s design of sources, sinks and channels mean that it can be used to move data between other systems flexibly, but the important feature is its Hadoop integration.	Kafka’s main use-case is a distributed publish-subscribe messaging system. Most of the development effort is involved with allowing subscribers to read exactly the messages they are interested in, and in making sure the distributed system is scalable and reliable under many different conditions. It was not written to stream data specifically for Hadoop, and using it to read and write data to Hadoop is significantly more challenging than it is in Flume.

8	Use Flume if you have an non-relational data sources such as log files that you want to stream into Hadoop.	Use Kafka if you need a highly reliable and scalable enterprise messaging system to connect many multiple systems, one of which is Hadoop.

3 comments:

Unknown15 June 2017 at 00:05
The difference between flume and kafka are explained well mu sincere thanks for sharing this post and please continue to share this kind of post
Hadoop Training in Chennai
Unknown19 June 2017 at 00:32
informative blog has been shared by you. before i read this blog i didn't have any knowledge about this but now i got some knowledge so keep on sharing such kind of an interesting blog.
hadoop training in chennai
amar24 August 2017 at 02:01
it's nice
blog

Big Data Analytics

Flume vs Kafka

CONVERSATION

3 comments:

About me

Popular Posts

Big Data Analytics

Flume vs Kafka

CONVERSATION

3 comments:

About me

Popular Posts

Follow me

Like us

Sponsor

Blog Archive

Hadoop

Categories

Definition List

Text Widget

Contributors