Flume vs Kafka

IDKAFKA FLUME
1Kafka is a publish-subscribe model messaging system, which offers strong durability, scalabitity and fault-tolerance support.Flume is a distributed, reliable, and available system for efficiently collecting, aggregating, and moving large amounts of data from many different sources to a centralized data store, such as HDFS
   
2Kafka Provides a back pressure to prevent overflowing a brokerFlume/Flume NG doesn’t provide any such a functionality
   
3With Kafka you pull data, so each consumer has and manages it's own read pointer. This allows a large number of consumers of each Kafka queue, that pull data at their own pace. With this, you could deliver your event streams to HBase, Cassandra, Storm, Hadoop, RDBMS all in parallel.To get data out of Flume, you use a sink, which writes to your target store (HDFS, HBase, Cassandra etc). Flume will re-try connections to your sinks if they are offline. Because Flume pushes data, you have to do some interesting work to sink data to two data stores
   
5With Kafka 0.8+ you get replication of your event data. If you lose a broker node, others will take up the slack to delivery your events without loss.With Flume & FlumeNG, and a File channel, if you loose a broker node you will lose access to those events until you recover that disk. The database channel with Flume is reported too slow for any production use cases at volume.
   
6Kafka just provides messagingFlume Provides number of pre built collectors
   
7Flume’s main use-case is to ingest data into Hadoop. It is tightly integrated with Hadoop’s monitoring system, file system, file formats, and utilities such a Morphlines. A lot of the Flume development effort goes into maintaining compatibility with Hadoop. Sure, Flume’s design of sources, sinks and channels mean that it can be used to move data between other systems flexibly, but the important feature is its Hadoop integration.Kafka’s main use-case is a distributed publish-subscribe messaging system. Most of the development effort is involved with allowing subscribers to read exactly the messages they are interested in, and in making sure the distributed system is scalable and reliable under many different conditions. It was not written to stream data specifically for Hadoop, and using it to read and write data to Hadoop is significantly more challenging than it is in Flume.
   
8Use Flume if you have an non-relational data sources such as log files that you want to stream into Hadoop.Use Kafka if you need a highly reliable and scalable enterprise messaging system to connect many multiple systems, one of which is Hadoop.

CONVERSATION

3 comments:

  1. The difference between flume and kafka are explained well mu sincere thanks for sharing this post and please continue to share this kind of post
    Hadoop Training in Chennai

    ReplyDelete
  2. informative blog has been shared by you. before i read this blog i didn't have any knowledge about this but now i got some knowledge so keep on sharing such kind of an interesting blog.
    hadoop training in chennai

    ReplyDelete