Home
Archive for
2015
Flume vs Kafka
ID | KAFKA | FLUME |
---|---|---|
1 | Kafka is a publish-subscribe model messaging system, which offers strong durability, scalabitity and fault-tolerance support. | Flume is a distributed, reliable, and available system for efficiently collecting, aggregating, and moving large amounts of data from many different sources to a centralized data store, such as HDFS |
2 | Kafka Provides a back pressure to prevent overflowing a broker | Flume/Flume NG doesn’t provide any such a functionality |
3 | With Kafka you pull data, so each consumer has and manages it's own read pointer. This allows a large number of consumers of each Kafka queue, that pull data at their own pace. With this, you could deliver your event streams to HBase, Cassandra, Storm, Hadoop, RDBMS all in parallel. | To get data out of Flume, you use a sink, which writes to your target store (HDFS, HBase, Cassandra etc). Flume will re-try connections to your sinks if they are offline. Because Flume pushes data, you have to do some interesting work to sink data to two data stores |
5 | With Kafka 0.8+ you get replication of your event data. If you lose a broker node, others will take up the slack to delivery your events without loss. | With Flume & FlumeNG, and a File channel, if you loose a broker node you will lose access to those events until you recover that disk. The database channel with Flume is reported too slow for any production use cases at volume. |
6 | Kafka just provides messaging | Flume Provides number of pre built collectors |
7 | Flume’s main use-case is to ingest data into Hadoop. It is tightly integrated with Hadoop’s monitoring system, file system, file formats, and utilities such a Morphlines. A lot of the Flume development effort goes into maintaining compatibility with Hadoop. Sure, Flume’s design of sources, sinks and channels mean that it can be used to move data between other systems flexibly, but the important feature is its Hadoop integration. | Kafka’s main use-case is a distributed publish-subscribe messaging system. Most of the development effort is involved with allowing subscribers to read exactly the messages they are interested in, and in making sure the distributed system is scalable and reliable under many different conditions. It was not written to stream data specifically for Hadoop, and using it to read and write data to Hadoop is significantly more challenging than it is in Flume. |
8 | Use Flume if you have an non-relational data sources such as log files that you want to stream into Hadoop. | Use Kafka if you need a highly reliable and scalable enterprise messaging system to connect many multiple systems, one of which is Hadoop. |
Subscribe to:
Posts
(
Atom
)
About me
I am Java programmer with avid interest in Big data, hadoop & internet of things
Popular Posts
-
Here is list of some of the blogs you can follow to learn about Hadoop and big data. Developer Content Hortonworks Blog Cloudera Engineeri...
-
ID KAFKA FLUME 1 Kafka is a publish-subscribe model messaging system, which offers strong durability, scalabitity and fault-tolerance su...
-
Big Data and Big data analytics: Analytics: This is a technique, deriving insights from data Big Data: Huge data sets, which are diffic...
-
Where can I find huge data sets, is a question in front of every person who is aiming to develop, test or study big data analytics tools. H...
-
Huge Elephant representing as a big data Big Data: Big data a new buzzword in a market we are hearing all around now a days. What re...
-
A year ago, I had to start a POC on Hadoop and I had no idea about what Hadoop is. I would explain the way I started with and which helpe...
-
Capgemini Supertechies: Heathrow Challenge Challenge : Use SMAC (Social, Mobility, Analytics and Cloud) tools and provide solution to ch...
Follow me
twitter.com/abhijeetdhumal
Like us
Sponsor
Blog Archive
Powered by Blogger.
Hadoop
Hello World