top button
Flag Notify
    Connect to us
      Facebook Login
      Site Registration Why to Join

    Get Free Article Updates

Facebook Login
Site Registration

Small Introduction About Kafka?

0 votes

What is Kafka ?

Kafka is used for real-time streams of data, used to collect big data or to do real time analysis or both). Kafka is used with in-memory microservices to provide durability and it can be used to feed events to CEP (complex event streaming systems), and IOT/IFTTT style automation systems.

Kafka is often used in real-time streaming data architectures to provide real-time analytics. Since Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system, Kafka is used in use cases where JMS, RabbitMQ, and AMQP may not even be considered due to volume and responsiveness. 

Kafka has higher throughput, reliability, and replication characteristics, which makes it applicable for things like tracking service calls (tracks every call) or tracking IoT sensor data where a traditional MOM might not be considered.

Kafka can work with Flume/Flafka, Spark Streaming, Storm, HBase, Flink, and Spark for real-time ingesting, analysis and processing of streaming data. Kafka is a data stream used to feed Hadoop BigData lakes. Kafka brokers support massive message streams for low-latency follow-up analysis in Hadoop or Spark.

Kafka has operational simplicity. Kafka is to set up and use, and it is easy to reason how Kafka works. However, the main reason Kafka is very popular is its excellent performance. It has other characteristics as well, but so do other messaging systems. 

Kafka has great performance, and it is stable, provides reliable durability, has a flexible publish-subscribe/queue that scales well with N-number of consumer groups, has robust replication, provides Producers with tunable consistency guarantees, and it provides preserved ordering at shard level (Kafka Topic Partition). 

In addition, Kafka works well with systems that have data streams to process and enables those systems to aggregate, transform & load into other stores. But none of those characteristics would matter if Kafka was slow. The most important reason Kafka is popular is Kafka’s exceptional performance.

Video for Kafka?

posted Dec 26, 2017 by Madhavi Latha

  Promote This Article
Facebook Share Button Twitter Share Button Google+ Share Button LinkedIn Share Button Multiple Social Share Button

Related Articles

What is Apache SINGA?

SINGA is an Apache Incubating project for developing an open source machine learning library. It provides a flexible architecture for scalable distributed training, is extensible to run over a wide range of hardware, and has a focus on health-care applications.

SINGA was initiated by the DB System Group at National University of Singapore in 2014, in collaboration with the database group of Zhejiang University.

SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. A variety of popular deep learning models are supported, namely feed-forward models including convolutional neural networks (CNN), energy models like restricted Boltzmann machine (RBM), and recurrent neural networks (RNN). Many built-in layers are provided for users. SINGA architecture is sufficiently flexible to run synchronous, asynchronous and hybrid training frameworks. SINGA also supports different neural net partitioning schemes to parallelize the training of large models, namely partitioning on batch dimension, feature dimension or hybrid partitioning.

The second goal is to make SINGA easy to use. It is non-trivial for programmers to develop and train models with deep and complex model structures. Distributed training further increases the burden of programmers, e.g., data and model partitioning, and network communication. Hence it is essential to provide an easy to use programming model so that users can implement their deep learning models/algorithms without much awareness of the underlying distributed platform.

Video for Apache SINGA


What is Seaborn?
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.


  • A dataset-oriented API for examining relationships between multiple variables
  • Specialized support for using categorical variables to show observations or aggregate statistics
  • Options for visualizing univariate or bivariate distributions and for comparing them between subsets of data
  • Automatic estimation and plotting of linear regression models for different kinds dependent variables
  • Convenient views onto the overall structure of complex datasets
  • High-level abstractions for structuring multi-plot grids that let you easily build complex visualizations
  • Concise control over matplotlib figure styling with several built-in themes
  • Tools for choosing color palettes that faithfully reveal patterns in your data

Seaborn aims to make visualization a central part of exploring and understanding data. Its dataset-oriented plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots.

Example Code

import seaborn as sns
tips = sns.load_dataset("tips")
sns.relplot(x="total_bill", y="tip", col="time",
            hue="smoker", style="smoker", size="size",

Video for Seaborn


What is Mlpack Library?

mlpack is a C++ machine learning library with emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. 

This is done by providing a set of command-line executables which can be used as black boxes, and a modular C++ API for expert users and researchers to easily make changes to the internals of the algorithms.

As a result of this approach, mlpack outperforms competing machine learning libraries by large margins; see the BigLearning workshop paper and the benchmarks for details.

mlpack is developed by contributors from around the world. It is released free of charge, under the 3-clause BSD License (more information). (Versions older than 1.0.12 were released under the GNU Lesser General Public License: LGPL, version 3.)

mlpack was originally presented at the BigLearning workshop of NIPS 2011 [pdf] and later published in the Journal of Machine Learning Research [pdf], with version 3 being published in the Journal of Open Source Software [pdf]. Please cite mlpack in your work using this citation.

mlpack bindings for R are provided by the RcppMLPACK project.

Currently mlpack supports the following algorithms:

  • Collaborative Filtering
  • Decision stumps (one-level decision trees)
  • Density Estimation Trees
  • Euclidean Minimum Spanning Trees
  • Gaussian Mixture Models (GMMs)
  • Hidden Markov Models (HMMs)
  • Kernel Principal Component Analysis (KPCA)
  • K-Means Clustering
  • Least-Angle Regression (LARS/LASSO)
  • Linear Regression
  • Local Coordinate Coding
  • Locality-Sensitive Hashing (LSH)
  • Logistic regression
  • Max-Kernel Search
  • Naive Bayes Classifier
  • Nearest neighbor search with dual-tree algorithms
  • Neighbourhood Components Analysis (NCA)
  • Non-negative Matrix Factorization (NMF)
  • Principal Components Analysis (PCA)
  • Independent component analysis (ICA)
  • Rank-Approximate Nearest Neighbor (RANN)
  • Simple Least-Squares Linear Regression (and Ridge Regression)
  • Sparse Coding, Sparse dictionary learning

For more detail visit here -

Video for Mlpack


What is PyShark?

PyShark is a wrapper for the Wireshark CLI interface, tshark, so all of the Wireshark decoders are available to PyShark!

Python wrapper for tshark, allowing python packet parsing using wireshark dissectors.

There are quite a few python packet parsing modules, this one is different because it doesn't actually parse any packets, it simply uses tshark's (wireshark command-line utility) ability to export XMLs to use its parsing.

This package allows parsing from a capture file or a live capture, using all wireshark dissectors you have installed. Tested on windows/linux.

Example Code for Reading a File

import pyshark
cap = pyshark.FileCapture('/tmp/mycapture.cap')
>>> <FileCapture /tmp/mycapture.cap>
print cap[0]
Packet (Length: 698)
Layer ETH:
        Destination: aa:bb:cc:dd:ee:ff
        Source: 00:de:ad:be:ef:00
        Type: IP (0x0800)
Layer IP:
        Version: 4
        Header Length: 20 bytes
        Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
        Total Length: 684
        Identification: 0x254f (9551)
        Flags: 0x00
        Fragment offset: 0
        Time to live: 1
        Protocol: UDP (17)
        Header checksum: 0xe148 [correct]

Video for PyShark

Contact Us
+91 9880187415
#280, 3rd floor, 5th Main
6th Sector, HSR Layout
Karnataka INDIA.