Abstract:
This paper describes an overview of apache Flink a data processing framework or tool developed by Apache organization. In this paper I provide some insights on apache Flink on how it works and how efficient it is compared to other data processing tools available out there. This paper also explains the need of such kind of tools and the importance of using these tools in order to make data driven decisions or getting to know more from Big Data. To process Big data several techniques have been developed over time like hadoop, map reduce, etc. Apache Flink is an open source stream processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data flow engine written in Java and Scala [1] . This paper provides some contents for which a developer or data processing manager should be aware of when using apache Flink or implementing it to process huge amount of data flowing into the system at a high rate.
Introduction
Big Data can be considered as a vast amount of Data. In technical terms it is such a big amount of data which escapes beyond the processing capacity of any database system. The amount of data is so huge it becomes a problem to store or process it in a traditional database system. Whole amount of data together forms a very complex structure, it becomes difficult to iterate and gain useful information through it. The need for distributed data processing frameworks is growing tremendously because of the increase in demand and analysis achieved through data processing. There are basically two well-known data processing tools with API for data batches and data streaming Apache Flink and Apache Spark. This paper comprises of insights of how Apache Flink works and it’s comparison with other tools like Apache Spark or Apache Beam. Further this paper also consists of some statistics gathered by running a single node cluster for Apache Flink and Apache Spark and running the word count example. A very easy to read comparison between different data processing frameworks is provided in a tabular format which can be very useful for anyone who is deciding on which tool to use according to their application.
For full paper click here
