Hadoop becomes even better when good components are connected to it. Apache Spark, a Hadoop-based data processing engine designed for both batch and streaming workloads, Spark runs on top of existing Hadoop clusters to provide enhanced and additional functionality.
This project will provide exposure to participants to build Big Data Analytics end to end data pipeline from scratch. Analyze batch data and provide insights about energy usage pattern in the last couple of years. Identify Faulty Meters which require attention. Consuming energy over and their sanctioned load.
Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation, written in Scala and Java. It is a community distributed event streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log.
This is an advanced training course on some of key Big Data projects i.e. YARN, Hive, HBase, Spark Core, Spark SQL, Spark Streaming, Kafka Core, Kafka Connect, Kafka Streams, Ni-Fi, Druid and Apache Atlas. During the course, participants will learn Scala programming language ..
Learn how to work with HDFS, Understand the purpose of HBase and use HBase with other ecosystem projects, Create tables, insert, read and delete data from HBase, Get an all-round understanding of Kudu and its role in the Hadoop ecosystem, Understand the role of Spark