Spark with Python | Event in Bengaluru | Townscript
Spark with Python | Event in Bengaluru | Townscript

Spark with Python

Dec 02 - 31 '19 | 12:00 PM (IST)

Event Information

Introduction to Big Data and Distributed Computing :
Big data analysis is future. This section of course will help you to understand, the need of distributed computation. 
Introduction to data.
Data Science a vision.
Big data Introduction.
Parallel computation.
Problem with parallel computation.
Traditional parallel computation systems.

Hadoop :
Introduction to Hadoop.
Hadoop Components.
HDFS and its architecture.
HDFS Commands
◦ mkdir
◦ ls 
◦ rmdir and rm 
◦ copyFromLocal 
◦ put 
◦ cat 
◦ copyToLocal 
◦ get 
◦ touchz 
◦ mv 
◦ cp 
◦ distcp 
◦ etc…... 
fsimage and edits log files.
Hadoop property files.
Introduction to MapReduce.
Shortcoming of MapReduce.
Python : Refresher
Introduction to Python 
Jupytor
Python variables and Data Type.
Operators in Python.
Interactive mode and script base programming introduction
Python Collections (List, Dictionaries etc)
Control Flow and looping in Python
Functions in Python (Declaration, Definition Types and calling)
Object oriented Python.
NumPy
Spark Introduction :
Introduction to Spark.
Spark and Hadoop (Similarity and Differences)
Spark Execution (Master Slave System , Drive, Driver manager and Executors)
Spark Shell
Resilient Distributed dataSet (RDD)

Operations On RDD :
Creation of RDD
Transformation and Action Introduction
Lazy evaluation
Some Important Transformation :
filter
map
flatMap
distinct
sample
union
intersection
subtract
cartesian
Some Important Action
first
take
top
reduce
fold
aggregate
foreach
count
collect
Creation of Paired RDD
Some important Transformation on pairRDD
combineBy
mapValues
groupByKeys
reduceByKeys
sortByKeys
subsractByKey
Joines and their Type
cogroup
Some Important action on pair RDD
lookUp
collectAsMap
countByKey
Hands on all the functions
Fault tolerance and Persistence :
RDD lineage
persistence
Benefit of persistence
Optimizing Spark program
Introduction to partitioning
Inbuilt partitioners (Hash and Range)
Benefits of partitioning
groupByKey and reduceBykey comparison
Spark broadcasting and accumulators
IO in Spark :
TextFile
Csv File
JSON
Data From HDFS
Spark Streaming :
Introduction to Spark Streaming
Transformation
Reading from HDFS
Window Concept
Push Based Receiver and Pull Based receiver
Kafka integration with Streaming.
Performance
SparkSQL.
Introduction to SparkSQL
SparkSQL datatype
DataFrame an Introduction.
Creation of a dataframe.
Summary statistics on DataFrame.
Aggregation on Given Data.
Data joining.
SparkSQL and SQL
Introduction to Hive.
Using data from Hive and HiveQL.
Optimizing SparkSQL code.
Spark Code Deployment and cluster managers.
 
Submitting Spark code in local mode
Submitting Spark code on StandAlone cluster manager.
Submitting Spark code on YARN
Submitting Spark code on Mesos

Note : Every part of course will be associated with hands on . A number of objective questions will always help you in scratch your brain.

Projects :

Project 1 : Spark core can be used for data preparation and aggregation. Aggregation will be implemented using Spark core APIs.
For data aggregation movie lance data will be used.

Project 2 : Implementing streaming data word frequency visualization. using Kafka and Spark streaming integration.

Project 3 : Implementation of Moving average using SparkSQL.

Project 4 : Data preprocessing, data manipulation and aggregation using SparkSQL. It will be done using Real time data.

Venue

BTM 2nd Stage
773,3rd Floor, 7th cross 16th main, Bengaluru, India
Walsoul Pvt Lt cover image
Walsoul Pvt Lt profile image
Walsoul Pvt Lt
Joined on Apr 10, 2019
Have a question?
Send your queries to the event organizer
Walsoul Pvt Lt profile image
CONTACT ORGANIZER
EVENT HAS ENDED
BOOK NOW
Have a question?
Send your queries to the event organizer
Walsoul Pvt Lt profile image
CONTACT ORGANIZER