PYSpark for Big data Course
PySpark is a general-purpose, in-memory, distributed processing
engine that allows you to process data efficiently in a distributed
fashion. Applications running on PySpark are 100x faster than
traditional systems. You will get great benefits using PySpark for
data ingestion pipelines.
- Interactive training for better learning
- Pre-evaluation learn only what you need to learn
- Experienced and certified trainer
- Convenient weekday and weekend Batches available Demo.
- Timings for classes are arranged upon Flexibility of both the trainee and trainer.
- Access to the recorded videos which you have attended.
Introduction:
Starting Course
Validate and Explore:
-
12Dataframe Essentials Concept Review Quiz
-
13A little something to keep you going....
-
14Read, Write and Validate Dataframes Code Along Activity
-
15Read, Write and Validate Data HW
-
16Read, Write and Validate Data HW Solutions Code Review
-
17A little something to keep you going....
-
18Search and Filter Dataframes Code Along Activity
-
19Search and Filter Dataframes HW
-
20Search and Filter Dataframes HW Solution Code Review
-
21A little something to keep you going....
-
22SQL Options in Spark/PySpark Code Along Activity
-
23SQL Options in Spark/PySpark HW
-
24SQL Options in Spark/PySpark HW Solutions
After Intro
-
25Realistic Graphic on UE4
-
26Volta GPU for optimization.
The Tensor Core GPU Architecture designed to Bring AI to Every Industry. Equipped with 640 Tensor Cores, Volta delivers over 100 teraflops per second (TFLOPS) of deep learning performance, over a 5X increase compared to prior generation NVIDIA Pascal architecture.
-
27Deep Learning
Aggregate:
-
28Manipulating Dataframes Code Along Activity
-
29Manipulating Dataframes HW
-
30Manipulating Dataframes HW Solution
-
31A little something to keep you going....
-
32Aggregating Data in Dataframes Code Along Activity
-
33Aggregating Data in Dataframes HW
-
34Aggregating Data in Dataframes HW Solution
-
35A little something to keep you going....
-
36Joining and Appending Dataframes Code Along Activity
-
37Joining and Appending Dataframes HW
-
38Joining and Appending Dataframes HW Solution Code Review
-
39A little something to keep you going....
-
40Handling Missing Data in Dataframes Code Along Activity
-
41Handling Missing Data in Dataframes HW
-
42Handling Missing Data in Dataframes HW Solution
-
43Dataframe Essentials Coding Master Review
Introduction to MLlib:
Classification in MLib:
-
48Introduction to Classification in MLlib Concept Review
-
49Classification in MLlib Quiz
-
50Classification in MLlib Quiz
-
51Classification in MLlib Code Along Part 1: Data Formatting and Transformations
-
52Classification in MLlib Code Review Part 2.0: Train and Evaluate Models [Intro]
-
53Classification in MLlib Code Review Part 2.0: Train and Evaluate Models [Intro]
-
54Classification in MLlib Code Review Part 2.2: Train & Test Models [1 vs Rest]
-
55A little something to keep you going....
-
56Classification in MLlib Code Review Part 2.3: Train & Test Models[Multilayer PC]
-
57Classification in MLlib Code Review Part 2.4: Train & Test Models [Naive Bayes]
-
58Classification in MLlib Code Review Part 2.5: Train & Test Models [Linear SVM]
-
59Classification in MLlib Code Review Part 2.6: Train & Test Models[Decision Tree]
-
60Classification in MLlib Code Review Part 2.7: Train & Test Models[Random Forest]
-
61Classification in MLlib Code Review Part 2.8: Train & Test Models [GBT]
-
62Classification Project
-
63Remember to be creative with this project!
-
64Classification Project Solution
Kafka UI:
Natural Language Processing in Mlib:
-
66Introduction to Natural Language Processing Quiz
-
67Natural Language Processing Concept Review [Part 1: Feature Transformers]
-
68Natural Language Processing Concept Review [Part 2: Feature Extractors]
-
69Natural Language Processing Feature Extractors Quiz
-
70Natural Language Processing Code Along Activity Part 1: Data Prep
-
71Natural Language Processing Code Along Activity Part 2: Vectorize, Train & Eval
-
72Natural Language Processing Project
-
73Natural Language Processing Project Solution
Regression in Mlib:
-
74Regression in MLlib Concept Review
-
75Regression in PySpark's MLlib
-
76Regression in MLlib Code Review Introduction
-
77Regression in MLlib Code Review Part 1: Data Prep
-
78Regression in MLlib Code Review Part 2.0: Linear Regression
-
79A little something to keep you going....
-
80Regression in MLlib Code Review Part 2.1: Decision Tree Regression
-
81Regression in MLlib Code Review Part 2.2: Random Forest Regression
-
82Regression in MLlib Code Review Part 2.3: Gradient Boosted Tree Regression
-
83A little something to keep you going....
-
84BONUS: Add loop functions to your regression training and evaluation script
-
85Regression Project
-
86And finally... have FUN with this project and LOVE what you do!
-
87Regression Project Solution Code Along Activity
Clustering in Pyspark:
-
88Intro to Clustering in MLlib Concept Review
-
89Clustering Concept Review Quiz
-
90K-Means & Bisecting K-Means in MLlib Code Along Activity
-
91Latent Dirichlet Allocation in MLlib Code Along Activity
-
92A little something to keep you going....
-
93Gaussian Mixture Modeling in MLlib Code Along Activity
-
94Clustering Project Introduction
-
95Clustering Project Solution Code Review
Frequent Pattern mining in MLib:
-
96Frequent Pattern Mining in MLlib Concept Review
-
97Frequent Pattern Mining Concept Quiz
-
98Frequent Pattern Mining Code Along Activity [Part 1: FPGrowth]
-
99Frequent Pattern Mining Code Along Activity [Part 2: PrefixSpan]
-
100A little something to keep you going....
-
101Frequent Pattern Mining Project Introduction
-
102Frequent Pattern Mining Project Solution Code Review
Apache Spark is an open-source real-time in-memory cluster processing framework. It is used in streaming analytics systems such as bank fraud detection system, recommendation system, etc. Whereas Python is a general-purpose, high-level programming language. It has a wide-range of libraries which supports diverse types of applications. PySpark is a combination of Python and Spark. It provides Python API for Spark that lets you harness the simplicity of Python and the power of Apache Spark in order to tame Big Data.
Your access to the Support Team is for lifetime and will be available 24/7. The team will help you in resolving queries, during and after the course
"You will never miss a lecture at MITS You can choose either of the two options:
View the recorded session of the class available in your LMS.
You can attend the missed session, in any other live batch."
View the recorded session of the class available in your LMS.
You can attend the missed session, in any other live batch."
To help you in this endeavor, we have added a resume builder tool in your LMS. Now, you will be able to create a winning resume in just 3 easy steps. You will have unlimited access to use these templates across different roles and designations. All you need to do is, log in to your LMS and click on the "create your resume" option.
Yes, the access to the course material will be available for lifetime once you have enrolled into the course.
We have limited number of participants in a live session to maintain the Quality Standards. So, unfortunately, participation in a live class without enrollment is not possible. However, you can go through the sample class recording and it would give you a clear insight into how are the classes conducted, quality of instructors and the level of interaction in a class.
All the instructors at MITS are practitioners from the Industry with minimum 10-12 yrs of relevant IT experience. They are subject matter experts and are trained by MITS for providing an awesome learning experience to the participants.
RDD stands for Resilient Distributed Dataset which is the building block of Apache Spark. RDD is fundamental data structure of Apache Spark which is an immutable distributed collection of objects. Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster.
PySpark is not a language. PySpark is Python API for Apache Spark using which Python developers can leverage the power of Apache Spark and create in-memory processing applications. PySpark is developed to cater the huge amount of Python community.
Be the first to add a review.
Please, login to leave a review