# Python® for Data Science for Beginners

(PYTHON-DS.AE1)/ISBN:978-1-64459-462-9

This course includes

Lessons

TestPrep

Hand-on Lab

#### Lessons

23+ Lessons | 40+ Exercises | 170+ Quizzes | 76+ Flashcards | 76+ Glossary of terms

#### TestPrep

#### Hand on lab

30+ LiveLab | 15+ Video tutorials | 24+ Minutes

Need guidance and support? __Click here to check our Instructor Led Course__.

# Here's what you will learn

Download Course Outline### Lessons 1: Introduction

- About This Course
- False Assumptions
- Icons Used in This Course
- Where to Go from Here

### Lessons 2: Discovering the Match between Data Science and Python

- Defining the Sexiest Job of the 21st Century
- Creating the Data Science Pipeline
- Understanding Python’s Role in Data Science
- Learning to Use Python Fast

### Lessons 3: Introducing Python’s Capabilities and Wonders

- Why Python?
- Working with Python
- Performing Rapid Prototyping and Experimentation
- Considering Speed of Execution
- Visualizing Power
- Using the Python Ecosystem for Data Science

### Lessons 4: Setting Up Python for Data Science

- Considering the Off-the-Shelf Cross-Platform Scientific Distributions
- Installing Anaconda on Windows
- Installing Anaconda on Linux
- Installing Anaconda on Mac OS X
- Downloading the Datasets and Example Code

### Lessons 5: Working with Google Colab

- Defining Google Colab
- Getting a Google Account
- Working with Notebooks
- Performing Common Tasks
- Using Hardware Acceleration
- Executing the Code
- Viewing Your Notebook
- Sharing Your Notebook
- Getting Help

### Lessons 6: Understanding the Tools

- Using the Jupyter Console
- Using Jupyter Notebook
- Performing Multimedia and Graphic Integration

### Lessons 7: Working with Real Data

- Uploading, Streaming, and Sampling Data
- Accessing Data in Structured Flat-File Form
- Sending Data in Unstructured File Form
- Managing Data from Relational Databases
- Interacting with Data from NoSQL Databases
- Accessing Data from the Web

### Lessons 8: Conditioning Your Data

- Juggling between NumPy and pandas
- Validating Your Data
- Manipulating Categorical Variables
- Dealing with Dates in Your Data
- Dealing with Missing Data
- Slicing and Dicing: Filtering and Selecting Data
- Concatenating and Transforming
- Aggregating Data at Any Level

### Lessons 9: Shaping Data

- Working with HTML Pages
- Working with Raw Text
- Using the Bag of Words Model and Beyond
- Working with Graph Data

### Lessons 10: Putting What You Know in Action

- Contextualizing Problems and Data
- Considering the Art of Feature Creation
- Performing Operations on Arrays

### Lessons 11: Getting a Crash Course in MatPlotLib

- Starting with a Graph
- Setting the Axis, Ticks, Grids
- Defining the Line Appearance
- Using Labels, Annotations, and Legends

### Lessons 12: Visualizing the Data

- Choosing the Right Graph
- Creating Advanced Scatterplots
- Plotting Time Series
- Plotting Geographical Data
- Visualizing Graphs

### Lessons 13: Stretching Python’s Capabilities

- Playing with Scikit-learn
- Performing the Hashing Trick
- Considering Timing and Performance
- Running in Parallel on Multiple Cores

### Lessons 14: Exploring Data Analysis

- The EDA Approach
- Defining Descriptive Statistics for Numeric Data
- Counting for Categorical Data
- Creating Applied Visualization for EDA
- Understanding Correlation
- Modifying Data Distributions

### Lessons 15: Reducing Dimensionality

- Understanding SVD
- Performing Factor Analysis and PCA
- Understanding Some Applications

### Lessons 16: Clustering

- Clustering with K-means
- Performing Hierarchical Clustering
- Discovering New Groups with DBScan

### Lessons 17: Detecting Outliers in Data

- Considering Outlier Detection
- Examining a Simple Univariate Method
- Developing a Multivariate Approach

### Lessons 18: Exploring Four Simple and Effective Algorithms

- Guessing the Number: Linear Regression
- Moving to Logistic Regression
- Making Things as Simple as Naïve Bayes
- Learning Lazily with Nearest Neighbors

### Lessons 19: Performing Cross-Validation, Selection, and Optimization

- Pondering the Problem of Fitting a Model
- Cross-Validating
- Selecting Variables Like a Pro
- Pumping Up Your Hyperparameters

### Lessons 20: Increasing Complexity with Linear and Nonlinear Tricks

- Using Nonlinear Transformations
- Regularizing Linear Models
- Fighting with Big Data Chunk by Chunk
- Understanding Support Vector Machines
- Playing with Neural Networks

### Lessons 21: Understanding the Power of the Many

- Starting with a Plain Decision Tree
- Making Machine Learning Accessible
- Boosting Predictions

### Lessons 22: Ten Essential Data Resources

- Discovering the News with Subreddit
- Getting a Good Start with KDnuggets
- Locating Free Learning Resources with Quora
- Gaining Insights with Oracle’s Data Science Blog
- Accessing the Huge List of Resources on Data Science Central
- Learning New Tricks from the Aspirational Data Scientist
- Obtaining the Most Authoritative Sources at Udacity
- Receiving Help with Advanced Topics at Conductrics
- Obtaining the Facts of Open Source Data Science from Masters
- Zeroing In on Developer Resources with Jonathan Bower

### Lessons 23: Ten Data Challenges You Should Take

- Meeting the Data Science London + Scikit-learn Challenge
- Predicting Survival on the Titanic
- Finding a Kaggle Competition that Suits Your Needs
- Honing Your Overfit Strategies
- Trudging Through the MovieLens Dataset
- Getting Rid of Spam E-mails
- Working with Handwritten Information
- Working with Pictures
- Analyzing Amazon.com Reviews
- Interacting with a Huge Graph

# Hands-on LAB Activities

### Conditioning Your Data

- Checking the Version of Pandas
- Creating Categorical Variables
- Finding the Missing Data
- Encoding Missingness
- Sorting and Shuffling
- Creating n-grams
- Calculating TF-IDF
- Modifying Graphs Using NetworkX
- Creating an Adjacency Matrix Using NetworkX
- Defining a Plot
- Creating a Line Plot
- Creating a Legend
- Creating a Pie Chart
- Creating a Scatterplot
- Creating an Undirected Graph
- Using Parallel Coordinates
- Calculating Descriptive Statistics
- Visualizing the Validation Curve
- Visualizing a Subset of Images
- Adding New Cases and Variables

### Shaping Data

- Extracting a Telephone Number

### Putting What You Know in Action

- Using Vectorization
- Performing Matrix Multiplication

### Stretching Python’s Capabilities

- Building a Predictor

### Exploring Data Analysis

- Loading the Iris Dataset

### Reducing Dimensionality

- Creating a Numpy Array

### Clustering

- Understanding Centroid-Based Algorithms

### Exploring Four Simple and Effective Algorithms

- Using K-Nearest Neighbors and PCA

### Performing Cross-Validation, Selection, and Optimization

- Loading the Boston Housing Dataset

### Understanding the Power of the Many

- Optimizing the Depth of Decision Tree