i
Exploring R
Evolution of R
Programming Features of R
R for Machine Learning
R for Data Analysis
Application of R
R vs. Python vs. SAS
R vs. Excel vs.Tableau
Install R base on Windows
Install R Studio on Windows
Install R base on Ubuntu
Install R Studio on Ubuntu
R Starter
First R Program
Working with R Packages
R Workplace and R Sessions
Manage working directory
Customize R studio
RStudio Debugger
RStudio History and Environment variables
R Syntax
R Variables
R Data Types & Structures
R Arithmetic Operators
R Logical Operators
R If Statement
R - If…Else Statement
If…else if…Else Statement
R for loop
R while loop
R repeat loop
R String Construction
R String Manipulation Functions
Creating Character Strings
R Functions
R built-in functions
Working with Vector
R Vector Indexing
R Vector Modification
R Arithmetic Vector Operations
R Lists
Access List elements (List Slicing)
List modification
R Matrix construction
Access Matrix elements
R Matrix Modification
R Matrix Operations
R Array Construction
Accessing Array Elements
Manipulating Array Elements
R Data Frames
Data Extraction
Data Frame Expansion
R Built-in Data frames
R Factors
Manage Factor levels
Factor Functions
R Contingency Tables
R Data Visualization
R – Charts and Graphs
R Density Plot
R Strip Charts
R Boxplots
R Violin Plots
R Bar Charts
R Pie Charts
R Area Plots
R Time Series
Graphics with ggplot2
Ggplot2 Structure
ggplot2 Bar Charts
ggplot2 Pie Chart
ggplot2 Area Plot
ggplot2 Histogram
ggplot2 Scatter Plot
ggplot2 Box Plot
Mean & Median
Standard Deviation
Normal Distribution
Correlation
T-Tests
Chi-Square Test
ANOVA Test
Survival Analysis
Data Pre-processing and Missing Value Analysis
Missing data treatment
Missing value analysis with mice package
Outlier Analysis
Problems with outliers
Outlier Detection
Outlier Treatment
Simple Linear Regression
Mathematical Computation
Linear Regression in R
A complete Simple Regression Analysis
Multiple Linear Regression
Mathematical Analysis
Model Interpretation
A complete Multiple Regression Analysis
Logistic Regression
Mathematical Computation in R
Logistic Regression in R
Heart Risk Analysis using LR
Support Vector Machine
Heart Risk Analysis using SVM
Decision Trees
Random Forest
K means Clustering
Big data Analytics using R-Hadoop
RHADOOP Packages:
rJava: Low-Level R to Java Interface
rhdfs: Integrate R with HDFS
rmr2: MapReduce job in R
plyrmr: Data Manipulation with MapReduce job
rhbase: Integrate HBase with R
Environment setup for RHADOOP
Getting Started with RHADOOP
Data Pre-processing
In this chapter, we will discuss Missing Value Analysis and Outlier Analysis and their treatment in detail.
Missing Value Analysis
In general missing data means incomplete data; whatever the reasons are, its incomplete. In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation.In R, missing values are represented by:
NA:Which means Not Available. In a data set missing data points are represented by NA.
NULL: Null is used whenever there is a need to indicate or specify that an object is missing.
NaN: It means not a number and is for arithmetic purposes. Usually, NaN comes from 0/0.
Inf: Like NaN, Inf is also produced by numerical computation such as 1/0. Inf is not an NA. In fact, it is a very large number.
Why Missing value?
In this section, we are going to explain the reasons for Missing data:
Missing data mechanism
This section is especially for Missing data mechanism:
Missing completely at Random (MCAR): This means the missing of a particular value has nothing to do with the hypothetical value and the values of other variables
Example: Removed 5% of medical survey data
Missing at Random (MAR): This means that the propensity for a data point to be missing is not related to the missing data, but it is related to some of the observed data
Example: People who come from poor families might be less inclined to answer questions about drug use, and so the level of drug use is related to the family income.
Missing not at Random (MNAR): Two possible reasons are that the missing value depends on the hypothetical value or missing value is dependent on some other variable’s value.
Example: Students skipped the question on drug use because they feared that they would be expelled from school.
Problem with Missing data
The main problems we face with missing data in data analysis are:
Don't miss out!