Sample data
The data used in this tutorial are the following random vectors. Copy and paste them in your console to run the examples below.
Scatter plot by group
If you have a grouping variable you can create a scatter plot by group passing the variable (as factor) to the col argument of the plot function, so each group will be displayed with a different color.
Change the default colors
If you want to change the default colors you can create a vector of colors and pass them to the function as in the following block of code.
Pleleminary tasks
Import your data into R as described here: Fast reading of data from txt|csv files into R: readr package.
Infos
This analysis has been performed using R statistical software (ver. 3.2.4).
How do outliers affect data?
Outliers in data can distort predictions and affect the accuracy, if you don’t detect and handle them appropriately especially in regression models.
Is an observation an outlier?
Declaring an observation as an outlier based on a just one (rather unimportant) feature could lead to unrealistic inferences. When you have to decide if an individual entity (represented by row or observation) is an extreme value or not, it better to collectively consider the features (X’s) that matter.
What is an outlier in a continuous variable?
For a given continuous variable, outliers are those observations that lie outside 1.5 * IQR, where IQR, the ‘Inter Quartile Range’ is the difference between 75th and 25th quartiles. Look at the points outside the whiskers in below box plot.
What is Cook's distance?
Cook’s distance is a measure computed with respect to a given regression model and therefore is impacted only by the X variables included in the model. But, what does cook’s distance mean? It computes the influence exerted by each data point (row) on the predicted outcome.
R plot pch symbols : The different point shapes available in R
Different plotting symbols are available in R. The graphical argument used to specify point shapes is pch.
Plotting symbols
The different points symbols commonly used in R are shown in the figure below :
Examples
The following arguments can be used to change the color and the size of the points :