Essay Available:
You are here: Home → Statistics Project → IT & Computer Science
Pages:
1 page/≈275 words
Sources:
No Sources
Level:
APA
Subject:
IT & Computer Science
Type:
Statistics Project
Language:
English (U.S.)
Document:
MS Word
Date:
Total cost:
$ 5.4
Topic:
Data Visualization using R (Statistics Project Sample)
Instructions:
Data Visualiza+on
Quan%ta%ve Assignment Op%on
Preliminary steps:
• Literature and materials:
• Wickham & Grolemund (2017)
• ggplot2 Cheatsheet
• R Cheatsheet
• Presenta%on slides for weeks 9 & 10
• If you have not yet installed the tidyverse package, do so by typing the following
command into the RStudio console:
• Load the tidyverse package into RStudio, which comes with ggplot2 pre-installed:
• Download the ldt.RData file from OPAL and load in into your RStudio environment, using
either the file manager or by typing the code below into the console :1
The file will be saved in your RStudio environment as ldt.
If loading the dataset manually make sure to replace FILE/PATH with the loca%on of the file on your computer. 1
NB: If you’re a Windows user, don’t use backslashes as directory separators! Indicate the file loca%on using forward
slashes (e.g., C:/FILE/PATH/ldt.RData instead of C:FILEPATHldt.RData) or double backslashes (C:
FILE\PATH\ldt.RData).
1
Task 3
Consider the following dataset
• marvel.csv: data on characters from the Marvel universe (NOTE: the dataset uses the
semicolon ‘;’, not the comma, as separator!)
Load the dataset into RStudio. Explore your chosen dataset visually. Try out a variety of plot types
on a variety of different variables.
Choose your two favorite (or what you feel to be the most informa%ve) plots. Make sure the plots
have a meaningful %tle and the axes meaningful labels. Describe and interpret the plots in a few
sentences. Paste the code for them as well as the resul%ng plots below.
Tip
To create a new variable condi%oned by another variable, you can use the mutate() func%on,2
e.g.:
The above code creates a new variable called BMI for the starwars dataset, which is based on
the exis%ng variables mass and height. The result (a copy of the dataset now containing the new
variable) is assigned back to the starwars dataset variable to save it. Please note that the
mutate() func%on is part of the tidyverse package.
h^ps://dplyr.%dyverse.org/reference/mutate.html source..
Content:
Outliers & Missing Data
Student’s name:
Professor’s name:
Course number:
Date:
A histogram showing the frequency of appearance in the dataset. The first bar indicates the appearance of the data that has occurred frequently. The other bars are shorter, meaning that the data have less occurrence. The data in the histogram have a concentration of the appearance of 0 to 2000.
The graph displays a scatter plot where the x-axis represents the number of "appearances," and the y-axis represents the "year" in which those appearances occurred. The scatter plot of the data represents the occurrence of the individual data which have clustered, which suggests that certain in...
Get the Whole Paper!
Not exactly what you need?
Do you need a custom essay? Order right now:
Other Topics:
- Data VisualizationDescription: Data Visualization IT & Computer Science Statistics Project...14 pages/≈3850 words| 19 Sources | APA | IT & Computer Science | Statistics Project |