Real-World Dataset and Not Fictious Dataset (Statistics Project Sample)
Each team must comprise of 4- 5people
• You must use a real-world dataset and not fictious dataset
• Select a dataset from any of the following portal
• You can pick multiple dataset and use joins functions in SQL, Excel or Tableau to create a compelling business problem
• Your dataset must contain at least 15 columns
o You should write a report, dashboard and a presentation
• Here’s an example of a sample report. “Making Brooklyn road safer” See attached e-conestoga
• You can get free slides from here: https://24slides.com/templates/view/data-tables-graphs-charts/survey-results-powerpoint-template
How to write your report
The data analysis report has two very important features:
• It is organized in a way that makes it easy for different audiences to skim/fish through it to find the topics and the level of detail that are of interest to them.
• The writing is as invisible/unremarkable as possible, so that the content of the analysis is what the reader remembers, not distracting quirks or tics in the writing.
Examples of distractions include:
• Extra sentences, overly formal or flowery prose, or at the other extreme overly casual or overly brief prose.– Grammatical and spelling errors.
• Placing the data analysis in too broad or too narrow a context for the questions of interest to your primary audience.
• Focusing on process rather than reporting procedures and outcomes.
• Getting bogged down in technical details, rather than presenting what is necessary to properly understand your conclusions on substantive questions of interest to the primary audience.
Structure of your report
Now let's consider the basic outline of the data analysis report in more detail:
1. Introduction. Good features for the Introduction include:
• Summary of the study and data, as well as any relevant substantive context, background, or framing issues. For example:
• The “big questions” answered by your data analyses, and summaries of your conclusions about these questions.
• Brief outline of remainder of paper.
• For example: “Making Brooklyn safe” using Traffic data analysis
2. Retrieving Data or Data source: State where you downloaded the data. Explain the metadata and attributes of your dataset.See Example;
• age (numeric)
• job : type of job (categorical: “admin”, “blue-collar”, “entrepreneur”, “housemaid”, “management”, “retired”, “self-employed”, “services”, “student”, “technician”, “unemployed”, “unknown”)
Predict variable (desired target):
• y — has the client subscribed a term deposit? (binary: “1”, means “Yes”, “0” means “No”)
4. Data Exploration: Have at least 5 insights in your report
In this format there is a single Body section, usually called “Analysis”, and then there is a subsection for each question raised in the introduction, usually taken in the same order as in the introduction (general to specific, decreasing order of importance, etc.).
Within each subsection, statistical method, analyses, and conclusion would be described (for each question). For example:
Here’s a sample of an analysis
2.1 Question or Metric 1: For example: How many calories do people eat at chipotle
2.2 Question or Metric 2: For example: Which bands come to ACL Fest
2.3 Question 3 or Metric 3: Exploring Ausin food critics
percentage of no subscription is 88.73458288821988
percentage of subscription 11.265417111780131
Our classes are imbalanced, and the ratio of no-subscription to subscription instances is 89:11. Before we go ahead to balance the classes, let’s do some more exploration.
1. The average age of customers who bought the term deposit is higher than that of the customers who didn’t.
2. The pdays (days since the customer was last contacted) is understandably lower for the customers who bought it. The lower the pdays, the better the memory of the last call and hence the better chances of a sale.
3. Surprisingly, campaigns (number of contacts or calls made during the current campaign) are lower for customers who bought the term deposit.
1. Other organizational formats are possible too. Whatever the format, it is useful to provide one or two well-chosen tables or graphs per question in the body of the report, for two reasons: First, graphical and tabular displays can convey your points more efficiently than words; and second, your “skimming” audiences will be more likely to have their eye caught by an interesting graph or table than by running text. However, too much graphical/tabular material will break up the flow of the text and become distracting; so extras should be moved to the Appendix.
2. Use the export function to attach snippets of your dashboard to your report
One or more appendices are the place to out details and ancillary materials. These might include such items as
• Technical descriptions of (unusual) statistical procedures
• Detailed tables or computer output
• Figures that were not central to the arguments presented in the body of the report
• Computer code used to obtain results.
In all cases, and especially in the case of computer code, it is a good idea to add some text sentences as comments or annotations, to make it easier for the uninitiated reader to follow what you are doing. It is often difficult to find the right balance between what to put in the appendix and what to put in the body of the paper. Generally, you should put just enough in the body to make the point, and refer the reader to specific sections or page numbers in the appendix for additional graphs, tables and other details.
Amazon Prime TV Shows and Movies Research for marketing and Revenue Increase
Name of Student
Name of Instructor
Name of Institution
Amazon Prime TV Shows and Movies Research for marketing and Revenue Increase
The data we used for this project is the titles.csv containing the titles of movies and TV shows on Amazon prime and their performance on various metrics. The questions that the project seeks to answer include; which are the most popular movies and shows on Amazon prime, which films and TV shows have the highest score and an increased number of votes, and which production countries' movies and TV shows are more popular among the viewers. The marketing team can use these questions to explore ways to convert online visitors and the public to subscribers.
The data was acquired from https://datasetsearch.research.google.com/ and supplied by Kaggle .com The data was acquired in May 2022 and is available in the united states. The dataset has fifteen columns and 9,871 rows.
The attributes of the
- Simple Linear Regression ModelsDescription: How To Fit Simple Linear Regression Models, Linear regression is the most basic and commonly used predictive analysis. Regression estimates are used to describe data and to explain the relationship....2 pages/≈550 words| 1 Source | APA | Mathematics & Economics | Statistics Project |
- Represent The Total Life SatisfactionDescription: The three chosen variables are “tlifesat,” “tslfest" and "agegp5," which represent the total life satisfaction, total self-esteem, and age-5-groups. The first two variables, that is, the variables that represent the total life satisfaction and the one that represents total self-esteem are measured in ...3 pages/≈825 words| 3 Sources | APA | Mathematics & Economics | Statistics Project |
- Regression Analysis: Students EnrollmentDescription: Evidence suggests that the rate of student’s enrollment in higher education institutions in the United States depends on various factors (Bailey & Dynarski, 2011). Some of the factors that have been suggested to have an influence on the students’ enrollment rate are level of income, cost of tuition, amount ...5 pages/≈1375 words| 4 Sources | APA | Mathematics & Economics | Statistics Project |