Supply Chain Management Accounting, Finance, SPSS Statistics Project

SCM 200 Project 1The Lending Club is a lending company based on San Franscisco, CA. They connect borrowers with investors through an online marketplace. They have provided a publicly available data from 2007-2011. I have it under Project files and it is available on here as well (DO NOT HAVE TO CLICK ON THIS LINK) https://www.lendingclub.com/info/download-data.act... Please see the data with its data dictionary (This is in another separate excel sheet which explains what all the variables mean). You just got an interview as an analyst for the Lending Club. The client wants you to analyze this big amount of information.

1. Start by making initial observations of the data. What types of variables are present? Is there anything that catches your eye? A good analyst checks the data carefully. See the Quartz Guide to Bad Data on our slides.

2. Use at least two ways to summarize the qualitative data present in the data set with frequency distributions and the various graphs/charts we have used in the class for Chapter 2.

3. Do the same thing with the quantitative data present. These four ways should be different aspects from the data set. Interpret your results.

4. Pick two of the above graphs you chose and describe the shape of those distributions.

5. Why did you use the certain graphs you did? Are there any benefits over the other?

6. Now I want you to take two variables you think might be related. Create a scatterplot. Find the covariance, correlation and interpret the results.

7. For the 2 examples you chose on Step 4, give me the best central tendency measure you feel is right for the data sets. Then find their sample variances.

8. Create a box plot for me for one of the examples.

9. Depending on the distribution you get for Step 8, let me know where the limits of the observations lie within 2 standard deviations of the mean.

What does this mean in relation to the variable?Finally give me a summary of what you have discovered as a whole from this data set. You want the Lending Club to know that you are very interested in working with them. Give them something to think about. Due October 12th. Submit on Canvas. Send me whatever work you have done with Excel or any other tool you wish to use all in 1 document. Send me formulas/code used. DO NOT Handwrite the calculations. We will discuss tools in class.

SCM 200 Project 1

Supply Chain Management

The Lending Club

Initial Observations of the Data

The Lending Club data has 42535 observations per row and about 100 columns that outline the features of each observation within the data. The data can be used to provide an assessment of the possibility of providing a loan to a member of the club given the features of the member. A model can be constructed from the data to provide predictive analytics.

The data possess a variety of variables that fall under the numeric and the categorical categories. The numeric variables include the loan_amnt, funded_amnt, funded_amnt_inv and the annual_inc that are observations on the loan amount and the annual income for the individuals. There are cases of missing data points within the data which can be attributed to entry. The missing values have been left blank rather than been filled up with alternative values. From the data points, it can be hard to provide an assessment of what the missing data points should be. There are no duplicate observations, and the features that encompass dates are well represented and formatted for ease of analysis. The units for each of the features are well defined, for example the interest rates are in percentage, whereas features that are rates have been well documented to facilitate analysis. The categories of the data used are well represented and provide a clear over view of what they represent.

