LAB PROJECT #1
An understanding of the consumer decision-making process and pattern of the consumer choice process in supermarkets is an important information for the management of supermarket chains around the country such as for example, Kroger, Wall–Mart, Food City, etc. which are local food stores in metropolitan area of Knoxville, Tennessee. There are many other different stores elsewhere around the country.
A consumer typology for food has become complex based on health considerations of the customers. This adds a new challenge to the supermarket chains in terms of inventorying the most purchased items in the supermarkets. Also, food preferences are culturally bound and socially influenced.
Consider the SUPERMARKETS_DATA.xlsx data set collected on n =150 customers based on p =15 variables or items. The description of these variables (or items) are as follows.
- x1 = Monthly amount spent (amspent)
- x2 = Meat expenditure (meat)
- x3 = Fish expenditure (fish)
- x4 =Vegetable expenditure (vegetabl)
- x5 =%Spent in own-brand product(ownbrand)
- x6 =Own a car(car)
- x7 =%Spent in organic food(organic)
- x8 = Vegitarian [Vegitari]
- x9 = Household size (housesiz)
- x10 =Number of kids (kids)
- x11 =Weekly TV watching(hours)[TV]
- x12 =Weekly radio listening(hours)[Radio]
- x13 =Surf the web[Web]
- x14 =Yearly Household income[Income]
- x15 =Age of respondent [Age]
QUESTIONS:
(a) Identify the type of the above variables and classify them according to their types. Is this data set considered a mixed data type? Why? Explain briefly.
(b) Consider the continuous variables in this data set alone and carry out exploratory data analysis to visually investigate the distributional structure of the variables. Construct their histograms. What do you observe? Do the distributions of these variables follow a normal curve?
(c) Use the following MATLAB commands to obtain the bandwidth of the smoothing parameter for the probability density estimates for each of the continuous variables.
rng(’default’) %for reproducibility
%Input data from an Excel File
SupData= xlsread(’SUPERMARKETS_DATA.xlsx’); amspent=SupData(:,3) %You can change this to
%read the other continuous variables
[f,xi,bw]= ksdensity(amspent);figure;
plot(xi,f)
bw % print the bandwidth
Give the plots of the density estimators for each of the continuous variables along with their bandwidth. What do you observe?
(d) If you were assuming a probability distribution to fit to the continuous variables, then estimate the parameters (i.e., the mean and standard deviation, etc…) of these distributions for each continuous variables and summarize your results in a table form using the following mle (maximum likelihood estimation) command in MATLAB, say for example, for the Monthly Amount Spent (amspent) variable for different distributions.
[PHAT, PCI]= mle(amspent,’distribution’,’normal’); [PHAT, PCI]= mle(amspent,’distribution’,’lognormal’); [PHAT, PCI]= mle(amspent,’distribution’,’exponential’); [PHAT, PCI]= mle(amspent,’distribution’,’gamma’); [PHAT, PCI]= mle(amspent,’distribution’,’weibul’);
(e) Consider the Monthly Amount Spent (amspent) as your response variable and construct a new ANOVA data structure for different Supermarkets as your groups. Provide this data in Excel format and carry out your ANOVA hyposthesis of
at α(alpha)=5%. Do you accept or reject the null hypothesis? Construct a boxplot to show graphically which supermarkets are similar in terms of revenues earned.
(f) What can you say about the scales of these variables? Construct bar charts for the categorical variables.
NOTE: Please do not distribute this data set.