Project #1: Data Summary
The data which is stored in the file nc2005birth1000.MTW is a random sample of 1000 birth records taken by the North Carolina State Center for Health and Environmental Statistics in 2005. Of particular interest will be incidents of low infant birth weight. Low birth weight is defined as less than 2500 grams. Over the course of the semester we will investigate the relationship of several variables with low birth weight and each other. The goal of this assignment will be to summarize the variables in this data set both graphically and numerically. The variables in this study are:
Variable Label | Description |
PLURAL | Number of children born of the pregnancy |
SEX | Sex of child (1=Male, 2=Female) |
FAGE | Age of father (years) |
MAGE | Age of mother (years) |
WEEKS | Completed Weeks of Gestation (weeks) |
VISITS | Number of prenatal visits |
MARITAL | Marital status (1=married, 2=not married) |
RACEMOM | Race of Mother (0=Other Non-white, 1=White, 2=Black 3=American Indian, 4=Chinese, 5=Japanese, 6=Hawaiian, 7=Filipino, 8=Other Asian or Pacific Islander) |
HISPMOM | Mother of Hispanic origin (C=Cuban, M=Mexican, N=Non-Hispanic, O=Other and Unknown Hispanic, P=Puerto Rican, S=Central/South American, U=Not Classifiable) |
GAINED | Weight gained during pregnancy (pounds) |
LOWBW | 0=infant was not low birth weight
1=infant was low birth weight |
TPOUNDS | Weight of child (pounds) |
SMOKE | 0=mother did not smoke during pregnancy
1=mother did smoke during pregnancy |
MATURE | 0=Mother is age 34 or younger
1=Mother is 35 or older |
PREMIE | 0=infant was not premature
1=infant was premature premature defined at 36 weeks or sooner |
The goal of this assignment is to obtain summary statistics for the variables in the data set. This is an important activity of most statistical studies. In your report, clearly label all tables and when appropriate give the units of measure. The components of the assignment are given below. Be sure your presentation is clear and organized. The use of tables is required.
- Provide the appropriate numerical summary for each of the variables. This entails determining if you will create a frequency table or a table with the mean, standard deviation, min, Q1, median, Q3, and maximum (7 number summary).
- Create a histogram for the variables FAGE, MAGE, WEEKS, GAINED, and TPOUNDS. Describe the shape of the distribution. Is the mean or the median a better measure of center for each of these variables?
- Construct side-by-side boxplots for the variable of TPOUNDS for the two groups of smokers and nonsmokers. Interpret your graph. Does the boxplot indicate a difference in the distribution of baby weight for smoking and nonsmoking mothers? Calculate the 7 number summary of TPOUNDS for the smoking and nonsmoking mothers. What do these statistics indicate?
- Suppose a friend of yours has given birth to a 10.3 pound baby. Would you consider this baby “heavy”? Why or why not? In writing explain your reasoning. Another friend of yours had a baby and gained approximately 30 pounds during her pregnancy. Explain to her why she should not be too depressed over this occurrence. Comment on the percentage of women who reported smoking during pregnancy and the reliability of the responses.
- Lastly, propose three other variables you would like to investigate in regard to weight of the infant. Give three explicit questions you would ask the mother prior to delivery and explain why you want to know that information.
This is what I have, and I do not know how to word it
Shardena Thompson
Question 1
Categorical
Numerical
Question 2
5 Histograms
Fage
The fage, mage and gained showing on the graph that it is skewed to the right because of the outliers and the median is the better measurement of the center because the mean is being pulled making it not represent the data accurately. In addition to that, the weeks and tpounds is showing a skewed to the left because the outliers and the median is the better measurement of the center because the mean is being pulled making it not represent the data accurately.
Question 3
The boxplots looks similar to each, however the non smokers has more outliers on the low tpound scale that ranger for 1 to 4 than the smoker which range from 1 to 3. Also, the non smoker have some outliers on the higher end of the tpound scale where as the smoker have no outliers on the higher tpound scale. They are both skewed to the left howere the non smoker has a greater distibution than the smoker. Also the skewed to the left because of the outliers and the median is the better measurement of the center because the mean is being pulled making it not represent the data accurately.
Question 4
(10.3-7.0997)/1.5089==2.1209
I would not consider this baby heavy. The baby weight is not more than 3SD away from the mean.
(30-30.326) /14.241= -0.023
The woman’s weight gained is not even close to being 3 SD above from the mean. It is slightly below the mean, so there is no reason to be depressed at all. The percentage of woman who reported not smoking 87.39% the data may be unreliable because of response bias.
Question 5
- Were you taking any drugs or exposed to any?
- What was the weight prior to pregnancy?
- Were you exposed to radiation or any Infection?
Question 1
Categorical
Numerical
Question 2
5 Histograms
Fage
The fage, mage and gained showing on the graph that it is skewed to the right because of the outliers and the median is the better measurement of the center because the mean is being pulled making it not represent the data accurately. In addition to that, the weeks and tpounds is showing a skewed to the left because the outliers and the median is the better measurement of the center because the mean is being pulled making it not represent the data accurately.
Question 3
The boxplots looks similar to each, however the non smokers has more outliers on the low tpound scale that ranger for 1 to 4 than the smoker which range from 1 to 3. Also, the non smoker have some outliers on the higher end of the tpound scale where as the smoker have no outliers on the higher tpound scale. They are both skewed to the left howere the non smoker has a greater distibution than the smoker. Also the skewed to the left because of the outliers and the median is the better measurement of the center because the mean is being pulled making it not represent the data accurately.
Question 4
(10.3-7.0997)/1.5089==2.1209
I would not consider this baby heavy. The baby weight is not more than 3SD away from the mean.
(30-30.326) /14.241= -0.023
The woman’s weight gained is not even close to being 3 SD above from the mean. It is slightly below the mean, so there is no reason to be depressed at all. The percentage of woman who reported not smoking 87.39% the data may be unreliable because of response bias.
Question 5
1. Were you taking any drugs or exposed to any?
2. What was the weight prior to pregnancy?
3. Were you exposed to radiation or any Infection?