INSIGHTS INTO DATA

 

GOALS

1.   Represent data graphically.

2.   Describe data numerically.

3.   Describe the relationship between two variables.

4.   Identify the degree of correlation between variables.

5.   Describe a linear relationship with an equation of a straight line.

6.   Design, conduct and analyze ways of gathering data: surveys, simulations, experiments.

7.   Use random samples in gathering data.

8.   Analyze representations of data.

9.   Draw conclusions based on given data sets and representations of data.

10.  Recognize possible bias in sample surveys.

11.  Determine whether representations of data (numerical and visual) are appropriate.

12.  Become aware of the questions that should be asked when analyzing data and representations of data.

 

SECTION SUMMARIES

 

SECTION A:  PATTERNS IN DATA

1.       Some conclusions you draw from a graph may be very obvious – if there are clusters of data or outliers.

2.       Other conclusions may be more complex – a description of a typical data point.

3.       Careful examination of a graph may raise new questions requiring more research to provide answers.

 

SECTION B:  SELECTING SAMPLES

1.       A population is a group of people or a set of objects you want to gather information about.

2.       When taking a sample, it’s important to do so randomly, so each member of the population has an equal chance of being selected.

3.       You can also collect data by designing and running an experiment or simulation.

4.       Sampling bias should be avoided;  some possible causes of bias are:  incorrectly choosing the sample, neglecting to account for the people who do not respond, and letting interviewers select the people they want to interview.

 

SECTION C:  INTERPRETING GRAPHS

1.        Data must be reliable and presented appropriately in order to make accurate conclusions.

2.        Pictorial, line, bar, histograms, scatterplots, and box plots are only a few ways of presenting data.

3.        Data may be misrepresented if one or more of the following occurs:

a.       The graph’s axes are scaled improperly

b.       origins on the graph are excluded

c.       three dimensional pictures are used inappropriately

d.       numbers that should not be compared are compared

e.       pictures that do not fit the numbers are used

 

SECTION D:  USING PLANT GROWTH DATA

1.       Mean, median, and mode can numerically describe the “typical” or “normal” number of a data set.

2.       A plot over time allows you to look for trends in the day to day change of the data.

3.       A histogram is a general picture of the data, allowing you to see clusters and gaps in the data.

4.       A box plot provides a summary of the 5 major data points:  minimum, the 1st quarter, the median, the 3rd quarter, and the maximum.  It shows the big picture but you lose the small details.

 

SECTION E:  CORRELATING DATA

1.        Scatter plots show info about pairs of data and whether a relationship exists between them.  (look at the trend)

2.        Points close to forming a straight line show a strong correlation.  Scattered points show weak correlation.

3.        A strong correlation does not mean there is a cause-effect relationship.  Other tests are needed to see whether a change in one variable (the x-axis independent variable) causes a change in the other (the y-axis dependent variable).

SECTION F:  LINES THAT SUMMARIZE DATA

1.        If the relationship between two variables appears to be linear, a line can be found to describe it.

2.        Straight lines can be used to predict unknown values, check existing values, and compare different values.

3.        The slope of the line can be expressed in terms of the data, and can lead to statements such as ‘When the

                              increases by           , the                          increases or decreases  by                    .’

4.        The best fit line drawn by hand is used to capture the trend of the data.  The least squares linear regression line is based on the mean of the data.  The median fit line is used to find medians as representative points.  A curve is used to describe data that is not constant, such as growth.

 

 

 

MY NOTES   (you may want to make some notes here to remember how to generate random #’s, create a histogram, specifics on a box plot, or how to find the different lines of best fit )