One of the projects I am working on aims to characterize how women adjust psychologically to genetic testing results. The outcomes are fairly standard for health psychology and research with cancer patients, examples include depressive symptoms, positive affect (are you a happy person?), negative affect (are you a moody person?), and intrusive thoughts (are you bothered by thoughts or concerns about cancer?).
Outcomes are measured before receiving genetic testing results, after, and at several follow up points. After characterizing adjustment trajectories, our second aim is to see what predicts these trajectories. In a way, this is a ‘pie-in-the-sky’ goal. Perhaps it would be more accurate to say our second aim is to examine whether each of a number of theoretically relevant psychological constructs predicts adjustment trajectories. By no means will we account for all individual variability.
Two factors that may be relevant are perceptions of cancer-related threat and control. This study has six items (two sets of three) that should measure each. Because it was a longitudinal study, we have these items at four different time points. However, these items have never been officially published to my knowledge. That is, they have been used in our and related labs, but are not an official, public scale.
The way most scales in psychology are created is by taking the mean or sum of participants’ responses to several questions. In this case, the sum or mean of the three items. However, the validity of this relies on several assumptions. For example, if all the items measure the same underlying construct, they should all be positively related to each other. If they are measuring one and only one construct, all the inter-item correlations should be similar (because all similarity in items should be driven by one underlying, or latent, factor).
That is enough to get us started, but if we want to use the same items in different groups of people or different contexts, it also becomes important that the inter-item correlations remain consistent across time or groups. If they do not, this implies that the construct being measured is shifting over time or that you are actually measuring different constructs.
Let’s make this more concrete. For example, suppose we asked people this question: “I feel I can control my cancer”. Perhaps some people consider medical treatment something they are doing to control cancer, but other people do not include medical treatment. In the former case, the question might correlate strongly with “I feel I have a competent medical team” and in the latter one might expect almost no correlation between perceived cancer control and quality of the medical team. Thus, patterns of correlations and whether they are consistent or not can give us clues into whether different groups of participants or the same participants at different times are interpreting the questions in the same way. Does it guarantee that they are? No. However, it is a good indicator.
These are just a few examples of the logic behind psychological measurement. So well into the post, what’s in a correlation matrix? I love data visualization so rather than showing you a giant table, here is a heatmap of the correlation matrix, separated into blocks by time. The size of the dots indicate the proportion of data present. They are useful as a quick judge as to how reliable a correlation is.
In each block are the six items measuring perceived control and threat. The block diagonal are the correlations within a time point, and the block off diagonals are the between time point correlations. You can easily see that there are two sets of three items, and that the pattern holds up over time.
By plotting the correlations among all time points simultaneously, it is also possible to see the stability of each item (i.e., within item correlation over time, sometimes also called test-retest reliability). You can also see how similar or dissimilar the between item correlations are over time. That is, are the patterns of correlations within items hypothesized to measure the same construct stable over time?
Of course, this visual method is a rough first pass. Formal statistical analyses exist, such as confirmatory factor analysis to test whether the proposed factor structure is a good fit to the data (here, the data are the correlations, not the raw data points). It is also possible to use multiple group confirmatory factor analyses to not only test the factor structure but also whether it is consistent over time. Essentially, fit the same model in each time point and then test how similar or different the values are at each time.
To summarize, here are the rough conclusions I would draw from this graph.
- There are indeed two dimensions to these six items, and they are the hypothesized ones.
- Within a dimension, the correlations are moderate, suggesting the items are measuring something similar, but they are not so high that any item appears to be redundant.
- The two dimensions only have a small to modest, negative correlation.
- Scores on each of the items are moderately to strongly correlated over time. The correlations over time do not decrease much, suggesting that the constructs being measured are relatively stable over time.
- The patterns within a dimension are fairly consistent over time. There is some fluctuation, but it is not dramatic. This is evidence in support of the notion that people interpret the questions similarly over time.
- One item, “prc7” belongs to the second dimension, but has a moderate, negative correlation with the first dimension. This is present at all time points, but is strongest at the final time.