Using Canonical Correlation for Index Construction with Aggregated Data

Thursday, July 17, 2014: 8:45 AM
Room: 416
Oral Presentation
Saskia Maria FUCHS , Social Science, Kiel University, Kiel, Germany
Peter GRAEFF , Institute of Social Sciences, Christian-Albrechts University Kiel, Kiel, Germany
Indices constructed by aggregated data are frequently used for macro data or multilevel studies. There are several statistical approaches to construct such macro indices (as has been done by World Bank econometricians or psychometricians who deal with cross-country research questions).

For measuring multi-facet phenomena on the macro level (such as happiness, corruption or freedom), one would consider indices based on sub components which refer to the same phenomenon. Specifically, for convergent validity it seems important that the sub components measure different aspects of the same phenomenon but ideally nothing else (convergent validity). Statistical methods applied for index construction usually regard for convergent validity. This implies that most macro indices are constructed with high reliability. For preserving validity completely, one should also consider differences in measurement to other constructs or phenomena (discriminant validity). This is typically not under consideration in econometricians’ way of procedures when indices are constructed. It is also only seldom regarded by psychometric attempts of index construction.

As a method that allows for both reliability and validity, e.g. for warranting discriminant validity, we suggest canonical correlation as a procedure of macro index construction.

We show the advantages of canonical correlation for macro indices construction by referring to the seldom scrutinized social phenomenon of personal freedom. One of canonical correlation analysis mostly used application is the reduction of dimensionality (Anderson 1984) which is a crucial matter in index construction when sub components are also concerned.

The construction process is compared to factor analysis results (e.g. the role of eigenvalues in both procedures, Burt 1948). It is shown that both methods do not necessarily end up with the same results. In sum, canonical correlation is a more flexible tool that comes with stricter assumptions but with a clearer concept of what should be excluded from the index.