1. Introduction
This is the first post in a two-part series about using quantitative methods to develop typologies. The first part deals with Principal Components Analysis; the second will be about Clustering. I use the topic of development cooperation as a practical example.
Typologies are analytical tools used to build concepts and classify observations according to attributes they expose on a range of different variables. Research about development cooperation uses typologies of aid donors to describe similarities and differences in the way how states and other actors provide assistance to developing countries. In the professional discourse, we find concepts or labels such as „Scandinavian donors“, „Arab donors“, and more. A major distinction is also drawn between the donors of the OECD's Development Assistance Committee (DAC) and providers of South-South Cooperation. These categories are usually not derived from statistical methods, but are the product of qualitative comparisons and a historically grown domain-specific vocabulary.
I do not present a full overview of typologies and concepts used in the academic literature on development cooperation. Instead, I will focus on showing how quantitative methods can inform typologies. Most importantly, I do not argue that running some "fancy“ quantitative method alone is sufficient to develop a robust typology. The methods I present here can only play a complementary role in supporting qualitative research guided by domain expertise.
In the first part of this project, I apply Principal Components Analysis (PCA) to a data set about aid donors. The main idea behind PCA is to reduce the dimensionality of a data set while retaining most of the information (i.e. variation) of the original data. This is achieved by creating new variables, the principal components (PCs). The resulting PCs are uncorrelated and ordered so that the first few PCs contain most of the variation of the original variables.
How can PCA help us with developing a typology? For starters, we can use lower-dimensional data to visualise aid donors in two- or three-dimensional space. This would allow us to visually identify groups of donors that are closer together than others. A meaningful interpretation of the resulting lower-dimensional space, however, is not guaranteed. The quality of representation depends on how much of the original information is retained in the first few dimensions of the transformed data set. Let’s find out how far PCA can take us in developing a typology of aid donors.
If you want to follow the specific steps of the analysis presented in this post, I have put a Jupyter notebook in my blog repository on GitHub.
2. The data
I use data from the OECD Development Assistance Committee (DAC). The data set I’ve been building here has a few limitations. First, the data focuses on DAC members, although it also includes a group of "Participants" (non-member countries that cooperate closely with the DAC) and some other countries that share data with the DAC. The data therefore excludes some of the big players that are not reporting data to the DAC, such as China, India and Brazil. Second, I focus on Official Development Assistance (ODA). ODA constitutes an important source of development finance for the poorest developing countries, but it is not the only and, for many developing countries, not the most important source of development finance. Finally, I only include bilateral official donors (i.e. not private foundations or the EU institutions).
Using data for the year 2020 from the OECD’s statistics website, I create variables around the following criteria.
- ODA as share of Gross National Income (GNI)
- the share of a donor’s ODA going to different regions of the world and regional focus
- poverty focus (i.e. share of ODA going to the poorest countries)
- the share of ODA going to specific groups of countries, such as fragile states
- the share of ODA going to different sectors (e.g. health, education, agriculture, humanitarian aid)
- the share of ODA by type of aid (e.g. project-type interventions, technical assistance, budget support)
- the share of ODA by delivery channel (e.g. public sector, NGOs, multilateral organisations)
- the share of multilateral aid in total ODA and the share of earmarked funding to multilateral organisations
- the share of Country-Programmable Aid (CPA) in total bilateral ODA
Chart 1 provides a first peek into some of the data. The 47 donors depicted in chart 1 belong to the groups of DAC members, participants, and other countries (names appear as in original data). They spend different absolute amounts of ODA (note that the x-axis is log10-transformed) and different shares of their respective GNI (y-axis). The United Nations has a target of 0.7 percent of GNI that rich countries should spend as foreign aid. The chart shows that four "outliers“ considerably surpass this objective with an ODA share of GNI over 1 percent (Turkey, Sweden, Norway and Luxembourg); the median value of ODA as share of GNI is just over 0.25 percent for all donors. For a more detailed overview of the other variables I refer to the exploratory section in the notebook.
OECD Stats.
You can hover over the charts presented in this post to get additional information or zoom into them for a more detailed view.
3. PCA: Gaining a three-dimensional view of aid donors
For the first round of PCA, I use a data set with 47 rows (donors) and 27 columns (variables). I standard-scale the data before the PCA. We get the following 3D scatter plot based on the first three PCs. You can spin the plot and change the perspective by holding a click and moving the mouse/touchpad.
Own work based on OECD Stats
Chart 2 gives us an interesting perspective on the donor landscape. We can indeed use this plot to identify groups of donors that are close to each other. For instance, Sweden and Norway are very close together at the negative far end of the first PC. Although the different groups of DAC, Participants, and Other are mixing to some extent, we see a concentration of DAC donors in some areas. A major DAC cluster can be found around the United Kingdom (Canada, Netherlands, Switzerland, Ireland, Finland). Japan and South Korea are also relatively close to each other, as are some of the Arab countries (Saudi Arabia, Qatar, United Arab Emirates). Among the DAC members, we see that Greece is the farthest away from the DAC’s centre of gravity. Cyprus and Kazakhstan are close to each other, but very distant from the rest. The Central, Eastern, and South-Eastern European countries also tend to be situated in the same area.
But can we really trust that these three dimensions give us a truthful representation of where the donors are situated in relation to each other? To find out we need to dig a bit deeper into the results of the PCA.
4. Interpretation of the results
In the introduction, I said that the magic of PCA is that the principal components capture the variation of the initial variables in a way that the first few are sufficient to represent most of the original data. To know how much of the information is actually contained in the first three PCs, we can have a look at their explained variance.
Own work based on OECD Stats
Chart 3 shows the cumulative sum of explained variance. We can see that the first 3 PCs explain just a bit over 50% of the variance in our data set. Half of the variance of the 27 original variables in only three new variables does not sound bad. But it is less than one hopes for when the objective is to get a meaningful 3D scatter plot. Let’s have a more detailed look into how well the individual donors and original variables are represented.
4.1 Donors: can we really trust that the points in three-dimensional space represent our donors?
We can measure the quality of a donor’s representation with the cosine squared (cos2) between its original vector and its projections on the new axes (with values between 0 and 1, the closer to 1 the more information about the donor is retained in a PC). In chart 4, I show the sum of the cos2 for the first three axes of the PCA. The red line represents an arbitrary threshold (I use 0.6) above which we might consider a donor well enough represented in the 3D scatter plot.
Own work based on OECD Stats
With a threshold of 0.6, only 18 donors could be interpreted. For the other donors, we would need to include more dimensions from the PCA to have a robust idea of where they really are positioned. As for our objective of developing a typology, we could only draw limited conclusions from the 3D plot: we see for example that Sweden and Norway are indeed likely to be close together at the far negative side of the first axis because their cos2 are above the threshold. But, overall, the 3D plot gives us an incomplete view of the donor landscape with most of the donors not represented well enough.
4.2 Variables: What do the first three dimensions actually mean?
We can use correlation circles with a radius of 1 to plot the correlations of the original variables with the PCs on two-dimensional planes like in charts 5 (for PCs 1 and 2) and 6 (for PCs 1 and 3). The circles contain the original variables as arrows that come from the origin and point towards the circle's circumference. Variables that are well represented by a PC will get closer to the line of the circle. We can use the directions of the arrows to interpret the 3D plot.
Here I only show variables that have a correlation of at least 0.5 with at least one of the PCs. You can hover over the arrowheads to read the variable names.
Own work based on OECD Stats
Own work based on OECD Stats
Towards the positive numbers of PC1 and the negative numbers of the axis of PC2, we find donors that spend ODA for in-donor expenses (ODA spent in donor countries themselves, e.g. for refugees) and focus on Europe as a region. Donors in this sector are Greece, Romania, Bulgaria and Slovenia. Towards the positive side of PC2, we find donors that channel their ODA a lot through the public sector. If we remain in the positive part PC2 and move more towards the origin of PC1, we come to donors that rely especially on project-type interventions, especially in the sectors of transport, communication and energy. Here we find donors like Japan, South Korea and Turkey. If we look in exactly the opposite area, we see donors that have a high share of multilateral aid (Cyprus and Latvia).
Moving more into the negative direction of PC1, we can also see a somewhat stronger correlation with ODA to fragile states. We find donors like Saudi Arabia and the United Arab Emirates in this sector. The poverty focus becomes stronger as we move to the far negative side of PC1. This is reflected by a regional focus on Sub-Saharan Africa. Here, we see Luxembourg for example. Moving more to the negative side of PC2, we see more cooperation with NGOs and civil society as well as a greater inclination towards budget support and pooled funding (e.g. Norway and Sweden).
PC3 is less clear-cut. This is to some extent normal as the following components explain less and less of the original data. But PC3 still explains over 10%. The positive side of this axis is influenced by donors active in South and Central Asia, especially in the sector of humanitarian aid. These also tend to be donors that have a strong regional focus (i.e. are mostly present in only one region of the world). We can see Kazakstan and Turkey at the far end of this spectrum. The opposite of axis 3 is less informative.
5. What next?
We have to take the results of the above PCA with a grain of salt. Moving ahead with what we just learned, we can adjust our approach. We could change the variables and/or the donors to be included in the PCA. Selecting the donors and variables that go into the PCA requires iterating over the whole PCA process several times to exclude donors that contribute excessively to the PCA or variables that turn out to be less important than initially thought. In this post, I’ll do one more round of PCA with the same variables, but with a different set of donors.
In the notebook’s exploratory data analysis, we can see that some donors appear more often as extreme values than others. If we look into how "extreme“ the different groups (DAC, Participant, Other) are on average, we note that, maybe unsurprisingly, countries marked as „Other“ and „Participants“ are more often outliers than DAC members. This could either mean that the DAC’s standards really have a harmonising effect, or that members define the standards according to their own practices in the first place.
Outliers are not automatically disqualified from being included in a PCA. We have to look at the contributions of each donor to the different axes of the PCA. Contribution in this sense refers to the share of each donor in the total variance of each of the PCs. The donors in chart 7 are ordered according to their contribution on the first PCA axis. Contributions are more relevant for the first PC than for the following ones as it represents the highest share of the PCA’s explained variance. We observe that contributions to the first PC remain below 10%. The picture changes when looking at the other dimensions. The highest contributions come from Cyprus and Kazakhstan on PC3 (ca. 20 and 25 percent respectively).
Own work based on OECD Stats
Observations that are far away from the centre of gravity tend to contribute more. But what would be an excessive contribution? If we have 47 donors, we would expect each donor to have ca. 2.13% (i.e. 100/47) of influence on each of the components. Overall, the contributions could be considered acceptable, especially given that the largest contributions relate to PC3. We notice, however, that most of the above average contributions come from non-DAC donors. As a result, DAC members get squeezed closer together. One idea for another round of PCA could be to look only at the group of DAC members to identify intra-DAC similarities and differences more clearly.
Own work based on OECD Stats
I will not go into all the details again this time. Here the main changes compared to the previous PCA:
- The DAC-only 3D plot has gained in explained variance. The first three components now capture more than 57 percent of the explained variance (PC1 alone almost one third).
- The quality of representation of the different donors is still highly unequal. The color scale indicates how well a donor is represented, with red points well represented and blue points badly represented. We can feel more or less confident about the positioning of 11 of the DAC donors in the 3D space. This allows us to make at least some judgements about a potential typology: e.g. Sweden, Norway and Denmark are very likely close to each other; Japan and South Korea are similar to each other on PC1 and PC2 only; Greece, Slovenia, Poland and the Slovak Republic can be considered a group at the opposite end from the Scandinavian countries along axis 1. Germany is also above the 0.6 threshold for quality of representation.
- The correlation circles (see notebook) show that the axis only shift a little bit compared to the previous PCA, though the correlations are stronger (closer to the circle line). The negative side of PC1 could still be interpreted as pointing towards „progressive“ donors: i.e. strong poverty focus, focus on Sub-Saharan Africa, and use of budget support and pooled funding. The opposite side along the same axis still represents in-donor expenditure in Europe. PC2 separates donors using project-type interventions and the public sector, especially in economic sectors (transport, communication, energy) in Asia (positive numbers) from donors that use the multilateral system and work together with NGOs and civil society (negative numbers). PC3 still provides the least information. On the positive side it has some correlation with regional concentration; on the negative side with the Middle East and North Africa.
- Changing the set of donors also changes who contributes more than others to the PCA. The shape of the points is clearly influenced by the positions of Japan as well as Greece. E.g. a large number of donors almost form a line along axis 1 in the negative range of PC2 opposite of Japan.
6. Conclusion
As for the objective of developing a typology, some groups of donors have emerged as likely belonging to the same groups, despite the many caveats mentioned. Although the picture is still incomplete, commonly used labels like „Scandinavian donors“, the similarity of Japan and South Korea as donors, and a group of „Eastern European“ donors can already be confirmed as sensible categorisations by the PCA.
There is certainly much more we could do to make the PCA work better for the purpose of creating a good 3D-representation of aid donors. In particular, the selection of variables and donors could be worked out more diligently. Making better choices on using variables and donors as inputs to the PCA or merely as illustrative data can bring us a long way towards increasing the explained variance of the first three components. This process should be led by a sound theory of what a typology of aid donors should do. Above all, variables should not just be tweaked to make the method happy.
Otherwise, we just have to accept that we might need more than three PCs and different methods to make more sense of the data. This is what I will do in the second part of this project. I will take the results of the two PCAs from this post (all donors and DAC-only) and use more of the PCs to perform clustering.
Go to the second part of this article here.