Analyzing Campaign Contributions

Last week, I took a closer look at the APRIL FEC Quarterly reports from the 14 major democratic candidates competing for the presidential nomination in 2020. Specifically, I looked at who Asian donors were giving to and which candidates were raising the most from Asian contributions. Check out the first post on Asian contributions overall here and the second one which looks at specific ethnic groups here. I hope this post helps explain how I analyzed the contribution data, and what kind of caveats should be considered. If you have any questions please send me an email or message me on Twitter!

Asian American Contributions (April 2019 FEC)

Asian American Contributions by Ethnic Group (April 2019 FEC)

Basic Steps of analysis

Here’s the basic steps of the analysis from data collection to final estimates. Most of the cleaning/analysis was done in R, which I strongly recommend to everyone :) #Tidyverse4life.

  1. Scrape April 2019 quarterly filings for each candidate into a database1
  2. Subset to only those classified as individual contributions 2
  3. Geocode contributor information3
  4. Estimate race/ethnicity of contributor
  5. Subset to Asian contributions and estimate ethnicity for top 6 Asian ethnic groups in the US4

What are the caveats with this data?

Itemized individual contributions

Due to FEC requirements on reporting, the only data that is publicly available for analysis are those that are classified as “itemized individual contributions”. These contributions are those that arerequired to be reported to the FEC due to the amount (>$200).

Doesn’t this mean that your analysis excludes small donors?

Yes, but not entirely.

In recent years, many Democratic campaigns have turned to using ACTBLUE to help with fundraising efforts, particularly among small donors (NYT, Wired). Since ACTBLUE is defined as political action committee (PAC), they are subject to slightly different reporting requirements which allows us a peek into small donors. When a donor gives to a particular through ACTBLUE’s platform, that contribution is still counted as an individual contribution, but ACTBLUE acts as an intermediary who then delivers the contributions to the candidate’s campaign committee. In 2018, FiveThirtyEight published an excellent piece about ACTBLUE that leveraged this data to look at small donor contribution patterns.

Taking Andrew Yang’s filing as an example, we can see that his principle campaign committee “Friends of Andrew Yang” reported raising over 1.7 million dollars in individual contributions during this time period, ~342K of which are considered itemized individual contributions. Looking closer at the itemized contributions we can see that many of them are given through ACTBLUE and some of the contributions are well under $200. So, looking back at my posts, Andrew Yang’s $119,440 that he raised from Asian American contributions is estimated from the itemized individual contributions only since we don’t have any data on the contributors who are grouped into unitemized individual contributions.

Caveat #1: We can’t look at unitemized contributions, but that doesn’t mean we can’t look at contributions made by smaller donors.

Ethnic surname analysis

Ethnic surname analysis is a technique for imputing the race/ethnicity of a person using their last name (surname) and Census data on surnames by racial/ethnic group. Relatively recently, scholars have improved on this method by incorporating more information and using geocoded demographic information. This particular flavor of ethnic surname analysis has been called by different names, but the version that I’m most familiar with is “Bayesian Improved Surname Geocoding” or BISG.

The caveats associated with this approach are beyond the scope of this post, but I have listed several links at the bottom that are helpful in getting an overview. One caveat that I will mention here is the opportunity for contributors to be miscoded due to interracial marriage. According to estimates from PEW, nearly 3 in 10 Asian newlyweds were married to someone of a different race/ethnicity in 2015, the highest of any racial group. Unfortunately this is something that we don’t have a good solution for, so it is important to keep this in mind when using these methods.

That being said, it is worth noting that this type of analysis has been used in Health research (Elliot et al 2008, Adjaye-Gbewonyo et al 2013; Grundmeier et al 2015), political science (Imai & Khanna; Fraga 2015), and even by the Consumer Finance Protection Bureau to help enforce fair lending laws. In addition, Catalist a data vendor that often supplies Democratic campaigns (and academic researchers) with enriched voter file data bases their approach to modeling race/ethnicity from this method as well (Hersch; Fraga 2018)5.

Caveat #2: Ethnic-surname analysis to group donors is a widely-used technique for imputing race/ethnicity on administrative records, but there are important caveats (i.e. rates of interracial marriage) to be aware of.

Tools for Analysis

List of scholarship on ethnic surname analysis

  • Adjaye‐Gbewonyo, D., Bednarczyk, R. A., Davis, R. L., & Omer, S. B. (2014). Using the Bayesian Improved Surname Geocoding Method (BISG) to create a working classification of race and ethnicity in a diverse managed care population: a validation study. Health Services Research, 49(1), 268-283.

  • Imai, K., & Khanna, K. (2016). Improving ecological inference by predicting individual ethnicity from voter registration records. Political Analysis, 24(2), 263-272.

  • Elliott, M. N., Fremont, A., Morrison, P. A., Pantoja, P., & Lurie, N. (2008). A new method for estimating race/ethnicity and associated disparities where administrative records lack self‐reported race/ethnicity. Health services research, 43(5p1), 1722-1736.

  • Fiscella, K., & Fremont, A. M. (2006). Use of geocoding and surname analysis to estimate race and ethnicity. Health services research, 41(4p1), 1482-1500.

  • Fraga, B. L. (2016). Candidates or districts? Reevaluating the role of race in voter turnout. American Journal of Political Science, 60(1), 97-122.

  • Fraga, B. L. (2018). The Turnout Gap: Race, Ethnicity, and Political Inequality in a Diversifying America. Cambridge University Press.

  • Grundmeier, R. W., Song, L., Ramos, M. J., Fiks, A. G., Elliott, M. N., Fremont, A., … & Localio, R. (2015). Imputing missing race/ethnicity in pediatric electronic health records: reducing bias with use of US census location and surnamejavascript:; data. Health services research, 50(4), 946-960.

  • Hersh, E. D. (2015). Hacking the electorate: How campaigns perceive voters. Cambridge University Press.

  • Lauderdale, D. S., & Kestenbaum, B. (2000). Asian American ethnic identification by surname. Population Research and Policy Review, 19(3), 283-300.


  1. You can manually download them, but I would strongly recommend using Fec-Loader ^
  2. Using entity_type field in contribution record ^
  3. Ideally, you want the smallest possible geography possible (i.e. census block), for this analysis, I geocoded each contribution record to the census tract. The R package I used also allows for county and place as well. ^
  4. I used an Asian detailed origin surname list from Lauderdale & Kestenbaum ^
  5. See Appendix A3 for a discussion on Estimating Race/Ethnicity with Catalist Data ^
Avatar
Sono Shah
Ph.D. Candidate