A Replication of Karlan and List (2007)

Author

Dominic Schenone

Published

January 6, 2026

Introduction

Dean Karlan at Yale and John List at the University of Chicago conducted a field experiment to test the effectiveness of different fundraising letters. They sent out 50,000 fundraising letters to potential donors, randomly assigning each letter to one of three treatments: a standard letter, a matching grant letter, or a challenge grant letter. They published the results of this experiment in the American Economic Review in 2007. The article and supporting data are available from the AEA website and from Innovations for Poverty Action as part of Harvard’s Dataverse.

This project seeks to replicate the key findings from Karlan and List (2007), explore extensions of their analysis, and reflect on the implications of their design and results. Throughout, I aim to verify treatment effects on donation rates and amounts, assess baseline balance, and explore how match ratios influence giving behavior.

Data

##Importing Pandas and also our data as df 
import pandas as pd 
df = pd.read_stata('karlan_list_2007.dta')

##Looking at some of the qualities of the data
##The Shape
print("The Shape of our Data: ", df.shape)
print("The Different DataTypes of our columns")

print("Info: ", df.info())
print("First 5 Rows", df.head())
print("DataTypes of columns: ", df.dtypes)

The Shape of our Data:  (50083, 51)
The Different DataTypes of our columns
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50083 entries, 0 to 50082
Data columns (total 51 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   treatment           50083 non-null  int8    
 1   control             50083 non-null  int8    
 2   ratio               50083 non-null  category
 3   ratio2              50083 non-null  int8    
 4   ratio3              50083 non-null  int8    
 5   size                50083 non-null  category
 6   size25              50083 non-null  int8    
 7   size50              50083 non-null  int8    
 8   size100             50083 non-null  int8    
 9   sizeno              50083 non-null  int8    
 10  ask                 50083 non-null  category
 11  askd1               50083 non-null  int8    
 12  askd2               50083 non-null  int8    
 13  askd3               50083 non-null  int8    
 14  ask1                50083 non-null  int16   
 15  ask2                50083 non-null  int16   
 16  ask3                50083 non-null  int16   
 17  amount              50083 non-null  float32 
 18  gave                50083 non-null  int8    
 19  amountchange        50083 non-null  float32 
 20  hpa                 50083 non-null  float32 
 21  ltmedmra            50083 non-null  int8    
 22  freq                50083 non-null  int16   
 23  years               50082 non-null  float64 
 24  year5               50083 non-null  int8    
 25  mrm2                50082 non-null  float64 
 26  dormant             50083 non-null  int8    
 27  female              48972 non-null  float64 
 28  couple              48935 non-null  float64 
 29  state50one          50083 non-null  int8    
 30  nonlit              49631 non-null  float64 
 31  cases               49631 non-null  float64 
 32  statecnt            50083 non-null  float32 
 33  stateresponse       50083 non-null  float32 
 34  stateresponset      50083 non-null  float32 
 35  stateresponsec      50080 non-null  float32 
 36  stateresponsetminc  50080 non-null  float32 
 37  perbush             50048 non-null  float32 
 38  close25             50048 non-null  float64 
 39  red0                50048 non-null  float64 
 40  blue0               50048 non-null  float64 
 41  redcty              49978 non-null  float64 
 42  bluecty             49978 non-null  float64 
 43  pwhite              48217 non-null  float32 
 44  pblack              48047 non-null  float32 
 45  page18_39           48217 non-null  float32 
 46  ave_hh_sz           48221 non-null  float32 
 47  median_hhincome     48209 non-null  float64 
 48  powner              48214 non-null  float32 
 49  psch_atlstba        48215 non-null  float32 
 50  pop_propurban       48217 non-null  float32 
dtypes: category(3), float32(16), float64(12), int16(4), int8(16)
memory usage: 8.9 MB
Info:  None
First 5 Rows    treatment  control    ratio  ratio2  ratio3      size  size25  size50  \
0          0        1  Control       0       0   Control       0       0   
1          0        1  Control       0       0   Control       0       0   
2          1        0        1       0       0  $100,000       0       0   
3          1        0        1       0       0  Unstated       0       0   
4          1        0        1       0       0   $50,000       0       1   

   size100  sizeno  ... redcty  bluecty    pwhite    pblack  page18_39  \
0        0       0  ...    0.0      1.0  0.446493  0.527769   0.317591   
1        0       0  ...    1.0      0.0       NaN       NaN        NaN   
2        1       0  ...    0.0      1.0  0.935706  0.011948   0.276128   
3        0       1  ...    1.0      0.0  0.888331  0.010760   0.279412   
4        0       0  ...    0.0      1.0  0.759014  0.127421   0.442389   

   ave_hh_sz  median_hhincome    powner  psch_atlstba  pop_propurban  
0       2.10          28517.0  0.499807      0.324528            1.0  
1        NaN              NaN       NaN           NaN            NaN  
2       2.48          51175.0  0.721941      0.192668            1.0  
3       2.65          79269.0  0.920431      0.412142            1.0  
4       1.85          40908.0  0.416072      0.439965            1.0  

[5 rows x 51 columns]
DataTypes of columns:  treatment                 int8
control                   int8
ratio                 category
ratio2                    int8
ratio3                    int8
size                  category
size25                    int8
size50                    int8
size100                   int8
sizeno                    int8
ask                   category
askd1                     int8
askd2                     int8
askd3                     int8
ask1                     int16
ask2                     int16
ask3                     int16
amount                 float32
gave                      int8
amountchange           float32
hpa                    float32
ltmedmra                  int8
freq                     int16
years                  float64
year5                     int8
mrm2                   float64
dormant                   int8
female                 float64
couple                 float64
state50one                int8
nonlit                 float64
cases                  float64
statecnt               float32
stateresponse          float32
stateresponset         float32
stateresponsec         float32
stateresponsetminc     float32
perbush                float32
close25                float64
red0                   float64
blue0                  float64
redcty                 float64
bluecty                float64
pwhite                 float32
pblack                 float32
page18_39              float32
ave_hh_sz              float32
median_hhincome        float64
powner                 float32
psch_atlstba           float32
pop_propurban          float32
dtype: object

Description

Variable Definitions

Variable	Description
`treatment`	Treatment
`control`	Control
`ratio`	Match ratio
`ratio2`	2:1 match ratio
`ratio3`	3:1 match ratio
`size`	Match threshold
`size25`	$25,000 match threshold
`size50`	$50,000 match threshold
`size100`	$100,000 match threshold
`sizeno`	Unstated match threshold
`ask`	Suggested donation amount
`askd1`	Suggested donation was highest previous contribution
`askd2`	Suggested donation was 1.25 x highest previous contribution
`askd3`	Suggested donation was 1.50 x highest previous contribution
`ask1`	Highest previous contribution (for suggestion)
`ask2`	1.25 x highest previous contribution (for suggestion)
`ask3`	1.50 x highest previous contribution (for suggestion)
`amount`	Dollars given
`gave`	Gave anything
`amountchange`	Change in amount given
`hpa`	Highest previous contribution
`ltmedmra`	Small prior donor: last gift was less than median $35
`freq`	Number of prior donations
`years`	Number of years since initial donation
`year5`	At least 5 years since initial donation
`mrm2`	Number of months since last donation
`dormant`	Already donated in 2005
`female`	Female
`couple`	Couple
`state50one`	State tag: 1 for one observation of each of 50 states; 0 otherwise
`nonlit`	Nonlitigation
`cases`	Court cases from state in 2004-5 in which organization was involved
`statecnt`	Percent of sample from state
`stateresponse`	Proportion of sample from the state who gave
`stateresponset`	Proportion of treated sample from the state who gave
`stateresponsec`	Proportion of control sample from the state who gave
`stateresponsetminc`	stateresponset - stateresponsec
`perbush`	State vote share for Bush
`close25`	State vote share for Bush between 47.5% and 52.5%
`red0`	Red state
`blue0`	Blue state
`redcty`	Red county
`bluecty`	Blue county
`pwhite`	Proportion white within zip code
`pblack`	Proportion black within zip code
`page18_39`	Proportion age 18-39 within zip code
`ave_hh_sz`	Average household size within zip code
`median_hhincome`	Median household income within zip code
`powner`	Proportion house owner within zip code
`psch_atlstba`	Proportion who finished college within zip code
`pop_propurban`	Proportion of population urban within zip code

Balance Test

As an ad hoc test of the randomization mechanism, I provide a series of tests that compare aspects of the treatment and control groups to assess whether they are statistically significantly different from one another.

#These are the variables that I am going to include
#They are not outcome variables, so they will be good to use to test to see if the population distrbutions are significantly different:
#mrm2 – months since last donation
#hpa – highest previous amount donated
#freq – number of prior donations
from scipy.stats import t
import numpy as np

def compute_ci(var, group_var="treatment", conf_level=0.95):
    a = df[df[group_var] == 1][var].dropna()
    b = df[df[group_var] == 0][var].dropna()

    x_a, x_b = a.mean(), b.mean()
    s_a, s_b = a.std(), b.std()
    n_a, n_b = len(a), len(b)

    diff = x_a - x_b
    se = np.sqrt((s_a**2 / n_a) + (s_b**2 / n_b))
    dof = min(n_a, n_b) - 1
    t_star = t.ppf((1 + conf_level) / 2, df=dof)

    margin = t_star * se
    lower, upper = diff - margin, diff + margin

    print(f"{var} | Mean Diff: {diff:.4f} | CI: ({lower:.4f}, {upper:.4f})")

# Run CI calculations for a few variables
for variable in ["mrm2", "hpa", "freq"]:
    compute_ci(variable)

mrm2 | Mean Diff: 0.0137 | CI: (-0.2107, 0.2381)
hpa | Mean Diff: 0.6371 | CI: (-0.6498, 1.9239)
freq | Mean Diff: -0.0120 | CI: (-0.2238, 0.1998)

Balance Check Interpretation

To assess balance across treatment and control groups, I computed 95% confidence intervals for the difference in means on several baseline characteristics. The intervals for mrm2, hpa, and freq all include zero, suggesting that the groups were well-balanced and no statistically significant differences were present. This supports the validity of the random assignment.

Now I will go on to see if the results from making a linear regression, for variable mrm2, will match that from that as the t-test @95% confidence

from scipy.stats import t
import statsmodels.api as sm
import statsmodels.formula.api as smf

# --- T-Test (using classic formula) ---
a = df[df["treatment"] == 1]["mrm2"].dropna()
b = df[df["treatment"] == 0]["mrm2"].dropna()

mean_diff = a.mean() - b.mean()
se = np.sqrt((a.var(ddof=1)/len(a)) + (b.var(ddof=1)/len(b)))
dof = min(len(a), len(b)) - 1
t_stat = mean_diff / se

print(f"T-Test result for mrm2:")
print(f"  Mean difference: {mean_diff:.4f}")
print(f"  SE: {se:.4f}")
print(f"  t-stat: {t_stat:.4f}")

# --- Linear Regression ---
model = smf.ols("mrm2 ~ treatment", data=df).fit()
print("\nLinear Regression result for mrm2:")
print(model.summary())

T-Test result for mrm2:
  Mean difference: 0.0137
  SE: 0.1145
  t-stat: 0.1195

Linear Regression result for mrm2:
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   mrm2   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.000
Method:                 Least Squares   F-statistic:                   0.01428
Date:                Tue, 06 Jan 2026   Prob (F-statistic):              0.905
Time:                        20:53:01   Log-Likelihood:            -1.9585e+05
No. Observations:               50082   AIC:                         3.917e+05
Df Residuals:                   50080   BIC:                         3.917e+05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     12.9981      0.094    138.979      0.000      12.815      13.181
treatment      0.0137      0.115      0.119      0.905      -0.211       0.238
==============================================================================
Omnibus:                     8031.352   Durbin-Watson:                   2.004
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            12471.135
Skew:                           1.163   Prob(JB):                         0.00
Kurtosis:                       3.751   Cond. No.                         3.23
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

To verify the balance of baseline variable mrm2, I conducted both a two-sample t-test and a linear regression with treatment as the sole predictor. Both approaches yielded nearly identical estimates:

Mean difference: 0.0137

Standard error: ~0.115

t-statistic: 0.119

These consistent results confirm that a linear regression with a binary treatment variable is mathematically equivalent to a t-test. This approach mirrors the analysis used in Table 1 of Karlan and List (2007), which demonstrates the success of randomization in balancing observable characteristics across groups. ## Experimental Results

Charitable Contribution Made

First, I analyze whether matched donations lead to an increased response rate of making a donation.

import seaborn as sns
import matplotlib.pyplot as plt

# Calculate mean donation rate by group
donation_rates = df.groupby("treatment")["gave"].mean().reset_index()
donation_rates["group"] = donation_rates["treatment"].map({0: "Control", 1: "Treatment"})

# Create barplot
sns.barplot(data=donation_rates, x="group", y="gave")
plt.title("Proportion of People Who Donated")
plt.ylabel("Proportion Donated")
plt.xlabel("Group")
plt.ylim(0, donation_rates["gave"].max() + 0.01)
plt.grid(True)
plt.show()

from scipy.stats import ttest_ind
import statsmodels.formula.api as smf

# T-test: donation rates (binary outcome)
gave_ttest = ttest_ind(df[df["treatment"] == 1]["gave"],
                       df[df["treatment"] == 0]["gave"])
print("T-test result for 'gave' (donation made):", gave_ttest)

# Bivariate regression: gave ~ treatment
gave_reg = smf.ols("gave ~ treatment", data=df).fit()
print(gave_reg.summary())

T-test result for 'gave' (donation made): TtestResult(statistic=3.101361000543946, pvalue=0.0019274025949016982, df=50081.0)
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   gave   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                     9.618
Date:                Tue, 06 Jan 2026   Prob (F-statistic):            0.00193
Time:                        20:53:02   Log-Likelihood:                 26630.
No. Observations:               50083   AIC:                        -5.326e+04
Df Residuals:                   50081   BIC:                        -5.324e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0179      0.001     16.225      0.000       0.016       0.020
treatment      0.0042      0.001      3.101      0.002       0.002       0.007
==============================================================================
Omnibus:                    59814.280   Durbin-Watson:                   2.005
Prob(Omnibus):                  0.000   Jarque-Bera (JB):          4317152.727
Skew:                           6.740   Prob(JB):                         0.00
Kurtosis:                      46.440   Cond. No.                         3.23
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

I performed a t-test to compare the donation rates between individuals who received a matching donation offer and those who did not. The test yielded a statistically significant result (p = 0.0019), indicating that those who received the match offer were more likely to donate. While the absolute difference in rates is small (from ~1.8% to ~2.2%), the result is highly unlikely to have occurred by chance. This suggests that even modest behavioral nudges — such as the promise of a matched gift — can influence charitable giving. This supports the idea presented in Table 2a of Karlan and List (2007), where matched donations are shown to increase response rates.

Probit Model Interpretation

I used a probit regression to model the likelihood that a person made a charitable donation as a function of whether they were offered a matching donation. The coefficient on the treatment variable was positive and statistically significant (z = 3.11, p = 0.002), suggesting that individuals who received the matching offer were more likely to donate.

import statsmodels.api as sm

# Prepare data for probit
X = sm.add_constant(df["treatment"])  # Add intercept
y = df["gave"]

# Fit probit model
probit_model = sm.Probit(y, X).fit()
print(probit_model.summary())

Optimization terminated successfully.
         Current function value: 0.100443
         Iterations 7
                          Probit Regression Results                           
==============================================================================
Dep. Variable:                   gave   No. Observations:                50083
Model:                         Probit   Df Residuals:                    50081
Method:                           MLE   Df Model:                            1
Date:                Tue, 06 Jan 2026   Pseudo R-squ.:               0.0009783
Time:                        20:53:03   Log-Likelihood:                -5030.5
converged:                       True   LL-Null:                       -5035.4
Covariance Type:            nonrobust   LLR p-value:                  0.001696
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -2.1001      0.023    -90.073      0.000      -2.146      -2.054
treatment      0.0868      0.028      3.113      0.002       0.032       0.141
==============================================================================

Why Use a Probit Model?

Probit regression is more appropriate for binary outcomes like donation (gave = 0/1) than OLS, because it models probabilities between 0 and 1 using the cumulative normal distribution. It’s a standard choice in applied microeconomics and matches the method used by Karlan & List (2007).

Although the raw coefficient from a probit model isn’t directly interpretable as a probability, the significant result aligns with earlier t-tests and linear regression findings. This confirms that the treatment has a measurable effect on donation behavior. In essence, people respond to matching offers — even small nudges like this can meaningfully influence giving.

I also took the liberty of calculating how much more likely someone is to donate because of the treatment, in percentage terms.

# Get marginal effects
marginal_effects = probit_model.get_margeff()
print(marginal_effects.summary())

       Probit Marginal Effects       
=====================================
Dep. Variable:                   gave
Method:                          dydx
At:                           overall
==============================================================================
                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
treatment      0.0043      0.001      3.104      0.002       0.002       0.007
==============================================================================

Being offered a matching donation increases the probability of giving by approximately 0.43 percentage points (e.g., from 1.8% to ~2.2%). While small, this is statistically significant and meaningful in a high-volume fundraising context.

NOTE: Linear regression results appear replicate Table 3 column 1 in the paper. Probit results do not, despite Table 3 indicating its results come from probit regressions…

Differences between Match Rates

Next, I assess the effectiveness of different sizes of matched donations on the response rate.

Match Ratio Interpretation (T-test + Regression)

I conducted a series of t-tests to examine whether higher match ratios (2:1 and 3:1) led to higher donation rates compared to a 1:1 match. None of the comparisons were statistically significant. This aligns with the findings of Karlan and List (2007), who similarly reported that while matched donations increased giving compared to no match, larger match ratios did not consistently lead to higher response rates. This suggests that the presence of a match, rather than its magnitude, may be the primary behavioral driver.

from scipy.stats import ttest_ind

# 1:1 vs 2:1
gave_1to1 = df[df["ratio"] == 1]["gave"]
gave_2to1 = df[df["ratio"] == 2]["gave"]
ttest_2vs1 = ttest_ind(gave_2to1, gave_1to1)
print("2:1 vs 1:1 match rate:", ttest_2vs1)

# 2:1 vs 3:1
gave_3to1 = df[df["ratio"] == 3]["gave"]
ttest_3vs2 = ttest_ind(gave_3to1, gave_2to1)
print("3:1 vs 2:1 match rate:", ttest_3vs2)

# (Optional) 1:1 vs 3:1
ttest_3vs1 = ttest_ind(gave_3to1, gave_1to1)
print("3:1 vs 1:1 match rate:", ttest_3vs1)

2:1 vs 1:1 match rate: TtestResult(statistic=0.96504713432247, pvalue=0.33453168549723933, df=22265.0)
3:1 vs 2:1 match rate: TtestResult(statistic=0.05011583793874515, pvalue=0.9600305283739325, df=22261.0)
3:1 vs 1:1 match rate: TtestResult(statistic=1.0150255853798622, pvalue=0.3101046637086672, df=22260.0)

Why Regression Might Reveal More

The regression model pools variance and uses all three match groups in one estimation, which can provide slightly more power than isolated pairwise t-tests. It also allows you to assess the pattern of effects together, rather than in fragments.

# Create ratio indicators
df["ratio1"] = (df["ratio"] == 1).astype(int)
df["ratio2"] = (df["ratio"] == 2).astype(int)
df["ratio3"] = (df["ratio"] == 3).astype(int)

# Regression using manual indicators (that I just created)
manual_reg = smf.ols("gave ~ ratio2 + ratio3", data=df).fit()
print(manual_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   gave   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                     4.117
Date:                Tue, 06 Jan 2026   Prob (F-statistic):             0.0163
Time:                        20:53:03   Log-Likelihood:                 26629.
No. Observations:               50083   AIC:                        -5.325e+04
Df Residuals:                   50080   BIC:                        -5.323e+04
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0190      0.001     22.306      0.000       0.017       0.021
ratio2         0.0036      0.002      2.269      0.023       0.000       0.007
ratio3         0.0037      0.002      2.332      0.020       0.001       0.007
==============================================================================
Omnibus:                    59815.856   Durbin-Watson:                   2.005
Prob(Omnibus):                  0.000   Jarque-Bera (JB):          4317637.927
Skew:                           6.741   Prob(JB):                         0.00
Kurtosis:                      46.443   Cond. No.                         3.16
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

In this regression, I examined whether the size of the match ratio influenced the likelihood of donating. The model uses the 1:1 match group as the baseline. Both the 2:1 and 3:1 match groups showed small but statistically significant increases in donation rates, around +0.36 to +0.37 percentage points. This contrasts slightly with the t-tests, which did not find significant differences — likely because the regression model uses more degrees of freedom and pools variance more efficiently.

The results suggest that larger match ratios may have had a small positive impact, though the effect size is modest. This provides slightly more evidence than Karlan and List (2007), who found no consistent effects. However, even in this case, the behavioral change is relatively small, supporting the idea that the presence of a match may matter more than its magnitude.

Size of Charitable Contribution

In this subsection, I analyze the effect of the size of matched donation on the size of the charitable contribution.

Below is the T-Test:

from scipy.stats import ttest_ind

# T-test comparing average donation amount (including 0s)
t_amount = ttest_ind(df[df["treatment"] == 1]["amount"],
                     df[df["treatment"] == 0]["amount"])
print("T-test on donation amount:")
print(t_amount)

T-test on donation amount:
TtestResult(statistic=1.8605020225753781, pvalue=0.06282038947470686, df=50081.0)

The results of this indicate:

Now I am running a bivariate linear regression of the donation amount on the treatment status. It shows:

import statsmodels.formula.api as smf

# OLS regression of donation amount on treatment
amount_reg = smf.ols("amount ~ treatment", data=df).fit()
print(amount_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 amount   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                     3.461
Date:                Tue, 06 Jan 2026   Prob (F-statistic):             0.0628
Time:                        20:53:03   Log-Likelihood:            -1.7946e+05
No. Observations:               50083   AIC:                         3.589e+05
Df Residuals:                   50081   BIC:                         3.589e+05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.8133      0.067     12.063      0.000       0.681       0.945
treatment      0.1536      0.083      1.861      0.063      -0.008       0.315
==============================================================================
Omnibus:                    96861.113   Durbin-Watson:                   2.008
Prob(Omnibus):                  0.000   Jarque-Bera (JB):        240735713.635
Skew:                          15.297   Prob(JB):                         0.00
Kurtosis:                     341.269   Cond. No.                         3.23
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Overall Donation Amount Interpretation

I analyzed whether people in the treatment group gave more money, on average, than those in the control group. Both a t-test and a linear regression yielded nearly identical results: treatment group participants donated about $0.15 more, but the difference was not statistically significant (p ≈ 0.063). This suggests that while there may be a small increase in donation amount due to the treatment, the effect is weak and uncertain when including non-donors in the analysis.

import statsmodels.formula.api as smf

# Filter to only people who donated
donors = df[df["amount"] > 0]

# Run regression on amount given, conditional on donating
donation_reg = smf.ols("amount ~ treatment", data=donors).fit()
print(donation_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 amount   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.001
Method:                 Least Squares   F-statistic:                    0.3374
Date:                Tue, 06 Jan 2026   Prob (F-statistic):              0.561
Time:                        20:53:03   Log-Likelihood:                -5326.8
No. Observations:                1034   AIC:                         1.066e+04
Df Residuals:                    1032   BIC:                         1.067e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     45.5403      2.423     18.792      0.000      40.785      50.296
treatment     -1.6684      2.872     -0.581      0.561      -7.305       3.968
==============================================================================
Omnibus:                      587.258   Durbin-Watson:                   2.031
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             5623.279
Skew:                           2.464   Prob(JB):                         0.00
Kurtosis:                      13.307   Cond. No.                         3.49
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Conditional Donation Amount Interpretation

I restricted the sample to only those who donated and regressed the donation amount on treatment status. The results showed that donors in the treatment group gave, on average, $1.67 less than those in the control group. However, this difference was not statistically significant (p = 0.561). This suggests that while the match offer may have influenced the decision to donate, it did not affect how much was given among those who did donate.

Causality Caveat

This regression is not strictly causal, because it conditions on having donated — which is a post-treatment outcome. That means you’re selecting a subset of people whose outcome (donating) may itself have been influenced by the treatment. This could introduce selection bias, so while the analysis is informative, it should be interpreted descriptively rather than causally.

Histogram Interpretation

I created histograms of donation amounts among donors in both the treatment and control groups. Both groups show similar distributions, with most donations clustering around $25 to $75. The red line shows the mean donation, which is slightly higher in the control group. These visuals reinforce the regression results — while the treatment may have influenced donation likelihood, it did not lead to higher amounts among those who gave.

import seaborn as sns
import matplotlib.pyplot as plt

# Filter to donors only
donors = df[df["amount"] > 0]

# --- Histogram for Treatment Group ---
sns.histplot(data=donors[donors["treatment"] == 1], x="amount", bins=30, kde=False)
plt.axvline(donors[donors["treatment"] == 1]["amount"].mean(), color='red', linestyle='--', label="Mean")
plt.title("Donation Amounts – Treatment Group")
plt.xlabel("Donation Amount")
plt.ylabel("Count")
plt.legend()
plt.show()

# --- Histogram for Control Group ---
sns.histplot(data=donors[donors["treatment"] == 0], x="amount", bins=30, kde=False)
plt.axvline(donors[donors["treatment"] == 0]["amount"].mean(), color='red', linestyle='--', label="Mean")
plt.title("Donation Amounts – Control Group")
plt.xlabel("Donation Amount")
plt.ylabel("Count")
plt.legend()
plt.show()

Simulation Experiment

As a reminder of how the t-statistic “works,” in this section I use simulation to demonstrate the Law of Large Numbers and the Central Limit Theorem.

Suppose the true distribution of respondents who do not get a charitable donation match is Bernoulli with probability p=0.018 that a donation is made.

Further suppose that the true distribution of respondents who do get a charitable donation match of any size is Bernoulli with probability p=0.022 that a donation is made.

Law of Large Numbers

To illustrate the Law of Large Numbers, I simulated differences in donation rates between treatment and control groups and plotted the cumulative average over 10,000 draws. While early sample differences fluctuated substantially, the average steadily approached the true treatment effect of 0.004. This demonstrates how, with large enough samples, the sample average converges to the population mean — providing reassurance about the reliability of randomized experimental estimates.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

# Simulate donation outcomes (1 = gave, 0 = didn’t give)
control_draws = np.random.binomial(1, 0.018, 10000)
treatment_draws = np.random.binomial(1, 0.022, 10000)

# Calculate the difference in each trial
diffs = treatment_draws - control_draws

# Cumulative average of differences
cum_avg = np.cumsum(diffs) / (np.arange(1, len(diffs)+1))

# Plot the cumulative average
plt.plot(cum_avg, color='orange')
plt.axhline(0.004, color='red', linestyle='--', label="True Effect (0.004)")
plt.title("Cumulative Average of Treatment - Control Differences")
plt.xlabel("Sample Size")
plt.ylabel("Cumulative Average")
plt.legend()
plt.grid(True)
plt.show()

Central Limit Theorem

I simulated 1000 samples for four different sizes (n = 50, 200, 500, 1000) and plotted the sampling distribution of the mean difference in donation rates. As expected, with increasing sample size, the distributions became more symmetric and concentrated around the true treatment effect. This visualizes the Central Limit Theorem — regardless of the population distribution, the sampling distribution of the mean becomes approximately normal as the sample size grows.

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

sample_sizes = [50, 200, 500, 1000]
np.random.seed(42)

for n in sample_sizes:
    sim_means = []
    for _ in range(1000):
        control_sample = np.random.binomial(1, 0.018, n)
        treatment_sample = np.random.binomial(1, 0.022, n)
        sim_means.append(np.mean(treatment_sample - control_sample))

    # Plot histogram of sampling distribution
    plt.figure()
    sns.histplot(sim_means, bins=30)
    plt.title(f"Sampling Distribution of Mean Difference (n = {n})")
    plt.xlabel("Mean Difference")
    plt.ylabel("Frequency")
    plt.grid(True)
    plt.show()