What is a Regression Table?
A regression table (or regression output) summarizes the results of a regression analysis. It presents the estimated relationships between variables, their statistical significance, and model fit statistics in a structured format.
"A regression table tells you which variables matter, how much they matter, and how confident you can be about those conclusions."
Part 1: Anatomy of a Regression Table
1.1 Standard Components
┌─────────────────────────────────────────────────────────────────────┐ │ REGRESSION TABLE │ ├─────────────────────────────────────────────────────────────────────┤ │ Dependent Variable: Sales (in $1000) │ │ Method: Ordinary Least Squares (OLS) │ │ Sample Size: 1,000 observations │ ├───────────────┬──────────┬──────────┬──────────┬───────────────────┤ │ │ │ │ │ │ │ Variable │ Coefficient│ Std. Error│ t-statistic│ p-value │ │ │ │ │ │ │ ├───────────────┼──────────┼──────────┼──────────┼───────────────────┤ │ Intercept │ 50.234 │ 2.145 │ 23.42 │ 0.000 *** │ │ (Constant) │ │ │ │ │ ├───────────────┼──────────┼──────────┼──────────┼───────────────────┤ │ Advertising │ 2.345 │ 0.123 │ 19.07 │ 0.000 *** │ │ ($1000) │ │ │ │ │ ├───────────────┼──────────┼──────────┼──────────┼───────────────────┤ │ Price │ -1.567 │ 0.089 │ -17.61 │ 0.000 *** │ │ ($) │ │ │ │ │ ├───────────────┼──────────┼──────────┼──────────┼───────────────────┤ │ Store Size │ 0.876 │ 0.234 │ 3.74 │ 0.000 *** │ │ (sq ft) │ │ │ │ │ ├───────────────┼──────────┼──────────┼──────────┼───────────────────┤ │ Location │ 5.432 │ 1.876 │ 2.90 │ 0.004 ** │ │ (Urban = 1) │ │ │ │ │ └───────────────┴──────────┴──────────┴──────────┴───────────────────┘ │ │ │ R-squared: 0.782 Adjusted R-squared: 0.781 │ │ F-statistic: 892.4 Prob (F-statistic): 0.000 │ │ AIC: 4523.6 BIC: 4548.2 │ └─────────────────────────────────────────────────────────────────────┘
1.2 Key Components Explained
| Component | What It Tells You | Interpretation |
|---|---|---|
| Coefficient | Change in dependent variable per 1-unit change in predictor | Direction and magnitude of relationship |
| Standard Error | Sampling variability of coefficient | Smaller = more precise estimate |
| t-statistic | Coefficient / Standard Error | Tests if coefficient ≠ 0 |
| p-value | Probability of observing this result by chance | < 0.05 = statistically significant |
| R-squared | Proportion of variance explained | 0-1, higher = better fit |
| Adjusted R² | R² penalized for number of predictors | Better for model comparison |
Part 2: Creating Regression Tables
2.1 Simple Linear Regression
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from sklearn.linear_model import LinearRegression
import scipy.stats as stats
# Sample data
np.random.seed(42)
n = 200
df = pd.DataFrame({
'advertising': np.random.uniform(0, 100, n),
'price': np.random.uniform(20, 80, n),
'store_size': np.random.uniform(500, 5000, n),
'location': np.random.choice([0, 1], n, p=[0.7, 0.3])
})
# Create dependent variable with known relationships
df['sales'] = (50 +
2.5 * df['advertising'] -
1.2 * df['price'] +
0.05 * df['store_size'] +
10 * df['location'] +
np.random.normal(0, 15, n))
# Method 1: Using statsmodels (comprehensive output)
def create_regression_table_statsmodels(df, formula):
"""Create regression table using statsmodels"""
model = smf.ols(formula, data=df).fit()
# Get summary
print(model.summary())
# Extract components
results = {
'coefficients': model.params,
'std_errors': model.bse,
't_values': model.tvalues,
'p_values': model.pvalues,
'conf_int': model.conf_int(),
'r_squared': model.rsquared,
'adj_r_squared': model.rsquared_adj,
'f_statistic': model.fvalue,
'f_pvalue': model.f_pvalue,
'aic': model.aic,
'bic': model.bic,
'nobs': model.nobs
}
return model, results
# Run regression
formula = 'sales ~ advertising + price + store_size + location'
model, results = create_regression_table_statsmodels(df, formula)
# Create formatted table
def format_regression_table(model):
"""Create a nicely formatted regression table"""
# Get coefficients and statistics
coef = model.params
se = model.bse
t = model.tvalues
p = model.pvalues
ci_low, ci_high = model.conf_int().T
# Create DataFrame
table = pd.DataFrame({
'Coefficient': coef,
'Std. Error': se,
't-statistic': t,
'p-value': p,
'CI (2.5%)': ci_low,
'CI (97.5%)': ci_high
})
# Add significance stars
def significance_stars(p):
if p < 0.001:
return '***'
elif p < 0.01:
return '**'
elif p < 0.05:
return '*'
else:
return ''
table['Sig'] = table['p-value'].apply(significance_stars)
table['Coefficient'] = table['Coefficient'].map('{:.3f}'.format)
table['Coefficient'] = table['Coefficient'] + table['Sig']
table = table.drop('Sig', axis=1)
# Format numbers
for col in ['Std. Error', 't-statistic', 'CI (2.5%)', 'CI (97.5%)']:
table[col] = table[col].map('{:.3f}'.format)
table['p-value'] = table['p-value'].map('{:.4f}'.format)
return table
reg_table = format_regression_table(model)
print("\n=== REGRESSION TABLE ===\n")
print(reg_table)
print(f"\nModel Fit Statistics:")
print(f"R-squared: {model.rsquared:.4f}")
print(f"Adjusted R-squared: {model.rsquared_adj:.4f}")
print(f"F-statistic: {model.fvalue:.2f} (p={model.f_pvalue:.4f})")
print(f"AIC: {model.aic:.1f}")
print(f"BIC: {model.bic:.1f}")
print(f"Observations: {model.nobs}")
2.2 Multiple Regression with Comparison Tables
def create_model_comparison_table(df, models_dict):
"""
Create a comparison table for multiple regression models
models_dict: dict with model names as keys and formulas as values
"""
results = []
for model_name, formula in models_dict.items():
model = smf.ols(formula, data=df).fit()
# Extract key statistics
result = {
'Model': model_name,
'R²': model.rsquared,
'Adj. R²': model.rsquared_adj,
'AIC': model.aic,
'BIC': model.bic,
'F-stat': model.fvalue,
'F p-value': model.f_pvalue,
'N': model.nobs
}
# Add coefficients
for var in model.params.index:
result[f'coef_{var}'] = model.params[var]
result[f'p_{var}'] = model.pvalues[var]
results.append(result)
comparison_df = pd.DataFrame(results)
# Format for display
for col in ['R²', 'Adj. R²']:
comparison_df[col] = comparison_df[col].map('{:.4f}'.format)
return comparison_df
# Create multiple models
models = {
'Model 1 (Simple)': 'sales ~ advertising',
'Model 2 (Add price)': 'sales ~ advertising + price',
'Model 3 (Add store)': 'sales ~ advertising + price + store_size',
'Model 4 (Full)': 'sales ~ advertising + price + store_size + location'
}
comparison = create_model_comparison_table(df, models)
print("\n=== MODEL COMPARISON ===\n")
print(comparison[['Model', 'R²', 'Adj. R²', 'AIC', 'BIC', 'N']])
Part 3: Interpreting Regression Tables
3.1 Coefficient Interpretation
def interpret_coefficients(model, feature_names=None):
"""
Provide plain English interpretation of coefficients
"""
interpretations = []
for var in model.params.index:
if var == 'Intercept':
interp = f"When all predictors are zero, the predicted value is {model.params[var]:.2f}"
interpretations.append(interp)
else:
coef = model.params[var]
p_val = model.pvalues[var]
ci_low, ci_high = model.conf_int().loc[var]
# Direction
direction = "increases" if coef > 0 else "decreases"
# Significance
if p_val < 0.001:
sig = "highly significant"
elif p_val < 0.01:
sig = "significant"
elif p_val < 0.05:
sig = "marginally significant"
else:
sig = "not statistically significant"
interp = (f"For each 1-unit increase in {var}, sales {direction} "
f"by {abs(coef):.2f} units (95% CI: [{ci_low:.2f}, {ci_high:.2f}]). "
f"This effect is {sig} (p={p_val:.4f}).")
interpretations.append(interp)
return interpretations
# Print interpretations
print("\n=== COEFFICIENT INTERPRETATIONS ===\n")
for interp in interpret_coefficients(model):
print(interp)
3.2 Standardized Coefficients (Beta Weights)
def get_standardized_coefficients(model, X):
"""
Calculate standardized coefficients (beta weights)
Allows comparison of variable importance across different scales
"""
from sklearn.preprocessing import StandardScaler
# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Fit model on standardized data
model_scaled = sm.OLS(model.model.endog, X_scaled).fit()
# Create comparison
coef_orig = model.params[1:] # Exclude intercept
coef_std = model_scaled.params
comparison = pd.DataFrame({
'Variable': coef_orig.index,
'Original Coefficient': coef_orig.values,
'Standardized Coefficient': coef_std,
'|Beta|': np.abs(coef_std)
}).sort_values('|Beta|', ascending=False)
print("\n=== STANDARDIZED COEFFICIENTS (BETA WEIGHTS) ===\n")
print("Higher |Beta| = stronger influence on outcome\n")
print(comparison)
return comparison
3.3 Marginal Effects
def calculate_marginal_effects(model, df, variables):
"""
Calculate marginal effects for interpretable units
"""
from statsmodels.tools import add_constant
marginal_effects = []
for var in variables:
# Calculate at means
means = df.mean()
means[var] = means[var] + 1 # Increase by 1 unit
X_pred = add_constant(pd.DataFrame([means]))
X_pred.columns = ['const'] + list(df.columns)
# Prediction at +1
pred_plus = model.predict(X_pred)[0]
# Prediction at mean
means[var] = means[var] - 1
X_pred_mean = add_constant(pd.DataFrame([means]))
X_pred_mean.columns = ['const'] + list(df.columns)
pred_mean = model.predict(X_pred_mean)[0]
marginal_effect = pred_plus - pred_mean
marginal_effects.append({
'Variable': var,
'Marginal Effect': marginal_effect,
'Interpretation': f"A 1-unit increase in {var} is associated with "
f"a {marginal_effect:.2f} unit change in sales"
})
return pd.DataFrame(marginal_effects)
# Calculate marginal effects
variables = ['advertising', 'price', 'store_size', 'location']
marginal_df = calculate_marginal_effects(model, df[variables], variables)
print("\n=== MARGINAL EFFECTS ===\n")
print(marginal_df)
Part 4: Advanced Regression Table Features
4.1 Robust Standard Errors
def regression_with_robust_se(df, formula):
"""
Calculate regression with robust (heteroscedasticity-consistent) standard errors
"""
model = smf.ols(formula, data=df).fit()
# Calculate robust standard errors
robust_se = model.get_robustcov_results(cov_type='HC3')
# Create comparison table
comparison = pd.DataFrame({
'Variable': model.params.index,
'Coefficient': model.params.values,
'Std. Error (Regular)': model.bse.values,
'Std. Error (Robust)': robust_se.bse,
't (Regular)': model.tvalues.values,
't (Robust)': robust_se.tvalues,
'p (Regular)': model.pvalues.values,
'p (Robust)': robust_se.pvalues
})
print("\n=== ROBUST STANDARD ERRORS COMPARISON ===\n")
print(comparison.to_string())
return model, robust_se
model_robust, robust_results = regression_with_robust_se(df, formula)
4.2 Logistic Regression Table
from sklearn.linear_model import LogisticRegression
import statsmodels.api as sm
# Create binary outcome
df['high_sales'] = (df['sales'] > df['sales'].median()).astype(int)
# Logistic regression
logit_model = smf.logit('high_sales ~ advertising + price + store_size + location',
data=df).fit()
def format_logistic_table(model):
"""Format logistic regression results with odds ratios"""
# Get coefficients and odds ratios
coef = model.params
se = model.bse
p = model.pvalues
odds_ratio = np.exp(coef)
ci_low = np.exp(model.conf_int()[0])
ci_high = np.exp(model.conf_int()[1])
# Create table
table = pd.DataFrame({
'Coefficient': coef,
'Std. Error': se,
'p-value': p,
'Odds Ratio': odds_ratio,
'OR 95% CI Low': ci_low,
'OR 95% CI High': ci_high
})
# Add significance stars
table['Sig'] = table['p-value'].apply(lambda x: '***' if x < 0.001 else
'**' if x < 0.01 else
'*' if x < 0.05 else '')
print("\n=== LOGISTIC REGRESSION TABLE ===\n")
print(table.round(4))
# Interpret odds ratios
print("\n=== ODDS RATIO INTERPRETATION ===\n")
for var in table.index:
if var != 'Intercept':
or_val = odds_ratio[var]
if or_val > 1:
interp = f"{var}: {or_val:.2f}x higher odds of high sales per 1-unit increase"
else:
interp = f"{var}: {or_val:.2f}x lower odds of high sales per 1-unit increase"
print(interp)
return table
logit_table = format_logistic_table(logit_model)
4.3 Mixed Effects Models
# Create group structure
df['region'] = np.random.choice(['North', 'South', 'East', 'West'], size=len(df))
# Mixed effects model (random intercept by region)
import statsmodels.formula.api as smf
mixed_model = smf.mixedlm('sales ~ advertising + price + store_size + location',
df,
groups=df['region']).fit()
def format_mixed_model_table(model):
"""Format mixed effects model results"""
print("\n=== MIXED EFFECTS MODEL RESULTS ===\n")
print(model.summary())
# Extract random effects
print("\n=== RANDOM EFFECTS VARIANCE ===\n")
re_variance = model.cov_re
print(f"Random intercept variance: {re_variance.iloc[0,0]:.4f}")
print(f"Residual variance: {model.scale:.4f}")
# Intraclass correlation (ICC)
icc = re_variance.iloc[0,0] / (re_variance.iloc[0,0] + model.scale)
print(f"\nIntraclass Correlation (ICC): {icc:.3f}")
print(f"{icc*100:.1f}% of variance explained by region differences")
return model
mixed_results = format_mixed_model_table(mixed_model)
Part 5: Visualizing Regression Results
5.1 Coefficient Plot
import matplotlib.pyplot as plt
import seaborn as sns
def plot_coefficients(model, title="Regression Coefficients"):
"""
Create coefficient plot with confidence intervals
"""
# Extract coefficients and CIs
coef = model.params[1:] # Exclude intercept
ci_low, ci_high = model.conf_int().iloc[1:].T
# Create DataFrame for plotting
coef_df = pd.DataFrame({
'Variable': coef.index,
'Coefficient': coef.values,
'CI_Lower': ci_low.values,
'CI_Upper': ci_high.values
})
# Sort by coefficient magnitude
coef_df = coef_df.sort_values('Coefficient')
# Plot
fig, ax = plt.subplots(figsize=(10, 6))
y_pos = np.arange(len(coef_df))
ax.errorbar(coef_df['Coefficient'], y_pos,
xerr=[coef_df['Coefficient'] - coef_df['CI_Lower'],
coef_df['CI_Upper'] - coef_df['Coefficient']],
fmt='o', capsize=5, capthick=2, markersize=8,
color='steelblue', ecolor='gray')
ax.axvline(x=0, color='red', linestyle='--', alpha=0.5)
ax.set_yticks(y_pos)
ax.set_yticklabels(coef_df['Variable'])
ax.set_xlabel('Coefficient Estimate')
ax.set_title(title)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
plot_coefficients(model)
5.2 Predicted Values vs. Actual
def plot_model_fit(model, df, dependent_var):
"""
Plot predicted vs actual values
"""
# Get predictions
df_pred = df.copy()
df_pred['predicted'] = model.predict()
df_pred['residuals'] = df_pred[dependent_var] - df_pred['predicted']
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Actual vs Predicted
axes[0].scatter(df_pred['predicted'], df_pred[dependent_var], alpha=0.5)
axes[0].plot([df_pred[dependent_var].min(), df_pred[dependent_var].max()],
[df_pred[dependent_var].min(), df_pred[dependent_var].max()],
'r--', alpha=0.5)
axes[0].set_xlabel('Predicted Values')
axes[0].set_ylabel('Actual Values')
axes[0].set_title(f'Actual vs Predicted\nR² = {model.rsquared:.3f}')
# Residuals vs Predicted
axes[1].scatter(df_pred['predicted'], df_pred['residuals'], alpha=0.5)
axes[1].axhline(y=0, color='r', linestyle='--')
axes[1].set_xlabel('Predicted Values')
axes[1].set_ylabel('Residuals')
axes[1].set_title('Residual Plot')
# Q-Q plot for residuals
from scipy import stats
stats.probplot(df_pred['residuals'], dist="norm", plot=axes[2])
axes[2].set_title('Q-Q Plot (Normality Check)')
plt.tight_layout()
plt.show()
return df_pred
df_pred = plot_model_fit(model, df, 'sales')
Part 6: Exporting Regression Tables
6.1 To CSV and Excel
def export_regression_tables(model, filename='regression_results'):
"""
Export regression results to CSV and Excel
"""
# Create coefficient table
coef_table = pd.DataFrame({
'Variable': model.params.index,
'Coefficient': model.params.values,
'Std_Error': model.bse.values,
't_stat': model.tvalues.values,
'p_value': model.pvalues.values,
'CI_2.5': model.conf_int()[0].values,
'CI_97.5': model.conf_int()[1].values
})
# Create fit statistics table
fit_stats = pd.DataFrame({
'Statistic': ['R-squared', 'Adjusted R-squared', 'F-statistic',
'F p-value', 'AIC', 'BIC', 'Observations'],
'Value': [model.rsquared, model.rsquared_adj, model.fvalue,
model.f_pvalue, model.aic, model.bic, model.nobs]
})
# Export to Excel with formatting
with pd.ExcelWriter(f'{filename}.xlsx', engine='xlsxwriter') as writer:
coef_table.to_excel(writer, sheet_name='Coefficients', index=False)
fit_stats.to_excel(writer, sheet_name='Fit_Statistics', index=False)
# Auto-adjust column widths
workbook = writer.book
for sheet_name in writer.sheets:
worksheet = writer.sheets[sheet_name]
for i, col in enumerate(coef_table.columns if sheet_name == 'Coefficients'
else fit_stats.columns):
max_len = max(coef_table[col].astype(str).map(len).max() if sheet_name == 'Coefficients'
else fit_stats[col].astype(str).map(len).max(), len(col)) + 2
worksheet.set_column(i, i, max_len)
print(f"Tables exported to {filename}.xlsx")
# Also export to CSV
coef_table.to_csv(f'{filename}_coefficients.csv', index=False)
fit_stats.to_csv(f'{filename}_fit_stats.csv', index=False)
return coef_table, fit_stats
# Export results
coef_table, fit_stats = export_regression_tables(model)
6.2 To LaTeX for Academic Papers
def to_latex_table(model, caption="Regression Results", label="tab:regression"):
"""
Generate LaTeX code for regression table
"""
coef = model.params
se = model.bse
p = model.pvalues
# Significance stars
stars = []
for p_val in p:
if p_val < 0.001:
stars.append('$^{***}$')
elif p_val < 0.01:
stars.append('$^{**}$')
elif p_val < 0.05:
stars.append('$^{*}$')
else:
stars.append('')
# Build LaTeX table
latex = []
latex.append('\\begin{table}[htbp]')
latex.append('\\centering')
latex.append('\\caption{' + caption + '}')
latex.append('\\label{' + label + '}')
latex.append('\\begin{tabular}{lccc}')
latex.append('\\hline')
latex.append('Variable & Coefficient & Std. Error & p-value \\\\')
latex.append('\\hline')
for var, c, s, p_val, star in zip(coef.index, coef, se, p, stars):
latex.append(f'{var} & {c:.3f}{star} & {s:.3f} & {p_val:.4f} \\\\')
latex.append('\\hline')
latex.append(f'R-squared & \\multicolumn{{3}}{{c}}{{{model.rsquared:.4f}}} \\\\')
latex.append(f'Adj. R-squared & \\multicolumn{{3}}{{c}}{{{model.rsquared_adj:.4f}}} \\\\')
latex.append(f'Observations & \\multicolumn{{3}}{{c}}{{{model.nobs}}} \\\\')
latex.append('\\hline')
latex.append('\\end{tabular}')
latex.append('\\end{table}')
latex_code = '\n'.join(latex)
# Save to file
with open('regression_table.tex', 'w') as f:
f.write(latex_code)
print("LaTeX table saved to regression_table.tex")
return latex_code
# Generate LaTeX table
latex_code = to_latex_table(model)
print(latex_code)
Part 7: Diagnostic Tests from Regression Tables
7.1 Assumption Checks
def regression_diagnostics(model):
"""
Perform diagnostic tests for regression assumptions
"""
from scipy.stats import shapiro, jarque_bera
from statsmodels.stats.stattools import durbin_watson
from statsmodels.stats.diagnostic import het_breuschpagan
residuals = model.resid
fitted = model.fittedvalues
diagnostics = {}
# 1. Normality of residuals
_, shapiro_p = shapiro(residuals)
_, jb_stat, jb_p = jarque_bera(residuals)
diagnostics['Normality'] = {
'Shapiro-Wilk p': shapiro_p,
'Jarque-Bera p': jb_p,
'Conclusion': 'Normal' if shapiro_p > 0.05 else 'Non-normal'
}
# 2. Autocorrelation (Durbin-Watson)
dw = durbin_watson(residuals)
diagnostics['Autocorrelation'] = {
'Durbin-Watson': dw,
'Conclusion': 'No autocorrelation' if 1.5 < dw < 2.5 else 'Autocorrelation present'
}
# 3. Heteroscedasticity (Breusch-Pagan)
bp_test = het_breuschpagan(residuals, model.model.exog)
bp_lm, bp_p, bp_f, bp_f_p = bp_test
diagnostics['Heteroscedasticity'] = {
'Breusch-Pagan p': bp_p,
'Conclusion': 'Homoscedastic' if bp_p > 0.05 else 'Heteroscedastic'
}
# 4. Multicollinearity (VIF)
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif_data = []
for i in range(1, model.model.exog.shape[1]): # Exclude intercept
vif = variance_inflation_factor(model.model.exog, i)
vif_data.append({
'Variable': model.model.exog_names[i],
'VIF': vif,
'Conclusion': 'OK' if vif < 10 else 'High multicollinearity'
})
diagnostics['Multicollinearity'] = pd.DataFrame(vif_data)
# Print results
print("=== REGRESSION DIAGNOSTICS ===\n")
print("1. Normality of Residuals:")
print(f" Shapiro-Wilk p-value: {diagnostics['Normality']['Shapiro-Wilk p']:.4f}")
print(f" Jarque-Bera p-value: {diagnostics['Normality']['Jarque-Bera p']:.4f}")
print(f" → {diagnostics['Normality']['Conclusion']}\n")
print("2. Autocorrelation:")
print(f" Durbin-Watson: {diagnostics['Autocorrelation']['Durbin-Watson']:.3f}")
print(f" → {diagnostics['Autocorrelation']['Conclusion']}\n")
print("3. Heteroscedasticity:")
print(f" Breusch-Pagan p-value: {diagnostics['Heteroscedasticity']['Breusch-Pagan p']:.4f}")
print(f" → {diagnostics['Heteroscedasticity']['Conclusion']}\n")
print("4. Multicollinearity (VIF):")
print(diagnostics['Multicollinearity'].to_string())
return diagnostics
# Run diagnostics
diagnostics = regression_diagnostics(model)
Part 8: Common Pitfalls and Best Practices
8.1 Pitfalls to Avoid
def common_regression_pitfalls():
"""
Examples of common mistakes when interpreting regression tables
"""
pitfalls = {
"1. Confusing Correlation with Causation":
"Coefficient shows association, not causation",
"2. Ignoring Multicollinearity":
"High correlation between predictors inflates standard errors",
"3. Overinterpreting p-values":
"p > 0.05 doesn't mean 'no effect' - it means insufficient evidence",
"4. Extrapolating Beyond Data Range":
"Predictions outside the range of observed data are unreliable",
"5. Ignoring Model Assumptions":
"Violations of assumptions invalidate significance tests",
"6. Cherry-Picking Results":
"Running many models and reporting only significant ones",
"7. Confusing Statistical with Practical Significance":
"Very small effects can be significant with large n",
"8. Omitting Important Variables":
"Omitted variable bias distorts coefficients"
}
print("=== COMMON REGRESSION PITFALLS ===\n")
for pitfall, explanation in pitfalls.items():
print(f"{pitfall}:")
print(f" {explanation}\n")
8.2 Best Practices Checklist
def regression_best_practices():
"""
Checklist for regression analysis best practices
"""
checklist = {
"Before Running Regression": [
"✓ Understand the business/research question",
"✓ Check data quality and missing values",
"✓ Explore relationships visually (scatter plots)",
"✓ Handle outliers appropriately",
"✓ Check for multicollinearity"
],
"During Model Building": [
"✓ Use domain knowledge for variable selection",
"✓ Consider interactions and non-linearities",
"✓ Split data for validation",
"✓ Use appropriate standard errors (robust if needed)",
"✓ Test multiple specifications"
],
"Interpreting Results": [
"✓ Report both coefficients and standard errors",
"✓ Include confidence intervals",
"✓ Interpret magnitude (practical significance)",
"✓ Report model fit statistics",
"✓ Acknowledge limitations"
],
"Reporting": [
"✓ Present full regression table",
"✓ Include diagnostic test results",
"✓ Explain assumptions and violations",
"✓ Provide code for reproducibility",
"✓ Discuss causal vs. correlational interpretation"
]
}
print("=== REGRESSION BEST PRACTICES CHECKLIST ===\n")
for category, items in checklist.items():
print(f"{category}:")
for item in items:
print(f" {item}")
print()
Summary: Quick Reference Card
# Quick Reference for Regression Table Components
quick_reference = {
"Coefficient": {
"What": "Change in Y per 1-unit change in X",
"Check": "Sign (+ or -) indicates direction",
"Magnitude": "Practical importance"
},
"Standard Error": {
"What": "Uncertainty in coefficient estimate",
"Check": "Small SE = precise estimate",
"Rule": "Coefficient / SE = t-statistic"
},
"p-value": {
"What": "Probability of result by chance",
"Thresholds": {
"p < 0.001": "*** Highly significant",
"p < 0.01": "** Significant",
"p < 0.05": "* Marginally significant",
"p >= 0.05": "Not statistically significant"
}
},
"R-squared": {
"What": "Proportion of variance explained",
"Range": "0 to 1 (higher = better fit)",
"Note": "Adjusted R² penalizes for extra variables"
},
"Confidence Interval": {
"What": "Range of plausible coefficient values",
"Interpret": "95% CI contains true coefficient 95% of the time",
"Check": "Does it include zero?"
}
}
# Print quick reference
for component, info in quick_reference.items():
print(f"\n{component}:")
for key, value in info.items():
print(f" {key}: {value}")
Key Takeaway: Regression tables are the cornerstone of statistical inference in data science. They provide a complete picture of the relationships between variables, including the strength, direction, precision, and significance of each effect. Mastering regression table interpretation enables you to:
- Quantify relationships with confidence
- Identify important predictors with statistical rigor
- Communicate findings effectively to stakeholders
- Validate assumptions and diagnose problems
- Compare models objectively
- Make data-driven decisions with quantified uncertainty
Remember: A regression table tells a story about your data—learn to read it critically, interpret it carefully, and present it clearly.