在本文,我将使用Grunfeld数据集(可在statsmodels.datasets中获得)来演示固定效应模型的使用。
该数据包含11家公司中每家20年的数据:IBM,通用电气,美国钢铁,大西洋炼油,钻石比赛,西屋电气,通用汽车,固特异,克莱斯勒,联合石油和美国钢铁。
模型如下:
其中单个公司因子为或称为entity_effects
。时间因子是或称为time_effects
。
如下所示,其中是公司i的虚拟变量,而是t年的虚拟变量。
一、导入相关库
from statsmodels.datasets import grunfeld
from linearmodels.panel import PanelOLS
import pandas as pd
import statsmodels.formula.api as smf
二、获取面板数据
data = grunfeld.load_pandas().data
#设置索引
data = data.set_index(["firm","year"],drop=False)
三、个体固定效应
模型如下:
其中单个公司因子为或称为entity_effects
。
如下所示,其中是公司i的虚拟变量。
(一)PanelOLS
#个体固定效应:基于数组
exog = data[['value','capital']]
res_fe = PanelOLS(data['invest'], exog, entity_effects=True)
results_fe = res_fe.fit()
print(results_fe)
#个体固定效应:基于公式
res_fe = PanelOLS.from_formula('invest ~ value + capital + EntityEffects', data=data)
results_fe = res_fe.fit()
print(results_fe)
基于数组和基于公式的返回结果一致,如下所示:
PanelOLS Estimation Summary
================================================================================
Dep. Variable: invest R-squared: 0.7667
Estimator: PanelOLS R-squared (Between): 0.8223
No. Observations: 220 R-squared (Within): 0.7667
Date: Wed, Jul 20 2022 R-squared (Overall): 0.8132
Time: 15:55:39 Log-likelihood -1167.4
Cov. Estimator: Unadjusted
F-statistic: 340.08
Entities: 11 P-value 0.0000
Avg Obs: 20.000 Distribution: F(2,207)
Min Obs: 20.000
Max Obs: 20.000 F-statistic (robust): 340.08
P-value 0.0000
Time periods: 20 Distribution: F(2,207)
Avg Obs: 11.000
Min Obs: 11.000
Max Obs: 11.000
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
capital 0.3100 0.0165 18.744 0.0000 0.2774 0.3426
value 0.1101 0.0113 9.7461 0.0000 0.0879 0.1324
==============================================================================
F-test for Poolability: 49.207
P-value: 0.0000
Distribution: F(10,207)
Included effects: Entity
(二)smf.ols
#采用ols估计,加入个体的虚拟变量
res_ols = smf.ols('invest ~ value + capital +firm', data=data)
#res_ols = smf.ols('invest ~ value + capital + C(firm)', data=data)
results_ols = res_ols.fit()
print(results_ols.summary())
结果如下:
OLS Regression Results
==============================================================================
Dep. Variable: invest R-squared: 0.946
Model: OLS Adj. R-squared: 0.943
Method: Least Squares F-statistic: 302.6
Date: Wed, 20 Jul 2022 Prob (F-statistic): 4.77e-124
Time: 17:33:36 Log-Likelihood: -1167.4
No. Observations: 220 AIC: 2361.
Df Residuals: 207 BIC: 2405.
Df Model: 12
Covariance Type: nonrobust
=============================================================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------------------
Intercept -20.5782 11.298 -1.821 0.070 -42.852 1.695
firm[T.Atlantic Refining] -94.0243 17.164 -5.478 0.000 -127.862 -60.186
firm[T.Chrysler] -7.2309 17.338 -0.417 0.677 -41.413 26.951
firm[T.Diamond Match] 14.0102 15.944 0.879 0.381 -17.422 45.443
firm[T.General Electric] -214.9912 25.461 -8.444 0.000 -265.188 -164.795
firm[T.General Motors] -49.7209 48.280 -1.030 0.304 -144.905 45.463
firm[T.Goodyear] -66.6363 16.379 -4.068 0.000 -98.927 -34.346
firm[T.IBM] -2.5820 16.379 -0.158 0.875 -34.873 29.709
firm[T.US Steel] 122.4829 25.960 4.718 0.000 71.304 173.662
firm[T.Union Oil] -45.9660 16.357 -2.810 0.005 -78.215 -13.717
firm[T.Westinghouse] -36.9683 17.309 -2.136 0.034 -71.093 -2.843
value 0.1101 0.011 9.746 0.000 0.088 0.132
capital 0.3100 0.017 18.744 0.000 0.277 0.343
==============================================================================
Omnibus: 35.893 Durbin-Watson: 1.079
Prob(Omnibus): 0.000 Jarque-Bera (JB): 243.455
Skew: 0.297 Prob(JB): 1.36e-53
Kurtosis: 8.119 Cond. No. 2.98e+04
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.98e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
也可采用去时间均值方法获取。
data = grunfeld.load_pandas().data
#设置索引
data = data.set_index(["firm","year"]) #此处drop=True
#求被解释变量、解释变量的去除时间均值
data['invest_w'] = data['invest'] - data.groupby('firm').mean()['invest']
data['value_w'] = data['value'] - data.groupby('firm').mean()['value']
data['capital_w'] = data['capital'] - data.groupby('firm').mean()['capital']
#用OLS方程对去除时间均值进行估计
results_man = smf.ols('invest_w ~ 0 + value_w +capital_w', data).fit()
print(results_man.summary())
结果如下:
OLS Regression Results
=======================================================================================
Dep. Variable: invest_w R-squared (uncentered): 0.767
Model: OLS Adj. R-squared (uncentered): 0.765
Method: Least Squares F-statistic: 358.2
Date: Wed, 20 Jul 2022 Prob (F-statistic): 1.28e-69
Time: 17:58:17 Log-Likelihood: -1167.4
No. Observations: 220 AIC: 2339.
Df Residuals: 218 BIC: 2346.
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
value_w 0.1101 0.011 10.002 0.000 0.088 0.132
capital_w 0.3100 0.016 19.236 0.000 0.278 0.342
==============================================================================
Omnibus: 35.893 Durbin-Watson: 1.079
Prob(Omnibus): 0.000 Jarque-Bera (JB): 243.455
Skew: 0.297 Prob(JB): 1.36e-53
Kurtosis: 8.119 Cond. No. 1.74
==============================================================================
Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
四、时间固定效应
模型如下:
其中,时间因子是或称为time_effects
。
如下所示,其中是t年的虚拟变量。
(一)PanelOLS
#时间固定效应:基于数组
exog = data[['value','capital']]
res_fe = PanelOLS(data['invest'], exog, time_effects=True)
results_fe = res_fe.fit()
print(results_fe)
#时间固定效应:基于公式
res_fe = PanelOLS.from_formula('invest ~ value + capital + TimeEffects', data=data)
results_fe = res_fe.fit()
print(results_fe)
基于数组和基于公式的返回结果一致,如下所示:
PanelOLS Estimation Summary
================================================================================
Dep. Variable: invest R-squared: 0.8109
Estimator: PanelOLS R-squared (Between): 0.8720
No. Observations: 220 R-squared (Within): 0.7273
Date: Wed, Jul 20 2022 R-squared (Overall): 0.8481
Time: 17:40:21 Log-likelihood -1298.8
Cov. Estimator: Unadjusted
F-statistic: 424.46
Entities: 11 P-value 0.0000
Avg Obs: 20.000 Distribution: F(2,198)
Min Obs: 20.000
Max Obs: 20.000 F-statistic (robust): 424.46
P-value 0.0000
Time periods: 20 Distribution: F(2,198)
Avg Obs: 11.000
Min Obs: 11.000
Max Obs: 11.000
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
capital 0.2166 0.0299 7.2436 0.0000 0.1577 0.2756
value 0.1158 0.0060 19.434 0.0000 0.1040 0.1275
==============================================================================
F-test for Poolability: 0.2419
P-value: 0.9996
Distribution: F(19,198)
Included effects: Time
(二)smf.ols
#采用ols估计,加入个体的虚拟变量
res_ols = smf.ols('invest ~ value + capital + C(year)', data=data)
results_ols = res_ols.fit()
print(results_ols.summary())
结果如下:
OLS Regression Results
==============================================================================
Dep. Variable: invest R-squared: 0.822
Model: OLS Adj. R-squared: 0.803
Method: Least Squares F-statistic: 43.55
Date: Wed, 20 Jul 2022 Prob (F-statistic): 1.27e-62
Time: 17:41:37 Log-Likelihood: -1298.8
No. Observations: 220 AIC: 2642.
Df Residuals: 198 BIC: 2716.
Df Model: 21
Covariance Type: nonrobust
=====================================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------------
Intercept -21.6815 28.354 -0.765 0.445 -77.597 34.234
C(year)[T.1936.0] -15.1865 39.884 -0.381 0.704 -93.839 63.466
C(year)[T.1937.0] -30.8415 39.958 -0.772 0.441 -109.640 47.957
C(year)[T.1938.0] -25.9640 39.882 -0.651 0.516 -104.611 52.683
C(year)[T.1939.0] -51.2476 39.902 -1.284 0.201 -129.936 27.441
C(year)[T.1940.0] -27.5208 39.911 -0.690 0.491 -106.226 51.184
C(year)[T.1941.0] -2.0012 39.928 -0.050 0.960 -80.739 76.737
C(year)[T.1942.0] -0.3563 39.990 -0.009 0.993 -79.216 78.504
C(year)[T.1943.0] -18.7958 39.997 -0.470 0.639 -97.671 60.079
C(year)[T.1944.0] -19.4973 39.991 -0.488 0.626 -98.360 59.366
C(year)[T.1945.0] -29.7423 40.002 -0.744 0.458 -108.627 49.142
C(year)[T.1946.0] -6.1207 40.033 -0.153 0.879 -85.066 72.825
C(year)[T.1947.0] -4.3649 40.312 -0.108 0.914 -83.860 75.130
C(year)[T.1948.0] -2.8025 40.508 -0.069 0.945 -82.686 77.081
C(year)[T.1949.0] -25.2951 40.683 -0.622 0.535 -105.522 54.932
C(year)[T.1950.0] -24.9390 40.767 -0.612 0.541 -105.332 55.454
C(year)[T.1951.0] -9.4694 40.792 -0.232 0.817 -89.912 70.973
C(year)[T.1952.0] -3.8273 41.134 -0.093 0.926 -84.944 77.289
C(year)[T.1953.0] 4.0537 41.589 0.097 0.922 -77.961 86.068
C(year)[T.1954.0] -9.3916 42.268 -0.222 0.824 -92.744 73.961
value 0.1158 0.006 19.434 0.000 0.104 0.128
capital 0.2166 0.030 7.244 0.000 0.158 0.276
==============================================================================
Omnibus: 33.290 Durbin-Watson: 0.341
Prob(Omnibus): 0.000 Jarque-Bera (JB): 134.793
Skew: 0.482 Prob(JB): 5.37e-30
Kurtosis: 6.711 Cond. No. 3.42e+04
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.42e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
五、个体固定效应+时间固定效应
模型如下:
其中单个公司因子为或称为entity_effects
。时间因子是或称为time_effects
。
如下所示,其中是公司i的虚拟变量,而是t年的虚拟变量。
(一)PanelOLS
#个体固定效应+时间固定效应:基于数组
exog = data[['value','capital']]
res_fe = PanelOLS(data['invest'], exog, entity_effects=True,time_effects=True)
results_fe = res_fe.fit()
print(results_fe)
#个体固定效应+时间固定效应:基于公式
res_fe = PanelOLS.from_formula('invest ~ value + capital + EntityEffects + TimeEffects', data=data)
results_fe = res_fe.fit()
print(results_fe)
基于数组和基于公式的返回结果一致,如下所示:
PanelOLS Estimation Summary
================================================================================
Dep. Variable: invest R-squared: 0.7253
Estimator: PanelOLS R-squared (Between): 0.7637
No. Observations: 220 R-squared (Within): 0.7566
Date: Wed, Jul 20 2022 R-squared (Overall): 0.7625
Time: 17:46:42 Log-likelihood -1153.0
Cov. Estimator: Unadjusted
F-statistic: 248.15
Entities: 11 P-value 0.0000
Avg Obs: 20.000 Distribution: F(2,188)
Min Obs: 20.000
Max Obs: 20.000 F-statistic (robust): 248.15
P-value 0.0000
Time periods: 20 Distribution: F(2,188)
Avg Obs: 11.000
Min Obs: 11.000
Max Obs: 11.000
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
capital 0.3514 0.0210 16.696 0.0000 0.3099 0.3930
value 0.1167 0.0129 9.0219 0.0000 0.0912 0.1422
==============================================================================
F-test for Poolability: 18.476
P-value: 0.0000
Distribution: F(29,188)
Included effects: Entity, Time
也可写成这样的代码:
#个体固定效应+时间固定效应:基于数组
exog = data[['value','capital','firm']]
res_fe = PanelOLS(data['invest'], exog, time_effects=True) #11家公司创建10个虚拟变量
results_fe = res_fe.fit()
print(results_fe)
#个体固定效应+时间固定效应:基于数组
year = pd.Categorical(data.year) #将数字形式的年份转化为类别形式
data['year'] = year
exog = data[['value','capital','year']]
res_fe = PanelOLS(data['invest'], exog, entity_effects=True) #20年创建19个虚拟变量
results_fe = res_fe.fit()
results_fe = res_fe.fit()
print(results_fe)
#个体固定效应+时间固定效应:基于公式( + 个体虚拟变量 + TimeEffects)
res_fe = PanelOLS.from_formula('invest ~ value + capital + firm + TimeEffects', data=data) #不足之处:11家公司创建11个虚拟变量
results_fe = res_fe.fit()
print(results_fe)
#个体固定效应+时间固定效应:基于公式( + EntityEffects + 时间虚拟变量)
res_fe = PanelOLS.from_formula('invest ~ value + capital + EntityEffects + C(year)', data=data) #不足之处:20年创建20个虚拟变量
results_fe = res_fe.fit()
print(results_fe)
(二)smf.ols
#采用ols估计,加入个体和时间的虚拟变量
res_ols = smf.ols('invest ~ value + capital + firm + C(year)', data=data)
results_ols = res_ols.fit()
print(results_ols.summary())
结果如下:
OLS Regression Results
==============================================================================
Dep. Variable: invest R-squared: 0.953
Model: OLS Adj. R-squared: 0.945
Method: Least Squares F-statistic: 122.1
Date: Wed, 20 Jul 2022 Prob (F-statistic): 5.20e-108
Time: 17:47:55 Log-Likelihood: -1153.0
No. Observations: 220 AIC: 2370.
Df Residuals: 188 BIC: 2479.
Df Model: 31
Covariance Type: nonrobust
=============================================================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------------------
Intercept 18.0876 18.656 0.970 0.334 -18.715 54.890
firm[T.Atlantic Refining] -112.5008 17.752 -6.337 0.000 -147.520 -77.482
firm[T.Chrysler] -13.5993 17.540 -0.775 0.439 -48.199 21.001
firm[T.Diamond Match] 16.4928 15.692 1.051 0.295 -14.462 47.448
firm[T.General Electric] -241.0850 28.000 -8.610 0.000 -296.319 -185.851
firm[T.General Motors] -101.7696 55.177 -1.844 0.067 -210.615 7.075
firm[T.Goodyear] -77.9628 16.435 -4.744 0.000 -110.383 -45.543
firm[T.IBM] -6.4573 16.271 -0.397 0.692 -38.554 25.640
firm[T.US Steel] 100.5492 28.438 3.536 0.001 44.450 156.648
firm[T.Union Oil] -56.7936 16.403 -3.462 0.001 -89.151 -24.436
firm[T.Westinghouse] -41.7165 17.483 -2.386 0.018 -76.204 -7.229
C(year)[T.1936.0] -16.9592 21.518 -0.788 0.432 -59.407 25.488
C(year)[T.1937.0] -36.3756 22.364 -1.627 0.106 -80.492 7.741
C(year)[T.1938.0] -35.6237 21.162 -1.683 0.094 -77.370 6.122
C(year)[T.1939.0] -63.0994 21.505 -2.934 0.004 -105.522 -20.677
C(year)[T.1940.0] -39.8248 21.626 -1.842 0.067 -82.486 2.836
C(year)[T.1941.0] -16.4878 21.529 -0.766 0.445 -58.957 25.982
C(year)[T.1942.0] -17.9993 21.275 -0.846 0.399 -59.967 23.968
C(year)[T.1943.0] -37.7724 21.415 -1.764 0.079 -80.016 4.471
C(year)[T.1944.0] -38.3201 21.459 -1.786 0.076 -80.652 4.012
C(year)[T.1945.0] -49.5395 21.687 -2.284 0.023 -92.322 -6.757
C(year)[T.1946.0] -27.7544 21.866 -1.269 0.206 -70.888 15.379
C(year)[T.1947.0] -34.8775 21.589 -1.616 0.108 -77.464 7.709
C(year)[T.1948.0] -38.3307 21.734 -1.764 0.079 -81.204 4.542
C(year)[T.1949.0] -65.2008 21.901 -2.977 0.003 -108.404 -21.998
C(year)[T.1950.0] -67.3877 22.028 -3.059 0.003 -110.841 -23.935
C(year)[T.1951.0] -54.8346 22.437 -2.444 0.015 -99.095 -10.574
C(year)[T.1952.0] -56.4890 22.819 -2.475 0.014 -101.504 -11.474
C(year)[T.1953.0] -58.5126 23.819 -2.457 0.015 -105.500 -11.525
C(year)[T.1954.0] -81.7939 24.204 -3.379 0.001 -129.540 -34.047
value 0.1167 0.013 9.022 0.000 0.091 0.142
capital 0.3514 0.021 16.696 0.000 0.310 0.393
==============================================================================
Omnibus: 32.466 Durbin-Watson: 0.988
Prob(Omnibus): 0.000 Jarque-Bera (JB): 180.276
Skew: 0.311 Prob(JB): 7.14e-40
Kurtosis: 7.391 Cond. No. 3.92e+04
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.92e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
精选文章
管理世界 | 使用 「经营讨论与分析」 测量 「企业数字化指标」
27G数据集 | 使用Python对27G招股说明书进行文本分析
安装python包出现报错:Microsoft Visual 14.0 or greater is required. 怎么办?
管理世界 | 使用 「经营讨论与分析」 测量 「企业数字化指标」
27G数据集 | 使用Python对27G招股说明书进行文本分析
安装python包出现报错:Microsoft Visual 14.0 or greater is required. 怎么办?