首页 文章详情

Python计量 | 面板数据回归(二)

大邓和他的Python | 133 2023-02-10 22:57 0 0 0
UniSMS (合一短信)

在本文,我将使用Grunfeld数据集(可在statsmodels.datasets中获得)来演示固定效应模型的使用。

该数据包含11家公司中每家20年的数据:IBM,通用电气,美国钢铁,大西洋炼油,钻石比赛,西屋电气,通用汽车,固特异,克莱斯勒,联合石油和美国钢铁。

模型如下:

其中单个公司因子为或称为entity_effects。时间因子是或称为time_effects

如下所示,其中是公司i的虚拟变量,而是t年的虚拟变量。

一、导入相关库

from statsmodels.datasets import grunfeld
from linearmodels.panel import PanelOLS
import pandas as pd
import statsmodels.formula.api as smf

二、获取面板数据

data = grunfeld.load_pandas().data
#设置索引
data = data.set_index(["firm","year"],drop=False)

三、个体固定效应

模型如下:

其中单个公司因子为或称为entity_effects

如下所示,其中是公司i的虚拟变量。

(一)PanelOLS

#个体固定效应:基于数组
exog = data[['value','capital']]
res_fe = PanelOLS(data['invest'], exog, entity_effects=True)
results_fe = res_fe.fit()
print(results_fe)

#个体固定效应:基于公式
res_fe = PanelOLS.from_formula('invest ~ value + capital + EntityEffects', data=data)
results_fe = res_fe.fit()
print(results_fe)

基于数组和基于公式的返回结果一致,如下所示:

                         PanelOLS Estimation Summary                           
================================================================================
Dep. Variable:                 invest   R-squared:                        0.7667
Estimator:                   PanelOLS   R-squared (Between):              0.8223
No. Observations:                 220   R-squared (Within):               0.7667
Date:                Wed, Jul 20 2022   R-squared (Overall):              0.8132
Time:                        15:55:39   Log-likelihood                   -1167.4
Cov. Estimator:            Unadjusted                                           
                                        F-statistic:                      340.08
Entities:                          11   P-value                           0.0000
Avg Obs:                       20.000   Distribution:                   F(2,207)
Min Obs:                       20.000                                           
Max Obs:                       20.000   F-statistic (robust):             340.08
                                        P-value                           0.0000
Time periods:                      20   Distribution:                   F(2,207)
Avg Obs:                       11.000                                           
Min Obs:                       11.000                                           
Max Obs:                       11.000                                           

                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
capital        0.3100     0.0165     18.744     0.0000      0.2774      0.3426
value          0.1101     0.0113     9.7461     0.0000      0.0879      0.1324
==============================================================================

F-test for Poolability: 49.207
P-value: 0.0000
Distribution: F(10,207)

Included effects: Entity

(二)smf.ols

#采用ols估计,加入个体的虚拟变量
res_ols = smf.ols('invest ~ value + capital +firm', data=data)
#res_ols = smf.ols('invest ~ value + capital + C(firm)', data=data)
results_ols = res_ols.fit()
print(results_ols.summary())

结果如下:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 invest   R-squared:                       0.946
Model:                            OLS   Adj. R-squared:                  0.943
Method:                 Least Squares   F-statistic:                     302.6
Date:                Wed, 20 Jul 2022   Prob (F-statistic):          4.77e-124
Time:                        17:33:36   Log-Likelihood:                -1167.4
No. Observations:                 220   AIC:                             2361.
Df Residuals:                     207   BIC:                             2405.
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
Intercept                   -20.5782     11.298     -1.821      0.070     -42.852       1.695
firm[T.Atlantic Refining]   -94.0243     17.164     -5.478      0.000    -127.862     -60.186
firm[T.Chrysler]             -7.2309     17.338     -0.417      0.677     -41.413      26.951
firm[T.Diamond Match]        14.0102     15.944      0.879      0.381     -17.422      45.443
firm[T.General Electric]   -214.9912     25.461     -8.444      0.000    -265.188    -164.795
firm[T.General Motors]      -49.7209     48.280     -1.030      0.304    -144.905      45.463
firm[T.Goodyear]            -66.6363     16.379     -4.068      0.000     -98.927     -34.346
firm[T.IBM]                  -2.5820     16.379     -0.158      0.875     -34.873      29.709
firm[T.US Steel]            122.4829     25.960      4.718      0.000      71.304     173.662
firm[T.Union Oil]           -45.9660     16.357     -2.810      0.005     -78.215     -13.717
firm[T.Westinghouse]        -36.9683     17.309     -2.136      0.034     -71.093      -2.843
value                         0.1101      0.011      9.746      0.000       0.088       0.132
capital                       0.3100      0.017     18.744      0.000       0.277       0.343
==============================================================================
Omnibus:                       35.893   Durbin-Watson:                   1.079
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              243.455
Skew:                           0.297   Prob(JB):                     1.36e-53
Kurtosis:                       8.119   Cond. No.                     2.98e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.98e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

也可采用去时间均值方法获取。

data = grunfeld.load_pandas().data
#设置索引
data = data.set_index(["firm","year"])  #此处drop=True
#求被解释变量、解释变量的去除时间均值
data['invest_w'] = data['invest'] - data.groupby('firm').mean()['invest']
data['value_w'] = data['value'] - data.groupby('firm').mean()['value']
data['capital_w'] = data['capital'] - data.groupby('firm').mean()['capital']

#用OLS方程对去除时间均值进行估计
results_man = smf.ols('invest_w ~ 0 + value_w +capital_w', data).fit()
print(results_man.summary())

结果如下:

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:               invest_w   R-squared (uncentered):                   0.767
Model:                            OLS   Adj. R-squared (uncentered):              0.765
Method:                 Least Squares   F-statistic:                              358.2
Date:                Wed, 20 Jul 2022   Prob (F-statistic):                    1.28e-69
Time:                        17:58:17   Log-Likelihood:                         -1167.4
No. Observations:                 220   AIC:                                      2339.
Df Residuals:                     218   BIC:                                      2346.
Df Model:                           2                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
value_w        0.1101      0.011     10.002      0.000       0.088       0.132
capital_w      0.3100      0.016     19.236      0.000       0.278       0.342
==============================================================================
Omnibus:                       35.893   Durbin-Watson:                   1.079
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              243.455
Skew:                           0.297   Prob(JB):                     1.36e-53
Kurtosis:                       8.119   Cond. No.                         1.74
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.

四、时间固定效应

模型如下:

其中,时间因子是或称为time_effects

如下所示,其中是t年的虚拟变量。

(一)PanelOLS

#时间固定效应:基于数组
exog = data[['value','capital']]
res_fe = PanelOLS(data['invest'], exog, time_effects=True)
results_fe = res_fe.fit()
print(results_fe)

#时间固定效应:基于公式
res_fe = PanelOLS.from_formula('invest ~ value + capital + TimeEffects', data=data)
results_fe = res_fe.fit()
print(results_fe)

基于数组和基于公式的返回结果一致,如下所示:

                          PanelOLS Estimation Summary                           
================================================================================
Dep. Variable:                 invest   R-squared:                        0.8109
Estimator:                   PanelOLS   R-squared (Between):              0.8720
No. Observations:                 220   R-squared (Within):               0.7273
Date:                Wed, Jul 20 2022   R-squared (Overall):              0.8481
Time:                        17:40:21   Log-likelihood                   -1298.8
Cov. Estimator:            Unadjusted                                           
                                        F-statistic:                      424.46
Entities:                          11   P-value                           0.0000
Avg Obs:                       20.000   Distribution:                   F(2,198)
Min Obs:                       20.000                                           
Max Obs:                       20.000   F-statistic (robust):             424.46
                                        P-value                           0.0000
Time periods:                      20   Distribution:                   F(2,198)
Avg Obs:                       11.000                                           
Min Obs:                       11.000                                           
Max Obs:                       11.000                                           

                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
capital        0.2166     0.0299     7.2436     0.0000      0.1577      0.2756
value          0.1158     0.0060     19.434     0.0000      0.1040      0.1275
==============================================================================

F-test for Poolability: 0.2419
P-value: 0.9996
Distribution: F(19,198)

Included effects: Time

(二)smf.ols

#采用ols估计,加入个体的虚拟变量
res_ols = smf.ols('invest ~ value + capital + C(year)', data=data)
results_ols = res_ols.fit()
print(results_ols.summary())

结果如下:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 invest   R-squared:                       0.822
Model:                            OLS   Adj. R-squared:                  0.803
Method:                 Least Squares   F-statistic:                     43.55
Date:                Wed, 20 Jul 2022   Prob (F-statistic):           1.27e-62
Time:                        17:41:37   Log-Likelihood:                -1298.8
No. Observations:                 220   AIC:                             2642.
Df Residuals:                     198   BIC:                             2716.
Df Model:                          21                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
Intercept           -21.6815     28.354     -0.765      0.445     -77.597      34.234
C(year)[T.1936.0]   -15.1865     39.884     -0.381      0.704     -93.839      63.466
C(year)[T.1937.0]   -30.8415     39.958     -0.772      0.441    -109.640      47.957
C(year)[T.1938.0]   -25.9640     39.882     -0.651      0.516    -104.611      52.683
C(year)[T.1939.0]   -51.2476     39.902     -1.284      0.201    -129.936      27.441
C(year)[T.1940.0]   -27.5208     39.911     -0.690      0.491    -106.226      51.184
C(year)[T.1941.0]    -2.0012     39.928     -0.050      0.960     -80.739      76.737
C(year)[T.1942.0]    -0.3563     39.990     -0.009      0.993     -79.216      78.504
C(year)[T.1943.0]   -18.7958     39.997     -0.470      0.639     -97.671      60.079
C(year)[T.1944.0]   -19.4973     39.991     -0.488      0.626     -98.360      59.366
C(year)[T.1945.0]   -29.7423     40.002     -0.744      0.458    -108.627      49.142
C(year)[T.1946.0]    -6.1207     40.033     -0.153      0.879     -85.066      72.825
C(year)[T.1947.0]    -4.3649     40.312     -0.108      0.914     -83.860      75.130
C(year)[T.1948.0]    -2.8025     40.508     -0.069      0.945     -82.686      77.081
C(year)[T.1949.0]   -25.2951     40.683     -0.622      0.535    -105.522      54.932
C(year)[T.1950.0]   -24.9390     40.767     -0.612      0.541    -105.332      55.454
C(year)[T.1951.0]    -9.4694     40.792     -0.232      0.817     -89.912      70.973
C(year)[T.1952.0]    -3.8273     41.134     -0.093      0.926     -84.944      77.289
C(year)[T.1953.0]     4.0537     41.589      0.097      0.922     -77.961      86.068
C(year)[T.1954.0]    -9.3916     42.268     -0.222      0.824     -92.744      73.961
value                 0.1158      0.006     19.434      0.000       0.104       0.128
capital               0.2166      0.030      7.244      0.000       0.158       0.276
==============================================================================
Omnibus:                       33.290   Durbin-Watson:                   0.341
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              134.793
Skew:                           0.482   Prob(JB):                     5.37e-30
Kurtosis:                       6.711   Cond. No.                     3.42e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.42e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

五、个体固定效应+时间固定效应

模型如下:

其中单个公司因子为或称为entity_effects。时间因子是或称为time_effects

如下所示,其中是公司i的虚拟变量,而是t年的虚拟变量。

(一)PanelOLS

#个体固定效应+时间固定效应:基于数组
exog = data[['value','capital']]
res_fe = PanelOLS(data['invest'], exog, entity_effects=True,time_effects=True)
results_fe = res_fe.fit()
print(results_fe)

#个体固定效应+时间固定效应:基于公式
res_fe = PanelOLS.from_formula('invest ~ value + capital + EntityEffects + TimeEffects', data=data)
results_fe = res_fe.fit()
print(results_fe)

基于数组和基于公式的返回结果一致,如下所示:

                          PanelOLS Estimation Summary                           
================================================================================
Dep. Variable:                 invest   R-squared:                        0.7253
Estimator:                   PanelOLS   R-squared (Between):              0.7637
No. Observations:                 220   R-squared (Within):               0.7566
Date:                Wed, Jul 20 2022   R-squared (Overall):              0.7625
Time:                        17:46:42   Log-likelihood                   -1153.0
Cov. Estimator:            Unadjusted                                           
                                        F-statistic:                      248.15
Entities:                          11   P-value                           0.0000
Avg Obs:                       20.000   Distribution:                   F(2,188)
Min Obs:                       20.000                                           
Max Obs:                       20.000   F-statistic (robust):             248.15
                                        P-value                           0.0000
Time periods:                      20   Distribution:                   F(2,188)
Avg Obs:                       11.000                                           
Min Obs:                       11.000                                           
Max Obs:                       11.000                                           

                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
capital        0.3514     0.0210     16.696     0.0000      0.3099      0.3930
value          0.1167     0.0129     9.0219     0.0000      0.0912      0.1422
==============================================================================

F-test for Poolability: 18.476
P-value: 0.0000
Distribution: F(29,188)

Included effects: Entity, Time

也可写成这样的代码:

#个体固定效应+时间固定效应:基于数组
exog = data[['value','capital','firm']]
res_fe = PanelOLS(data['invest'], exog, time_effects=True#11家公司创建10个虚拟变量
results_fe = res_fe.fit()
print(results_fe)

#个体固定效应+时间固定效应:基于数组
year = pd.Categorical(data.year) #将数字形式的年份转化为类别形式
data['year'] = year
exog = data[['value','capital','year']]
res_fe = PanelOLS(data['invest'], exog, entity_effects=True#20年创建19个虚拟变量
results_fe = res_fe.fit()
results_fe = res_fe.fit()
print(results_fe)

#个体固定效应+时间固定效应:基于公式( + 个体虚拟变量 + TimeEffects)
res_fe = PanelOLS.from_formula('invest ~ value + capital + firm + TimeEffects', data=data)  #不足之处:11家公司创建11个虚拟变量
results_fe = res_fe.fit()
print(results_fe)

#个体固定效应+时间固定效应:基于公式( + EntityEffects + 时间虚拟变量)
res_fe = PanelOLS.from_formula('invest ~ value + capital +  EntityEffects + C(year)', data=data) #不足之处:20年创建20个虚拟变量
results_fe = res_fe.fit()
print(results_fe)

(二)smf.ols

#采用ols估计,加入个体和时间的虚拟变量
res_ols = smf.ols('invest ~ value + capital + firm + C(year)', data=data)
results_ols = res_ols.fit()
print(results_ols.summary())

结果如下:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 invest   R-squared:                       0.953
Model:                            OLS   Adj. R-squared:                  0.945
Method:                 Least Squares   F-statistic:                     122.1
Date:                Wed, 20 Jul 2022   Prob (F-statistic):          5.20e-108
Time:                        17:47:55   Log-Likelihood:                -1153.0
No. Observations:                 220   AIC:                             2370.
Df Residuals:                     188   BIC:                             2479.
Df Model:                          31                                         
Covariance Type:            nonrobust                                         
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
Intercept                    18.0876     18.656      0.970      0.334     -18.715      54.890
firm[T.Atlantic Refining]  -112.5008     17.752     -6.337      0.000    -147.520     -77.482
firm[T.Chrysler]            -13.5993     17.540     -0.775      0.439     -48.199      21.001
firm[T.Diamond Match]        16.4928     15.692      1.051      0.295     -14.462      47.448
firm[T.General Electric]   -241.0850     28.000     -8.610      0.000    -296.319    -185.851
firm[T.General Motors]     -101.7696     55.177     -1.844      0.067    -210.615       7.075
firm[T.Goodyear]            -77.9628     16.435     -4.744      0.000    -110.383     -45.543
firm[T.IBM]                  -6.4573     16.271     -0.397      0.692     -38.554      25.640
firm[T.US Steel]            100.5492     28.438      3.536      0.001      44.450     156.648
firm[T.Union Oil]           -56.7936     16.403     -3.462      0.001     -89.151     -24.436
firm[T.Westinghouse]        -41.7165     17.483     -2.386      0.018     -76.204      -7.229
C(year)[T.1936.0]           -16.9592     21.518     -0.788      0.432     -59.407      25.488
C(year)[T.1937.0]           -36.3756     22.364     -1.627      0.106     -80.492       7.741
C(year)[T.1938.0]           -35.6237     21.162     -1.683      0.094     -77.370       6.122
C(year)[T.1939.0]           -63.0994     21.505     -2.934      0.004    -105.522     -20.677
C(year)[T.1940.0]           -39.8248     21.626     -1.842      0.067     -82.486       2.836
C(year)[T.1941.0]           -16.4878     21.529     -0.766      0.445     -58.957      25.982
C(year)[T.1942.0]           -17.9993     21.275     -0.846      0.399     -59.967      23.968
C(year)[T.1943.0]           -37.7724     21.415     -1.764      0.079     -80.016       4.471
C(year)[T.1944.0]           -38.3201     21.459     -1.786      0.076     -80.652       4.012
C(year)[T.1945.0]           -49.5395     21.687     -2.284      0.023     -92.322      -6.757
C(year)[T.1946.0]           -27.7544     21.866     -1.269      0.206     -70.888      15.379
C(year)[T.1947.0]           -34.8775     21.589     -1.616      0.108     -77.464       7.709
C(year)[T.1948.0]           -38.3307     21.734     -1.764      0.079     -81.204       4.542
C(year)[T.1949.0]           -65.2008     21.901     -2.977      0.003    -108.404     -21.998
C(year)[T.1950.0]           -67.3877     22.028     -3.059      0.003    -110.841     -23.935
C(year)[T.1951.0]           -54.8346     22.437     -2.444      0.015     -99.095     -10.574
C(year)[T.1952.0]           -56.4890     22.819     -2.475      0.014    -101.504     -11.474
C(year)[T.1953.0]           -58.5126     23.819     -2.457      0.015    -105.500     -11.525
C(year)[T.1954.0]           -81.7939     24.204     -3.379      0.001    -129.540     -34.047
value                         0.1167      0.013      9.022      0.000       0.091       0.142
capital                       0.3514      0.021     16.696      0.000       0.310       0.393
==============================================================================
Omnibus:                       32.466   Durbin-Watson:                   0.988
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              180.276
Skew:                           0.311   Prob(JB):                     7.14e-40
Kurtosis:                       7.391   Cond. No.                     3.92e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.92e+04. This might indicate that there are
strong multicollinearity or other numerical problems.



精选文章

管理世界 | 使用文本分析词构建并测量短视主义

管理世界 | 使用 「经营讨论与分析」 测量 「企业数字化指标」

支持开票 | Python实证指标构建与文本分析

PNAS | 文本网络分析&文化桥梁Python代码实现

PNAS | 使用语义距离测量一个人的「创新力(发散思维)得分

MS | 使用网络算法识别『创新的颠覆性与否

金融研究 | 文本相似度计算与可视化

金融研究 | 使用Python构建「关键审计事项信息含量」

视频分享 | 文本分析在经管研究中的应用

转载 | 金融学文本大数据挖掘方法与研究进展

文本分析 | 「MD&A信息含量」指标构建代码实现

可视化 | 绘制《三体》人物关系网络图

长期征稿 | 欢迎各位前来投稿

17G数据集 | 深交所企业社会责任报告

70G数据集 | 上市公司定期报告数据集

27G数据集 | 使用Python对27G招股说明书进行文本分析

数据集 | 585w企业工商注册信息

数据集 | 90w条中国上市公司高管数据

可视化 | 绘制《三体》人物关系网络图

认知的测量 | 向量距离vs语义投影

Asent库 | 英文文本数据情感分析

tomotopy | 速度最快的LDA主题模型

100min视频 | Python文本分析与会计

安装python包出现报错:Microsoft Visual 14.0 or greater is required. 怎么办?

如何正确读入文本数据不乱码(解决文本乱码问题)

Faker库 | 生成实验数据

使用R语言绘制文本数据情感历时趋势图

NiceGUI库 | 简单易懂的Web GUI开发包;可开发数据标注工具、心理学实验工具等

CheatSheet | Python文本数据处理速查表

pandas | 使用 df.query 字符串表达式进行数据筛选

good-icon 0
favorite-icon 0
收藏
回复数量: 0
    暂无评论~~
    Ctrl+Enter