作者:严小样儿
来源:统计与数据分析实战
dropna为pandas库下DataFrame的一个方法,用于删除缺失值。基本参数如下:
dropna(self, axis=0, how='any', subset=None, inplace=False)
接下来,我们一一进行讲解。
# 预览模拟数据
> df
Out[1]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
1 NaN NaT female
2 Black 18.0 1997-02-07 male
3 Cici NaN 2000-01-18 female
4 David 25.0 NaT male
5 NaN 22.0 NaT female
# 不加任何参数
> df.dropna()
Out[2]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
2 Black 18.0 1997-02-07 male
'any') > df.dropna(how =
Out[3]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
2 Black 18.0 1997-02-07 male
# all——删除整行均为缺失值的行
'all') > df.dropna(how =
Out[4]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
1 NaN NaT female
2 Black 18.0 1997-02-07 male
3 Cici NaN 2000-01-18 female
4 David 25.0 NaT male
5 NaN 22.0 NaT female
:,1:3].dropna(how = 'all') > df.iloc[
Out[5]:
age birthday
0 17.0 1999-01-25
2 18.0 1997-02-07
3 NaN 2000-01-18
4 25.0 NaT
5 22.0 NaT
# 按列删除——即包含缺失值的列统统被删除
1) > df.dropna(axis =
Out[6]:
gender
0 male
1 female
2 male
3 female
4 male
5 female
# 删除指定列包含缺失值的行
'name','gender']) > df.dropna(subset = [
Out[7]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
1 NaN NaT female
2 Black 18.0 1997-02-07 male
3 Cici NaN 2000-01-18 female
4 David 25.0 NaT male
当然了,以上所有的操作均不是对元数据产生作用,只是生成了一个副本。如果想要对元数据产生作用,则必须加一个inplace参数。
# 再次查看元数据,观察是否变化
> df
Out[8]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
1 NaN NaT female
2 Black 18.0 1997-02-07 male
3 Cici NaN 2000-01-18 female
4 David 25.0 NaT male
5 NaN 22.0 NaT female
# 加入inplace参数,对元数据产生影响
> df.dropna(inplace = True)
> df
Out[9]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
2 Black 18.0 1997-02-07 male
◆ ◆ ◆ ◆ ◆
麟哥新书已经在当当上架了,我写了本书:《拿下Offer-数据分析师求职面试指南》,目前当当正在举行活动,大家可以用相当于原价5折的预购价格购买,还是非常划算的:
数据森麟公众号的交流群已经建立,许多小伙伴已经加入其中,感谢大家的支持。大家可以在群里交流关于数据分析&数据挖掘的相关内容,还没有加入的小伙伴可以扫描下方管理员二维码,进群前一定要关注公众号奥,关注后让管理员帮忙拉进群,期待大家的加入。
管理员二维码:
猜你喜欢 ● 你相信逛B站也能学编程吗