动机

map和filter是处理iterable数据最好用的函数，但却让代码看起来很乱，使代码可读性大大降低。

arr = [1, 2, 3, 4, 5]

#对arr筛选偶数，并对偶数乘以2
list(map(lambda x: x*2, filter(lambda x:x%2==0, arr)))

[4, 8]

刚刚的iterable的例子，其实可以使用pipe库中的 | 来应用多种方法。

from pipe import select, where

arr = [1, 2, 3, 4, 5]

list(arr
    |where(lambda x:x%2==0)
    |select(lambda x:x*2))

[4, 8]

pipe是什么？

pipe是python中的管道操作库，可以使数据分析多个步骤(函数）像管道(流水线)一样上下衔接，共同完成一个数据分析任务。

我喜欢pipe是因为它让iterable代码变得干净整洁，可读性大大增强。后面我会通过几个案例让大家快速掌握pipe库。首先先安装pipe

!pip3 install pipe

where

对iterable中的数据进行筛选操作

from pipe import where

arr = [1, 2, 3, 4, 5]

#把偶数筛选出来
list(arr | where(lambda x: x%2==0))

[2, 4]

select

对iterable中的数据进行某种操作

from pipe import select

arr = [1, 2, 3, 4, 5]

#对arr中的每个数 乘以2
list(arr | select(lambda x: x*2))

[2, 4, 6, 8, 10]

现在你可能会有疑问：为何在Python已拥有map和filter情况下，还用pipe库中的 select和 where呢？

因为可以使用管道在一个方法后面加入另一个方法，加不止1次!!

from pipe import select, where

arr = [1, 2, 3, 4, 5]

list(arr 
     | where(lambda x: x%2==0) #筛选arr中的偶数
     | select(lambda x: x*2)  #对偶数乘以2
    )

[4, 8]

非折叠iterable

chain

对于嵌套结构的iterable数据，最难任务之一就是将其展平。

from pipe import chain

nested = [[1,2,[3]], [4, 5]]

list((nested | chain))

[1, 2, [3], 4, 5]

即时经过上述操作，依然不是完全展开。为了处理深度嵌套数据，可以使用traverse方法。

traverse

遍历traverse方法可以用递归的方式展开嵌套对象。

from pipe import traverse

nested = [[1,2,[3]], [4, 5]]

list((nested | traverse))

[1, 2, 3, 4, 5]

现在我们从抽取字典values中的列表，并将其展平

from pipe import traverse, select

fruits = [
    {"name": "apple", "price": [2, 5]},
    {"name": "orange", "price": 4},
    {"name": "grape", "price": 5}
]

list(fruits
    | select(lambda fruit: fruit["price"])
    | traverse)

[2, 5, 4, 5]

groupby

有时候，需要对列表中的数据进行分组，这可能用到groupby方法。

from pipe import select, groupby

list(
    (1, 2, 3, 4, 5, 6, 7, 8, 9)
    | groupby(lambda x: "偶数" if x%2==0 else "奇数")
    | select(lambda x: {x[0]: list(x[1])})
)

[{'偶数': [2, 4, 6, 8]}, {'奇数': [1, 3, 5, 7, 9]}]

在上面的代码中，我们使用groupby将数字分为奇数组和偶数组。groupby方法输出的结果如下

[('偶数', <itertools._grouper at 0x10bd54550>),
 ('奇数', <itertools._grouper at 0x10bd4d350>)]

接下来，使用select将元素为元组的列表转化为字典，其中

元组中第1位置做字典的关键词
元组中第2位置做字典的值

[{'偶数': [2, 4, 6, 8]}, {'奇数': [1, 3, 5, 7, 9]}]

Cool！为了range值大于2，我们在select内增加where条件操作

from pipe import select, groupby

list(
    (1, 2, 3, 4, 5, 6, 7, 8, 9)
    | groupby(lambda x: "偶数" if x%2==0 else "奇数")
    | select(lambda x: {x[0]: list(x[1] 
                                   | where(lambda x: x>2)
                                  )
                       }
            )
)

[{'偶数': [4, 6, 8]}, {'奇数': [3, 5, 7, 9]}]

dedup

使用Key对list数据进行去重

from pipe import dedup

arr = [1, 2, 2, 3, 4, 5, 6, 6, 7, 9, 3, 3, 1]

list(arr | dedup)

[1, 2, 3, 4, 5, 6, 7, 9]

这看起来没啥新意，毕竟python内置的set函数即可实现刚刚的需求。然而，dedup通过key获得列表中的唯一元素。

例如，获得小于5的唯一元素，且另一个元素大于或等于5

from pipe import dedup

arr = [1, 2, 2, 3, 4, 5, 6, 6, 7, 9, 3, 3, 1]

list(arr | dedup(lambda key: key<5))

[1, 5]

from pipe import traverse, select

data = [
    {"name": "apple", "count": 2},
    {"name": "orange", "count": 4},
    {"name": "grape", "count": None},
    {"name": "orange", "count": 7}
]

list(
    data
    | dedup(key=lambda fruit: fruit["name"])
    | select(lambda fruit: fruit["count"])
    | where(lambda count: isinstance(count, int))
)

[2, 4]
其实，除了pipe， 还有一个库plydata有类似的功效，感兴趣的童鞋可以  plydata库 | 数据操作管道操作符>>

近期文章
视频专栏课 | Python网络爬虫与文本分析
cntext 中文文本分析库
如何在DataFrame中使用If-Else条件语句创建新列
BERTopic 主题建模库 | 建议收藏
KeyBERT库 | 自动挖掘文本中的关键词
DataShare | 6000+个股票的每日财经新闻  
SmartScraper | 简单、自动、快捷的Python网络爬虫
SHAP | 机器学习模型解释库
读完本文你就了解什么是文本分析
文本分析在经管领域中的应用概述
综述:文本分析在市场营销研究中的应用
文本分析方法在《管理世界》（2021.5）中的应用
中文金融情感词典发布啦 | 附代码
wordexpansion包 | 新增词向量法构建领域词典
语法最简单的微博通用爬虫weibo_crawler
hiResearch 定义自己的科研首页
SciencePlots | 科研样式绘图库
plydata库 | 数据操作管道操作符>>
plotnine: Python版的ggplot2作图库
Wow~70G上市公司定期报告数据集
漂亮~pandas可以无缝衔接Bokeh  
YelpDaset: 酒店管理类数据集10+G

pipe让Python代码更简洁

动机

pipe是什么？

where

select

非折叠iterable

chain

traverse

groupby

dedup

近期文章

相关文章推荐

pipe让Python代码更简洁

动机

pipe是什么？

where

select

非折叠iterable

chain

traverse

groupby

dedup

近期文章

添加附言

微信扫一扫：分享

相关文章推荐