Python模糊匹配列表性能中的字符串

家电修理 2023-07-16 19:17www.caominkang.com电器维修

编写矢量化操作并避免循环可显着提高速度

导入必要的包裹

from fuzzyuzzy import fuzzimport pandas as pdimport numpy as np

从第一个列表创建数据框

dataframecolumn = pd.Dataframe(["apple","tb"])dataframecolumn.columns = ['Match']

从第二个列表创建数据框

pare = pd.Dataframe(["adfad","apple","asple","tab"])pare.columns = ['pare']

合并-通过引入密钥(自加入)的笛卡尔积

dataframecolumn['Key'] = 1pare['Key'] = 1bined_dataframe = dataframecolumn.merge(pare,on="Key",ho="left")bined_dataframe = bined_dataframe[~(bined_dataframe.Match==bined_dataframe.pare)]

向量化

def partial_match(x,y): return(fuzz.ratio(x,y))partial_match_vector = np.vectorize(partial_match)

使用矢量化并通过在阈值上设置阈值来获得期望的结果

bined_dataframe['score']=partial_match_vector(bined_dataframe['Match'],bined_dataframe['pare'])bined_dataframe = bined_dataframe[bined_dataframe.score>=80]

结果

+--------+-----+--------+------+| Match  | Key | pare | score+--------+-----+--------+------+| apple  | 1   |   asple | 80|  tb | 1   |   tab   | 80+--------+-----+--------+------+


Copyright © 2016-2025 www.caominkang.com 曹敏电脑维修网 版权所有 Power by