您可以
pandas.Series.str.split像平常一样使用
split。只需对string进行拆分
'::',并索引从该
split方法创建的列表:
>>> df = pd.Dataframe({'text': ["vendor a::ProductA", "vendor b::ProductA", "vendor a::Productb"]})>>> df text0 vendor a::ProductA1 vendor b::ProductA2 vendor a::Productb>>> df['text_new'] = df['text'].str.split('::').str[0]>>> df text text_new0 vendor a::ProductA vendor a1 vendor b::ProductA vendor b2 vendor a::Productb vendor a
这是一个非熊猫解决方案:
>>> df['text_new1'] = [x.split('::')[0] for x in df['text']]>>> df text text_new text_new10 vendor a::ProductA vendor a vendor a1 vendor b::ProductA vendor b vendor b2 vendor a::Productb vendor a vendor a
编辑:这是
pandas上面发生的情况的分步说明:
# Select the pandas.Series object you want>>> df['text']0 vendor a::ProductA1 vendor b::ProductA2 vendor a::ProductbName: text, dtype: object# using pandas.Series.str allows us to implement "normal" string methods # (like split) on a Series>>> df['text'].str<pandas.core.strings.StringMethods object at 0x110af4e48># Now we can use the split method to split on our '::' string. You'll see that# a Series of lists is returned (just like what you'd see outside of pandas)>>> df['text'].str.split('::')0 [vendor a, ProductA]1 [vendor b, ProductA]2 [vendor a, Productb]Name: text, dtype: object# using the pandas.Series.str method, again, we will be able to index through# the lists returned in the previous step>>> df['text'].str.split('::').str<pandas.core.strings.StringMethods object at 0x110b254a8># now we can grab the first item in each list above for our desired output>>> df['text'].str.split('::').str[0]0 vendor a1 vendor b2 vendor aName: text, dtype: object
我建议您查看pandas.Series.str文档,或者更好的方法是在pandas中使用文本数据。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)