Pandas str.replace with regex doubles results? [duplicate]

Let’s say I have this pandas Series:

$ python3 -c 'import pandas as pd; print(pd.Series(["1","2","3","4"]))'
0    1
1    2
2    3
3    4
dtype: object

I’d like to “wrap” the strings “1”,”2″,”3″,”4″ so they are prefixed with “a” and suffixed with “b” -> that is, I want to get “a1b”,”a2b”,”a3b”,”a4b”. So I try https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html

$ python3 -c 'import pandas as pd; print(pd.Series(["1","2","3","4"]).str.replace("(.*)", r"a\1b", regex=True))'
0    a1bab
1    a2bab
2    a3bab
3    a4bab
dtype: object

So – I did get a “wrap” of “1” into “a1b” -> but then “ab” is repeated one more time?

(Trying this regex in regex101.com, I’ve noticed I get the same “ghost copies” of “ab” at end if the g flag is enabled; so maybe Pandas .str.replace somehow enables it? But then, default is flags=0 for Pandas .str.replace as per docs ?!)

How can I get the entire contents of a column cell “wrapped” in only those characters that I want?

Change (.*) to (.+):

andrej@Andrej-PC:~/app$ python3 -c 'import pandas as pd; print(pd.Series(["1","2","3","4"]).str.replace("(.+)", r"a\1b", regex=True))'
0    a1b
1    a2b
2    a3b
3    a4b
dtype: object

A possible solution:

s = pd.Series(range(1,5))
'a' + s.astype(str) + 'b'

Output:

0    a1b
1    a2b
2    a3b
3    a4b
dtype: object

Leave a Comment