I want to remove url’s from a column in a data-frame. The column I am interested in is called comment, and example entry in comment is:
|comment |
|:--------------------------------------:|
| """Drone Strikes Up 432 Percent Under. |
|Donald Trump"" by Joe Wolverton, II, |
|J.D. |
|<a |
|href=""https://www.thenewamerican.com/ |
|usne |
|ws/foreign-policy/item/25604-drone- |
|strikes-up-432-percent-under-donald- |
|trump"" |
|title=""https://www.thenewamerican.com/ |
|usn |
|ews/foreign-policy/item/25604-drone- |
|strikes-up-432-percent-under-donald- |
|trump"" |
|target=""_blank"">https://www.thenewamer|
|c |
|an.com/usnews/foreign-policy/item/25604-|
|drone-st...</a><br/>""Trump is weighing |
| major escalation in Yemen's devastating|
|war<br/>The war has already killed at |
|least 10,000, displaced 3 million, and. |
|left millions more at risk of famine."" |
|<br/>" |
This above entry shows the issue I am trying to solve. I want to completely remove:
<a href=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" title=""https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-strikes-up-432-percent-under-donald-trump"" target=""_blank"">https://www.thenewamerican.com/usnews/foreign-policy/item/25604-drone-st...</a>
I’ve tried:
df['comment'] = df['comment_body'].replace(r'httpsS+', ' ', regex=True).replace(r'wwwS+', ' ', regex=True).replace(r'httpS+', ' ', regex=True)
However this likes with me in
href title targetblank com