I am really not good at writing regex so I took help from ChatGPT but whatever he is providing does not work. So I think I need an expert’s help for my case. So here I am expecting a solution from you guys.
So let me briefly explain my situation. I have a markdown js library that converts urls to html. But it doesnt support urls with format google.com
and www.google.com
because these two formats doesnt have the protocol. So basically, I need to identify these two url formats with regex, and then replace them with something like https://google.com
and https://www.google.com
so that the library can convert to html links.
The regex provided by ChatGPT is either matching the domain part of lets say [email protected]
or doesnt identify nothing at all or if the url is wrapped with quotes or parentheses or brackets, it identifies them as well like "google.com"
. Or sometimes it matches the first part before @ in email addresses.
Here’s my code:
function matchURLs(text) {
const regex = /b(?:(?:https?://|www.)[a-zA-Z0-9.-]+.[a-zA-Z]{2,}(?:/S*)?|(?!.*@)[a-zA-Z0-9-]+.[a-zA-Z]{2,}(?:/S*)?)b/g;
const matches = text.match(regex);
if (matches?.length > 0) {
matches.forEach(match => {
if (!match.startsWith('http')) {
text = text.replace(match, `https://${match}`);
}
});
}
return text;
}
matchURLS(text)
These are the strings I’m passing to the function to match:
links like "google.com" are working
links "http://google.com" are working
links "https://google.com" are working
links "www.google.com" are working
links "http://www.google.com" are working
links "https://www.google.com" are working
email "[email protected]" are working
[email protected]
customer/[email protected]
[email protected]
this.is.a.very.long.email.address@subdomain.of.a.really.long.domain.name.com
And the output is this:
[
"google.com",
"http://google.com",
"https://google.com",
"www.google.com",
"http://www.google.com",
"https://www.google.com",
"google.com",
"exa4mple.com",
"exa5mple.com",
"mail.server",
"exa7mple.com",
"subdomain.of",
"a.really",
"long.domain",
"name.com"
]
Can someone help please to improve this regex so that it only matches the urls? So the first 6 only, and must not match url in email addresses, neither email addresses.