I want to catch the phone number from a text using regex.
Examples:
I have this regex which finds the phone number very well:
^((\(?\+45\)?)?)(\s?\d{2}\s?\d{2}\s?\d{2}\s?\d{2})$
and it catches all the numbers below well.
But I cannot catch the “tel.”, “tlf”, “mobil:”, etc that could be before the number. And also, if another letter comes after the last digit, it doesn’t take number anymore, but it should.
These examples are not covered:
tel.: +45 09827374, +45 89895867, some kind of text...
mobil: +45 20802020, +45 20802001,
tlf.: +45 5555 1212
tlf: +4567890202Girrafe
If helpful, I found this regex:
'\btlf\b\D*([\d\s]+\d)'
which can extract the number and the tlf and also stop before it finds a new character which is represented by a letter.
So I tried to combine them and I obtained this but it doesn’t work:
\b(tlf|mobil|telephone|mobile|tel)\b\D*(^((\(?\+45\)?)?)(\s?\d{2}\s?\d{2}\s?\d{2}\s?\d{2})$)
Expected output:
- for input:
"tel.: +45 09827374, +45 89895867, some kind of text..."
–> output:"tel.: +45 09827374" and "+45 89895867"
- for input:
"mobil: +45 20802020, +45 20802001,"
–> output:"mobil: +45 20802020"
and"+45 20802001"
or"mobil: +45 20802020, +45 20802001"
is ok too - for input:
"tlf +45 5555 1212"
–> output:"tlf +45 5555 1212"
- for input:
"tlf: +4567890202Girrafe"
–> output:"tlf: +4567890202"
- for input:
"+4567890202"
–> output:"+4567890202"
Can you help me please?
If you want the full match only:
(?:\b(?:tlf|mobile?|tel(?:ephone)?)[.:\s]+)?(?:\(\+45\)|\+45)?\s*\d{2}(?:\s?\d{2}){3}(?!\d)
The pattern matches:
(?:
Non capture group\b
A word boundary to prevent a partial word match(?:tlf|mobile?|tel(?:ephone)?)
match one of the alternatives[.:\s]+
match 1+ occurrences of either.
:
or a whitespace char
)?
Close the on capture group and make it optional(?:\(\+45\)|\+45)?
Optionally match either+45
or(+45)
\s*\d{2}(?:\s?\d{2}){3}
Match 3 times 2 digits with an optional whitespace char in between(?!\d)
Negative lookahead, assert not a digit directly to the right
See a regex demo
You don’t have to use a single regexp, you could match the tel:
etc. text first and then just match every phone number, e.g. using GNU awk and POSIX EREs instead of PCREs:
$ awk -v FPAT='[+]45[[:space:]]*[0-9][0-9[:space:]]+[0-9]' '
match($0,/^(tel|mobil|tlf)\.?:/,a) {
printf "%s ", a[0]
for (i=1; i<=NF; i++) {
print $i
}
}
' file
tel.: +45 09827374
+45 89895867
mobil: +45 20802020
+45 20802001
tlf.: +45 5555 1212
tlf: +4567890202
You can do the same with any awk with just a bit more code:
$ awk 'match($0,/^(tel|mobil|tlf)\.?:/) {
printf "%s ", substr($0,1,RLENGTH)
while ( match($0,/[+]45[ \t]*[0-9][[0-9 \t]+[0-9]/) ) {
print substr($0,RSTART,RLENGTH)
$0 = substr($0,RSTART+RLENGTH)
}
}' file
tel.: +45 09827374
+45 89895867
mobil: +45 20802020
+45 20802001
tlf.: +45 5555 1212
tlf: +4567890202
and I’m sure you could do the same in python, perl, ruby or whatever similar tool you like.
IMO it’s better to have a couple of small, simple regexps in your code than one lone, complicated one.
Please edit your question to add the expected outcomes for examples.
Try
(?:\b(tlf|mobil|telephone|mobile|tel)\b)?[^\w\n]*((?:\(\+45\)|\+45)?\s*\d{2}\s?\d{2}\s?\d{2}\s?\d{2})(?!\d)
regex101.com/r/XiYYkZ/1@PM77-1 I edited
@AriadneR. If you don’t want that, you can be more specific about what can be after the tel. like this regex101.com/r/mW6rPV/1 Or as a single match without groups regex101.com/r/Ep0nWs/1
works like a charm! thank you so much!
Show 1 more comment