How to distinguish numbers from ip addresses in regular expressions?

How to distinguish numbers from ip addresses in regular expressions? - javascript

For example, if we look at 5.56 and 183.55.0.144. Basically, when you do something like this /\d+\.\d+/ it matches 5.56, 189.55 and 0.144. Is there a way to match by regexp only numbers but not ip adresses' parts? I tried to use lookaheads, but I can't figure it out what kind of it should be like...
Here's a set of examples I try with:
<some_text> 5.56 <some_text>
<some_text> 183.55.0.144 <some_text>
4544445555.6877878487874
1.75.
How to get 1,3 and 4, without parts of 2?
I tried something like:
\d+\.\d+
\d+\.\d(?!\.)
(?<!\.)\d+\.\d+ (very close...)
(?<!\.)\d+\.\d(?!\.)
(?!(?:[0-9]{1,3}\.){3}[0-9]{1,3})\d+\.\d+(?!\.)\d+ (very close)
\d+\.(?!(?:\.\d+){2})\d+
(?<!(?:\.\d){2})\d+\.(?!(?:\.\d+){2})\d+
And many different forms like these.
Google gives something like these:
[-+]?[0-9]*\.?[0-9]+ or [+-]?[0-9]+[.][0-9]*([e][+-]?[0-9]+)?
And many other variations. But they are all match parts of ip.
P.S. Sorry for my bad English.

Solution by #Andy Ray: [^\.0-9](\d+(\.\d+)?)[^\.0-9]
Matches in next cases:
<some_text> 5.56 <some_text>
<some_text> 183.55.0.144 <some_text>
4544445555.6877878487874
1.75.
127.0.0.1
555 (Doesn't match if there is nothing at the end of a number )
Mine: (?<!\.)(?:\d+\.)(?!\d+\.\d+)\d+
<some_text> 5.56 <some_text>
<some_text> 183.55.0.144 <some_text>
4544445555.6877878487874
1.75.
127.0.0.1
555
Another way is to replace all ip addresses with some magic replacement, do your number regex, then replace them back (c) #Andy Ray

Another option for the example data might be:
(?<!\S)\d+\.\d+\b(?!\.\d)
Explanation
(?<!\S) Negative lookbehind, assert a whitespace boundary to the left
\d+\.\d+\b Match 1+ digits . 1+ digits
(?!\.\d) negative lookahead, assert not . followed by a digit to the right
See a regex101 demo.

Using a library for matching/verifying IP addresses, so to discard such strings, is a reliable way. But let's try this one for your purpose
/(?<!\.[0-9])(?<!\.) ([0-9]+)\.([0-9]+) (?![0-9]*\.[0-9])/x;
It works as asked in the question, and works for a number of other cases that I tested with.
With the requirements articulated in comments, we can go the "safe" way: build an alternation pattern with all possibilities
in my case there is no need for floats like ".75" or "+50".
Just only shown in examples.
Regarding the surrounding symbols.
They are text, space, period at the end or nothing.
First, the regex itself
(?: ^ | [a-zA-Z\s] ) ([0-9]+)\.([0-9]+) (?: [a-zA-Z\s] | (?:\.(?:[^0-9]|$)) | $ )
In a Perl program for testing, and laid out for easier reading
use warnings;
use strict;
use feature 'say';
my #tt = ( # test strings from the question
'<some_text> 5.56 <some_text>',
'<some_text> 183.55.0.144 <some_text>',
'4544445555.6877878487874',
'1.75.'
);
push #tt, #ARGV; # add strings if given on command line
for (#tt) {
say "--- $_"; # print current test-string
say for # print captures, one per line
m{ (?: ^ | [a-zA-Z\s] ) # beginning of string or letter/space
([0-9]+) \. ([0-9]+) # decimal number as expected, nums captured
(?: [a-zA-Z\s] # letter/space
| (?: \. # or period followed by
(?: [^0-9]|$)) # non-number or end-of-string
| $ ) # or end-of-string
}xg;
}
# The other approach
#for (#tt) {
# say for /(?<!\.[0-9])(?<!\.) ([0-9]+)\.([0-9]+) (?![0-9]*\.[0-9])/xg;
#}
I run this with an additional test-string, as
> perl prog.pl "a 0.23 is not .230 nor 12.23. But 22.33.44 is a-no"
and that prints
--- <some_text> 5.56 <some_text>
5
56
--- <some_text> 183.55.0.144 <some_text>
--- 4544445555.6877878487874
4544445555
6877878487874
--- 1.75.
1
75
--- a 0.23 is not .230 nor 12.23. But 22.33.44 is a-no
0
23
12
23
Matched are 5.56, 4544445555.6877878487874,1.75. from the question and 0.23, 12.23 from the input string, with the pairs of numbers comprising each float captured and printed.
If the float is rather to be captured change ([0-9]+)\.([0-9]+) to ([0-9]+\.[0-9]+).

You can use lookarounds in conjunction with word-boundaries:
\b(?<!\d\.)\d+\.\d+\b(?!\.\d)
or, in place of the last word-boundary, making the dot optional in the lookahead:
\b(?<!\d\.)\d+\.\d+(?!\.?\d)
or, with perl only, using a possessive quantifier:
\b(?<!\d\.)\d+\.\d++(?!\.\d)

Related

Javascript regex to match non repeating 9 digit number [duplicate]

I want to find 10 digit numbers with no repeat digits, for example:
1123456789 //fail, there are two 1's
6758951230 //fail, there are two 5's
6789012345 //pass, each digit occurs once only.
at the moment I am using regex but can only match 10digits numbers(it doesnt check for duplicates. I am using this regex:
[0-9]{10}
Can this be done with regex or is there a better way to achieve this?

This regex works:
^(?!.*(.).*\1)\d{10}$
This uses an anchored negative look ahead with a back reference to assert that there are no repeating characters.
See a live demo working with your examples.
In java:
if (str.matches("^(?!.*(.).*\\1)\\d{10}"))
// number passes

Try this one (?:([0-9])(?!.*\1)){10}, this will work if you're validating numbers one at a time.
This should work (?:([0-9])(?!\d*\1)){10} to search for each occurance of an unique 10-digit sequence, but it will fail with 12345678901234567890, will find the last valid part 1234567890 instead of ignoring it.
Source and explanations: https://stackoverflow.com/a/12870549/1366360

Here's the shortest and efficient regex with less backtracking due to the presence of a ?.
Works for any length of input:
!/(.).*?\1/.test(number)
Examples:
!/(.).*?\1/.test(1234567890) // true
!/(.).*?\1/.test(1234567490) // false - note that it also works for repeated chars which are not adjacent.
Demo
- checks for repeated digits
- opposite of what you want, because rubular doesn't allow a !

lancemanfv regex reference https://stackoverflow.com/a/12870549/1366360 is a great one, but the suggested regex is slightly off.
Instead try
^(?:([0-9])(?!.*\1)){10}$
This will match any string that begins and ends with 10 digits that are all different.
If you want to check (and extract) if a longer string contains a 10 digit number with each number different use this
((?:([0-9])(?!.*\2)){10})*
You can then use a numbered reference to extract the matching number

Works every time (I see this question) -
Revised to define Grp 10 before the (?! \10 ) assertion. \1-\9 are always considered backrefs (> \10, the parenth's must be before it is referenced).
So made them all the same as well.
Note- this can be used to find a floating (substring) 10 uinque digit number. Requires no anchors.
Fyi - With Perl, the \g{#} (or \k'name') syntax could be used before the group is defined, no matter what number the group number is.
# "(?:((?!\\1)1)|((?!\\2)2)|((?!\\3)3)|((?!\\4)4)|((?!\\5)5)|((?!\\6)6)|((?!\\7)7)|((?!\\8)8)|((?!\\9)9)|((?!\\10)0)){10}"
(?:
( # (1)
(?! \1 )
1
)
| ( # (2)
(?! \2 )
2
)
| ( # (3)
(?! \3 )
3
)
| ( # (4)
(?! \4 )
4
)
| ( # (5)
(?! \5 )
5
)
| ( # (6)
(?! \6 )
6
)
| ( # (7)
(?! \7 )
7
)
| ( # (8)
(?! \8 )
8
)
| ( # (9)
(?! \9 )
9
)
| ( # (10)
(?! \10 )
0
)
){10}

Regex that allows a single whitespace in the middle but with character limit.

Excuse my ignorance but I really need help with this, I need this regex: [A-Za-z0-9]+\s?[A-Za-z0-9]+ (an username that allows a single whitespace in the middle, but not at the beginning or at the end.), but limiting the total amount of characters to minimum 3 and maximun 30.
I have tried to adapt this answer using negative lookaheads, but so far is not working.
It has to be a regex, it can't use jQuery or anything else.

You may use a positive lookahead here:
^(?=.{3,30}$)[A-Za-z0-9]+(?:\s[A-Za-z0-9]+)?$
See the regex demo.
Details:
^ - start of string
(?=.{3,30}$) - there can be 3 to 30 chars (other than linebreak, replace the . with [A-Za-z0-9\s] to be more specific)
[A-Za-z0-9]+ - 1+ alphanumeric chars
(?:\s[A-Za-z0-9]+)? - an optional (1 or 0) occurrences of a
\s - whitespace
[A-Za-z0-9]+ - 1+ alphanumeric symbols
$ - end of string.

You can use:
(?=^[A-Za-z0-9]+\s?[A-Za-z0-9]+$).{3,30}
See a demo on regex101.com. It will match:
username123 # this one
user name 123 # this one not (two spaces!)
user name123 # this one
u sername123 # this one
username123 # this one not (space in the beginning!)

Can a regular expression match a character at the beginning OR end of the string (but not both)?

I'm writing a regular expression to validate euro currency strings. It allows several different formats, since some locales use decimal points for thousands separators, some use spaces, some put the € at the beginning and some put the € at the end. Here's what I've come up with:
/^(€ ?)?\-?([1-9]{1,3}( \d{3})*|[1-9]{1,3}(\.\d{3})*|(0|([1-9]\d*)?))(,[0-9]{2})?( ?€)?$/
This is working for the following tests:
valid:
123 456,78
123.456,78
€6.954.231
€ 896.954.231
16.954.231 €
12 346 954 231€
€10,03
10,03
1,39
,03
0,10
€10567,01
€ 0,01
€1 234 567,89
€1.234.567,89
invalid
1,234
€ 1,1
50#,50
123,#€
€€500
0001
€ ,001
€0,001
12.34,56
123456.123.123456
One problem with this is it validates a string with the euro symbol on both ends, e.g. €123€. This is probably acceptable for my purposes, but is there a way to make a compact RegEx that only allows that character at one end and not both, or do I just have to write one that's twice as long, checking first for a valid string with optional € at the beginning and then a valid string with optional € at the end?
UPDATE
The one in the accepted answer still has a few false positives. I ended up writing a function that takes several options to customize the validator. It's the isCurrency function in this library. Still uses the lookahead to avoid certain edge cases, which was the key to answering this question.

With lookahead this would work
^(?!€*$)(€ ?(?!.*€)(?=,?\d))?\-?([1-9]{1,3}( \d{3})*|[1-9]{1,3}(\.\d{3})*|(0|([1-9]\d*)?))(,[0-9]{2})?( ?€)?$
See: https://regex101.com/r/aR4xR8/8
#Necreaux deserves the credit for pointing at lookahead first!

Depending on your regex engine you might be able to do this with a negative lookahead.
^€(?!(.*€))

You can use this pattern:
^
(?=(.)) # you capture the first character in a lookahead
(?:€[ ]?)?
(?:
[1-9][0-9]{0,2}
(?:
([ .]) [0-9]{3} (?: \2 [0-9]{3})*
|
[0-9]*
)
(?:,[0-9]{2})?
|
0?,[0-9]{2}
)
(?:
[ ]?
(?!\1)€ # you test if the first character is not an €
)?
$
online demo
The idea is to capture the first character and to test if it isn't the same at the end.
To use it with javascript you need to remove the formatting:
var re = /^(?=(.))(?:€ ?)?(?:[1-9][0-9]{0,2}(?:([ .])[0-9]{3}(?:\2[0-9]{3})*|[0-9]*)(?:,[0-9]{2})?|0?,[0-9]{2})(?: ?(?!\1)€)?$/;
About this way: the only interest is the shortness. If you want the performance the best way is to write literally the two possibilities:
var re = /^(?:€ ?(?:[1-9][0-9]{0,2}(?:([ .])[0-9]{3}(?:\1[0-9]{3})*|[0-9]*)(?:,[0-9]{2})?|0?,[0-9]{2})|(?:[1-9][0-9]{0,2}(?:([ .])[0-9]{3}(?:\2[0-9]{3})*|[0-9]*)(?:,[0-9]{2})?|0?,[0-9]{2})(?: ?€)?)$/;
It's more long to write, but it reduces the regex engine work.
With regex engines that support conditional subpatterns like PCRE, you can write this:
\A
(€ ?)?
(?:
[1-9][0-9]{0,2}
(?: ([ .]) [0-9]{3} (?:\2[0-9]{3})* | [0-9]*)
(?:,[0-9]{2})?
|
0?,[0-9]{2}
)
(?(1)| ?€)
\z
Where (?(1)| ?€) is an if..then..else: (?(condition)true|false) that checks if the capture group 1 is defined.

you can split your Regex in two party and combine them with '|'.
one for anything atarting with € and the other for € at the end.
/(^(€ ?)?\-?([1-9]{1,3}( \d{3})*|[1-9]{1,3}(\.\d{3})*|(0|([1-9]\d*)?))(,[0-9]{2})?$)|(^\-?([1-9]{1,3}( \d{3})*|[1-9]{1,3}(\.\d{3})*|(0|([1-9]\d*)?))(,[0-9]{2})?( ?€)?$)/
Edit:
sry I missed your last sentence.
I think the easiest is to write the regex twice as long.

This is the closest I've been able to come. It uses negative lookahead to make sure that the string doesn't begin and end with the euro symbol €:
^(?!€.*€$)€?\s*(0|[1-9][0-9]{0,2})?([. ]?[0-9]{3})*(,[0-9]{2})?\s*€?$
See Regex 101 Demo here for full explanation and examples. As you can see it passes all of your tests, but it lets a couple of bad ones through. I'm sure the digit portion can be tweaked so that it works for you. The part that makes sure there are not two euro symbols is just this:
^(?!€.*€$)€?\s*<digit validation goes here>\s*€?$
Negative lookahead makes sure the string doesn't start and end with the euro symbol, then it checks for optional euro symbol at start followed by an arbitrary # of spaces, validates the digits, then checks for an arbitrary # of spaces and a euro symbol at the end.

Allowing the RegEx to start with a period or minus sign

I've spent hours tring to find a solution. I have the following RegEx:
(?=.)^(([1-9][0-9]{0,8})|([0-9]))?(\.[0-9]{1,2})?$
I want to add the ability for the first character to match a minus sign but still match the rest of the stated RegEx.
I need these to work:
.0
.34
-.34
-30.0
-33.03
-34
-2
I need these to fail:
-04.4
043
3.
-
$34.33
1234567890.23
(any non-numeric character)
Thank you for your assistance.

You can use this regex:
^-?(?:[1-9][0-9]{0,8}(?:\.[0-9]{1,2})?|\.[0-9]{1,2})$
RegEx Demo
EDIT: If you want to allow 0.45 as valid input then use:
^-?(?:[1-9][0-9]{0,8}(?:\.[0-9]{1,2})?|0*\.[0-9]{1,2})$

Adding the optional -? at the correct place should do the trick.
Also I'm fairly sure you don't need all these capturing groups (see demo here):
^-?(?=.)(?:[1-9][0-9]{0,8}|0)?(?:\.[0-9]{1,2})?$
^-? # optional leading -
(?=.) # followed by at least one character
(?: # non capturing group
[1-9][0-9]{0,8} # number without leading 0
| # or
0 # single 0
)? # integer part is optional
(?:\.[0-9]{1,2})?$ # decimal part

This should work for you
^(?:-[1-9.]{1}[0-9]*|\.|0\.)\.?[0-9]{0,2}$
Demo

Javascript Regexp for all numeric and decimal point format

i'd like to make a javascript validation that will accept all numeric and decimal point format.
For example :
1,000,000.00 is OK
1.000.000,00 is OK
1.000,000.00 is not OK
1.000,000,00 is not OK
1,000.000,00 is not OK
1,000.000.00 is not OK
Based on what i got here is :
/^[1-9][0-9]{0,2}(,[0-9]{3})*(\.[0-9]{2})?$/ is only valid for 1,000,000.00 not for 1.000.000,00
How can i validate both format ?
Updated :
What if the thousand points are not compulsory such as :
1000000.00 is OK or
1000000,00 is OK

Assuming that the decimal part and thousands separators are mandatory, not optional, and that 0 is not an allowed value (as suggested by your examples and your regex):
^[1-9]\d{0,2}(?:(?:,\d{3})*\.\d{2}|(?:\.\d{3})*,\d{2})$
As a verbose regex:
^ # Start of string
[1-9]\d{0,2} # 1-3 digits, no leading zero
(?: # Match either...
(?:,\d{3})* # comma-separated triple digits
\.\d{2} # plus dot-separated decimals
| # or...
(?:\.\d{3})* # dot-separated triple digits
,\d{2} # plus comma-separated decimals
) # End of alternation
$ # End of string

Here is the regex that you want..
^(([1-9]\d{0,2}(((\.\d{3})*(,\d{2})?)|((,\d{3})*(\.\d{2})?)))|(0(\.|,)\d{1,2})|([1-9]\d+((\.|,)\d{1,2})?))$
This is the link that proves that it can handles all cases
http://regexr.com?2tves

The best way to look at a regular expression this big is to blow it up to
a very large font and split it on the alternatives (|)
var s='1,000,000.00';// tests
var result= /(^\d+([,.]\d+)?$)/.test(s) || // no thousand separator
/((^\d{1,3}(,\d{3})+(\.\d+)?)$)/.test(s) || // comma thousand separator
/((^\d{1,3}(\.\d{3})+(,\d+)?)$)/.test(s); // dot thousand separator
alert(result)
Put together its a brute-
function validDelimNum2(s){
var rx=/(^\d+([,.]\d+)?$)|((^\d{1,3}(,\d{3})+(\.\d+)?)$)|((^\d{1,3}(\.\d{3})+(,\d+)?)$)/;
return rx.test(s);
}
//tests
var N= [
'10000000',
'1,000,000.00',
'1.000.000,00',
'1000000.00',
'1000000,00',
'1.00.00',
'1,00,00',
'1.000,00',
'1000,000.00'
]
var A= [];
for(var i= 0, L= N.length; i<L; i++){
A.push(N[i]+'='+validDelimNum2(N[i]));
}
A.join('\n')
/* returned values
10000000=true
1,000,000.00=true
1.000.000,00=true
1000000.00=true
1000000,00=true
1.00.00=false
1,00,00=false
1.000,00=true
1000,000.00=false
*/

The simplest (though not most elegant by far) method would be to write analogous RE for another case and join them with 'OR', like this:
/^(([1-9][0-9]{0,2}(,[0-9]{3})*(\.[0-9]{2})?)|([1-9][0-9]{0,2}(\.[0-9]{3})*(,[0-9]{2})?))$/
UPDATE
A little cleaned up version
/^[1-9]\d{0,2}(((,\d{3})*(\.\d{2})?)|((\.\d{3})*(,\d{2})?))$/

You can replace the , and \. with [,.] to accept either in either location. It would also make 1,000.000.00 OK though.
Its harder to make the regexp behave like that in JavaScript because you can't use lookbehinds
/^(0|0[.,]\d{2}|[1-9]\d{0,2}((,(\d{3}))*(\.\d{2})?|(\.(\d{3}))*(,\d{2})?))$/
/^ #anchor to the first char of string
( #start group
0 # 0
| # or
0[.,] # 0 or 0 followed by a . or ,
\d{2} # 2 digits
| # or
[1-9] #match 1-9
\d{0,2} #0-2 additional digits
( #start group
(,(\d{3}))* # match , and 3 digits zero or more times
(\.\d{2})? # match . and 2 digits zero or one
| # or
(\.(\d{3})* # match . and 3 digits zero or more times
(,\d{2})? # match , and 2 digits zero or one time
) #end group
) #end group
$/ #anchor to end of string
http://jsfiddle.net/AC3Bm/

We Keep Coding

JavaScript is the programming language of the Web.

How to distinguish numbers from ip addresses in regular expressions? - javascript

Another option for the example data might be: (?<!\S)\d+\.\d+\b(?!\.\d) Explanation (?<!\S) Negative lookbehind, assert a whitespace boundary to the left \d+\.\d+\b Match 1+ digits . 1+ digits (?!\.\d) negative lookahead, assert not . followed by a digit to the right See a regex101 demo.

You can use lookarounds in conjunction with word-boundaries: \b(?<!\d\.)\d+\.\d+\b(?!\.\d) or, in place of the last word-boundary, making the dot optional in the lookahead: \b(?<!\d\.)\d+\.\d+(?!\.?\d) or, with perl only, using a possessive quantifier: \b(?<!\d\.)\d+\.\d++(?!\.\d)

Related

Javascript regex to match non repeating 9 digit number [duplicate]

Regex that allows a single whitespace in the middle but with character limit.

Can a regular expression match a character at the beginning OR end of the string (but not both)?

Allowing the RegEx to start with a period or minus sign

Javascript Regexp for all numeric and decimal point format

Categories

Resources