So the only valid digits are arabic numbers but arabic script numbers are not a valid digit?
Some people writing Regex implementations have that opinion. I’ve refrained from saying mine.
If we want programming to be inclusive then doesn’t that make sense to also include the arabic script number?
Maybe. IMO, number tests should be chosen/implemented based on the project’s requirements. If you want to include every Unicode character or string pattern anyone’s ever used to convey a numeric value, that would be a long and growing list. Arguably, it’s impossible: the word “elf” means a number if interpreted as German for “eleven” but not if interpreted as English for 🧝.
Yeah, but “elf” are not digits. Digits are a symbol abstracted from the language itself. Does 5 and V convey different meanings in the context of digits? And yeah, I can see why they would argue about the implementation because inclusivity is important. Especially when designing a language implementation. If you are designing it wrong, it will be very hard to extend it in the future. But for application level implementation, go nuts.
As I said, a digit is a symbol. Much like how we use letters to compose words, digits are used to construct numbers. When you start to repeat or reuse the symbol then it is no longer a singular symbol (what regex \d does). Hence my comments on why arabic script are one of the understandable debates since i18n is a valid concern as much as a11y is.
You are right, “elf” is a stretch, it does not make sense to parse it as a number. But in some languages, the string “15 240,5” is just how a number is written (yes, that’s a U+2009 THIN SPACE, you can’t stop me from using it as a thousand separator in German). Obviously, despite having a , on their numpads, German programmers still expect computers to parse numbers with decimal dots and interpret commas as list values.
Alright, maybe you misunderstood the term digits with numbers. When parsing a digit, you do not attach semantic yet to the building blocks. A \d regex parser does not care that the string “555” is not equivalent to “VVV”. All it cares about is that there is the digit “5” or “V”. In the same vein, regex parser should not try to parse IV as a single symbol.
It’s not just digits. Nobody is expecting it to understand language yet but the parser is-number still returns true for "2e3" or "0x0F". It tells you whether the string can be interpreted as a real numeric value.
Yeah, hence is-“number”. But we were talking about regex are we. A number representation can use digits but it can also not. Much like how you make a number using the word “elf”.
Some people writing Regex implementations have that opinion. I’ve refrained from saying mine.
Maybe. IMO, number tests should be chosen/implemented based on the project’s requirements. If you want to include every Unicode character or string pattern anyone’s ever used to convey a numeric value, that would be a long and growing list. Arguably, it’s impossible: the word “elf” means a number if interpreted as German for “eleven” but not if interpreted as English for 🧝.
Yeah, but “elf” are not digits. Digits are a symbol abstracted from the language itself. Does 5 and V convey different meanings in the context of digits? And yeah, I can see why they would argue about the implementation because inclusivity is important. Especially when designing a language implementation. If you are designing it wrong, it will be very hard to extend it in the future. But for application level implementation, go nuts.
But elf=B=11. Kinda depends on context if 11 is a digit
As I said, a digit is a symbol. Much like how we use letters to compose words, digits are used to construct numbers. When you start to repeat or reuse the symbol then it is no longer a singular symbol (what regex \d does). Hence my comments on why arabic script are one of the understandable debates since i18n is a valid concern as much as a11y is.
You are right, “elf” is a stretch, it does not make sense to parse it as a number. But in some languages, the string “15 240,5” is just how a number is written (yes, that’s a
U+2009 THIN SPACE
, you can’t stop me from using it as a thousand separator in German). Obviously, despite having a,
on their numpads, German programmers still expect computers to parse numbers with decimal dots and interpret commas as list values.I feel like there shoul be an ISO/DIN to define this.
Alright, maybe you misunderstood the term digits with numbers. When parsing a digit, you do not attach semantic yet to the building blocks. A \d regex parser does not care that the string “555” is not equivalent to “VVV”. All it cares about is that there is the digit “5” or “V”. In the same vein, regex parser should not try to parse IV as a single symbol.
It’s not just digits. Nobody is expecting it to understand language yet but the parser
is-number
still returnstrue
for"2e3"
or"0x0F"
. It tells you whether the string can be interpreted as a real numeric value.Yeah, hence is-“number”. But we were talking about regex are we. A number representation can use digits but it can also not. Much like how you make a number using the word “elf”.