We hide and abstract things in programming all the time.
For example, the proper (RFC-compliant) regex for an email is very complicated and often implemented the wrong way.
As somebody who has written a lot of regexes, I'd rather uses this library which has the correct abstraction than google the regex for http-urls for the 512th time.
This library doesn't help you here though. It doesn't bring you any abstractions on top of regexes (e.g. parsing URLs/emails). This is merely a different syntax for matching parts of a string.
I hope no one is seriously suggesting that regular expressions should be used in programs, such as email address verification and http parsing. As well as being incredibly slow, they are hard to read, and inferior to application-specific parsers (for example, sometimes non-RFC-compliant emails are actually valid, and dealing with whitespace in html is a nightmare).
For the interactive case, the ability for them to be written quickly is what makes them so helpful, and an abstraction library could take away this advantage.
3. Sometimes a regex is faster than the overhead of a parser, so wouldn't the choice be dependent on context? In other words, regexes are not always slower, true?
4. Wouldn't some abstraction libraries utilize regexes under the hood? Would that be wrong in your view?
P.S. Some languages allow the option for very readable regexes, e.g. separate each component on its own line, with a comment.
Ah, understood. I was thinking "valid == well-formed" without knowing whether it really works (i.e., could be deleted), whereas I see you rightfully point out that it more reasonably means "it works." Thank you, makes sense.
Simply, some sites don't enforce the full set of RFC rules, as such people actually have non-RFC-compliant email addresses that are valid.
How can you 'compile' a regular expression?
For very simple regular expressions, they might be decently fast, but as soon as you start pulling out the more complicated regular expressions needed for parsing, you get slower. Even simple repeats can have a lot of overhead if not used correctly, have a look at "Looking Inside The Regex Engine" at this link http://www.regular-expressions.info/repeat.html. An equivalent parser doesn't need to do any form of backtracking, and doesn't care about the structure. For example, I've seen an application use regular expressions for html parsing. After spending a while figuring out what they actually did, I found the source html had changed its whitespace, but not the DOM structure, which broke the regular expressions.
As for my reasoning above, I think a lot of 'abstraction' libraries would be faster by operating directly on the data, instead of just converting it to regular expressions. The beauty of regular expressions is the speed at which they can be written.
For example, the proper (RFC-compliant) regex for an email is very complicated and often implemented the wrong way.
As somebody who has written a lot of regexes, I'd rather uses this library which has the correct abstraction than google the regex for http-urls for the 512th time.