miércoles, 1 de octubre de 2008

Regular Expresions

RegularExpressionLogo

Hi, again…

Regular expressions are extremely useful, you can think on them like “wildcards on steroids” (I like that phrase I found it here: http://www.regular-expressions.info/index.html), they save you a lot of time when you do a search inside a string and want to return the matches based on a pattern or when you want to do replacements based on them. I don’t really remember when I started to use Regular Expressions but now I use them quite often. So here are some tips that I’ve found so far, in Javascript:

(If you already know a lot about Regular Expressions, please go directly to my seven tip, maybe you can find it useful).

1. You can match a specific character using its hexadecimal index in the character set: e.g.: \xA9 matches the copyright symbol in the Latin-1 character set.

This one is really useful when you’re working with Unicode, there are several considerations if you plan to work with Unicode, take in mind that Unicode may encode a single grapheme as two different characters, e.g.: “à” could be “a” and “`” so many regular expressions could fail mysteriously, and it could be hard to find why.

2. Use shorthand characters when possible, e.g. : use \d instead of [0-9] and \w to match any word character (alphanumeric characters plus underscore). The regular expression will be more readable.

3. Take in mind that “.” matches (almost) any character. Well this will work different in IE and in Firefox, in IE the “.” Matches even line breaks, and in Firefox line breaks won’t be matched. This difference is really important to remember.

4. Use capture groups if you plan to do something with the results, you can define a group using “(” and “)” like in (q)(u) (this will match q followed by and u and you will have 2 capturing group, to access each group use $1 and $2 for each group. Please note that the groups index begin in “1”, the group 0 contains the entire match not a specific group.

5. Lazy Repetition feature is useful but it can be expensive, like in <.+?> that matches any html tag (invalid tags inclusive!), if you now the exactly set of characters expected is better to use <[^<>]+>, it will perform better that the “.” Character.

6. You can create a regular expression in Javascript in two ways:

a. Using literal notation: /(q)(?=u)(u)/g

b. Using: var regx = new RegExp(“(q)(?=u)(u)”,g);

Take in mind that for the second way you need to be careful when you use the special characters like “\”.

E.g.:

/((ht|f)tp(s?):\/\/)?((W){3}\.)?(\w)+((\.)(([a-zA-Z])+/?)?)+/g

Var reg = new RegExp(“((ht|f)tp(s?): / /)?((W){3}\\.)?(\w)+((\\.)(([a-zA-Z])+/?)?)+”,g)

Please note that the “\.” in the literal notation is translated to “\\.” when you use the constructor notation.

7. Make yourself a big favor (a really big favor, even if you’re a Regular Expression Guru like my friend Paolo, he is really good with Regular Expressions!), and consider use this free tool:

http://www.gskinner.com/RegExr/

gskinnertool

This online tool is as simply as impressive, it is really complete and the regular expressions created there work quite well in Javascript, you can construct your regular expressions and test them on the fly! (this tool comes to you as a courtesy of my good friend Hans who recommended me to use it, and now I simply can’t imagine work with Regular Expressions without it)

There is also a Desktop version located here: http://www.gskinner.com/RegExr/desktop/ (it needs Adobe Air installed)

or use this other tool:

http://osteele.com/tools/rework/

rework-tool

is really impresive too, and had a lot of features and it is programmed in javascript , but i don’t know why I prefer the first one :P, maybe because it looks nicer.

So that’s all so far.

No hay comentarios: