Importing bookmark.htm with Regular Expressions
The one good thing about Netscape is the bookmark.htm file, if you have ever
tried to copy all your URL files to a disk you know how much longer it takes,
and as for uploading a URL file, forget about it.
Here we will be looking at how to extract the bookmarks from the file.
Now getting the href from and anchor tag is quite easy. Here we are going to be
extracting the path to the bookmark, and this makes it a little more difficult.
The regular expression we use to do this is massive.
12 | Dim r As New System.Text.RegularExpressions.Regex("(HREF=""(?<href>[^""]+)""[\w\W]*?ADD_DATE=""(?<add_date>[^""]+)""[\w\W]*?LAST_VISIT=""(?<last_visit>[^""]+)""[\w\W]*?LAST_MODIFIED=""(?<last_modified>[^""]+)""[\w\W]*?>(?<title>[^<]+)<)|(<H3[\w\W]*?>(?<folder>[^<]+)<)|(</DL>(?<back>[^p]+)p>)") |
However, do not worry; it is actually three straightforward expressions joined
together. This is an example of using | (or) to select between one of the three
patterns in which we are interested.
an anchor tag, containing information about the bookmark
a folder title, indicating entry into a new folder
the literal </DL><p>, indicating the end of a folder
Once you have extracted the bookmarks, you can display them in your own format,
or go that little bit further are check that none of them are dead links.



















