By far the most common task involving URLs is converting relative URLs to absolute ones. The new_abs( ) method does all the hard work:
$abs_url = URI->new_abs(relative, base);
If rel_url is actually an absolute URL, base_url is ignored. This lets you pass all URLs from a document through new_abs( ), rather than trying to work out which are relative and which are absolute. So if you process the HTML at http://www.oreilly.com/catalog/ and you find a link to pperl3/toc.html, you can get the full URL like this:
$abs_url = URI->new_abs('pperl3/toc.html', 'http://www.oreilly.com/catalog/');
Another example:
use URI; my $base_url = "http://w3.thing.int/stuff/diary.html"; my $rel_url = "../minesweeper_hints/"; my $abs_url = URI->new_abs($rel_url, $base_url); print $abs_url, "\n"; http://w3.thing.int/minesweeper_hints/
You can even pass the output of new_abs to the canonical method that we discussed earlier, to get the normalized absolute representation of a URL. So if you're parsing possibly relative, oddly escaped URLs in a document (each in $href, such as you'd get from an <a href="..."> tag), the expression to remember is this:
$new_abs = URI->new_abs($href, $abs_base)->canonical;
You'll see this expression come up often in the rest of the book.
Copyright © 2002 O'Reilly & Associates. All rights reserved.