Converting Relative URLs to Absolute (Perl & LWP)

start page | rating of books | rating of authors | reviews | copyrights

4.4. Converting Relative URLs to Absolute

By far the most common task involving URLs is converting relative URLs to absolute ones. The new_abs( ) method does all the hard work:

$abs_url = URI->new_abs(relative, base);

If rel_url is actually an absolute URL, base_url is ignored. This lets you pass all URLs from a document through new_abs( ), rather than trying to work out which are relative and which are absolute. So if you process the HTML at http://www.oreilly.com/catalog/ and you find a link to pperl3/toc.html, you can get the full URL like this:

$abs_url = URI->new_abs('pperl3/toc.html', 'http://www.oreilly.com/catalog/');

Another example:

use URI;
my $base_url = "http://w3.thing.int/stuff/diary.html";
my $rel_url  = "../minesweeper_hints/";
my $abs_url  = URI->new_abs($rel_url, $base_url);
print $abs_url, "\n";
http://w3.thing.int/minesweeper_hints/

You can even pass the output of new_abs to the canonical method that we discussed earlier, to get the normalized absolute representation of a URL. So if you're parsing possibly relative, oddly escaped URLs in a document (each in $href, such as you'd get from an <a href="..."> tag), the expression to remember is this:

$new_abs = URI->new_abs($href, $abs_base)->canonical;

You'll see this expression come up often in the rest of the book.