The way you submit form data with LWP depends on whether the form's action is GET or POST. If it's a GET form, you construct a URL with encoded form data (possibly using the $url->query_form( ) method) and call $browser->get( ). If it's a POST form, you call to call $browser->post( ) and pass a reference to an array of form parameters. We cover POST later in this chapter.
If you know everything about the GET form ahead of time, and you know everything about what you'd be typing (as if you're always searching on the name "Dulce"), you know the URL! Because the same data from the same GET form always makes for the same URL, you can just hardcode that:
$resp = $browser->get( 'http://www.census.gov/cgi-bin/gazetteer?city=Dulce&state=&zip=' );
And if there is a great big URL in which only one thing ever changes, you could just drop in the value, after URL-encoding it:
use URI::Escape ('uri_escape'); $resp = $browser->get( 'http://www.census.gov/cgi-bin/gazetteer?city=' . uri_escape($city) . '&state=&zip=' );
Note that you should not simply interpolate a raw unencoded value, like this:
$resp = $browser->get( 'http://www.census.gov/cgi-bin/gazetteer?city=' . $city . # wrong! '&state=&zip=' );
The problem with doing it that way is that you have no real assurance that $city's value doesn't need URL encoding. You may "know" that no unencoded town name ever needs escaping, but it's better to escape it anyway.
If you're piecing together the parts of URLs and you find yourself calling uri_escape more than once per URL, then you should use the next method, query_form, which is simpler for URLs with lots of variable data.
The tidiest way to submit GET form data is to make a new URI object, then add in the form pairs using the query_form method, before performing a $browser->get($url) request:
$url->query_form(name => value, name => value, ...);
For example:
use URI; my $url = URI->new( 'http://www.census.gov/cgi-bin/gazetteer' ); my($city,$state,$zip) = ("Some City","Some State","Some Zip"); $url->query_form( # All form pairs: 'city' => $city, 'state' => $state, 'zip' => $zip, ); print $url, "\n"; # so we can see it
Prints:
http://www.census.gov/cgi-bin/gazetteer?city=Some+City&state=Some+State&zip=Some+Zip
From this, it's easy to write a small program (shown in Example 5-1) to perform a request on this URL and use some simple regexps to extract the data from the HTML.
#!/usr/bin/perl -w # gazetteer.pl - query the US Cenus Gazetteer database use strict; use URI; use LWP::UserAgent; die "Usage: $0 \"That Town\"\n" unless @ARGV == 1; my $name = $ARGV[0]; my $url = URI->new('http://www.census.gov/cgi-bin/gazetteer'); $url->query_form( 'city' => $name, 'state' => '', 'zip' => ''); print $url, "\n"; my $response = LWP::UserAgent->new->get( $url ); die "Error: ", $response->status_line unless $response->is_success; extract_and_sort($response->content); sub extract_and_sort { # A simple data extractor routine die "No <ul>...</ul> in content" unless $_[0] =~ m{<ul>(.*?)</ul>}s; my @pop_and_town; foreach my $entry (split /<li>/, $1) { next unless $entry =~ m{^<strong>(.*?)</strong>(.*?)<br>}s; my $town = "$1 $2"; next unless $entry =~ m{^Population \(.*?\): (\d+)<br>}m; push @pop_and_town, sprintf "%10s %s\n", $1, $town; } print reverse sort @pop_and_town; }
Then run it from a prompt:
% perl gazetteer.pl Dulce http://www.census.gov/cgi-bin/gazetteer?city=Dulce&state=&zip= 2438 Dulce, NM (cdp) 794 Agua Dulce, TX (city) 136 Guayabo Dulce Barrio, PR (county subdivision) % perl gazetteer.pl IEG http://www.census.gov/cgi-bin/gazetteer?city=IEG&state=&zip= 2498016 San Diego County, CA (county) 1886748 San Diego Division, CA (county subdivision) 1110549 San Diego, CA (city) 67229 Boca Ciega Division, FL (county subdivision) 6977 Rancho San Diego, CA (cdp) 6874 San Diego Country Estates, CA (cdp) 5018 San Diego Division, TX (county subdivision) 4983 San Diego, TX (city) 1110 Diego Herna]Ndez Barrio, PR (county subdivision) 912 Riegelsville, PA (county subdivision) 912 Riegelsville, PA (borough) 298 New Riegel, OH (village)
Copyright © 2002 O'Reilly & Associates. All rights reserved.