Individual Tokens (Perl & LWP)

7.3.1. Checking Image Tags

Example 7-1 complains about any img tags in a document that are missing alt, height, or width attributes:

Example 7-1. Check <img> tags

while(my $token = $stream->get_token) {
  if($token->[0] eq 'S' and $token->[1] eq 'img') {
    my $i = $token->[2]; # attributes of this img tag
    my @lack = grep !exists $i->{$_}, qw(alt height width);
    print "Missing for ", $i->{'src'} || "????", ": @lack\n" if @lack;
  }
}

When run on an HTML stream (whether from a file or a string), this outputs:

Missing for liza.jpg: height width
Missing for aimee.jpg: alt
Missing for laurie.jpg: alt height width

Identifying images has many applications: making HEAD requests to ensure the URLs are valid, or making a GET request to fetch the image and using Image::Size from CPAN to check or insert the height and width attributes.

7.3.2. HTML Filters

while (my $token = $stream->get_token) {
  if ($token->[0] eq 'S') {
    if ($token->[1] eq 'img') {
      print $token->[2]{'alt'} || '';
    } else {
      print $token->[4];
    }
  }
  elsif($token->[0] eq 'E' ) { print $token->[2] }
  elsif($token->[0] eq 'T' ) { print $token->[1] }
  elsif($token->[0] eq 'C' ) { print $token->[1] }
  elsif($token->[0] eq 'D' ) { print $token->[1] }
  elsif($token->[0] eq 'PI') { print $token->[2] }
}

So, for example, a document consisting just of this:

<!-- new entry -->
<p>Dear Diary,
<br>This is me &amp; my balalaika, at BalalaikaCon 1998:
<img src="mybc1998.jpg" src="BC1998!  WHOOO!"> Rock on!</p>

is then spat out as this:

<!-- new entry -->
<p>Dear Diary,
<br>This is me &amp; my balalaika, at BalalaikaCon 1998:
BC1998!  WHOOO! Rock on!</p>


7.2. Basic HTML::TokeParser Use		7.4. Token Sequences

7.3. Individual Tokens

7.3.1. Checking Image Tags

Example 7-1. Check <img> tags

7.3.2. HTML Filters