Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
![]() |
Download this listing! | ![]() |
![]() |
![]() |
Linux Magazine Column 83 (Jul 2006)
[Suggested title: ``Progress Bars for Download'']
If you're like me, you spend a lot of time getting things over your
net connection, downloading them to your desktop machine (or in my
case, my laptop which is my only machine). One of the things I
find myself doing frequently is watching the output of curl
as it
keeps me up to date on how much has been downloaded, and how much
longer it'll take to do the rest.
I recently stumbled across Term::ProgressBar
. This CPAN module can
draw a labeled progress bar, to show how much of a task has been
completed. The bar has a major part drawn with nice =
characters,
labeled by percentage, and a little flying *
that shows the
percentage complete within each one of the =
steps. The bar is
drawn in such a way that successive invocations overwrite the previous
one, creating the illusion that it's just ``growing'' as we make
progress.
One of the nice features of Term::ProgressBar
is that it also notes
the times when the update is called, and can give an estimate of how
many hours, minutes, and seconds it'll be before the task is complete.
This is done automatically without any work on the caller's part,
except for requesting the option. It was this particular feature that
had me think that I could emulate what curl
does during a download
with a nice little progress bar. I knew I could hook those values
of the download-in-progress with a LWP::UserAgent
content callback,
and the result is in [listing one, below].
Line 1 declares my path to Perl, and enables warnings throughout the
program. I still use -w
instead of use warnings
, mostly because
I'm lazy and habitual. The problem with using -w
is that it
enables warnings globally, even for code I didn't write or test. With
use warnings
, only the files (or smaller lexical scope) in which it
appears will have warnings enabled.
Line 2 enables the standard Perl restrictions, disabling arbitrary barewords and symbolic references, and requiring simple variables to be declared lexically.
Lines 4 through 6 pull in the three modules I've installed from CPAN.
Term::ProgressBar
provides the progress bar described
earlier. URI
and LWP::UserAgent
are part of the Bundle::LWP
:
the useful collection of modules to deal with everything about the web
except for CGI.
Line 8 creates my virtual user agent in $ua
, acting as a client
for HTTP transactions. I'll be using this user agent object as I
might use a browser, telling it to fetch a particular URL. Various
configuration options exist for a LWP::UserAgent
object, such as
what kind of browser it tells the server it might be; however, I've
left all the settings at the default because, yes, I'm lazy.
Note that some servers care about the browser identification, and I might want to go back and reconfigure this user agent to have it pretend to be a certain version of Internet Explorer or Firefox to access certain ``restricted pages''. Yes, the server trusts an arbitrary string sent by the browser, and some sites use that string to control access. How silly.
Lines 10 through 55 loop once for each URL specified on the command
line. The loop is exited when @ARGV
is finally empty, which
happens eventually because the first item of @ARGV
is shifted off
into $url
in line 11. Line 12 shows the URL that we're currently
trying to download.
Lines 14 to 18 try to figure out a suitable local filename for the
downloaded information. I wanted to emulate curl -O
by taking the
last component of the path as the name, so I pulled out the URI
module to do the parsing.
Line 14 creates a URI
object from the requested URL. Line 15 grabs
just the path part of the URL. That's the section after the host,
but before the optional query string. Line 16 removes everything from
the path up to the final (or only) slash. At this point, $path
is
a candidate for a filename. However, if it's empty (the path ends in
slash, for example), I force it to be download
instead in line 17.
And finally, because I want it to be a new file, I just add X in front
of the name until the file doesn't exist locally in line 18.
Yes, that's a pretty hokey chunk of code there, but it was good enough for the few samples with with which I used it.
Once I have a filename, line 20 opens up a handle to that file, using
a lexical filehandle and 3-argument open
, which works fine on modern
Perl versions, but probably won't work if you haven't upgraded Perl
since 1998.
Lines 22 to 24 create the progress bar object. I'm selecting a label
of Download
, which seems appropriate at this point, along with an
initial guess at the total size as 1024 bytes. Later, I'll be
updating this amount with either a better guess, or at the actual
bytes as reported by the server. Finally, I'm also enabling
estimated-time mode, using linear approximation (the only choice
possible).
Line 26 establishes $output
. I'll be using this to count the bytes
downloaded so far, so I'll start with 0.
Line 27 defines a boolean flag, $target_is_set
, initially false.
When I've seen a good length from the server, I'll use it as the final
target value for upper bound, and set this to true. This keeps me
from having to repeatedly check for the value on each iteration, which
seemed wasteful.
Line 28 holds the number of bytes I should see before updating the bar again. On each bar update, I'm told how long to wait before a half-second would have passed in terms of bytes downloaded. By paying attention to this value, I can optimize the number of calls I make to update the bar.
Lines 29 to 50 perform the download, by calling the get
method
against the user agent. Line 30 defines the desired URL for this
request.
Lines 31 to 50 define a content callback. Normally, as the
LWP::UserAgent
object is fetching the reply, the ``content'' is
loaded into the object, and available only when the entire response
has been seen (by calling the content
method on the response
object). However, we can define a callback subroutine which will
be called as each chunk is observed from the server.
In this case, as each chunk is observed, we'll get a call to our
subroutine beginning in line 31 (an anonymous subroutine is being used
here). The subroutine will be passed three values: the chunk of data
that has been read ($chunk
in line 32), the response object as
constructed so far ($response
), and the protocol handler object
($protocol
). I'm not using the $protocol
object at all, but the
other two are very important.
Lines 34 to 41 attempt to update the total-bytes target, unless we've
already done this for this download (noted because $target_is_set
is set). Line 35 reaches into the response object, looking for the
content_length
header in the web server's response. If that's been
provided, we can get a clearer idea of percentage of text seen so far.
If we know, we'll set the content length as the target in line 36, noting that we've already done that in line 37. However, for many downloads (especially those created dynamically), the server has no idea how many bytes it will eventually send. So in line 39, I fake up a target that is everything seen so far, plus the chunk we've just seen, plus perhaps one more chunk just like it. It's wrong, but there's no right value anyway, and we keep seeing the value as ``almost there''.
Once I've updated the target, it's time to actually write the data
that has been seen. I update the total bytes seen so far in line 43,
and then print the data to the handle in line 44. Actually, I could
eliminate the $output
variable by using -s
on the filehandle
every time I need it, since those numbers should be the same.
However, that would be making an operating system request repeatedly
for information that I can easily calculate, so why not just calculate
it?
Lines 46 to 48 update the bar. Initially, $next_so_far
is 0, so we
call this method on the first chunk of data we see. That will draw
the initial bar with the initial guess of maximum bytes (possibly an
accurate value directly from the server), and leave room for a ``time
remaining'' value that will be updated after a few more calls. The
return value from the update will modify $next_so_far
, giving us
the suggestion to not call update
again until we've seen that many
bytes. As mentioned earlier, this is an optimization so that the bar
is updated roughly every half second based on calls made seen
previously for this progress bar. I could completely ignore this
value, and just call update
on each chunk, and the result would be
similar, although a lot more output will be generated.
Once the download is complete, the call to get
in line 29 returns,
and we move on with the next step of the program. I want the bar to
read ``100% downloaded'' when I'm done. I know the total length in
$output
, so I call target
in line 52 to say ``yes, this number of
bytes is 100%''. I also call update
to say ``yes, I've seen exactly
this many bytes'' in line 53. And I'm done with that file!
So there you have it: emulating curl
's download time and percentage using
Term::ProgressBar
. Hopefully, you've seen enough to add progress bars to
your own applications. Also, check out Tk::ProgressBar
and
CGI::ProgressBar
in the CPAN for graphic and web-based applications, and
Smart::Comments
for automatically adding progress bars to your loops.
Until next time, enjoy!
LISTING
=1= #!/usr/bin/perl -w =2= use strict; =3= =4= use Term::ProgressBar; =5= use URI; =6= use LWP::UserAgent; =7= =8= my $ua = LWP::UserAgent->new; =9= =10= while (@ARGV) { =11= my $url = shift; =12= print "$url:\n"; =13= =14= my $uri = URI->new($url); =15= my $path = $uri->path; =16= $path =~ s{.*/}{}; =17= $path = "download" unless length $path; =18= $path = "X$path" while -e $path; =19= =20= open my $outhandle, ">", $path or die "Cannot create $path: $!"; =21= =22= my $bar = Term::ProgressBar->new({ name => 'Download', =23= count => 1024, =24= ETA => 'linear'}); =25= =26= my $output = 0; =27= my $target_is_set = 0; =28= my $next_so_far = 0; =29= $ua->get =30= ($url, =31= ":content_cb" => sub { =32= my ($chunk, $response, $protocol) = @_; =33= =34= unless ($target_is_set) { =35= if (my $cl = $response->content_length) { =36= $bar->target($cl); =37= $target_is_set = 1; =38= } else { =39= $bar->target($output + 2 * length $chunk); =40= } =41= } =42= =43= $output += length $chunk; =44= print {$outhandle} $chunk; =45= =46= if ($output >= $next_so_far) { =47= $next_so_far = $bar->update($output); =48= } =49= =50= }); =51= =52= $bar->target($output); =53= $bar->update($output); =54= =55= }