Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
![]() |
Download this listing! | ![]() |
![]() |
![]() |
Linux Magazine Column 13 (Jun 2000)
[Suggested title: Moving your news service]
About a half a year ago in this column I talked about how my ISP was looking at the performance of their news server, and I wrote a program to see just how bad the news service was compared to the other local ISPs using Deja as a baseline. Well, the ISP just got bought out by a big national chain, and they decided not to fight the spotty news service any more, and just convert over to everyone using the conglomerate's big service.
But the problem with moving from one news server to another is that the article numbers are not in sync, so a ``.newsrc'' file will have the right newsgroups, but the wrong ``already read'' marks. And since I read a lot of newsgroups, I don't have time to re-read existing articles, and I don't want to just throw away any new articles because I might miss something.
The solution is a bit complicated and has extensive bookkeeping requirements, but that's what computers are for, and Perl in particular. What you need to do is mark read any article you've already seen. Messages are uniquely identified by a ``Message ID'', and you can get that mapped into article numbers via the appropriate ``XHDR'' request to the NNTP server.
So, basically, for every subscribed newsgroup, we fetch the message IDs of the last 500 articles from the new server. (500 being the maximum number of unread articles per group I'd care to face in any event.) Then, we fetch the last 1500 or so message IDs from the old server. Then, for every message ID I know about on the new server, I see if I've already read it on the old server, and if so, mark it read in the new newsrc.
The newsrc file is the classic rn format. Most modern newsreaders can import and export this format, so it's a nice least-common-denominator of exchange. And there's a good module or two in the CPAN to deal with this format as well.
There was one additional requirement, just to make this even more
interesting. My newsreading and general information processing is on
yet another ISP from where either the old or new news servers are
located. So I use ssh
tunneling to go back to the shell account
machine of the old ISP to get to its news server, and also to get to
the new server at the takeover ISP's machine, which is permitted
access only to its customers so I can't use them on my computation
server ISP. Almost as bad as trying to figure out those spy novels
with all the odd names, but most of the time this is transparent to
me. However, I had to use ssh
tunnelling to get to both the old and
new news servers, although the program was set up so that it could
also run localling on the old ISP shell machine.
It's a mess, but it works. After my ISP converted, I had a fairly nice looking new newsrc with all my previously read articles punched out already. And the program to do this all is in [the listing below], which goes as follows.
Line 2 turns on strict
mode - needed for every program that is
longer than 10 lines or used more than 10 minutes. In this case, the
first applies but not the second, since I hope I'm not changing
servers frequently.
Line 3 unbuffers standard output. There's not a lot of output from this program, and I want to see it as it comes along.
Lines 5, 6, and 7 pull in the modules we'll need. Net::NNTP
comes
from the CPAN, and lets us talk to NNTP servers. News::Newsrc
also
comes from the CPAN, and provides parsing and updating of
``newsrc''-format files. IO::FIle
is a core module installed with
Perl, and lets us have generic filehandles as objects.
Lines 9 through 23 provide the most-likely-to-be-tweaked settable variables. As always, I'm providing my programs not as ``ready-to-run'' robust programs, but as snippets for your own inspiration (steal the ideas, not the code). However, since I'll probably brush the dust off this program in another year or so when this ISP merges with another one, I'll make it easy to remember my thinking by providing a distinct configuration area.
Here, $DST_MAX
is the most number of unread articles we're willing
to tolerate on the new server. You could probably crank this up to
20000 or so if you wanted to be sure to read everything the new
server has to offer, but if you have a lot of groups, the bigger
numbers will mean slower operations. (I had about 120 subscribed
newsgroups, and it took about 10 minutes to process at my value of
500
here, if that means anything.) $SRC_MAX
is how many
articles to map in the old news server. Because articles come in in a
scrambled order, this should be a number bigger than $DST_MAX
to
ensure that we don't miss an article number mapping on the old server
that we'll need.
$OLD
and $NEW
are the old and new news server hosts,
respectively. I presumed that I'd always be using port 119 (the NNTP
port) on the addresses, although I see that it wouldn't be hard to
parameterize that. No sense in making everything too flexible for
such an infrequently used program!
$VIA
is used when I need to ssh
-tunnel the connections. It's
the hostname of the shell machine at the old ISP. (Please note that
these are not the real hostnames... the comm
suffix should be
enough of a clue not to try them.) If $VIA
is false (such as 0,
undef
, or the empty string), tunnelling won't be used, so this is
an optional step. However, if it's used, we need to select two
hopefully unused port numbers for the local tunnel ports, and those
are given in $VIA_OLD_PORT
and $VIA_NEW_PORT
.
Finally, $VERBOSE
says how noisy to be. If we turn on all the
noise, we get a pretty good complete description of where we are in
the process, and what we've accomplished. However, $VERBOSE
of 0
is just fine if you don't like peering under the hood.
Lines 25 to 33 set up the tunnel if needed. For this to work, I have
to have ssh
trained to accept connections from my workhorse ISP to
my newsreading ISP, which I needed to do for my newsreader anyway. The
crucial parts are the selection of the tunnels (the -L
parameters),
the command to run (sleep 60
), and the additional sleep for 5
seconds after firing off the ssh
to let everything warm up. The
sleep 60
is executed on the remote host, and needs to be longer
than it takes for my program to connect to the local tunnel
ports. Once the connections are established, the remote command can
terminate without any problem.
$SRC_NNTP
and $DST_NNTP
, defined in lines 34 and 35, set up the
hostname and portnumber (if needed) for the old and new news servers.
Lines 37 and 38 attempt the connection to those servers, die-ing if
things are bad.
Lines 40 and 41 create News::newsrc
objects to hold the newsrc for
the old server and the newsrc for the new server. Line 42 sets aside
a place for lines from the old newsrc that aren't really about
subscribed or unsubscribed newsgroups - apparently, News::newsrc
blows up on these.
Lines 44 to 50 grab the old newsrc information into the newsrc object.
As you can tell, this is pretty inflexible, grabbing the file directly
from my home directory. Maybe this should have been a parameter, but
I don't care, because the job got done. @extra_lines
gets all the
stuff that's not about a newgroup, while the remaining lines are
sucked into the newsrc object.
Lines 52 to 90 do the bulk of the job. For every newsgroup mentioned
in the old newsrc, we loop once with $group
set to that group. A
large eval
block protects us from premature death on any particular
newsgroup, giving us instead a group that won't be transferred to the
new newsrc.
Line 54 determines if it's a subscribed newsgroup, and if so, sends us through the bulk of lines 55 to 81 (described in a moment). If not, we skip down to lines 83 to 87 and mark the group as unsubscribed in line 84. Line 85 grabs the lowest article number still active on the news server, and line 87 ensures that we don't try reading any article number below that. (Most newsreaders do the equivalent already, but I'm trying to make an accurate newsrc here.)
Now, back to the harder part. Line 58 gets the info from the old
server about article number range present in the group. Line 60
computes a range not to exceed $SRC_MAX
items for which we must get
a ``message-id-to-article-number'' map constructed. Line 61 creates a
hash from the hashref returned by calling the NNTP XHDR
operation
for all the message IDs in the given article number range. Sure, you
can get this info one article at a time, but the XHDR
command is
very fast since it reads directly from the .overview
file that most
news servers now maintain. The result is that we have a hash called
<%src_msgid_to_art> that we can feed a message ID and get back the
article number. Since we can then see if this article number has
already been read, we'll be able to tell if we should mark it as
having been read in the new newsrc. Lines 65 and 66 do the same thing
for the other direction, figuring out what message IDs correspond to
which article numbers in the new news server.
And then it's time for the heavy bookkeeping. Lines 68 to 80 check each article in the new server for its message ID number (line 70). If that same article (line 73) has been read on the old server (line 76), we mark it as read on the new server (line 78). Not rocket science, but a lot of details to get right. At this point, we're not talking to either of the servers - all of the information is in hashes in memory.
Line 81 then marks as read anything below the articles we've
considered. This means we can never have more than $DST_MAX
articles unread.
And now that we're all done, line 93 dumps the result! I could have made it save the new newsrc directly, but I'm running this program inside a window that I can cut-n-paste, so it didn't matter.
So there you have it. I wish you the luxury of never having to move from one news server to another, but at least if you have this program and a short period of overlap, it'll ease the pain a bit when you must move.
This is my last column for Linux Magazine that will be strictly about Perl. Next month, I'll begin writing about general webmaster topics. Of course, most of those topics will probably involve some Perl solution as some code snippet, but I'll be able to look at some non-Perl things as well. I hope you enjoy the new format as much as you've said you've liked this column in the past. Until next time, go forth and be Perl-y!
Listings
=1= #!/usr/bin/perl =2= use strict; =3= $|++; =4= =5= use Net::NNTP; =6= use News::Newsrc; =7= use IO::File; =8= =9= ## config =10= =11= my $DST_MAX = 500; =12= my $SRC_MAX = $DST_MAX * 3; =13= =14= my $OLD = "news.old-isp.comm"; =15= my $NEW = "news.big-mega-isp.comm"; =16= =17= my $VIA = "shell.old-isp.comm"; =18= my $VIA_OLD_PORT = 42001; =19= my $VIA_NEW_PORT = 42002; =20= =21= my $VERBOSE = 2; # 0 quiet, 1 expected errors, 2 noisy =22= =23= ## end config =24= =25= system join " ", =26= "ssh -f -q", =27= "-L $VIA_OLD_PORT:$OLD:119", =28= "-L $VIA_NEW_PORT:$NEW:119", =29= "$VIA", =30= "exec sleep 60", =31= "&", =32= "sleep 5" if $VIA; =33= =34= my $SRC_NNTP = $VIA ? "localhost:$VIA_OLD_PORT" : $OLD; =35= my $DST_NNTP = $VIA ? "localhost:$VIA_NEW_PORT" : $NEW; =36= =37= my $src = Net::NNTP->new($SRC_NNTP) or die "src: $!"; =38= my $dst = Net::NNTP->new($DST_NNTP) or die "dst: $!"; =39= =40= my $src_rc = News::Newsrc->new or die "Cannot new newsrc for src"; =41= my $dst_rc = News::Newsrc->new or die "Cannot new newsrc for dst"; =42= my @extra_lines = (); =43= =44= { =45= my $newsrc = IO::File->new("$ENV{HOME}/.newsrc", "r") =46= or die "Cannot open .newsrc: $!"; =47= my @all = <$newsrc>; =48= @extra_lines = grep !/^\S+[:!]\s/, @all; =49= $src_rc->_scan(join "", grep /^\S+[:!]\s/, @all); # dies if fail =50= } =51= =52= for my $group ($src_rc->groups) { =53= eval { =54= if ($src_rc->subscribed($group)) { =55= print "subscribed to $group\n" if $VERBOSE > 1; =56= $dst_rc->subscribe($group); =57= =58= (undef, my $src_low, my $src_high) = $src->group($group) =59= or die "Cannot get info for src $group\n"; =60= $src_low = $src_high - $SRC_MAX if $src_low < $src_high - $SRC_MAX; =61= my %src_msgid_to_art = reverse %{$src->xhdr("Message-Id", "$src_low-$src_high")}; =62= (undef, my $dst_low, my $dst_high) = $dst->group($group) =63= or die "Cannot get info for dst $group\n"; =64= =65= $dst_low = $dst_high - $DST_MAX if $dst_low < $dst_high - $DST_MAX; =66= my %dst_art_to_msgid = %{$dst->xhdr("Message-Id", "$dst_low-$dst_high")}; =67= =68= for my $dst_art ($dst_low..$dst_high) { =69= eval { =70= my $msgid = $dst_art_to_msgid{$dst_art} or =71= die "no msgid for $dst_art in $group at dst\n"; =72= ## next; =73= my $src_art = $src_msgid_to_art{$msgid} or =74= die "no art for $msgid in $group at src\n"; =75= ## next; =76= next unless $src_rc->marked($group,$src_art); =77= print "mapping $msgid from $src_art to $dst_art\n" if $VERBOSE > 1; =78= $dst_rc->mark($group, $dst_art); =79= }; warn $@ if $@ and $VERBOSE; =80= } =81= $dst_rc->mark_range($group, 1, $dst_low - 1); =82= } else { =83= print "unsubscribed to $group\n" if $VERBOSE > 1; =84= $dst_rc->unsubscribe($group); =85= (undef, my $dst_low, my $dst_high) = $dst->group($group) =86= or die "Cannot get info for dst $group\n"; =87= $dst_rc->mark_range($group, 1, $dst_low - 1) if $dst_low; =88= } =89= }; warn $@ if $@ and $VERBOSE; =90= } =91= =92= print "==== RESULT ====\n"; =93= print @extra_lines, $dst_rc->_dump;