Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Download this listing!

Linux Magazine Column 93 (May 2007)

[suggested title: ``Fingering myself with Twitter'']

Back in the early days of the Internet, the nets were a bit less hostile and people were a lot more open. For example, simply by typing finger merlyn@teleport.com, you could get information on my real name, mail stop, and even what project I was working on. For me, the best part of the finger response was the plan. In theory, this was the intended multi-line plan to implement the one-line project, but in practice, my friends and I usually used this as a personal soapbox, putting whatever we felt like there.

When I started touring around the US (and later, the world) to consult and to teach, I started updating my ``dot plan'' file on a regular basis to indicate the duration and general location of my future trips. This was in hopes that someone would notice that I would be in their city and perhaps invite me to a karaoke session or at least out for a meal. In practice, I can count the number of people who sought me out on the basis of my dot plan file on two hands. But it was worth a try.

As I moved from ISP to ISP, I eventually ended up at an ISP that didn't support the finger protocol any more. For me, that was the end of an era. However, I still wanted a quick way to notify people where I might be, without going to the formality of updating a web page with all the right HTML. One day, it occurred to me that I could just have my web server deliver my dot-plan file as a URL, and so now it does exactly that, at:

   http://www.stonehenge.com/merlyn/dot-plan.txt

It's a plain text file, with bits of interesting info about me. I have both my forward schedule and my sorta-current status in the file.

Recently, I stumbled across the Twitter service: an interesting ``free for now and who knows how they'll make money'' project. After registering (for free) and authenticating my cell phone and instant message client, I can now send one-line messages to indicate on a constant basis about where I am. In theory, my friends (and stalkers) would sign up for alerts delivered to their IM client or cell phone to keep track of me. In practice, I don't think anyone wants to be that bored with the trivial nature of my life. I guess we'll wait and see.

But it occurred to me that the Twitter updates I was now making were actually equivalent to the ``current status'' area of the dot-plan file. So, instead of updating my dot-plan frequently, I decided to link the two items together, by running a frequent cron job to suck the data down from Twitter, and editing it into my dot-plan file.

I also noticed that I could get an RSS or JSON feed of my recent Twitter activity, at least according to their API description. With a bit of ingenuity, I was able to determine that:

  http://twitter.com/statuses/user_timeline/92623.json

would return some number of recent twitter updates about me in JSON format. I like JSON format, because the alternative (RSS) requires some sort of XML Parser, which seemed a bit heavyweight for this application. The result comes back as a series of items like this:

  {"created_at":"Sun Mar 11 19:05:53 +0000 2007",
   "text":"futzing around the house, trying to come up with an idea for the overdue magazine article for linux magazine",
   "relative_created_at":"about 2 hours ago",
   "id":6713281,
   "user":
     {"screen_name":"merlyn",
      "name":"Randal L. Schwartz",
      "description":"Just another Perl hacker (and geekcruiser)",
      "location":"Here!",
      "url":"http:\/\/www.stonehenge.com\/merlyn\/",
      "id":92623
     }
   }

Using the JSON parser from the CPAN, I was able to turn this into a nice Perl data structure. I decided I wanted to limit the items coming back from Twitter to a certain date range. No problem: just parse that ``created_at'' item. But after looking at a few DateTime things, I scratched my head. That date-time format is not any RFC or ISO standard! Oops. Maybe the Twitter folks will read this article and update their date to something from an RFC, instead of just making things up.

But in the meanwhile, I couldn't wait, because I'm working against a deadline here. (I wasn't kidding about the ``overdue'' part of the message.) I then noticed that the relative_created_at field wasn't too bad for me to use as the basis of a selection: I'd permit minutes, hours, and days, but if it started getting longer than that, it was probably too old to be useful. My final choice is to take three to eight items from the list, stopping at the first item that was a month old or older.

To get the item into my dot-plan file, I decided to use an in-place edit. Although in-place edits are typically used from the perl command-line via the -i switch, they can also be activated from within the program, which worked out quite nicely here.

And the resulting program is in [the listing below]. Lines 1 and 2 begin nearly every program I write, selecting the path to Perl on my system, and enabling the usual collection of compile-time and run-time strictures.

Line 4 defines the Twitter URL that fetches my JSON recent status feed. To get this URL, I went to the RSS Feed link at http://twitter.com/merlyn, and through a wild guess, changed the .rss suffix to .json. I got this idea from looking at http://twitter.com/help/api, which doesn't describe how to get just my personal feed, but lists a bunch of URLs that end in .json, so I just guessed it would work. Hey, Twitter is new, so I expect the docs to be a bit immature.

Line 5 computes the path to a file named .plan in my home directory, using my favorite glob trick. The location of my home directory is not trivial, because it could be $ENV{HOME} or $ENV{LOGDIR}, depending on the flavor of Unix. But glob always knows which one it is, and can even compute the home directories of other users using ~otheruser/.... The only tricky part is that I have to use glob in a list context or else it'll remember the state of ``names delivered so far'', causing my program to fail if used in a loop.

Lines 7 and 8 pull in the modules that I'm using from the CPAN. The JSON module interprets the JSON response, and the LWP::Simple takes care of the necessary HTTP fetch.

Line 10 fetches the response from the Twitter server. The result of get is either the content body, or undef if something went wrong. Unfortunately, with this very simple interface, we don't know exactly what went wrong. But I don't really care: this is not a critical service, so I'll just die, noticing that as some email sent to me from the cron job.

Line 11 turns the JSON content into a Perl data structure. In this case, we end up with an arrayref of items in newest-to-oldest order. Each item is a hashref that looks like the data presented above, including the nested hash for the details about me (which I ignore, because I already know about me).

Line 13 establishes a place to accumulate the selected items of interest, initially empty.

Lines 14 to 19 loop over the items from the data structure, in newest-to-oldest order, by looking at the item number of each item. I need the item number because I have inclusion criteria that references the number of final items. Note that the loop ends at $#$object, which is the highest index of the array referenced by the arrayref in $object.

Line 15 pulls out the selected item, adding it to @selected. From the original item, I'm using a hash slice to pull out the relative_created_at and text of each hash. These are then turned in to a two-element array, and a reference to that array is pushed on to the end of @selected.

Line 16 ensures that I'll have at least 3 items, presuming there are at least 3 items returned from Twitter. By using next, I skip over the tests in the remainder of the loop that might terminate the loop early. Keep in mind that $item is the index number, which is 0-based, so the expression is true on the first two iterations, but false afterwards.

Line 17 similarly limits the number of items selected from the data. Again, keep in mind that $item is 0-based, so when the value is 7, we've already pushed 8 values into @selected.

Line 18 stops the selection at the first ``too old'' item, if we're still between 3 and 8 items, by looking at the text of the relative time stamp. Based on experimentation, the text contains only one time unit, such as ``seconds'', ``minutes'', ``hours'', ``days'', or ``months'', so if it's not one of the first four, it's too old.

Of course, after you describe something carefully to someone else, you can always see better ways of doing things. I could have also written this loop as:

  my @selected;
  for my $item (@$object) {
    push @selected, @$item{qw(relative_created_at text)};
    next if @selected <= 3;
    last if @selected >= 8;
    last unless $selected[-1][0] =~ .....
  }

Wow, that looks far cleaner. However, I left the old code in to show you how it could also be done. But back to the listing.

Line 21 takes the selected items and merges them together with a blank line between them into a single $text variable. This variable will be used in the regex replacement below.

Lines 23 to 32 set up a local scope so that I can establish an in-place edit without disturbing the rest of the program. Of course, the rest of the program here is pretty small and would not be affected by these lines, but out of respect for the cut-n-pasters of the world, I'll write it in a way that pretends it's part of a large program.

Line 24 localizes the @ARGV array, as well as the $ARGV scalar and ARGV filehandle. These items are all altered during an in-place edit, so we need to make sure our alterations don't leak out to the rest of the code.

Line 25 puts the readline operation into slurp mode, so that the entire file is read in the first input. For something as short as my dot-plan file, that's just fine, and simplifies the editing step.

Line 26 enables in-place editing by setting $^I to tilde. When my program has been run, the previous dot-plan will be in ~/.plan~, which my backup scripts will ignore, but I can always look at to see what happened.

Line 27 sets up the list of files to process as being just my dot-plan file.

Lines 28 to 31 read the entire file contents into $_, and perform the necessary edit, writing the updated contents back out. Note the use of m, which enables ^ to anchor past embedded newlines as well as at the beginning of the string. I'm replacing everything between a line that begins with Twitter and a line that begins with Future with my Twitter strings.

And there it is, my ``new tech meets old tech'' bridge. Twitter updates are now placed into a dot-plan file, which can be delivered via the web to any interested party. All I need to run this from a cron job (like every 20 minutes or so), and my stalkers will be able to carefully plot my day-to-day travels. Have fun watching my boring life. Until next time, enjoy!

Listing

        =1=     #!/usr/bin/perl
        =2=     use strict;
        =3=     
        =4=     my $URL = "http://twitter.com/statuses/user_timeline/92623.json";;
        =5=     my ($DOTPLAN) = glob "~/.plan";
        =6=     
        =7=     use JSON qw(jsonToObj);
        =8=     use LWP::Simple qw(get);
        =9=     
        =10=    defined (my $content = get $URL) or die "Cannot fetch $URL";
        =11=    my $object = jsonToObj($content);
        =12=    
        =13=    my @selected;
        =14=    for my $item (0..$#$object) {
        =15=      push @selected, [@{$object->[$item]}{qw(relative_created_at text)}];
        =16=      next if $item < 2;            # always include at least three items
        =17=      last if $item >= 7;           # never more than eight items
        =18=      last unless $selected[-1][0] =~ /second|minute|hour|day/; # stop at first oldish item
        =19=    }
        =20=    
        =21=    my $text = join "\n\n", map "$_->[0]: $_->[1]", @selected;
        =22=    
        =23=    {
        =24=      local *ARGV;                  # isolate diamond
        =25=      local $/;                     # slurp mode
        =26=      local $^I = "~";              # in place edit
        =27=      @ARGV = $DOTPLAN;             # select the file to edit
        =28=      while (<>) {
        =29=        s/^(Twitter.*\n)(?:[\s\S]*?)^(Future.*\n)/$1\n$text\n\n$2/m;
        =30=        print;
        =31=      }
        =32=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 93 (May 2007)

Listing