Article 7876 of comp.lang.perl: Xref: feenix.metronet.com comp.lang.perl:7876 Path: feenix.metronet.com!news.utdallas.edu!wupost!howland.reston.ans.net!spool.mu.edu!caen!msuinfo!netnews.upenn.edu!mjd From: mjd@saul.cis.upenn.edu (Mark-Jason Dominus) Newsgroups: comp.lang.perl Subject: perl code to read and parse GNU `info' files Message-ID: Date: 12 Nov 93 16:48:11 GMT Sender: news@netnews.upenn.edu Organization: University of Pennsylvania Lines: 134 Nntp-Posting-Host: saul.cis.upenn.edu This is very hacky, but seems work, at least in a limited domain. To use the functions: Call &open_info_file(INFO_FILENAME); to open the filehandle `INFO' to the named info file. Then call &get_next_node; repeatedly to read the next node in the info file; the variables $info_file $info_node $info_prev $info_next $info_up are set if the corresponding fields appear in the node's header, and if the node has a menu, it is loaded into %info_menu. When `get_next-node' returns false, you have reached end-of-file or there has been an error. It's normal for an info file to be split into parts; these functions understand that. I am using it here to provide help-by-keyword: We have a texinfo users' guide, and if a user enters the command `help foo' at the shell prompt, a perl program based on these functions reads the users' guide info files, finds the main index, looks up `foo' in the index, and starts up the info browser pointing to the relevant node, if there is one. Almost everything about these functions could be improved. My answer to any criticism of the form ``Your functions are bad, because...'' is `Yes!'. I post them because I am so lazy that I hope to con interested people into improving them for me, and because I am so humble I don't mind the whole world seeing my cruddy code. Caveat delector. # Read next node into global variables. Assumes that file pointer is # positioned at the header line that starts a node. Leaves file # pointer positioned at header line of next node. # Programmer: note that nodes are separated by a "\n\037\n" sequence. # Reutrn true on success, false on failure sub read_next_node { undef %info_menu; $_ = ; # Header line if (eof(INFO)) { return &start_next_part && &read_next_node; } ($info_file) = /File:\s*([^,]*)/; ($info_node) = /Node:\s*([^,]*)/; ($info_prev) = /Prev:\s*([^,]*)/; ($info_next) = /Next:\s*([^,]*)/; ($info_up) = /Up:\s*([^,]*)/; $_ = until /^(\* Menu:|\037)/ || eof(INFO); if (eof(INFO)) { return &start_next_part; } elsif (/^\037/) { return 1; # end of node, so return success. } # read menu local($key, $ref); while () { return 1 if /^\037/; # end of node, menu is finished, success. next unless /^\* \S/; # next unless lines is a menu item if (/^\* ([^:]*)::/) { $key = $ref = $1; } elsif (/^\* ([^:]*):\s*([^.]*)[.]/) { ($key, $ref) = ($1, $2); } else { print STDERR "Couldn't parse menu item\n\t$_"; next; } $info_menu{$key} = $ref; } # end-of-file also terminates the node successfully. # start up the next file before continuing. &start_next_part; return 1; } # Discard commentary before first node of info file sub start_info_file { $_ = until (/^\037/ || eof(INFO)); return &start_next_part if (eof(INFO)) ; return 1; } # Look for next part of multi-part info file. # Return 0 (normal failure) if it isn't there---that just means # we ran out of parts. die on some other kind of failure. sub start_next_part { local($path, $basename, $ext); if ($info_filename =~ /\//) { ($path, $basename) = ( $info_filename =~ /^(.*)\/(.*)$/ ); } else { $basename = $info_filename; $path = ""; } if ($basename =~ /-\d*$/) { ($basename, $ext) = ($basename =~ /^([^-]*)-(\d*)$/); } else { $ext = 0; } $ext++; $info_filename = "$path/$basename-$ext"; close(INFO); if (! (open(INFO, "$info_filename")) ) { if ($! eq "No such file or directory") { return 0; } else { die "Couldn't open $info_filename: $!"; } } return &start_info_file; } sub open_info_file { ($info_filename) = @_; (open(INFO, "$info_filename")) || die "Couldn't open $info_filename: $!"; return &start_info_file; } -- -- If you never did, / you should. / These things are fun / and fun is good. Mark-Jason Dominus mjd@central.cis.upenn.edu