Article 3457 of comp.lang.perl: Xref: feenix.metronet.com comp.lang.perl:3457 Newsgroups: comp.lang.perl Path: feenix.metronet.com!news.utdallas.edu!tamsun.tamu.edu!cs.utexas.edu!math.ohio-state.edu!cyber1.cyberstore.ca!van-bc!vanbc.wimsey.com!cs.ubc.ca!uw-beaver!fluke!inc From: inc@tc.fluke.COM (Gary Benson) Subject: Re: Wordperfect to ASCII conversion with perl? Message-ID: <1993Jun16.145908.4191@tc.fluke.COM> Keywords: wordperfect Organization: John Fluke Mfg. Co., Inc., Everett, WA References: Date: Wed, 16 Jun 1993 14:59:08 GMT Lines: 57 In article jpw@sansfoy.lib.Virginia.EDU (John Price-Wilkin) writes: >We have more than 1500 collection-specific guides to portions of our >Library's Rare Books and Special Collections in WordPerfect and would >like to make ASCII versions available on our gopher. Would it possible to >use perl to remove the WP header information and reformat paragraphs to >groups of 70 char lines? Aboslutely this would be possible; and not just that, but highly desirable! Perl is ideally suited to this kind of business, and if others haven't already urged you to do so, let me be the first. I use perl all the time to whip up little tools for translating from SAF (Some Arbitrary Format) to real true ASCII text. >Aside from things like CTRL-Z, form feeds, and diacritics, are there other >coding concerns? Any help on this would be appreciated. Right now we're >reduced to scrapping together spare moments for the conversion, and the >first few hundred have been tedious. I can imagine. I'd like to help with more than just encouragement, but I'll have to leave this one to a WordPerfect afficianado. All I know about the product is that it uses 8-bit ASCII, at least for the special German characters with umlauts, and the eszet. Here is a tiny piece of perl that will convert these to common English transliterations: # German substitutions - 8-bit WordPerfect ascii to common sequences while (<>) { s/\201/ue/g; # u-umlaut s/\204/ae/g; # a-umlaut s/\204/oe/g; # o-umlaut s/\232/Ue/g; # U-umlaut s/\341/ss/g; # eszet print; } >John Price-Wilkin >jpw@virginia.edu >jpw@sansfoy.lib.virginia.edu (NeXTMail) The hardest part is simply defining the parameters -- once you do that, perl will scream through those files and dump out ASCII faster than scat! ps: I seem to recall someone saying that WordPerfect uses control codes to indicate bolding, centering, underlining and so on. If that is indeed the case, you have to decide how your perl program will handle these things, since only centering can be properly represented in ASCII... Please post your results! Manyu others need this, I am sure! -- Gary Benson-_-_-_-_-_-_-_-_-_-inc@sisu.fluke.com_-_-_-_-_-_-_-_-_-_-_-_-_-_- Freedom is just chaos with better lighting. -Alan Dean Foster Article 3592 of comp.lang.perl: Xref: feenix.metronet.com comp.lang.perl:3592 Path: feenix.metronet.com!news.ecn.bgu.edu!wupost!cs.utexas.edu!uunet!psgrain!ee.und.ac.za!tplinfm From: barrett@lucy.ee.und.ac.za (Alan Barrett) Newsgroups: comp.lang.perl Subject: Re: Wordperfect to ASCII conversion with perl? Date: 21 Jun 1993 10:35:08 +0200 Organization: Elec. Eng., Univ. Natal, Durban, S. Africa Lines: 21 Message-ID: <203rrs$8n8@lucy.ee.und.ac.za> References: NNTP-Posting-Host: lucy.ee.und.ac.za In article jpw@sansfoy.lib.Virginia.EDU (John Price-Wilkin) writes: > We have more than 1500 collection-specific guides to portions of our > Library's Rare Books and Special Collections in WordPerfect and would > like to make ASCII versions available on our gopher. Would it possible to > use perl to remove the WP header information and reformat paragraphs to > groups of 70 char lines? Aside from things like CTRL-Z, form feeds, and > diacritics, are there other coding concerns? Try wp2x (from comp.sources.misc volume 22). It purports to be able to convert from WordPerfect to several other formats, but I have an idea that it was restricted to WordPerfect version 4.x. There might be a newer version somewhere. Or you could try wp2latex (ask archie). It does a passable job of converting WordPerfect version 5.1 documents to LaTeX format. If you don't get usable output from either of these, you should at least get some ideas on how to decode the WordPerfect format. --apb (Alan Barrett)