[Top] -> [Works] -> [Unix] -> [catdoc] -> [Version 0.3x] [Version 0.9x]

Catdoc version 0.9x

Current release of catdoc is 0.93.3

Older releases in 0.92.x, 0.91.x and 0.90.x series are still available. There is no reason to prefer them to current one.

Download

  • Source distrbution for all platforms MS-DOS distribution with executable included
  • Browse CVS repository
    MD5 checksums:
    c021d2e30318bea063133191122676e5  catdoc-0.93.3.tar.gz
    afbde32d1593c7e8eaf42f4ba5460b90  catdoc-0.93.3.zip
    forget about release 0.93.2. It was buggy.
    e08eb3c709de8d6dc54df03cd79a3192  catdoc-0.93.1.tar.gz
    90d2fba000463f12a267e758fd2fb35d  catdoc-0.93.1.zip
    23c98aa829cf69aeb5e96d81a70cb84f  catdoc-0.93.tar.gz
    bdd96bc3629dc6400dc3e4da093f2807  catdoc-0.93.zip
    460ee1aaaa34363b2cfb56748a14a55d  catdoc-0.91.6.tar.gz
    67974a635c143b03124987889cd6434c  catdoc-0.91.6.zip
    1314239a3d9c9c7bfda608dbcdc33e3f  catdoc-0.91.5.tar.gz
    bfc146724ad45ba1287eb5466882670a  catdoc-0.91.4.tar.gz
    84f9fea198f71bec66c6bed2a612e86c  catdoc-0.91.3.tar.gz
    edfaedb7b60ff6336b03f67c16dd4c60  catdoc-0.91.2.tar.gz
    6d44fb20f2fb2365fbc26e5753b4a8bf  catdoc-0.91.1.tar.gz
    13fc1cafdd7f2733a28ff4b0e28a52bd  catdoc-0.91.tar.gz
    54f3b3789d241a346378a13023f624b7  catdoc-0.90.3.tar.gz
    2d90577df365408051e489ab93c051c2  catdoc-0.90.3.zip
    
    If you are checking checksums under DOS or Windows, please make sure that your md5sum utility opens files in binary mode. Typically DOS-based md5sum utilities has special command-line switch for it.

    Documentation

    Catdoc distribution includes man page in troff source form and postscript and plain text versions of it. HTML version of man pages for catdoc(1), wordview and xls2csv are available here. See also catdoc FAQ below.

    Status of this release

    Catdoc 0.90 is complete rewritten from scratch. It has been tested at least on MS-DOS, Linux, BSDI and Solaris.

    I kindly ask my users to contribute replacement and substitution maps for your beloved characters.

    Bug reports are also welcome

    Current version of substitution maps can be downloaded separately. So, if you are using catdoc beta1 you don't need to download whole distribution or even recompile, Just get ascii.rpl and tex.rpl and replace ones, provided in distribution.

    Bug and success reports are also welcome.

    What to do if catdoc doesn't read your Word file correctly

    1. Q: I've compilied catdoc and decided to test it before installing. But it complains about missing charset file while all the files are in place
      A: Catdoc is not designed to work without proper installation. You can overcome this problem by creating ${HOME}/.catdocrc file and specifiing path to charsets in it (see manual for syntax), but you'll also need to create symlinks for ascii.spc and ascii.rpl files in this diredtory named ascii.specchars and ascii.replchars repsectively. It is simplier to let make install do it for you
    2. Q: catdoc does something strange with my accented characters.
      A: Have you specified correct input and output charsets? By default, catdoc comes with cyrillic charsets configured in, and it is probably not what you want if you are not Russian. See charset correspondence table. Note also that Word files almost never use ISO8859-* charsets. They use cp* charsets which have additional punctuation characters, in range 0x80-0x9F. Catdoc probably would find reasonable substitution for them, if it knows proper charset of document.
    3. Q: I've successfully compilied catdoc-0.90.2 on SunOS 4.x, but it doesn't output any meaningful text
      A: Catdoc uses %x format specifier to read charsets and substitution maps, but on SunOS 4.x %x doesn't handle leading 0x in hex numbers. It should be replaced by %i everywhere where it occurs in functions read_charset and read_substmap. It was addressed in 0.90.3
    4. Q: Catdoc breaks lines in arbitrary places and eats chars at the end of line
      A: Running MS-DOS, aren't you? This is a bug in isspace implementation in Turbo C. It thinks that all chars with eighth bit set are space. This is (hopefully) fixed in 0.90.3
    5. Q: Catdoc doesn't work at all - it just complains about some missing file, but it is in the same directory as executable
      A: You are running an MS-DOS system, aren't you?
      pkunzip on MS-DOS have crazy default behavoir to put all the files in the archive into one directory, without reproducing directory structure, stored in the archive. Unpack with
      pkunzip -d catdoc.zip
      
      and all would be Ok. Support files should go in special subdirectory, not where executable resides. If you don't agree with me, you can override this in the catdoc.rc file.
    6. If there are few garbage lines of screen, try to use -u switch. Catdoc doesn't determine word 8 authomatically (suggestions welcome)
    7. If there is a lot of garbage on screen (it seems that catdoc just dumps file to stdout) - try to use -b switch. May be you are trying to read broken file or file from very old version of Word, which doesn't have correct OLE signatire.
    8. If catdoc segfaults misteriously, first try to recompile catdoc without optimization (remove -O from FLAGS in Makefile). There are known problems with some versions of gcc on some platforms, one to mention HP/UX 9.x. If it doesn't help - this is a bug. Write a bug report, if you cannot find it yourself, or send me a patch if you fixed it.
    9. If you get screen full of question marks or text where letters are mixed in random order, but words and paragraphs looks sensible, you are probably using incorrect input or output charset. Play with -s and -d options, may be using wordview
    10. Catdoc replaces some non-alphanumeric characheres with question marks.
      1. Find out which UNICODE characters are unsupported. You can do so by comparing second column in charset files for you input and output charsets. Typically all characters with code above 0x2000 are suspicious.
      2. Edit your substitution map file (ascii.replchars/ascii.rpl or tex.replchars/tex.rpl) and add there correct replacement sequences according to UNICODE name for this character.
      3. Send me a patch to be included in the next beta version.
    11. Catdoc produces incorrect TeX commands.
      1. Find out which substitution map contain this incorrect sequence There are only to files named tex.something in catdoc library directory, so it should be easy.
      2. Edit this file with your favorite text editor and fix it.
      3. Send me a patch
      If you don't have access to catdoc library directory, copy these files into your home directory and override substitution map location in your ~/.catdocrc.

      After submitting me a patch, persuade your sysadmin to upgrade catdoc

    Where to get additional charset definitions

    ISO to Unicode mappings,
    directly usable by catdoc
    Various Microsoft codepages
    Which you can expect to find in Word files
    APPLE codepages
    For those troubled with files from Word for Macintosh
    KOI8-R and KOI8-U mappings to UNICODE
    are not available on unicode.org site for some reason. So, they are provided locally.

    List of UNICODE character names, which can be helpful for those, who wish to extend substitution and replacement maps can be obtained from ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt.

    Format of this file is described in corresponding ReadMe.


    [Top] -> [Works] -> [Unix] -> [catdoc] -> [Version 0.3x] [Version 0.9x]