Article 11356 of comp.lang.perl: Path: feenix.metronet.com!news.utdallas.edu!convex!cs.utexas.edu!howland.reston.ans.net!math.ohio-state.edu!jussieu.fr!univ-lyon1.fr!swidir.switch.ch!scsing.switch.ch!news.dfn.de!news.coli.uni-sb.de!sbusol.rz.uni-sb.de!mpi-sb.mpg.de!uwe From: uwe@mpi-sb.mpg.de (Uwe Waldmann) Newsgroups: comp.lang.perl Subject: Re: Redefining \w and \b possible? Date: 9 Mar 1994 18:36:38 GMT Organization: Max-Planck-Institut fuer Informatik Lines: 27 Distribution: world Message-ID: <2ll4vmINN963@sbusol.rz.uni-sb.de> References: <1994Mar9.125522.20435@nntp.nta.no> Reply-To: uwe@mpi-sb.mpg.de NNTP-Posting-Host: mpii02005.ag2.mpi-sb.mpg.de Originator: uwe@mpii02005 In article <1994Mar9.125522.20435@nntp.nta.no>, Stein Kulseth wrote: > Here in Norway we are blessed/cursed with three extra vowels. > When doing pattern matching on Norwegian text it would be very > nice to have \b and \w accept these as letters. Is this possible? No, as far as I know (unless Larry has changed it in the meantime). > If not, how can I write a search pattern that will match Norwegian > word boundaries at either end and anywhere within a string? # (a) Put a \000 before and after every word: s/([A-Za-z0-9_\305\306\330\345\346\370]+)/\000$1\000/g; # (b) Check for \000 instead of \b. # For example, s/\b([A-Z])\b/"$1"/g becomes: s/\000([A-Z\305\306\330])\000/"\000$1\000"/g; # (c) Don't forget to remove all \000's after you are done: s/\000//g; If you have several substitutions in a row, be careful to check that every word boundary remains marked by a \000. It may even be necessary to repeat steps (c)+(a) in between to readjust them. -- Uwe Waldmann, Max-Planck-Institut fuer Informatik Im Stadtwald, D-66123 Saarbruecken, Germany Phone: +49 681 302-5431, Fax: +49 681 302-5401, E-Mail: uwe@mpi-sb.mpg.de