关于sed的疑问 (又是 locale 对字符排序的影响)

关于sed的疑问 (又是 locale 对字符排序的影响)

文件wk
northwest NW Joel Craig 3.0 .98 3 4
western WE Sharon Kelly 5.3 .97 5 23
southwest SW Chris Foster 2.7 .8 2 18
southern SO May Chin 5.1 .95 4 15
southeast SE Derek Johnson 4.0 .7 4 17
eastern EA Susan Beal 4.4 .84 5 20
northeast NE TJ Nichols 5.1 .94 3 13
north NO Val Shultz 4.5 .89 5 9

central CT Sheri Watson 5.7 .94 5 13

用命令sed 's/ \([A-Z][a-z][a-z]*\) \([A-Z][a-z]* \)/ \2 \1/g' wk为什么不可以达到和命令 sed 's/ \([a-zA-Z]\+\) \([a-zA-Z]\+\)\( \+[0-9]\)/ \2 \1\3/' wk一样的结果?
不可以用大写字母后面一个小写字母做特征.必须用空格和数字吗?
northwest NW Craig Joel    3.0 .98 3 4
western   WE Kelly Sharon  5.3 .97 5 23
southwest SW Foster Chris  2.7 .8  2 18
southern  SO Chin May      5.1 .95 4 15
southeast SE Johnson Derek 4.0 .7  4 17
eastern   EA Beal Susan    4.4 .84 5 20
northeast NE Nichols TJ    5.1 .94 3 13
north     NO Shultz Val    4.5 .89 5 9
central   CT Watson Sheri  5.7 .94 5 13

      
尽量把关键问题抽象提炼出来, 用准确简短的语言描述清楚. 像这样贴一堆内容加上长长的一个 RE 上来, 对别人的耐心是种考验       
FYI:
引用:
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=3901 $?=0] ; cat wk
northwest NW Joel Craig    3.0 .98 3 4
western   WE Sharon Kelly  5.3 .97 5 23
southwest SW Chris Foster  2.7 .8  2 18
southern  SO May Chin      5.1 .95 4 15
southeast SE Derek Johnson 4.0 .7  4 17
eastern   EA Susan Beal    4.4 .84 5 20
northeast NE TJ Nichols    5.1 .94 3 13
north     NO Val Shultz    4.5 .89 5 9
central   CT Sheri Watson  5.7 .94 5 13
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=3901 $?=0] ; sed 's/ \([a-zA-Z]\+\) \([a-zA-Z]\+\)\( \+[0-9]\)/ \2 \1\3/' wk
northwest NW Craig Joel    3.0 .98 3 4
western   WE Kelly Sharon  5.3 .97 5 23
southwest SW Foster Chris  2.7 .8  2 18
southern  SO Chin May      5.1 .95 4 15
southeast SE Johnson Derek 4.0 .7  4 17
eastern   EA Beal Susan    4.4 .84 5 20
northeast NE Nichols TJ    5.1 .94 3 13
north     NO Shultz Val    4.5 .89 5 9
central   CT Watson Sheri  5.7 .94 5 13
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=3901 $?=0] ; sed 's/ \([A-Z][a-z]\+\) \([A-Z][a-z]\+\) / \2 \1 /g' wk
northwest NW Craig Joel    3.0 .98 3 4
western   WE Kelly Sharon  5.3 .97 5 23
southwest SW Foster Chris  2.7 .8  2 18
southern  SO Chin May      5.1 .95 4 15
southeast SE Johnson Derek 4.0 .7  4 17
eastern   EA Beal Susan    4.4 .84 5 20
northeast NE TJ Nichols    5.1 .94 3 13
north     NO Shultz Val    4.5 .89 5 9
central   CT Watson Sheri  5.7 .94 5 13
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=3901 $?=0] ; bye
      
sed 's/ \([A-Z][a-z][a-z]*\) \([A-Z][a-z]* \) / \2 \1/g'
sed 's/ \([A-Z][a-z]\+\) \([A-Z][a-z]\+\) / \2 \1 /g' wk
为什么会有不同的结果呢?*前面不式比+多写了个【a-z】吗。      
...
引用:
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=3901 $?=0] ; cat wk
northwest NW Joel Craig    3.0 .98 3 4
western   WE Sharon Kelly  5.3 .97 5 23
southwest SW Chris Foster  2.7 .8  2 18
southern  SO May Chin      5.1 .95 4 15
southeast SE Derek Johnson 4.0 .7  4 17
eastern   EA Susan Beal    4.4 .84 5 20
northeast NE TJ Nichols    5.1 .94 3 13
north     NO Val Shultz    4.5 .89 5 9
central   CT Sheri Watson  5.7 .94 5 13
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=3901 $?=0] ; sed 's/ \([A-Z][a-z][a-z]*\) \([A-Z][a-z]* \) / \2 \1/g' wk
northwest NW Craig  Joel  3.0 .98 3 4
western   WE Kelly  Sharon5.3 .97 5 23
southwest SW Foster  Chris2.7 .8  2 18
southern  SO Chin  May    5.1 .95 4 15
southeast SE Derek Johnson 4.0 .7  4 17
eastern   EA Beal  Susan  4.4 .84 5 20
northeast NE TJ Nichols    5.1 .94 3 13
north     NO Shultz  Val  4.5 .89 5 9
central   CT Watson  Sheri5.7 .94 5 13
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=3901 $?=0] ; sed 's/ \([A-Z][a-z][a-z]*\) \([A-Z][a-z]*\) / \2 \1 /g' wk
northwest NW Craig Joel    3.0 .98 3 4
western   WE Kelly Sharon  5.3 .97 5 23
southwest SW Foster Chris  2.7 .8  2 18
southern  SO Chin May      5.1 .95 4 15
southeast SE Johnson Derek 4.0 .7  4 17
eastern   EA Beal Susan    4.4 .84 5 20
northeast NE TJ Nichols    5.1 .94 3 13
north     NO Shultz Val    4.5 .89 5 9
central   CT Watson Sheri  5.7 .94 5 13
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=3901 $?=0] ; bye
      
多谢版主了,看样子是我的系统有问题.我用的是linux red hat 9.0的.
用sed 's/ \([A-Z][a-z][a-z]*\) \([A-Z][a-z]* \) / \2 \1/g' wk
和sed 's/ \([A-Z][a-z]\+\) \([A-Z][a-z]\+\) / \2 \1 /g' wk都不能产生正确的结果.
结果都是
northwest Joel NW Craig 3.0 .98 3 4
western Sharon WE Kelly 5.3 .97 5 23
southwest Chris SW Foster 2.7 .8 2 18
southern May SO Chin 5.1 .95 4 15
郁闷了.
必需后面还加个( \+[0-9]\)      
看看你的 sed 什么版本? locale 怎么设的?
引用:
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=29459 $?=0] ; sed --version
GNU sed version 4.1.4
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=29459 $?=0] ; locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=29459 $?=0] ; bye
      
[zm@hangkar zm]$ sed --version
GNU sed version 4.0.5
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.
[zm@hangkar zm]$ locale
LANG=zh_CN.GB18030
LC_CTYPE="zh_CN.GB18030"
LC_NUMERIC="zh_CN.GB18030"
LC_TIME="zh_CN.GB18030"
LC_COLLATE="zh_CN.GB18030"
LC_MONETARY="zh_CN.GB18030"
LC_MESSAGES="zh_CN.GB18030"
LC_PAPER="zh_CN.GB18030"
LC_NAME="zh_CN.GB18030"
LC_ADDRESS="zh_CN.GB18030"
LC_TELEPHONE="zh_CN.GB18030"
LC_MEASUREMENT="zh_CN.GB18030"
LC_IDENTIFICATION="zh_CN.GB18030"
LC_ALL=      
不应该有问题 哪天找个 redhat 9 试试      
找了个 redhat 9 试了试, 还真的有问题:
引用:
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=25950 $?=0] ; cat /etc/redhat-release
Red Hat Linux release 9 (Shrike)
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=25950 $?=0] ; sed --version
GNU sed version 4.0.5
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=25950 $?=0] ; locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=25950 $?=0] ; sed 's/ \([A-Z][a-z]\+\) \([A-Z][a-z]\+\) / \2 \1 /g' wk
northwest Joel NW Craig 3.0 .98 3 4
western Sharon WE Kelly 5.3 .97 5 23
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=25950 $?=0] ; LC_ALL=zh_CN.GB18030 sed 's/ \([A-Z][a-z]\+\) \([A-Z][a-z]\+\) / \2 \1 /g' wk
northwest Joel NW Craig 3.0 .98 3 4
western Sharon WE Kelly 5.3 .97 5 23
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=25950 $?=0] ; LC_ALL=zh_CN.GBK sed 's/ \([A-Z][a-z]\+\) \([A-Z][a-z]\+\) / \2 \1 /g' wk
northwest Joel NW Craig 3.0 .98 3 4
western Sharon WE Kelly 5.3 .97 5 23
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=25950 $?=0] ; LC_ALL=zh_CN.GB2312 sed 's/ \([A-Z][a-z]\+\) \([A-Z][a-z]\+\) / \2 \1 /g' wk
northwest Joel NW Craig 3.0 .98 3 4
western Sharon WE Kelly 5.3 .97 5 23
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=25950 $?=0] ; LC_ALL=C sed 's/ \([A-Z][a-z]\+\) \([A-Z][a-z]\+\) / \2 \1 /g' wk
northwest NW Craig Joel 3.0 .98 3 4
western WE Kelly Sharon 5.3 .97 5 23
-(dearvoid@LinuxEden:Forum)-(~/tmp)-
[$$=25950 $?=0] ; bye
可以看出 LC_ALL 设成 C 时是正确的, 可能是 locale 方面的 bug