【求助】文献格式的perl处理,已写出部分程序,强烈请求请高手支招!为大众造福!

【求助】文献格式的perl处理,已写出部分程序,强烈请求请高手支招!为大众造福!

原文件是这样的:
   author = {Zaehres, Holm and Scholer, Hans R.},
   title = {Induction of Pluripotency: From Mouse to Human},
   journal = {Cell},
   volume = {131},
   number = {5},
   pages = {834-835},
   abstract = {In this issue of Cell, Takahashi et al. (2007) transfer their seminal work on somatic cell reprogramming from the mouse to human. By overexpressing the transcription factor quartet of Oct4, Sox2, Klf4, and c-Myc in adult human fibroblasts, they successfully isolate human pluripotent stem cells that resemble human embryonic stem cells by all measured criteria. This is a significant turning point in nuclear reprogramming research with broad implications for generating patient-specific pluripotent stem cells for research and therapeutic applications.},
   year = {|2007|}
}

   author = {Patil, P. S. and Hung, S. C.},
   title = {Total Synthesis of Phosphatidylinositol Dimannoside: A Cell-Envelope Component of Mycobacterium tuberculosis},
   journal = {Chemistry},
   note = {Journal article
Chemistry (Weinheim an der Bergstrasse, Germany)
Chemistry. 2008 Dec 22.},
   year = {|2008|}
}
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
我要将其变成这般:
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
@article{Patil2008
  author = {Patil, P. S. and Hung, S. C.},
   title = {Total Synthesis of Phosphatidylinositol Dimannoside: A Cell-Envelope Component of Mycobacterium tuberculosis},
   journal = {Chemistry},
   note = {Journal article
Chemistry (Weinheim an der Bergstrasse, Germany)
Chemistry. 2008 Dec 22.},
   year = {|2008|}
}
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

标注红色粗体即增加的,其中@article{是每一个都要增加的,后面接着增加的是第一个作者的姓,紧着的是年份(蓝色粗体)。
请问高手该如何处理?
我编了部分:提取了第一作者姓及年份,但接下去不知怎么去替代前面已经读取的部分
#!usr/bin/perl -w
use strict;
my $file=$ARGV[0];
open(FH,$file);
while(<FH>){
  if($_=~/author/){
           my $au=~/.*author\s\=\s\{(.*)$/;
           my @author=split(",",$au);#后面操作取$authour[0]即为第一作者的姓
           }elsif($_=~/year/){
            my $year=~/\s+year \= \{\|(\d+)\|\}/;#年份
            }
             
        }
欢迎高手补充!请注意这是将endnote文献导入JabRef时发生的问题,因为endnote导成bib格式的缺了这些东西,请高手献策,造福大众!

这个用  Parse::RecDescent 之类的模块应该比较简单。当然自己手工解析也是可能的。


QUOTE:
原帖由 bmechuangye 于 2008-12-24 22:52 发表
请注意这是将endnote文献导入JabRef时发生的问题,因为endnote导成bib格式的缺了这些东西

看懂前面的,没看懂这句话。


[Copy to clipboard] [ - ]
CODE:
#!/usr/bin/perl -w
use strict;

my @article=();
my $auth;
my $year;
while(<DATA>){
    if (!@article && /^\s*$/) { print; next}

    push @article, $_;
    if (/author\s*=\s*\{(\w+)/) {
        $auth = $1;
    }
    if (/year\s*=\s*\{\|([^|]+)\|/) {
        $year = $1;
    }

    if (/^\s*\}\s*$/) {
        print "   \@article\{$auth$year\n";
        print "@article";
        @article = ();
    }
}
__DATA__
   author = {Zaehres, Holm and Scholer, Hans R.},
   title = {Induction of Pluripotency: From Mouse to Human},
   journal = {Cell},
   volume = {131},
   number = {5},
   pages = {834-835},
   abstract = {In this issue of Cell, Takahashi et al. (2007) transfer their seminal work on somatic cell reprogramming from the mouse to human. By overexpressing the transcription factor quartet of Oct4, Sox2, Klf4, and c-Myc in adult human fibroblasts, they successfully isolate human pluripotent stem cells that resemble human embryonic stem cells by all measured criteria. This is a significant turning point in nuclear reprogramming research with broad implications for generating patient-specific pluripotent stem cells for research and therapeutic applications.},
       year = {|2007|}
}

   author = {Patil, P. S. and Hung, S. C.},
   title = {Total Synthesis of Phosphatidylinositol Dimannoside: A Cell-Envelope Component of Mycobacterium tuberculosis},
   journal = {Chemistry},
   note = {Journal article
   Chemistry (Weinheim an der Bergstrasse, Germany)
   Chemistry. 2008 Dec 22.},
   year = {|2008|}
}

#!/usr/bin/perl

use strict;
use warnings;

my @data;
my $flag = 0;
while (<DATA>) {
        $flag = 1 if /^\s*author/;
        $flag = 0 if /^\s*}/;
        push @data, $_ if $flag;

        if ($flag == 0 && @data) {
                $_ = join '', @data;
                /author = {(\w+),.*{\|(\d+)\|}/s;
                print q(@article{), $1, $2, "\n", $_, "}\n";
                @data = ();
        }
}
__DATA__
author = {Zaehres, Holm and Scholer, Hans R.},
   title = {Induction of Pluripotency: From Mouse to Human},
   journal = {Cell},
   volume = {131},
   number = {5},
   pages = {834-835},
   abstract = {In this issue of Cell, Takahashi et al. (2007) transfer their seminal work on somatic cell reprogramming from the mouse to human. By overexpressing the transcription factor quartet of Oct4, Sox2, Klf4, and c-Myc in adult human fibroblasts, they successfully isolate human pluripotent stem cells that resemble human embryonic stem cells by all measured criteria. This is a significant turning point in nuclear reprogramming research with broad implications for generating patient-specific pluripotent stem cells for research and therapeutic applications.},
   year = {|2007|}
}

   author = {Patil, P. S. and Hung, S. C.},
   title = {Total Synthesis of Phosphatidylinositol Dimannoside: A Cell-Envelope Component of Mycobacterium tuberculosis},
   journal = {Chemistry},
   note = {Journal article
Chemistry (Weinheim an der Bergstrasse, Germany)
Chemistry. 2008 Dec 22.},
   year = {|2008|}
}


QUOTE:
原帖由 ynchnluiti 于 2008-12-25 00:36 发表

看懂前面的,没看懂这句话。

ynchnluiti,真是高人啊,谢谢了!Endnote是windows下的一款强大的商业化的文献管理器(汤姆森公司),而Jabref是跨平台的一款值得推荐的可与Endnote相媲美的开源的文献管理器,用它们来管理文献,辅助写作,真是强大。但在linux下,就需要将以前的Endnote管理的文献重新导入到Jabref才使用Latex来写文章等,导入时安照网上提供的方法,可以输出前面我列出的格式,但缺少需要编程补充缺失的部分。当然这里只是涉及@article,其实还没涉及缺失类型为@book,@thesis等情况,混杂在一块编程处理更麻烦。
谢谢cobrawgl !挺
现在才发现ynchnluiti (andy) 是在凌晨回复我的,非常感谢!不过还是要早点休息才有益身体的哦!
请问这段 if (!@article && /^\s*$/) { print; next}怎么理解?
另外 两位程序中分别有@article = ();和@data = ();有什么用意?第一个程序中我试了一下,将while循环中的@article()去掉也可以。


QUOTE:
原帖由 bmechuangye 于 2008-12-25 08:52 发表

ynchnluiti,真是高人啊,谢谢了!Endnote是windows下的一款强大的商业化的文献管理器(汤姆森公司),而Jabref是跨平台的一款值得推荐的可与Endnote相媲美的开源的文献管理器,用它们来管理文献,辅助写作,真 ...

长知识了


QUOTE:
原帖由 bmechuangye 于 2008-12-25 19:39 发表
谢谢cobrawgl !挺
现在才发现ynchnluiti (andy) 是在凌晨回复我的,非常感谢!不过还是要早点休息才有益身体的哦!
请问这段 if (!@article && /^\s*$/) { print; next}怎么理解?
另外 两位程序中分别有@a ...

@article是存放文献信息内容的数组(一行是数组的一个元素)。
@article = ();用来清空数组,因为当前文献信息处理完并输出了。

if (!@article && /^\s*$/) { print; next} 判断数组内容为空(未读入新的文献信息)时,如果是空行就输出。


QUOTE:
原帖由 bmechuangye 于 2008-12-25 19:39 发表
谢谢cobrawgl !挺
将while循环中的@article()去掉也可以

去掉while(){}内的@article = ();结果会用重复的吧