我的问题是包含编号列表的文档(以1,2,3,…开头).我的Perl脚本无法获得该号码.我只能得到文字内容,而不是数字.
请建议如何将编号列表转换为纯文本,以保留编号和文本.
解决方法 我的博客文章 Extract bullet lists from PowerPoint slides using Perl and Win32::OLE 显示了如何使用PowerPoint执行此 *** 作.事实证明Word的任务有点简单.#!/usr/bin/env perluse strict;use warnings;use feature 'say';use Carp qw( croak );use Const::Fast;use Path::Class;use Try::Tiny;use Win32::olE;use Win32::olE::Const ('Microsoft.Word');use Win32::olE::Enum;$Win32::olE::Warn = 3;run(@ARGV);sub run { my $docfile = shift; # Croaks if it cannot resolve $docfile = file($docfile)->absolute->resolve; my $word = get_word(); my $doc = $word->documents->Open( { filename => "$docfile",ConfirmConversions => 0,AddToRecentfiles => 0,Revert => 0,Readonly => 1,} ); my $pars = Win32::olE::Enum->new($doc->Paragraphs); while (my $par = $pars->Next) { print_paragraph($par); }}sub print_paragraph { my $par = shift; my $range = $par->Range; my $fmt = $range->ListFormat; my $bullet = $fmt->ListString; my $text = $range->Text; unless ($bullet) { say $text; return; } my $level = $fmt->ListLevelNumber; say ">" x $level,join(' ',$bullet,$text); return;}sub get_word { my $word; try { $word = Win32::olE->GetActiveObject('Word.Application') } catch { croak $_ }; return $word if $word; $word = Win32::olE->new('Word.Application',sub { $_[0]->Quit }); return $word if $word; croak sprintf('Cannot start Word: %s',Win32::olE->LastError);}
鉴于以下Word文档:
它生成输出:
This is a document>1. This is a numbered List>2. Second item in the numbered List>3. Third oneBack to normal paragraph.>>a. Another List>>b. Yup,here comes the second item>>c. Not so sure what to put here>>>i. Sub-item
Object Browser是必不可少的.
@H_502_2@ 总结以上是内存溢出为你收集整理的使用Perl和Win32 :: OLE,如何将Word文档中的编号列表转换为纯文本?全部内容,希望文章能够帮你解决使用Perl和Win32 :: OLE,如何将Word文档中的编号列表转换为纯文本?所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)