gawk FS将记录分成单个字符_系统运维

line 371: * re_parse_fIEld --- parse fIElds using a regexp. * * This is called both from get_fIEld() and from do_split() * via (*parse_fIEld)(). This variation is for when FS is a regular * Expression -- either user-defined or because RS=="" and FS==" " */static longre_parse_fIEld(lo...

line 613: * null_parse_fIEld --- each character is a separate fIEld * * This is called both from get_fIEld() and from do_split() * via (*parse_fIEld)(). This variation is for when FS is the null string. */static longnull_parse_fIEld(long up_to,

#line 667 * sc_parse_fIEld --- single character fIEld separator * * This is called both from get_fIEld() and from do_split() * via (*parse_fIEld)(). This variation is for when FS is a single character * other than space. */static longsc_parse_fIEld(l

概述如果字段分隔符是空字符串,则每个字符将成为单独的字段 $echo hello | awk -F '' -v OFS=, '{$1 = NF OFS $1} 1'5,h,e,l,l,o 但是,如果FS是可能匹配零次的正则表达式,则不会发生相同的行为： $echo hello | awk -F ' *' -v OFS=, '{$1 = NF OFS $1} 1'1,hello 谁知道为什么会这样？如果字段分隔符是空字符串,则每个字符将成为单独的字段

$echo hello | awk -F '' -v OFS=,'{ = NF OFS } 1'5,h,e,l,o

但是,如果FS是可能匹配零次的正则表达式,则不会发生相同的行为：

$echo hello | awk -F ' *' -v OFS=,'{ = NF OFS } 1'1,hello

谁知道为什么会这样？我在gawk manual中找不到任何东西.FS =“”只是一个特例吗？

我最感兴趣的是理解为什么第二种情况不会将记录分成更多的字段.好像awk正在处理FS =“*”,如FS =“”

解决方法有趣的问题！

我刚刚提取了gnu-awk 4.1.0的代码,我想我们可以在文件fIEld.c中找到答案.

line 371: * re_parse_fIEld --- parse fIElds using a regexp. * * This is called both from get_fIEld() and from do_split() * via (*parse_fIEld)().  This variation is for when FS is a regular * Expression -- either user-defined or because RS=="" and FS==" " */static longre_parse_fIEld(lo...

这一行:(第425行)：

if (REEND(rp,scan) == RESTART(rp,scan)) {   /* null match */

这里是您的问题中< space> *匹配的情况.实现没有增加nf,也就是说,它认为整行是一个单独的字段.注意这个函数也用在do_split()函数中.

首先,如果FS为空字符串,则gawk将每个字符分隔为其自己的字段. gawk的doc清楚地写了这个,也在代码中,我们可以看到：

line 613: * null_parse_fIEld --- each character is a separate fIEld * * This is called both from get_fIEld() and from do_split() * via (*parse_fIEld)().  This variation is for when FS is the null string. */static longnull_parse_fIEld(long up_to,

如果FS有单个字符,awk不会将其视为正则表达式.这也在doc中提到过.也在代码中：

#line 667 * sc_parse_fIEld --- single character fIEld separator * * This is called both from get_fIEld() and from do_split() * via (*parse_fIEld)().  This variation is for when FS is a single character * other than space. */static longsc_parse_fIEld(l

如果我们读取函数,那里就没有进行正则表达式匹配处理.

在函数re_parse_fIEld()和sc_parse_fIEld()的注释中,我们看到do_split也会调用它们.它解释了为什么我们在以下命令中有1而不是3：

kent$ echo "foo"|awk '{split(,a,/ */);print length(a)}'1

注意,为了避免使帖子过长,我没有在这里粘贴完整的代码,我们可以在这里找到代码：

http://git.savannah.gnu.org/cgit/gawk.git/

总结

以上是内存溢出为你收集整理的gawk FS将记录分成单个字符全部内容，希望文章能够帮你解决gawk FS将记录分成单个字符所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错，欢迎将内存溢出网站推荐给程序员好友。

欢迎分享，转载请注明来源：内存溢出

原文地址: https://outofmemory.cn/yw/1049458.html

gawk FS将记录分成单个字符

发表评论

评论列表（0条）