【值得一看】Linux C 语言程序之变量类型解析。_C

背景
从编译的角度
从 ELF 二进制程序文件的角度
从运行的角度

背景

近期发现项目组同学对于C语言static关键字作用还是不太清晰。

从语法的角度去解释某个关键字用法的文章很多，可是这些解释蛮多时候是很生硬的，不是那么好记忆。

本文尝试从实 *** 的角度去解析 static以及更多类型的 C语言变量的形态，希望对大家有帮助。

static这个关键字用来限定某个变量或者函数的作用域，这个作用域可能是文件层面，也可能是函数层面。

从编译的角度

假如某个功能需求由多个文件构成如下：

$ cat print.h
extern void print(char *str);
extern char *hello;

$ cat hello.c
#include "print.h"

int main(void)
{
  print(hello);
  return 0;
}

$ cat print.c
#include 

char *hello = "hello";

void print(char *str);
{
  printf("%s\n", str);
}

编译运行如下：

$ gcc -m32 -o hello x.c print.h print.c
$ ./hello
hello

类似这种需要跨文件访问的函数和变量，如果定义成 static 的话：

$ cat print.c
#include 

static char *hello = "hello";

static void print(char *str);
{
  printf("%s\n", str);
}

$ gcc -m32 -o hello x.c print.h print.c
/tmp/ccetJaG2.o: In function `main':
x.c:(.text+0x12): undefined reference to `hello'
x.c:(.text+0x1b): undefined reference to `print'
collect2: error: ld returned 1 exit status

从 ELF 二进制程序文件的角度

先来编译成一个中间的可重定位文件：

$ gcc -m32 -c -o print.o print.c

针对加 static的情况：

$ readelf -s print.o | egrep "hello$|print$"
     6: 00000000     4 OBJECT  LOCAL  DEFAULT    3 hello
     7: 00000000    23 FUNC    LOCAL  DEFAULT    1 print

不加的情况：

$ readelf -s print.o | egrep "hello$|print$"
     9: 00000000     4 OBJECT  GLOBAL DEFAULT    3 hello
    10: 00000000    23 FUNC    GLOBAL DEFAULT    1 print

LOCAL和 GLOBAL直观地反应了 static 用于限定变量和函数在文件之外是否可访问。

加了 static以后，文件之外不可见。

补充另外一个 nm 工具的结果，针对加 static的情况：

$ nm print.o | egrep "hello$|print$"
00000000 d hello
00000000 t print

不加的情况：

$ nm print.o | egrep "hello$|print$"
00000000 D hello
00000000 T print

上面四个字母有两组大小写，分别对应 data, text 的 LOCAL 和 GOLOBAL 符号，其中 “hello” 是数据，“print” 作为函数处在代码区域。

man nm:
"D" "d" The symbol is in the initialized data section.
"T" "t" The symbol is in the text (code) section.
If lowercase, the symbol is usually local; 
if uppercase, the symbol is global (external). 
There are however a few lowercase symbols 
that are shown for special global symbols ("u", "v" and "w")

延伸介绍到 nm 这个工具是因为，Linux 内核的 System.map 这样的符号表文件经常会被用来调试，这个文件实际上是用 nm 导出来的。

再延伸一个WEAK类型，这个类型类似于不加 static的GLOBAL，但是呢，允许定义另外一个同名的函数或者变量，用来覆盖 WEAK类型的这个：

$ cat print.c
#include 

__attribute__((weak)) char *hello = "hello";

__attribute__((weak)) void print(char *str)
{
  printf("%s\n", str);
}

$ cat hello.c
#include "print.h"

char *hello = "hello, world";

int main(void)
{
  print(hello);
  return 0;
}

$ ./hello
hello, world

这种情况允许某个变量或者函数的multiple definition，如果不定义为 WEAK类型而且不定义为 LOCAL（用 static），这种情况本来是不被允许的：

$ gcc -m32 -o hello x.c print.h print.c
/tmp/ccMO5y0A.o:(.data+0x0): multiple definition of `hello'
/tmp/ccj8KK1s.o:(.data+0x0): first defined here
collect2: error: ld returned 1 exit status

这种用法在内核中被广泛采用，通常用来确保可以添加架构特定的优化函数：

$ grep __weak -ur ./ --include "*.c" | wc -l
413

汇总如下：

从运行的角度

上面从编译和二进制程序文件的角度分析了 static关键字针对文件层面变量和函数的约定，下面再来看看函数内部的变量，在声明为 static与否情况下的异同。

作为对比，把其他类型的变量也纳入进来：

$ cat hello.c
#include 

static int m;
static int n = 1000;
int a;
int b = 10000;

static int hello(void)
{
    static int i;
    static int j = 10;
    int x;
    int y = 100;
    register int z = 33;

    printf("i = %d, addr of i = %p\n", i, &i);
    printf("j = %d, addr of j = %p\n", j, &j);
    printf("x = %d, addr of x = %p\n", x, &x);
    printf("y = %d, addr of y = %p\n", y, &y);
    printf("z = %d, in register, no addr\n", z);

    return 0;
}

int main(int argc, char *argv[])
{
    printf("argc = %d, addr of argc = %p\n", argc, &argc);
    printf("argv = %s, addr of argv = %p\n", argv[0], argv);
    printf("m = %d, addr of m = %p\n", m, &m);
    printf("n = %d, addr of n = %p\n", n, &n);
    printf("a = %d, addr of a = %p\n", a, &a);
    printf("b = %d, addr of b = %p\n", b, &b);

    hello();

    return 0;
}

$ gcc -m32 -o hello hello.c
$ ./hello
argc = 1, addr of argc = 0xffd91f60
argv = ./hello, addr of argv = 0xffd91ff4
m = 0, addr of m = 0x804a030
n = 1000, addr of n = 0x804a020
a = 0, addr of a = 0x804a038
b = 10000, addr of b = 0x804a024
i = 0, addr of i = 0x804a034
j = 10, addr of j = 0x804a028
x = -143124200, addr of x = 0xffd91f24
y = 100, addr of y = 0xffd91f28
z = 33, in register, no addr

用二进制程序文件来佐证：

$ readelf -S hello | grep 804a | tail -2
  [24] .data PROGBITS 0804a018 001018 000014 00  WA 0 0 4
  [25] .bss  NOBITS   0804a02c 00102c 000010 00  WA 0 0 4

$ readelf -s hello |  egrep " m$| n$| a$| b$| i| j| x$| y$"
    36: 0804a030     4 OBJECT  LOCAL  DEFAULT   25 m
    37: 0804a020     4 OBJECT  LOCAL  DEFAULT   24 n
    39: 0804a034     4 OBJECT  LOCAL  DEFAULT   25 i.2021
    40: 0804a028     4 OBJECT  LOCAL  DEFAULT   24 j.2022
    54: 0804a024     4 OBJECT  GLOBAL DEFAULT   24 b
    67: 0804a038     4 OBJECT  GLOBAL DEFAULT   25 a

再补充几点：

1、用 register 定义的变量存放在寄存器中，所以无法获取它们的内存地址（因为根本不存放在内存中）。

可以通过查看汇编代码确认：

$ gcc -m32 -S -o hello.s hello.c
$ grep 33 hello.s
  movl  $33, %ebx

2、函数内用 static 定义的变量名（i 和 j）在符号表中都加了后缀，主要是方便多个函数定义同样的变量名，因为这些变量仅限该函数内（含多次调用）可见。

3、函数内非 static 定义的变量，以及函数参数的传递都是通过 Stack 完成的，这些变量只在函数内（包括 Caller, Callee）可见，外部不可见，所以在符号表中也找不到它们。

4、关于函数参数传递，如果明确改变了调用约定，比如函数明确加了 \_\_attribute\_\_((fastcall))声明，那么部分参数将通过寄存器传递。

不过 main 是例外，因为它的 Caller（ __libc_start_main）默认是通过 Stack传递参数的，再改变它的调用约定就拿不到正确的数据了。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/607503.html

【值得一看】Linux C 语言程序之变量类型解析。

发表评论

评论列表（0条）

【值得一看】Linux C 语言程序之变量类型解析 。

发表评论

评论列表（0条）

【值得一看】Linux C 语言程序之变量类型解析。