j***y 发帖数: 2074 | 1 在193~194页,书里谈到了一个下面的问题:
---
Signedness of char. In C and C++, it is not specified whether the char data type is signed or unsigned. This can lead to trouble when combining chars and ints, such as in code that calls the int-valued routine getchar(). If you say
? char c; /* should be int */
? c = getchar();
the value of c will be between 0 and 255 if char is unsigned, and between -128 and 127 if char is signed, for the almost universal configuration of 8-bit characters on a two's complement machine. This has implications if the character is to be used as an array subscript or if it is to be tested against EOF, which usually has value -1 in stdio.
For instance, we had developed this code in Section 6.1 after fixing a few boundary conditions in the original version. The comparison s[i] == EOF will always fail if char is unsigned:
? int i;
? char s[MAX];
?
? for (i = 0; i < MAX-1; i++)
? if ((s[i] = getchar()) == '\n' || s[i] == EOF)
? break;
? s[i] = '\0';
When getchar returns EOF, the value 255 (0xFF. the result of converting -1 to unsigned char) will be stored in s[i]. If s[i] is unsigned, this will remain 255 for the comparison with EOF, which will fail.
Even if char is signed, however, the code isn't correct. The comparison will succeed at EOF, but a valid input byte of 0xFF will look just like EOF and terminate the loop prematurely. So regardless of the sign of char, you must always store the return value of getchar in an int for comparison with EOF.
Here is how to write the loop portably:
int c, i;
char s[MAX];
for (i = 0; i < MAX-1; i++) {
if ((c = getchar()) == '\n' || c == EOF)
break;
s[i] = c;
}
s[i] = '\0';
---
初看似乎有理,但万一机器上从char到int转换时用的是符号扩展(sign extension,见K&R的The C Programming Language上第44页)的话,还是会有问题吧。
假设文件里真的包含一个0xFF的字符,那么getchar()读出来之后,赋值给c之前要转换为int,如果是用的符号扩展,还是会变成-1吧?这样不是还没读到文件末尾就结束了?
我的理解对吗? | m*********2 发帖数: 701 | 2 wow, good point.
yea, i think what the author saying is:
EOF == -1
0xFF == 255.
that's why you want to use int.
it's large enough to differentiate whether it's -1 or 255.
the short answer is:
getchar() returns int.
and c is int.
so, you are comparing integers, NOT char.
data type is signed or unsigned. This can lead to trouble when combining
chars and ints, such as in code that calls the int-valued routine
getchar(). If you say
between -128 and 127 if char is signed, for the almost universal
configuration of 8-bit characters on a two's complement machine. This
has implications if the character is to be used as an array subscript or
if it is to be tested against EOF, which usually has value -1 in stdio.
few boundary conditions in the original version. The comparison s[i] ==
EOF will always fail if char is unsigned:
【在 j***y 的大作中提到】 : 在193~194页,书里谈到了一个下面的问题: : --- : Signedness of char. In C and C++, it is not specified whether the char data type is signed or unsigned. This can lead to trouble when combining chars and ints, such as in code that calls the int-valued routine getchar(). If you say : ? char c; /* should be int */ : ? c = getchar(); : the value of c will be between 0 and 255 if char is unsigned, and between -128 and 127 if char is signed, for the almost universal configuration of 8-bit characters on a two's complement machine. This has implications if the character is to be used as an array subscript or if it is to be tested against EOF, which usually has value -1 in stdio. : For instance, we had developed this code in Section 6.1 after fixing a few boundary conditions in the original version. The comparison s[i] == EOF will always fail if char is unsigned: : ? int i; : ? char s[MAX]; : ?
| j***y 发帖数: 2074 | 3
But this is not true when the system is using sign-extension in casting char
to int.
With sign-extension, 0xFF is the same as -1, isn't it?
【在 m*********2 的大作中提到】 : wow, good point. : yea, i think what the author saying is: : EOF == -1 : 0xFF == 255. : that's why you want to use int. : it's large enough to differentiate whether it's -1 or 255. : the short answer is: : getchar() returns int. : and c is int. : so, you are comparing integers, NOT char.
| m*********2 发帖数: 701 | 4 getchar() returns int.
char
【在 j***y 的大作中提到】 : : But this is not true when the system is using sign-extension in casting char : to int. : With sign-extension, 0xFF is the same as -1, isn't it?
| c****p 发帖数: 6474 | 5 没有。。。。。。
getchar()返回值是int,所以EOF返回-1,0xff返回255。
所以没问题。
char
【在 j***y 的大作中提到】 : : But this is not true when the system is using sign-extension in casting char : to int. : With sign-extension, 0xFF is the same as -1, isn't it?
| w*******s 发帖数: 138 | 6 书上说的是对的
EOF 是 (int)-1
unsigned char c = EOF;
c是0xff
c == EOF // false
(int)c == 255 // 不管是不是sign-extension都一样
char
【在 j***y 的大作中提到】 : : But this is not true when the system is using sign-extension in casting char : to int. : With sign-extension, 0xFF is the same as -1, isn't it?
| j***y 发帖数: 2074 | 7 谢谢大家啊,我刚才不自觉地就把getchar()的返回值认为是char了。真是糊涂。
顺便问一下,一个文本文件中可以包含0xFF这样的字节吗(不是文件末尾)? | m*********2 发帖数: 701 | 8 yea....
EOF is a special character, and is platform-dependent.
so, it doesn't have to be -1 or 0xFF
and, welcome to a programmer's life.
You are always f*cking working WITH other people's bug (in design or
implement)
【在 j***y 的大作中提到】 : 谢谢大家啊,我刚才不自觉地就把getchar()的返回值认为是char了。真是糊涂。 : 顺便问一下,一个文本文件中可以包含0xFF这样的字节吗(不是文件末尾)?
| j***y 发帖数: 2074 | 9 我以前就是修bug的,不过,自打辞职后,已经在家歇了大半年了,郁闷啊。正在补基
础知识。 |
|