No. The int to int * truncation does not happen, the cast does not matter. What matters is the `int *p' which is 64-bit on 64-bit systems, so the compiler will just move the returned value in RAX into wherever p is, so no truncation happens, even without the cast. Look at the generated assembly, you'll see what I mean.
Clang is too clever and gives malloc the correct implicit declaration even when you don't have an explicit prototype:
test.c:10:12: warning: implicitly declaring library function 'malloc' with type 'void *(unsigned long)'
[-Wimplicit-function-declaration]
I think that having a prototype mismatch with the actual declared function is undefined behavior, so this is a legal way to resolve it, but not every compiler will. Older compilers tended to treat malloc as just another function and wouldn't do this.
I was able to replicate the older behavior by wrapping malloc in my own function. In one file:
int main()
{
int* p;
p = (int*)wrapped_malloc(sizeof(int));
*p = 10;
return 0;
}
Compiling them into one program results in a crash. The relevant bit of assembly is:
0000000100000f4f movslq %eax, %rdi
So it is indeed only extracting the lower 32 bits of the returned pointer.
The page looks pretty old (the IA-64 reference sure is dated, anyway) so I'd guess that it was referring to an older compiler that didn't have a special case for malloc like this.
Right, C compilers will automatically known it's part of libc. You didn't need to do the separate file thing, you could just have the wrapper and it will trigger the movslq, or a cltq.
The question asks about IA-64 (Itanium) and IA-32. You're talking about the RAX register which is x86-64.
If you call malloc without a prototype in scope, bad things can happen. Just because it happens to work out OK with the platform and compiler that you tested doesn't mean that it will work everywhere or that it will keep working in the future.
His scope of talk sounds as if he is referring to x86-64. Many people mistake the IA-64 for x86-64, that's why I assumed he'd be talking about it. Not to mention that IA-32 refers to x86, which he clearly seems to misunderstand.
I'll have to check if it's still true today, but Windows/the VC++ CRT certainly used not have no qualms about handing out pointers to memory below the 4GByte mark. So if you have a problem like this, it can go undetected for quite some time...
(Don't know about Linux. 64-bit OS X binaries usually start with a 4GByte section at 0, so the bottom 4GBytes simply isn't available.)