I thought 32-bit addressing gave 4GB of addresses…is there some sort of flag that's taking one bit? Not trying to be a smartass, just curious about the discrepancy.
Does it really make sense for them to live in the heap rather than in the disk buffers?
I presume that you're not hitting a melt-down from CPU time, so I'd assume you'd get better performance by letting the system handle paging in data from the disk and figuring out which stuff needs to be in memory (buffers) and lives out on disk.
Also one thing to watch out for when you make the 64-bit jump is that (internally pointer-heavy) applications in dynamic languages tend to use significantly more memory on 64-bit platforms. davidw talked about this some here:
It does give 4 GB of addressable space, but many things are memory mapped by the OS, and so on a 32-bit system you end up with anywhere from 2.8 to 3.5 GB of addressable memory.
So by 2GB he meant usable, not addressable? I understand that addressable RAM != 4GB due to graphics card memory, BIOS, etc. but 2GB is way less than you could get on a 32-bit server. I just wanted to understand if there's something I'm missing.
The OS sometimes reserves quite a chunk. For example, Windows XP really only has 2.25GB available. Putting more RAM in is pretty useless unless you switch to 64-bit.
No. The Windows and Linux kernels are hardwired to reserve a huge chunk of virtual addresses for kernel memory space. Windows can be toggled between reserving 2Gb and 1Gb (the latter of which Linux does by default). I assume this is so that a system memory address can be identified by testing a couple bits, and the minimum size is presumably limited by the chunk of addresses eaten up by memory-mapped devices like video cards.
It's common for dynamic languages to embed typing information in pointers as an optimization. For example, CLISP uses at least 2 bits to distinguish between common types. That way fixnum numbers can be recognized and added without slow memory accesses.
The result is that you get less bits for the address. Hence less addressable memory.
Actually, no. These tag bits are usually stored in the lowest bits which are zero for all pointers (you would be mad not to align your data structurs to the four or eight byte boundaries your hardware uses for memory access). So you get the full width for pointers, but reduced width for your fixnums, because you have to set one of the least significant bits of the machine word to one to distinguish it from a pointer.
That you still can't use the full 4GB of a 32 Bit address space is due to the fact that the OS needs some address space for itself, the details of this vary from OS to OS and what the runtime of your language does with the addresses the OS allows it to use. So beeing able to use more than 2GB on a 32 bit architecture should not be taken for granted.
Not to mention that many languages written in C use unions to describe their primitive -type- [ed: object]. The result is that the minimum number of bytes for storing an integer for instance, is the minimum number of bytes that can store a value of the largest type.
For example, if you have an string type, which keeps track of it's length, then you might need 8 bytes. 4 for the pointer to the string of chars, 4 for the integer to keep count.
Here's a better example from tinyscheme:
struct cell {
unsigned int _flag;
union {
struct {
char *_svalue;
int _length;
} _string;
num _number;
port *_port;
foreign_func _ff;
struct {
struct cell *_car;
struct cell *_cdr;
} _cons;
} _object;
};
As a minimum, each object takes up max(sizeof(_string), sizeof(num), sizeof(port), sizeof(_cons), sizeof(foreign_func)); And num is defined as follows:
typedef struct num {
char is_fixnum;
union {
long ivalue;
double rvalue;
} value;
} num;