I thought 32-bit addressing gave 4GB of addresses…is there some sort of flag tha...

pg · on March 12, 2009

There ends up being only 2GB of heap, which is where all the stories and comments live.

wheels · on March 12, 2009

Does it really make sense for them to live in the heap rather than in the disk buffers?

I presume that you're not hitting a melt-down from CPU time, so I'd assume you'd get better performance by letting the system handle paging in data from the disk and figuring out which stuff needs to be in memory (buffers) and lives out on disk.

Also one thing to watch out for when you make the 64-bit jump is that (internally pointer-heavy) applications in dynamic languages tend to use significantly more memory on 64-bit platforms. davidw talked about this some here:

http://journal.dedasys.com/2008/11/24/slicehost-vs-linode

spydez · on March 12, 2009

It does give 4 GB of addressable space, but many things are memory mapped by the OS, and so on a 32-bit system you end up with anywhere from 2.8 to 3.5 GB of addressable memory.

timtrueman · on March 12, 2009

So by 2GB he meant usable, not addressable? I understand that addressable RAM != 4GB due to graphics card memory, BIOS, etc. but 2GB is way less than you could get on a 32-bit server. I just wanted to understand if there's something I'm missing.

gnaritas · on March 12, 2009

2 gig is all you're going to get for a single process anyway, that's some kind of limit if I recall correctly. This site is a single process.

protomyth · on March 12, 2009

The OS sometimes reserves quite a chunk. For example, Windows XP really only has 2.25GB available. Putting more RAM in is pretty useless unless you switch to 64-bit.

vinutheraj · on March 12, 2009

Can't this be changed, you know, by hacking the registry or something ?!

jcl · on March 12, 2009

No. The Windows and Linux kernels are hardwired to reserve a huge chunk of virtual addresses for kernel memory space. Windows can be toggled between reserving 2Gb and 1Gb (the latter of which Linux does by default). I assume this is so that a system memory address can be identified by testing a couple bits, and the minimum size is presumably limited by the chunk of addresses eaten up by memory-mapped devices like video cards.

Here's a page with some more details: http://news.ycombinator.com/item?id=452005

hapless · on March 12, 2009

There's a boot-time argument that will move the barrier to 3G instead of 2G.

jganetsk · on March 12, 2009

IIRC, Linux uses the high-bit to distinguish between userspace and kernel space.

apgwoz · on March 12, 2009

dynamic languages end up using more memory, because extra information has to be stored about each item (I.e. Type, tc info, etc).

ynd · on March 12, 2009

Down-voters, he is also right.

It's common for dynamic languages to embed typing information in pointers as an optimization. For example, CLISP uses at least 2 bits to distinguish between common types. That way fixnum numbers can be recognized and added without slow memory accesses.

The result is that you get less bits for the address. Hence less addressable memory.

fhars · on March 12, 2009

Actually, no. These tag bits are usually stored in the lowest bits which are zero for all pointers (you would be mad not to align your data structurs to the four or eight byte boundaries your hardware uses for memory access). So you get the full width for pointers, but reduced width for your fixnums, because you have to set one of the least significant bits of the machine word to one to distinguish it from a pointer. That you still can't use the full 4GB of a 32 Bit address space is due to the fact that the OS needs some address space for itself, the details of this vary from OS to OS and what the runtime of your language does with the addresses the OS allows it to use. So beeing able to use more than 2GB on a 32 bit architecture should not be taken for granted.

apgwoz · on March 12, 2009

While that's true for some things, that doesn't account for the memory overhead of a copying garbage collector.

apgwoz · on March 12, 2009

Not to mention that many languages written in C use unions to describe their primitive -type- [ed: object]. The result is that the minimum number of bytes for storing an integer for instance, is the minimum number of bytes that can store a value of the largest type.

For example, if you have an string type, which keeps track of it's length, then you might need 8 bytes. 4 for the pointer to the string of chars, 4 for the integer to keep count.

Here's a better example from tinyscheme:

    struct cell {
      unsigned int _flag;
      union {
        struct {
          char   *_svalue;
          int   _length;
        } _string;
        num _number;
        port *_port;
        foreign_func _ff;
        struct {
          struct cell *_car;
          struct cell *_cdr;
        } _cons;
      } _object;
    };

As a minimum, each object takes up max(sizeof(_string), sizeof(num), sizeof(port), sizeof(_cons), sizeof(foreign_func)); And num is defined as follows:

    typedef struct num {
       char is_fixnum;
       union {
          long ivalue;
          double rvalue;
       } value;
    } num;

apgwoz · on March 12, 2009

eeek. that should have said GC info (stupid iPod touch keyboard)