Hacker Newsnew | past | comments | ask | show | jobs | submit | ato's commentslogin

This looks like it:

https://bugzilla.redhat.com/show_bug.cgi?id=1102343

I guess they applied that change which was obviously written against a very different init script where the variable is actually defined, got QA to test it and immediately backed it out.


Oh, good call. Yeah, you found it.


The index for Wayback is a massive sorted text file (called a CDX) containing a line for each URL and timestamp. For very large installations this index is sharded across multiple servers and queried in parallel. The lookups are done using plain old binary search.

http://archive.org/web/researcher/cdx_file_format.php

Each CDX record maps a URL-timestamp pair to a byte offset into an ARC or WARC file. These are essentially just gzipped HTTP responses concatenated together:

http://archive.org/web/researcher/ArcFileFormat.php http://www.digitalpreservation.gov/formats/fdd/fdd000236.sht...

The document is retrieved, uncompressed, URLs are rewritten, the navigation banner javascript injected and the result is sent to the client.

The code is here: https://github.com/internetarchive/wayback


How do you get a hold of the list of urls?


Indeed, perhaps it would be a better way to model the problem, I considered it but perhaps rejected it incorrectly. By default Clojure uses a thread-pool for atoms so it wouldn't be a new thread for each request, the overhead is just a queue push which is not that great. It would mean serializing all the updates, but you could use fine-grained agents like I'm using fine-grained atoms.


Thanks. Yeah, fast anything code tends to be not very pretty to look at and I'm not a very experienced lisper. But fortunately in most real-world programs the part that is performance critical tends to be very small compared to the rest of the program.


Oh, I didn't mean it as a criticism -- I'm afraid it has to be that way. My Common Lisp code that has been optimized looks similar (sprinkled with type declarations in various places).


It's not really an 32-core machine. It's an 8 core UltraSPARC T1, each core supporting 4 hardware threads. Also this is the sort of problem that C absolutely kills at. C can just mmap the file and run straight across raw the bytes and take advantage of all kinds of cache tricks. For well-written C code the problem is basically IO bound and I was surprised I could catch up to it at all with the JVM.


8 core, with each core having 4 threads, will soon be the norm. Forward thinking: Imagine that in a phone!


"You could conceivably lose half your brain and live. Which means your brain could conceivably be split into two halves and each transplanted into different bodies."

Unfortunately this does not follow. In most cases, if you lose half your brain you would die. If by chance, after losing half your brain you continue to live, that means the physical half that is gone was the less essential part. I am not a neurologist but I'd guess it is very unlikely that if you lost the part that survived you could still live.

Regards,

Atakan Gurkan


Actually, people have had entire hemispheres of their brain removed and continued to survive, with other parts of the brain taking over for the jobs of the removed parts.

In addition, to treat epilepsy brain surgeons sometimes cut the corpus callosum, or the nerves connecting the two hemispheres. The two halves of the body act almost like the bodies of two different people who just happen to be connected.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: