'ps' can hang for all sorts of reasons. If you have a process on an nfs-mount that's hung, 'ps' will stat the 'exe' instead of the link itself and wait forever for stat to return. solution? read /proc/<pid>/status for the program name and lstat the exe (there's really no reason to stat it in the first place)
Yes, I was a little surprised that this article is "news". I was also amused that the author references a previous article that apparently "made a big splash" about fork() failing, when... duh. Any competent C programmer or sysadmin should know both of these things.
Why is that? Doesn't it make more sense to have a deadline of, say, 10 seconds after which the stat/read/etc will simply fail? I remember trying to fix this for broken NFS mounts and not succeeding.
Basically, in order to maintain the integrity of the filesystem state, it is assumed that all NFS operations are only temporarily unavailable, and the system generally waits forever for the server to respond. If the kernel interrupted the operation with the client, the client might decide to act in a way that negatively affects the state of the filesystem.
Of course there's no reason for 'ps' not to build in its own timeout for i/o. It could cause premature failure on loaded boxes, but it wouldn't hurt anything.