Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I love Brendan's article about load averages but I feel like this is missing the mark.

A core, or an execution unit stalling within a core, still count as busy. I.e. the CPU can't process something else. The utilization metric is correct. The question of whether a CPU be used more efficiently is in the domain of optimization. It might be executing NOPs and not stalling. It could be using an O(N^3) algorithm instead of O(N).



With out of order execution, speculation, and SMT, it's hard to say if an instruction stalling means that the CPU can't process something else; CPUs are complex parallel streams of processing and trying to think of them in linear terms necessarily misses some complexity.


Another factor (also mentioned in the article) is dynamic frequency scaling. Is a core at 100% when it's running at it's running 'flat-out' at its nominal frequency or when it has boosted? The boost clock generally depends on thermals, silicon quality and maybe time - so in that case what do you make 100%? If you go for nominal being 100% then sometimes you're going to be at say 120%.


The most obvious answer: 100% is what the manufacturer claims it capable of handling continuously under the worst supported conditions.

The more difficult answer/question is how to communicate that 'full use' value as well as the current use (possibly greater than full) to software which calculates a usage estimate based on various already existing interfaces. Or if yet another interface (standard, if thinking about that XKCD comic) is needed.


I prefer the NASA approach: 100% is whatever as defined in the spec sheet, and any improvements above that are measured in percentages above 100.

As an example, the SSME was nominally operated at 104.5%, and the newer expendable RS-25Es nominally operate at 111%.[1]

So basically: 100% is the CPU running at base frequency, and anything higher (eg: turboing/boosting, overclocking) should result in even higher percentages above 100. This would be a lot more meaningful than whatever "100% CPU load" means today.

[1]: https://en.wikipedia.org/wiki/RS-25#Engine_throttle/output


That's mostly what I stated, though with adjustable frequency products like CPUs and GPUs the question of 'what is the base frequency' is also a question. That's why I carefully phrased around manufacturer's claimed continuous operation in the worst environment supported spec. (Though implicitly with adequate cooling operating, not obviously broken cases like a CPU with it's heat sink shaken off.)


If it could process something else, it would. If it doesn't, it means it can't. So, what are you trying to say?


In the case of SMT (aka hyperthreading) you'd only know if the CPU can process something else by profiling a different thread that's running on the same core.


This is a wrong approach.

What users want from these metrics is the feedback about their hardware performance. It should absolutely reflect on issues related to memory latency. This is not about going faster, this is about making good use of the resource you have.

My typical use of similar metrics is from iostat: a tool that shows various statistics about how the system is doing I/O to block devices. Beside other things, it shows CPU utilization (which, in the context of this tool means the amount of CPU work dedicated to I/O). In the context of looking at the output of this tool, I don't use CPU utilization to directly judge the speed (it has read / write requests per second for that), this aspect tells me if I'm utilizing the capacity of the system to do I/O to its full extent (and I don't care if I may be writing in improperly aligned blocks causing write amplification, or not merging smaller blocks -- I will use different tools for that).

The problem is with CPU utilization as displayed by eg. top and our intuitive understanding of what it means to do work on CPU -- they are different. But, tools that display that utilization go for the metric that are easy to obtain rather than trying to match our intuition / be better sources of actionable information.

We want utilization to count progress along the code instructions, because that's where intuitively we'd draw the line between hardware utilization and software issues. Instead, we get a metric that never over-estimates utilization, but is usually wrong.


I disagree. The general user has no control over the code executing. It's an application written by someone else. When that application is utilizing a core, then it's utilizing a core and this is what this metric is (correctly) telling us. If you're in the business of writing software and trying to squeeze the most out of a core then you use different tools.


These tools aren't for "general user". They are either for system programmers, or for system administrators.

> When that application is utilizing a core,

Core of what? A real CPU? A virtual CPU? Do we count hyperthreading TM?

You are just repeating a term that you didn't define -- "utilization". I did define it in the way that to me seems plausible given how people usually understand it intuitively. You just keep throwing this word around, but you don't even care to explain what you mean.


We start with a physical core. Virtual cores have "virtual" utilization" and similarly hyperthreaded cores (which is a bit of a marketing term that isn't always useful in the real world). Naturally if you want to understand what a VM is doing you need to also look at the hypervisor. If you want to dive into exactly what's going on with hyper-threaded cores it can be harder given you don't have perfect visibility.

A physical core can either be idle. Or it can be executing instructions. The portion of the time that it's executing instructions is when it's utilized. I think this is a pretty clear and meaningful definition that's been used for decades.

A system admin running Outlook on a server is not going to be able to do anything about a pipeline stall in Outlook on some particular CPU/memory/motherboard. From their perspective when the utilization is 100% Outlook is cpu-bound and can't do more work. And that's why we have this metric. A stall, or an unused execution unit, or an inefficient sequence of instructions, or inefficient algorithms or many other things are all things that cause the actual work you're getting out of the core to be less than what you could get if you rewrote the program. This is not what CPU utilization % means. If there are power management or thermal considerations then that's also another thing you need to look at to get a complete picture.

Now Outlook might be I/O bound, which is a different problem, for which we look at different metrics. By the way, your I/O metrics reported by various tools are also all imperfect, things like whether the I/Os are sequential, or random, the block size, the mix of reads and writes, all have their own peculiar performance characteristics. Which again are of interest for some people optimizing I/O but not generally something that users of applications can do much about.

EDIT: It feels like you are looking for something that tells you as a programmer how much more you can squeeze out of your CPU. There's no such metric. It's up to you to use tools like profilers and your understanding of architecture and your imagination to figure that out. The utilization metric is super useful. I use it a lot. I've used it for years. Do I need to understand all the other factors that influence it - sure do. Is it something I'd use instead of profiling? no.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: