More

jlgustafson · 2025-06-28T23:14:59 1751152499

Sorry to be late to this discussion. Please be aware that there are a number of variations of the original posit definition that preserve the elegant properties (2's complement negation, perfect reciprocals of the integer powers of 2, etc.) but correct the loss of accuracy at extreme magnitudes. By using 3 exponent bits instead of 2, and limiting the number of regime bits to a maximum of 6, you get a dynamic range of about 1e–15 to 1e15 that is independent of the precision and has a quire with only 256 bits. The decoding is MUCH simpler when the regime is limited... we're seeing about 40% reduction in circuit area.

I call these "b-posits" for bounded posits. They are described in Chapters 6 and 13 of my latest book, Every Bit Counts: Posit Computing.

jlgustafson · on May 13, 2023

IEEE 754 is just the codification of the Intel 8087 coprocessor design that John Palmer and Bruce Ravenel came up with. They brought in William Kahan as a consultant, and Kahan disagreed with almost every aspect of their design (he wanted decimal representation, not binary, and 128-bit extended precision instead of 80-bit, and bitwise reproducibility, not 'better answers on Intel') but he lost every argument. Kahan's clout helped Intel's design become the IEEE Std 754, and John Palmer chortled over the fact that they'd foisted that on the world. I used to work for him, so this is first-hand info. IEEE 754 is not a mathematical design, and the exception conditions are a complete mess, which is why it takes almost 90 pages to describe the Standard. The Posit Standard (2022) is only 12 pages long.

The #1 issue in computer performance is The Memory Wall... it is orders of magnitude more expensive to move data between external DRAM and the processor than it is to do operations within the processor. The solution is to increase information-per-bit so that real numbers can be represented in 32-bit precision with sufficient accuracy. That more than doubles the performance over 64-bit floats since it allows more data to fit in cache at every level of the memory hierarchy.

pclmulqdq · on May 15, 2023

The 754 spec has somewhat moved past the 8087 at this point (3 revisions later). A lot of things have been fixed, including the whole language around exceptions (which used to define "traps" - a very processor-specific idea rather than an arithmetic-centered one). I am hoping we can be free of (required) exceptions in 2028.

As I understand it, your other complaints tend to center around overflow to infinity and precise summation of vectors. For applications that really care about that precision, there are ways to do it in floating point without a quire register - sorting before summing is the naive approach, but look into ReproBLAS for some better algorithms.

Also, I can't help but wonder if the memory wall idea here is centered only around synthetic benchmarks like gigantic dot products. A lot of code leans heavily on caches these days, which make the energy cost of operations a lot lower, and pretty much everything short of massive dot products uses them. I imagine you would have to make a very nuanced argument about why a 1k fixed point sum is saving energy here. Even matmuls are pretty cache-efficient now.

Elsewhere in computing, we are actually generally moving away from tightly-packed structs in performance-sensitive code despite the memory retrieval cost, because they are just easier to deal with in both hardware and software, and locality picks up all the slack.

jlgustafson · on May 13, 2023

Lots of free information on posits at www.posithub.org.

jlgustafson · on May 13, 2023

The real hope is that 32-bit posits (with 512-bit quire for exact dot products and exact sums) can replace 64-bit floats where users hope 15-decimal accuracy in every variable means they don't have to learn numerical analysis. When you can do all your linear algebra to 8-decimal accuracy with 32-bit posits, the need for 64-bit representation starts to look expensive and unnecessary.

Also, please note that all traditional algorithms are wary of the disasters of overflow to infinity and underflow to zero, so they tend to manage the magnitudes of numbers to prevent that. Posits take advantage of that by decreasing relative error when the exponent scaling is not extreme. Standard 64-bit posits (2 exponent bits) have 60-bit significands, versus 53-bit significands for IEEE floats, for values between 1/16 and 16 in magnitude. And floats do not have anything like the quire, since an exact dot product accumulator for 64-bit IEEE floats has to be something like 4,664 bits wide (an ugly number) and has no provisions for infinities and NaN values.

jlgustafson · on Oct 31, 2021

Exactly. Floating-point numbers cannot represent such numbers without error, yet we still see people asserting that floats represent "the entire real number line." The first format to represent such numbers honestly was the original unum format, where the last bit of the fraction indicates if the number is exact or represents the open interval between exact numbers. Like saying pi is 3.14... means pi is between 3.14 and 3.15, a mathematically honest statement. The presence or absence of "..." as a bit in the number was the main idea behind unum arithmetic.

jlgustafson · on Oct 31, 2021

"The End of Error: Unum Computing" is written for a popular audience, not fellow mathematicians. Only high school math is needed, and it's got plenty of humor and full-color illustrations and figures, in an attempt to make a very dry topic into something interesting.

A book on posit arithmetic is in the works.

jlgustafson · on Oct 31, 2021

The original Stanford talk on posits suggested that they generate an exception and not the equivalent of a NaN. A few months later, I changed my mind and the (unique) NaR bit pattern serves the same purpose as NaN does in floats. We have also learned that the best exponent size (es or eS) is 2 bits, independent of the precision of the posit. So there have been some tweaks, but the basic concept is unchanged.

jlgustafson · on Oct 31, 2021

I hope I'm not too late to the party to correct some things I see here. The big accomplishment of Kahan and IEEE 754 was to get companies to agree on where the sign, exponent, and fraction should go, so that data interchange finally became possible between different computer brands.

Kahan wanted decimal floats, not binary, and he wanted Extended Precision to be 128, not 80. I've had many hours of conversation with the man about how Intel railroaded that Standard to express the design decisions that had already been made for the i8087 coprocessor. John Palmer, who I also worked with for years, was proud of this, and told me "Whatever the i8087 is, THAT is the IEEE Standard."

Posits have a single exception value, Not a Real (NaR) for all things that fall through the protections of C and Java and all the other modern languages for things like division by zero, and the square root of a negative value. Kahan wanted the quadrillions of Not a Number (NaN) patterns to be used to encode the address of the instruction in the program to pinpoint where it happened, but the support for this in computing languages never happened. By around 2005, vendors noticed they could trap the exceptions and spend hundreds of clock cycles handling them with microcode or software, so the FLOPS claims only applied to normal floats, not subnormals or NaN or infinities, etc. This is true today for all x86 and ARM processors, and SPARC for that matter. Only the POWER series from IBM can still claim to support IEEE 754 in hardware; hardware support for IEEE 754 is all but extinct.

There are over a hundred papers published comparing posits and floats, both for accuracy on applications and difficulty of implementation. LLNL and Oxford U have definitively showed that posits are much more accurate than floats on a range of applications, so much so that a lower (power-of-two) precision can be used. Like 32-bit posits instead of 64-bit floats for shock hydrodynamics, and 16-bit posits instead of 32-bit floats for climate and weather prediction. For signal processing, 16-bit posits are about 10 dB more accurate (less noise) than 16-bit floats, which means they can perform lossless Fast Fourier Transforms (FFTs) on data from 12-bit A-to-D convertors.

For the same precision, posit hardware add/subtract units appear slightly more expensive than float add/subtract, and multiplier units are slightly cheaper for posits than for floats. This echos what was found comparing the speed of the Berkeley SoftFloat emulator with that of Cerlane Leong's SoftPosit emulator. Naive studies say posits are more expensive because they first decode the posit into float subfields, apply time-honored float algorithms, then re-encode the subfields into posit format. This does not exploit the perfect mapping of posits to 2's complement integers.

Float comparison hardware is quite complicated and expensive because there are redundant representation like –0 and +0 that have to test as equal, and redundant NaN exceptions that have to test as not equal even when their bit patterns are identical. Posit comparison hardware is unnecessary because they test exactly the same way as 2's complement integers. NaR is the 2's complement integer that has no absolute value and cannot be negated, 1000...000 in binary. It is equal to itself and less than any real-valued posit.

The name is NaR because IEEE 754 incorrectly states that imaginary numbers are not numbers, and sqrt(–1) returns NaN. The Posit Standard is more careful to say that it is not a _real_.

The Posit Standard is up to Version 4.13 and close to full approval by its Working Group. Don't use any Version 3 or earlier. The one on posithub.org may be out of date. In Version 4, the number of eS bits was fixed at 2, greatly simplifying conversions between different precisions. Unlike floats, posit precision can be changed simply by appending bits or rounding them off, without any need to decode the fraction and the scaling. It's like changing a 16-bit integer to a 32-bit integer; it costs next to nothing, which really helps people right-size the precision they're using.

jlgustafson · on July 10, 2019

At the risk of a "flame war" where there are no winners, I would like to comment on some the statements here before they get stale. If we avoid ad hominem attacks and stick to the math, the claims, and counterexamples, this can be a useful scientific discussion and I very much welcome all the criticism of my ideas.

The irreproducibility of IEEE 754 float calculations is well documented... on Wikipedia, by William Kahan, and in an excellent paper by David Monniaux titled "The pitfalls of floating-point computations". It is amazing that this is tolerated, but IEEE 754 has done a great deal to lower the expectations of computer users regarding mathematically correct behavior.

The posit approach is not merely a format but also the Draft Standard. Whereas floats can arbitrarily use "guard bits" to covertly do calculations with greater accuracy, the posit standard rules that out. Whereas the float standard recommends that math functions like log(x), cos(x) etc. be correctly rounded, the draft posit standard mandates that they be correctly rounded (or else they have to use a function name that clarifies that they are not the correctly-rounded function). By the draft posit standard, you cannot do anything not specified in the source code (like noticing that a multiply and an add could be fused into a multiply-add with deferred rounding, so calling fused multiply-add without telling anyone). The source code completely defines what the result will be, bitwise, or it is not posit-compliant. It cannot depend on internal processor flags, optimization levels, or special hardware with guard bits to improve accuracy; this is what corrupted the IEEE 754 Standard and made it an irreproduci ble environment to this day.

The claim that posits is a "drop-in" replacement for floating point needs a lot of clarification, and this is unfortunately left out of much of the coverage of the ida. Clearly, if an algorithm assigns a hexadecimal value to encode a real value, that will need work to port from IEEE floats to posits. The math libraries need to be rewritten, as well as scanf and printf in C and their equivalent for other languages. However, a number of researchers have found that they can substituted a posit representation for a float representation of the same size, and they get more accurate results with the same number of bits. I call that "plug-and-play" replacement; yes, there are a multitude of side effects that might need to be managed, but it's nothing like the jarring change, say, of moving from serial execution to parallel execution. It's really pretty easy, and it's easy to build tools that catch the 'gotcha' cases.

Some here have suggested the use of rational number representation, or said that there are redundant binary representations of the same numerical value. Unlike floats, posits do not have redundancy. I suspect someone is confused by the Morris approach to adjusting the tradeoff between fraction bits and exponent bits, which produces many redundant wa6s to express the same mathematical value.

Perfect additive associativity is available, as an option, with the quire. If needed. Multiplicative associativity is available, as an option, by calling fused multiply-multiply in the draft posit standard. Because quire operations appear to be both faster (free of renormalization and rounding) and more accurate (exact until converted back to posit form), I am puzzled regarding why anyone would want to do things more slowly and with less accuracy.

Kulisch blazed the way with his exact dot product; unfortunately, any exact dot product based on IEEE floats will have an accumulator with far too many bits (like 4,224 for IEEEE double precision) and an accumulator that is just a bit larger than a power-of-two size. The "quire" of posits is always a power-of-two, much more hardware-friendly. It's 128 bits for 16-bit posits, and 512 bits for 32-bit posits, the width of a cache line on x86, or a an AVX-512 instruction.

"A little knowledge is a dangerous thing." In evaluating posit arithmetic, please use more than what you see in a ycombinator blog. You might discover that there are several decades of careful decision-making behind the design of posit arithmetic. And unlike Kahan, I subject my ideas to critical review by the community and learn from their input. The 1985 IEEE format is grossly overdue for a change.

milankl · on July 12, 2019

I want to add a few comments as most of the discussions here concerned the hardware implementation and only few pointed to possible applications. I work on weather and climate simulations, but my opinions should apply in general to CFD or PDE-type problems.

Yes, having redundant bitpatterns is not great when designing a number format, however, even for Float16 (half-precision), making use of the 3% NaNs is wise, but not going to be a gamechanger. Some others discussed pro/con for neg zero and also neg infinity: In my view you want to have a bit pattern that tells you that the answer you get is not real, but whether it's +/- Inf or some NaN is pretty much irrelevant. Using these bit patterns for something else sounds like a very reasonable approach to me. Furthermore, I've never come across a good reason for -0 in our applications.

When it comes to weather and climate models in HPC, I see the following potential for posits: Similar as BFloat16 is supported on TPUs, I could see Posit16 to be supported by some specialised hardware like GPUs, FPGAs etc. I'm saying that because for us it's not important to have a whole operating system running in posits (although I probably wouldn't mind) but to have them for some performance critical algorithms. Unfortunately, weather and climate models are far more complex than some dot products and we usually have to deal with a whole zoo of algorithms causing weather and climate models to cover easily several million lines of code. Now let's say we know our model spends 20% of the time in algorithm A which only requires a certain (low) precision to be stable and to yield reasonable results, then it would be indeed a big game changer if we could run this algorithms in, say, 16bit. In exchange of precision for speed we would probably want to push things to the edge, i.e. if we can just about do it in 16bit, then we should. Now there are several 16bit formats: Float16, BFloat16, Posit16, Posit16_2 (with 2 exp bits), and technically also Int16. Let's forget about the technical details of these formats and let's focus on where they actually considerably differ: What is the dynamic range and where on the real axis do I get how much precision to represent numbers. Yes, from a computer science perspective also the technical details matter, but from our perspective most of it is pretty irrelevant and what actual matters are these two things: dynamic range and where is the precision. Because these two really determine whether your algorithm is gonna crash or whether you can use it operationally on your desktop computer or in a big fat $$$ supercomputer.

For PDE-type problems (that includes CFD and also weather and climate models) I came within the last year of my research to the following preliminary conclusions regarding dynamic range and precision with respect to the above mentioned formats:

Int16: Let's forget about it. Float16: The precision is okay, but rarely needed towards the edges of the dynamic range. Floatmin might work, however, floatmax with 65504.0 is easily a killer. Might work with a no-overflow rounding mode and smart rewriting of algorithms to avoid large numbers. BFloat16: For our applications having only 7 significant bits is not enough, I didn't come across a single sophisticated algorithm that works with BFloat16. Posit16 (with 1 exp bit): Great, puts a lot of precision where it's needed but also allows for a reasonable dynamic range. Posit16 (with 2 exp bits): Probably even better, the sacrifice of a bit precision in the middle is fine and the wide dynamic range gives it the potential to also work with algorithms that are hard to squeeze into a smaller dynamic range.

In short, posits actually fit much better the numbers our algorithms produce. And this can indeed be the game changer: If a GPU supports posit arithmetic and we can run algorithm A on it in 16bit: Wonderful, contract sold! But if we couldn't with BFloat16 or Float16 than there is no future for 16bit in our field.

I explain more about this in this paper: dx.doi.org/10.1145/3316279.3316281

And there are two talks which tell a similar story: https://www.youtube.com/watch?v=XazIx0cMVyg https://www.youtube.com/watch?v=wp7AYMWlPLw

or simply drop me an email if you have questions (unlikely respond here) that you find on my website: milank.de

jlgustafson · on April 2, 2017

Actually, posits handle NaNs already, by interrupting the calculation and doing whatever you have set up to handle the exception. What they do NOT do is represent Not-a-Number with a number. If a programmer is about to compute, say, x/y and in running the code it sometimes hits the case y = 0, then any competent programmer can write a conditional test to guard against that happening. It is not reasonable to ask computer hardware to magically continue to work somehow when it hits a bug in a program.

Think of it this way: posits have a signaling NaN but do not have a quiet NaN.