Questions tagged [floating-point]

Floating point numbers are approximations of real numbers that can represent larger ranges than integers but use the same amount of memory, at the cost of lower precision. If your question is about small arithmetic errors (e.g. why does 0.2 + 0.1 equal 0.300000001?) or decimal conversion errors, please read the "info" page linked below before posting.

Many questions asked here about floating point math are about small inaccuracies in floating point arithmetic. To use the example from the excerpt, 0.1 + 0.1 + 0.1 might result in 0.300000001 instead of the expected 0.3. Errors like these are caused by the way floating point numbers are represented in computers' memory.

Integers are stored as exact values of the numbers they represent. Floating point numbers are stored as two values: a significand and an exponent. It is not possible to find a significand-exponent pair that matches every possible real number. As a result, some approximation and therefore inaccuracy is unavoidable.

Two commonly cited introductory-level resources about floating point math are What Every Computer Scientist Should Know About Floating-Point Arithmetic and the floating-point-gui.de.

FAQs:

Why 0.1 does not exist in floating point

Floating Point Math at https://0.30000000000000004.com/

Related tags:

ieee-754 (most used standard for floating-point computation)
- half-precision-float (16b float)
- single-precision (32b float)
- double-precision (64b float)
- extended-precision (80b float, usually)
- quadruple-precision (128b float)
types in c and c++
- double
- long-double
aspects of floating point numbers and computations

Programming languages where all numbers are double-precision (64b) floats:

javascript (see Number.MAX_SAFE_INTEGER on MDN and What is JavaScript's highest integer value that a Number can go to without losing precision?)
awk (see Expressions in awk in POSIX)
lua (up to 5.2 only, 5.3 introduced integers; see Changes in the Language in Lua 5.3 manual)

13427 questions

votes

3 answers

How do I find the largest integer less than x?

If x is 2.3, then math.floor(x) returns 2.0, the largest integer smaller than or equal to x (as a float.) How would I get i the largest integer strictly smaller than x (as a integer)? The best I came up with is: i = int(math.ceil(x)-1) Is there a…

python math floating-point

asked Jan 03 '15 at 19:22

pheon

2,255
2
22
32

votes

3 answers

PHP: number_format rounding

Hi I've been having a problem rounding numbers to -0 instead of just a 0 code: output: -0 expected output: 0 I've been looking to any solution but…

php floating-point rounding

asked Dec 24 '14 at 11:58

sa.lva.ge

votes

4 answers

How do calculators work with precision?

I wonder how calculators work with precision. For example the value of sin(M_PI) is not exactly zero when computed in double precision: #include #include int main() { double x = sin(M_PI); printf("%.20f\n", x); //…

floating-point precision calculator

asked Apr 26 '10 at 13:12

zoul

96,282
41
242
342

votes

2 answers

How does Double.isNaN() work?

The sun jdk implementation looks like this: return v != v; Can anyone explain how that works?

java floating-point double nan

asked Apr 20 '10 at 21:59

whiskeysierra

4,761
1
25
36

votes

2 answers

Maximum float value in php

Is there a way to programmatically retrieve the maximum float value for php. Akin to FLT_MAX or std::numeric_limits< float >::max() in C / C++? I am using something like the following: $minimumCost = MAXIMUM_FLOAT_VALUE??; foreach ( $objects as…

php floating-point

asked Apr 16 '10 at 03:02

Alex Deem

4,547
1
19
21

votes

5 answers

Floating point precision in Visual C++

HI, I am trying to use the robust predicates for computational geometry from Jonathan Richard Shewchuk. I am not a programmer, so I am not even sure of what I am saying, I may be doing some basic mistake. The point is the predicates should allow…

c++ visual-c++ floating-point math floating-accuracy

asked Apr 02 '10 at 08:16

user240092

votes

5 answers

Infinity in MSVC++

I'm using MSVC++, and I want to use the special value INFINITY in my code. What's the byte pattern or constant to use in MSVC++ for infinity? Why does 1.0f/0.0f appear to have the value 0? #include #include int main() { float…

c++ visual-c++ floating-point infinity

asked Mar 29 '10 at 13:51

bobobobo

57,855
58
238
337

votes

3 answers

What does fpstrict do in Java?

I read the JVM specification for the fpstrict modifier but still don't fully understand what it means. Can anyone enlighten me?

java floating-point

asked Mar 24 '10 at 09:00

Yuval Adam

149,388
85
287
384

votes

3 answers

Are floating point operations in Delphi deterministic?

Are floating point operations in Delphi deterministic? I.E. will I get the same result from an identical floating point mathematical operation on the same executable compiled with Delphi Win32 compiler as I would with the Win64 compiler, or the OS X…

delphi floating-point cross-platform precision deterministic

asked Jun 22 '14 at 14:08

LaKraven

5,676
2
21
49

votes

3 answers

__builtin_round is not a constant expression

In G++, various builtin math functions are constexpr under certain conditions. For example, the following compiles: static constexpr double A = __builtin_sqrt(16.0); static constexpr double B = __builtin_pow(A, 2.0); They are not always constexpr…

c++ floating-point g++ constexpr built-in

asked May 03 '14 at 17:39

Ambroz Bizjak

7,399
1
34
44

votes

3 answers

MySQL "greater than" condition sometimes returns row with equal value

I'm running into a baffling issue with a basic MySQL query. This is my table: id | rating 1 | 1317.17 2 | 1280.59 3 | 995.12 4 | 973.88 Now, I'm attempting to find all rows where the rating column is larger than a certain value. If I try the…

mysql floating-point floating-accuracy floating-point-conversion

asked Apr 28 '14 at 12:55

sveti petar

3,351
11
52
113

votes

6 answers

Convert float to string without sprintf()

I'm coding for a microcontroller-based application and I need to convert a float to a character string, but I do not need the heavy overhead associated with sprintf(). Is there any eloquent way to do this? I don't need too much. I only need 2 digits…

c string memory floating-point printf

asked Apr 21 '14 at 05:13

audiFanatic

2,014
5
27
50

votes

4 answers

Hex Representation of Floats in Haskell

I want to convert a Haskell Float to a String that contains the 32-bit hexadecimal representation of the float in standard IEEE format. I can't seem to find a package that will do this for me. Does anybody know of one? I've noticed that GHC.Float…

haskell floating-point hex ieee-754

asked Feb 15 '10 at 23:42

Jeremy

votes

2 answers

PostgreSQL round(v numeric, s int)

Which method does Postgres round(v numeric, s int) use? Round half up Round half down Round half away from zero Round half towards zero Round half to even Round half to odd I'm looking for documentation reference.

postgresql floating-point rounding

asked Mar 24 '14 at 15:29

mpapec

48,918
8
61
112

votes

4 answers

Conditional tests in primality by trial division

My question is about the conditional test in trial division. There seems to be some debate on what conditional test to employ. Let's look at the code for this from RosettaCode. int is_prime(unsigned int n) { unsigned int p; if (!(n & 1)…

c math floating-point primes

asked Mar 21 '14 at 10:47

Z boson

29,230
10
105
195

Prev 1 2 3

…

100