Is it appropriate to use off_t for non-byte offsets?

Question

Suppose I'm writing a function which takes a float a[] and an offset, into this array, and returns the element at that offset. Is it reasonable to use the signature

float foo(float* a, off_t offset);

for it? Or is off_t only relevant to offsets in bytes, rather than pointer arithmetic with aribtrary element sizes? i.e. is it reasonable to say a[offset] when offset is of type off_t?

The GNU C Library Reference Manual says:

off_t
     This is a signed integer type used to represent file sizes.

but that doesn't tell me much.

My intuition is that the answer is "no", since the actual address used in a[offset] is the address of a + sizeof(float) * offset , so "sizeof(float) * offset" is an off_t, and sizeof(float) is a size_t, and both are constants with 'dimensions'.

Note: The offset might be negative.

Pretty sure that accessing negative offsets of a pointer is considered evil. — zneak, Jul 15 '13 at 07:59
@zneak: Maybe, and I don't intend to, but `a[-1]` is perfectly valid code. — einpoklum, Jul 15 '13 at 07:59
@einpoklum Of course it compiles. That doesn't make it valid. If `a` is initialized to point to the start of a C style array, `a[-1]` is not valid, at least according to the standard. — James Kanze, Dec 04 '13 at 09:10

Basile Starynkevitch · Answer 1 · 2013-07-15T08:18:12.993

2

You could use size_t or ptrdiff_t as the type of an index (your second parameter is more an index inside a float array than an offset).

^{Your use is an index, not an offset. Notice that the standard offsetof macro is defined to return byte offsets!}

In practice, you could even use int or unsigned, unless you believe your array could have billions of components.

You may want to #include <stdint.h> (or <cstdint> with a recent C++) and have explicitly sized types like int32_t for your indexes.

For source readability reasons, you might define

  typedef unsigned index_t;

and later use it, e.g.

  float foo(float a[], index_t i);

My opinion is that you just should use int as the type of your indexes. (but handle out-of-bound indexes appropriately).

edited Jul 15 '13 at 08:18

answered Jul 15 '13 at 07:50

Basile Starynkevitch

1
16
251
479

Is it common to define an `index_t` type? Also, what if I want an offset which might potentially be negative? – einpoklum Jul 15 '13 at 07:57
He can't use `size_t` because offsets can be negative. In memory, the correct type would be `ptrdiff_t`, or if the domain or validation of input values ensures that overflow can't exist, `int`. – James Kanze Jul 15 '13 at 08:01
2

`size_t` is the best type (if any one type can be said to be "best") for an array index that will never be negative. For indices that may be negative, you can use `ptrdiff_t` or (on POSIX at least) `ssize_t`. See http://stackoverflow.com/questions/12175358/c-size-t-and-ssize-t-negative-value and http://stackoverflow.com/questions/8649018/what-is-the-difference-between-ssize-t-and-ptrdiff-t – torek Jul 15 '13 at 08:01
@torek That is simply incorrect. Unsigned types should be avoided unless you're doing bit manipulations. – James Kanze Jul 15 '13 at 08:04
@JamesKanze where do you get that from? In the contrary, signed types should be avoided wherever you know that you don't need negative values. – Jens Gustedt Jul 15 '13 at 08:06
@JamesKanze: I'm a big fan of GF(2**k) arithmetic, I find it very well behaved. :-) – torek Jul 15 '13 at 08:07
@JensGustedt From Stroustrup and Scott Meyers. And from experience. The standard expression for the difference between two values is `abs(a-b)`. If this doesn't give the right results, you're using the wrong type. – James Kanze Jul 15 '13 at 08:10
@Torek When you want modulo arithmetic, yes. I use unsigned types when calculating a hash code, for example. When it might make sense to subtract two values, no. – James Kanze Jul 15 '13 at 08:11
@JamesKanze, recurring to some authority is not very helpful, but with that argument I would wonder why the C standard in all places (unless bound by historical interfaces) promotes `size_t` for everything that has to do with indexing. `sizeof` returns `size_t` and any indexing that has something as an upper bound that could be the result of a `sizeof` operator should use the same type. – Jens Gustedt Jul 15 '13 at 08:24
@JensGustedt You asked where I got it from, not why it was a bad idea to use unsigned types. And the use of `size_t` in the standard is historically conditionned: by 16 bit implementations where the extra bit made a difference. I suspect that if we were doing it today, everything would be `ptrdiff_t`, to avoid the problems unsigned creates. – James Kanze Jul 15 '13 at 08:40
@JamesKanze, I don't think that anybody in the C standards committee would change anything to that. And unsigned types don't "create" problems. Mixing signed and unsigned arithmetic and types does. And there the one conversion signed -> unsigned is well defined in the standard and the inverse, unsigned -> signed, is implementation defined. One other reason to stick to unsigned types as much as you can. – Jens Gustedt Jul 15 '13 at 09:27
@JensGustedt I don't know about the C committee, but I know a number of members of the C++ committee who are unhappy with the unsignedness of `size_t`. – James Kanze Jul 15 '13 at 13:55
Well at the risk of re-inflaming the argument - I just don't feel like using 'int' for things. It's kind of a shifty type, that one :-) – einpoklum Dec 03 '13 at 19:57
1

@einpoklum What's "shifty" about it? It's the "default" integral type in C and C++. Historically, it was truly a default; you didn't have to declare it. And even today, all of the other standard integral types except the character types are formally modifiers: `unsigned` is actually `unsigned int`, etc. The standard itself says that plain int is the natural integral type, and that the others are "provided to meet special needs". – James Kanze Dec 04 '13 at 09:17

score 2 · Answer 2 · answered Jul 15 '13 at 08:09

2

Is there any good reason why you just don't use int? It's the default type for integral values in C++, and should be used unless there is a good reason not to.

Of course, one good reason could be that it might overflow. If the context is such that you could end up with very large arrays, you might want to use ptrdiff_t, which is defined (in C and C++) as the type resulting from the subtraction of two pointers: in other words, it is guaranteed not to overflow (when used as an offset) for all types with a size greater than 1.

answered Jul 15 '13 at 08:09

James Kanze

142,482
15
169
310

Can you quote some official source suggesting that "`int` should be used unless there's a good reason not to"? I'm not trying to taunt you, I want to read the rationale. – einpoklum Dec 03 '13 at 19:58
@einpoklum The "official" source is the standard: in §3.9.1/2, plain `int` is the natural type, and the others are "provided to meet special needs". In practice, the use of any type other than `int` says that there is a special need. Perhaps additional range, or saving space. Or you need the modulo arithmetic of unsigned types, or are doing bit manipulations. (Using an unsigned type is usually a clear indication that you are _not_ dealing with arithmetic data.) – James Kanze Dec 04 '13 at 09:21

score 1 · Answer 3 · edited Jul 15 '13 at 07:56

1

I would say it is not appropriate, since

off_t is (intended to be) used to represent file sizes
off_t is a signed type.

I would go for size_type (usually a "typedef"ed name for size_t), which is the one used by std containers.

edited Jul 15 '13 at 07:56

einpoklum

86,754
39
223
453

answered Jul 15 '13 at 07:51

Stefano Falasca

7,642
1
13
22

1

`std::vector::size_type` is a typedef of `size_t` (which exists in C). `size_type` does not exist as a standalone type. – zneak Jul 15 '13 at 07:53
But how can off_t mean a file size? An offset is not a size... and an offset can be negative while a size cannot. – einpoklum Jul 15 '13 at 07:54
@einpoklum, it's used with file manipulations, like `fseek`/`lseek`. `size_t`, on the other hand, is defined to be able to hold the maximum size of any array on a platform. Sounds more like what you're looking for. – zneak Jul 15 '13 at 07:54
@einpoklum as far as I know negative values (-1) are used for error reporting (see: http://linux.die.net/man/2/lseek for example) – Stefano Falasca Jul 15 '13 at 07:55
So, in that case, `off_t` is completely off the table... (excuse the pun). `size_t` is still not semantically appropriate though, I think. – einpoklum Jul 15 '13 at 07:56
That's, I think, why the standard uses size_type (which is defined to be a size_t, but has a different name). And that's why I suggested the use of size_type instead of size_t – Stefano Falasca Jul 15 '13 at 07:57
@zneak: Because an offset might be negative; and because the size of something is not the same as the offset of something. It's not the same meaning (even if offsets are assumed to be non-negative). – einpoklum Jul 15 '13 at 07:59
@einpoklum , thanks for the edits. Can you teach me to "highlight" text, as you did, for example, for off_t? – Stefano Falasca Jul 15 '13 at 07:59
Whatever he should use, it shouldn't be `size_t`, since an offset can be negative. In an `std::vector`, an "offset" would be calculated by subtracting two iterators. Which gives a type of `ptrdiff_t`, not `size_t`. – James Kanze Jul 15 '13 at 08:00
@JamesKanze I might have misunderstood the question, but it seems to me that he only wants to be able to use offsets referred to the beginning of the array (the first parameter of the function) – Stefano Falasca Jul 15 '13 at 08:01
@StefanoFalasca It doesn't really matter. The problem is that offset is an arithmetic value, which means that you should avoid the unsigned types. – James Kanze Jul 15 '13 at 08:02
@StefanoFalasca: Use back-quotes (`) around code. You can read more on markdown [here](http://en.wikipedia.org/wiki/Markdown) – einpoklum Jul 15 '13 at 09:08

score 0 · Accepted Answer · answered Jul 15 '13 at 08:02

0

Perhaps the answer is to use ptrdiff_t? It...

can be negative;
alludes to the difference not being in bytes, but in units of arbitrary size depending on the element type.

What do you think?

answered Jul 15 '13 at 08:02

einpoklum

86,754
39
223
453

While others suggested `ptrdiff_t` after me, all of them made statements I don't really agree with. Sorry, guys. – einpoklum Dec 03 '13 at 19:59

Is it appropriate to use off_t for non-byte offsets?

4 Answers4