Reading specific line in txt file with C

Question

I am working with Mac OSX, programming in C and using bash in terminal.

I am currently trying to make a lookup table for the gamma function. Calling gsl_sf_gamma I have been told is pretty expensive and a lookup table would be far faster. I did not wish to lose too much accuracy so I wanted to have a fairly large lookup table. Initializing a huge array would not be ideal since it then defeats the purpose.

My thoughts where to make a large text file with the values pre evaluated for the gamma function in the range of interest. A major problem with this is that I don't know how to call a specific line within a text file using C.

Thanks for any insight and help you guys can offer.

Warning: I know very little about strings and txt files, so I might just not know a simply function that does this already.

You'll have to explain this gamma function, and what it actually does. — Magn3s1um, May 31 '13 at 20:43
Why a text lookup file? If you're going to precompute the values, why not compute them in the format you want to use them? — Carl Norum, May 31 '13 at 20:44
The gamma function is the typical mathematical definition for a [gamma function](http://en.wikipedia.org/wiki/Gamma_function). The precomputed values are just digits, I want a lookup table of values of the function, somewhere around 100,000 values. Which initializing an array with 100,000 unique entries wouldn't be efficient(?). If I had a large text dump with all the digits sorted by input, if I could just call a specific line in the file, I think this would be the fastest. — Novice C, May 31 '13 at 20:53
@NoviceC: You can use a file to specify your lookup values, but why not just store the table in memory? — jxh, May 31 '13 at 20:58

jxh · Answer 1 · 2013-05-31T21:44:42.157

1

Gamma is basically factorial except in continuous form. You want to perform a lookup rather than a computation for the gamma function. You want to use a text file to represent these results. Each line of the file represents the input value multiplied by 1000. I guess for a high enough input value, the file scan could outperform doing the compute.

However, I think you will at minimum want to compute an index into your file. The file can still be arranged as a text file, but you have another step that scans the file, and notes the byte offset for each result line. These offsets get recorded into a binary file, which will serve as your index.

When you run your program, in the beginning, you load the index file into an array, which the index of the array is the floor of the gamma input multiplied by 1000, and the array value at that index is the offset that is recorded in the index file. When you want to compute gamma for a particular number, you multiply the input by 1000, and truncate the result to obtain your array index. You consult this array for the offset, and the next array value for to compute the length of the input. Then, your gamma text file is opened as a binary file. You seek to the offset, and read the length number of bytes to get your digits. You will need to read the next entry too to perform your interpolation.

edited May 31 '13 at 21:44

answered May 31 '13 at 21:29

jxh

64,506
7
96
165

I have already created a text file with the gamma values. There is two columns, the first one is the index, which is directly related to the input of the gamma function. Say gamma(2.4) corresponds to entry 24. The second column is the value of gamma(2.4), which is: 1.24216934450431. This all exist outside of my code and program. I wanted to know if I could simply input a value to my program, have it converted to an entry, say 2.4 is converted to 24 by multiplying by 10, then I fopen the text file and read out the input in line 24. I really thought this would be extremely fast. – Novice C May 31 '13 at 21:36
Is the 1/10 granularity constant in the whole file? – jxh May 31 '13 at 21:39
Yes, I'm actually using 1/1000. And if input is not input at a nice 1/1000 value, I have already written code to just linearly approximate what the value should be given the two nearest 1/1000 values and how close it is to the nearest 1/1000 mark. – Novice C May 31 '13 at 21:41
So in the case of 1/10 say you enter 2.43, then I will take the value of gamma(2.4) and gamma(2.5) and find the slope by linear approximation, and then multiple the slope by .003 and add it to the value of gamma(2.4). – Novice C May 31 '13 at 21:42
@NoviceC: The general outline of my approach is still what you want, I believe. It beats a line by line walk of the text file for sure. – jxh May 31 '13 at 21:45

score 1 · Answer 2 · answered May 31 '13 at 22:27

1

Yes, calculating gamma is slow (I think GSL uses the Lancosz formula, which sums a series). If the number of values for which you need to calculate it is limited (say, you're only doing integers), then certainly a lookup table might help. But if the table is too big for memory, it won't help--it will be even slower than the calculation.

If the table will fit into memory, there's nothing wrong with storing it in a file until you need it and then loading the whole thing into memory at once.

answered May 31 '13 at 22:27

Lee Daniel Crocker

12,296
1
24
47

I think it depends on the performance of the filesystem, and the magnitude of the input value. Although the sum operation seems linear, it is actually quadratic to the number of digits being manipulated. The disk based approach I outlined would have linear performance, with a high constant. – jxh May 31 '13 at 23:02
I would find it hard to believe any disk-based approach would be faster than just doing the calculation until I actually saw numbers. That's always the first rule of optimization: don't guess, measure. If your code really is so slow that disk I/O would improve, then you should certainly use a binary file and direct seeks rather than text. – Lee Daniel Crocker May 31 '13 at 23:13
That is the essence of my solution. On an SSD the seek time is constant. – jxh Jun 01 '13 at 00:14
I have failed to mention that the program is called 1000+ times and each call has four gamma evaluation. Additionally I only need a small range of values from .1 to 2.5, with probably 2400 entries. I could lower the number of entries at the cost of accuracy, which isn't necessarily much. Does this change the attitude everyone has towards making a lookup table? My problem is I am very inexperienced at programming, I started maybe two-ish weeks ago. I don't really know what you mean by loading into memory. Does that mean assigning each line to an index of an array with 2400 entries? – Novice C Jun 01 '13 at 00:59
1

If you're not an experienced programmer, the first and most important thing to learn about optimization is this: don't do it, YET. In other words, get everything working exactly the way it should, make sure you're using the best algorithm, etc. Then, and only then, profile your code to see where the bottlenecks really are rather than trying to guess. Don't get me wrong, I love lookup tables ([ojcardlib](http://github.com/lcrocker/ojcardlib) is full of them). But I had code complete and working without them before putting them in where they were needed. – Lee Daniel Crocker Jun 01 '13 at 01:15
Another thing to consider: do the values passed to the function repeat themselves often during the run? You might consider a hashtable-based cache instead of a lookup table. – Lee Daniel Crocker Jun 01 '13 at 01:17
Finally, the fact that you're only calling the function 4000 times makes it even more unlikely that optimization will help you. I write poker and blackjack simulations that play *billions* of hands in minutes, so lookup tables are my friend, but things that only happen a few thousand times I don't worry about. – Lee Daniel Crocker Jun 01 '13 at 01:20
I concur with @LeeDanielCrocker on getting it to work first before optimizing. Also, your lookup table doesn't seem large enough to warrant a database on disk anyway. – jxh Jun 01 '13 at 05:04

Reading specific line in txt file with C

2 Answers2