A long time ago I did some hobby programming for Z80 and 68000 assemblers so I understand the basics but I am a novice in x86/x64 assembler. I am trying to find the ** fastest ** code to do the following (the critical part needing optimal speed is steps 2-4 only).
I do not need help with steps 1 or 5 at all, they are shown for context only. I am not looking for a complete code writing, but would appreciate any hints on optimal instructions & algorithms for this platform. There are many ways to write a routine like this but often the obvious approach is not the optimal one. I would be fine if someone said something like "try using the XYZ instruction". Also, as I mention below, the use of an array in assembler might not be the fastest way to go so any suggestions on how to optimally structure the data for speed are also part of the answer I am looking for. (Can x64 assembler even handle a 4GB array with an index?)
- Step 1. Read a "string" of longword elements from a file. Each element contains externally supplied data that can be treated as a signed 31-bit number.
The file is usually pretty small (less than 100K) but may be as long a 1GB of elements at times. It is not necessary to read all elements into memory at the same time as long as the individual elements can be accessed/modified directly. Instinctively this sounds like using a 4GB array would be fastest but I am a novice with x64 assembler and not sure if the overhead for an array would help or hurt the speed.
Step 2. Increment the element by 1.
Step 3. Check the sign flag (see if the increment set the high bit).
- If set then branch to a routine that will modify the element then continue on to step 4..
- If not set then jump to Step 5 (exit).
The time spent in the subroutine is outside the scope of this question, you can just use an immediate return instruction for now. The subroutine will however need to know the index of the element.
Step 4. Move on to the next element and repeat step 2.
Step 5. Close the file, saving any modified data.
Also, two related questions:
Would the code run faster on a 32-bit system since the elements are 32 bits?
How would the code be different if Step 2 was an increment other than 1?
RESPONSE TO THE "TOO BROAD" CLOSURE FLAG:
How is this question "Too Broad" even though it precisely fits ALL FOUR of the on-topic descriptions at top of the community guidelines on the SO Help Page:
- a specific programming problem -- (how to optimize a special kind of array processing)
- a software algorithm -- (the aforementioned array algorithm, specifically Steps 2-4 above)
- software tools commonly used by programmers -- (x64 assembler is very commonly used by programmers)
- a practical, answerable problem that is unique to software development -- (since several answers/suggestions were provided by @Jester, @PeterCordes, and @Ped7g I would say it is self-evident that this is the case)