Questions tagged [external-sorting]

External sorting describes a class of sorting algorithms that can handle massive amounts of data. It is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive).

External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive).

External sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted subfiles are combined into a single larger file.

References and external links

72 questions
45
votes
5 answers

How external merge sort algorithm works?

I'm trying to understand how external merge sort algorithm works (I saw some answers for same question, but didn't find what I need). I'm reading the book "Analysis Of Algorithms" by Jeffrey McConnell and I'm trying to implement the algorithm…
KolKir
  • 859
  • 1
  • 8
  • 16
12
votes
5 answers

Sort 1TB file on machine with 1GB RAM

This questions seems easy, but I am not able to understand the real work behind it. I know people will say, break down into 512 Megs chunks and sort them like using Merge Sort using Map reduce. So here is the actual question i have: Suppose i break…
bruceparker
  • 935
  • 1
  • 13
  • 29
12
votes
4 answers

Efficiently reading a very large text file in C++

I have a very large text file(45GB). Each line of the text file contains two space separated 64bit unsigned integers as shown below. 4624996948753406865 10214715013130414417 4305027007407867230 4569406367070518418 10817905656952544704…
Pattu
  • 2,591
  • 6
  • 29
  • 37
10
votes
1 answer

multi-way merge vs 2-way merge

When we externally merge sort a large file, we split it into small ones, sort those, and then merge them back into a large sorted file. When merging, we can either do many 2-way merge passes, or one multi-way merge. I am wondering which approach is…
KFL
  • 14,338
  • 12
  • 60
  • 80
5
votes
2 answers

What is efficient and stable external sort algorithm implementation (written in c)?

What is efficient and stable external sort algorithm implementation (written in c)?
Mickey Shine
  • 11,559
  • 22
  • 84
  • 142
5
votes
1 answer

Is the formula 2b* (1+⌈ log (dm )⁡〖(nr)〗⌉) for the total of I/O access in merge-sort correct?

I am studying databases from the book Fundamentals of Database Systems, from authors Elmasri and Navathe, 5th edition, and they explain briefly external sort using merge sort in almost at the beginning of chapter 15. They divide the algorithm in two…
5
votes
2 answers

Sorting a huge file in Java

I have a file, which consists of a one row: 1 , 1 2 , 1 3 6 , 4 ,... In this representation, spaces separate the integers and commas. This string is so huge that I can't read it with RandomAccessFile.readLine() (almost 4 Gb needed). So that I…
Dmitry
  • 1,097
  • 1
  • 8
  • 11
4
votes
1 answer

Merging k sorted arrays - Priority Queue vs Traditional Merge-sort merge, when to use which?

Assuming we are given k sorted arrays (each of size n), in which case is using a priority heap better than a traditional merge (similar to the one used in merge-sort) and vice-versa? Priority Queue Approach: In this approach, we have a min heap of…
4
votes
2 answers

merging N sorted files using K way merge

There is decent literature about merging sorted files or say merging K sorted files. They all work on the theory that first element of each file is put in a Heap, then until the heap is empty poll that element, get another from the file from where…
4
votes
9 answers

Is there an easy way to sort an array of char*'s ? C++

I've got an array of char* in a file. The company I work for stores data in flat files.. Sometimes the data is sorted, but sometimes it's not. I'd like to sort the data in the files. Now I could write the code to do this, from scratch. Is there…
baash05
  • 4,108
  • 9
  • 54
  • 90
4
votes
3 answers

Why do we need external sort?

The main reason for external sort is that the data may be larger than the main memory we have.However,we are using virtual memory now, and the virtual memory will take care of swapping between main memory and disk.Why do we need to have external…
silverwen
  • 177
  • 2
  • 11
3
votes
1 answer

How to sort LevelDB by value

I'm using leveldb to store records (key-value), where the key is a 64-bit hash and the value is a double. To make an analogy: think of the 64-bit hash is a unique ID of a customer and the double as an account balance (i.e. how much money they have…
Kiril
  • 37,748
  • 29
  • 161
  • 218
3
votes
0 answers

Correct use of ForkJoinPool submit and join in Java

I've worked recently on an implementation of a external merge sort algorithm (External Sorting) and my implementation needed to use a multi-threaded approach. I tried to use ForkJoinPool instead of using the older implementations in Java such as…
Assafs
  • 3,114
  • 4
  • 25
  • 33
3
votes
2 answers

External Sort between two files

I'm trying to get my head around an external sort for a requirement I have - and I can't. The requirement is to externally sort a file of an arbitrary size but using just the original file and one other (call them fileA and fileB) - two files…
keldar
  • 5,548
  • 8
  • 46
  • 68
3
votes
1 answer

stxxl sorting of very large file (ubuntu)

I am trying to sort a large file with about billion records (each containing four integers). The size of the file would shoot up beyond 50GB. I am testing my code with 400 million records (about 6 GB file). My disk configuration looks like this:…
Chirag Jain
  • 133
  • 1
  • 8
1
2 3 4 5