Questions tagged [data-partitioning]

Data partitioning deals with the dividing of a collection of data into smaller collections of data for the purpose of faster processing, easier statistics gathering and smaller memory/persistence footprint.

277 questions
67
votes
14 answers

python equivalent of filter() getting two output lists (i.e. partition of a list)

Let's say I have a list, and a filtering function. Using something like >>> filter(lambda x: x > 10, [1,4,12,7,42]) [12, 42] I can get the elements matching the criterion. Is there a function I could use that would output two lists, one of elements…
F'x
  • 11,555
  • 6
  • 67
  • 120
57
votes
3 answers

Difference between df.repartition and DataFrameWriter partitionBy?

What is the difference between DataFrame repartition() and DataFrameWriter partitionBy() methods? I hope both are used to "partition data based on dataframe column"? Or is there any difference?
Shankar
  • 6,796
  • 20
  • 75
  • 137
42
votes
11 answers

C# - elegant way of partitioning a list?

I'd like to partition a list into a list of lists, by specifying the number of elements in each partition. For instance, suppose I have the list {1, 2, ... 11}, and would like to partition it such that each set has 4 elements, with the last set…
David Hodgson
  • 9,544
  • 17
  • 53
  • 77
35
votes
6 answers

What is the best way to divide a collection into 2 different collections?

I have a Set of numbers : Set mySet = [ 1,2,3,4,5,6,7,8,9] I want to divide it into 2 sets of odds and evens. My way was to use filter twice : Set set1 = mySet.stream().filter(y -> y % 2 ==…
user1386966
  • 2,962
  • 11
  • 33
  • 63
16
votes
7 answers

QuickSort and Hoare Partition

I have a hard time translating QuickSort with Hoare partitioning into C code, and can't find out why. The code I'm using is shown below: void QuickSort(int a[],int start,int end) { int q=HoarePartition(a,start,end); if (end<=start) return; …
Ofek Ron
  • 7,601
  • 12
  • 47
  • 91
16
votes
5 answers

Create grouping variable for consecutive sequences and split vector

I have a vector, such as c(1, 3, 4, 5, 9, 10, 17, 29, 30) and I would like to group together the 'neighboring' elements that form a regular, consecutive sequence, i.e. an increase by 1, in a ragged vector resulting in: L1: 1 L2: 3,4,5 L3: 9,10 L4:…
letsrock
  • 161
  • 1
  • 3
15
votes
2 answers

Querying Windows Azure Table Storage with multiple query criteria

I'm trying to query a table in Windows Azure storage and was initially using the TableQuery.CombineFilters in the TableQuery().Where function as follows: TableQuery.CombineFilters( TableQuery.GenerateFilterCondition("PartitionKey",…
Captain John
  • 1,689
  • 2
  • 15
  • 28
13
votes
5 answers

How to sort an integer array into negative, zero, positive part without changing relative position?

Give an O(n) algorithm which takes as input an array S, then divides S into three sets: negatives, zeros, and positives. Show how to implement this in place, that is, without allocating new memory. And you have to keep the number's relative…
Gin
  • 1,715
  • 3
  • 12
  • 17
11
votes
4 answers

How to write SQL query that selects distinct pair values for specific criteria?

I'm having trouble formulating a query for the following problem: For pair values that have a certain score, how do you group them in way that will only return distinct pair values with the best respective scores? For example, lets say I have a…
10
votes
1 answer

Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?

I have a large JSON file with I'm guessing 4 million objects. Each top level has a few levels nested inside. I want to split that into multiple files of 10000 top level objects each (retaining the structure inside each). jq should be able to do…
Chaz
  • 667
  • 1
  • 8
  • 16
10
votes
2 answers

partitioning an float array into similar segments (clustering)

I have an array of floats like this: [1.91, 2.87, 3.61, 10.91, 11.91, 12.82, 100.73, 100.71, 101.89, 200] Now, I want to partition the array like this: [[1.91, 2.87, 3.61] , [10.91, 11.91, 12.82] , [100.73, 100.71, 101.89] , [200]] // [200] will…
alessandro
  • 1,567
  • 10
  • 30
  • 51
9
votes
4 answers

python: Generating integer partitions

I need to generate all the partitions of a given integer. I found this algorithm by Jerome Kelleher for which it is stated to be the most efficient one: def accelAsc(n): a = [0 for i in range(n + 1)] k = 1 a[0] = 0 y = n - 1 …
etuardu
  • 4,301
  • 3
  • 41
  • 53
8
votes
2 answers

Hashing VS Indexing

Both hashing and indexing are use to partition data on some pre- defined formula. But I am unable to understand the key difference between the two. As in hashing we are dividing the data on the basis of some key value pair, similarly in Indexing…
coolDude
  • 279
  • 1
  • 5
  • 16
7
votes
1 answer

Understanding a median selection algorithm?

I'm currently learning algorithms in my spare time but have the following question while studying chapter 3 select() algorithms. I understand that I can use the select() algorithm to find the median number (n/2 th smallest number) if I was using a…
anon
7
votes
2 answers

Enumerate all k-partitions of 1d array with N elements?

This seems like a simple request, but google is not my friend because "partition" scores a bunch of hits in database and filesystem space. I need to enumerate all partitions of an array of N values (N is constant) into k sub-arrays. The sub-arrays…
Austin Hastings
  • 587
  • 3
  • 13
1
2 3
18 19