10

I am using elasticsearch-dsl python library to connect to elasticsearch and do aggregations.

I am following code

search.aggs.bucket('per_date', 'terms', field='date')\
        .bucket('response_time_percentile', 'percentiles', field='total_time',
                percents=percentiles, hdr={"number_of_significant_value_digits": 1})
response = search.execute()

This works fine but returns only 10 results in response.aggregations.per_ts.buckets

I want all the results

I have tried one solution with size=0 as mentioned in this question

search.aggs.bucket('per_ts', 'terms', field='ts', size=0)\
        .bucket('response_time_percentile', 'percentiles', field='total_time',
                percents=percentiles, hdr={"number_of_significant_value_digits": 1})

response = search.execute()

But this results in error

TransportError(400, u'parsing_exception', u'[terms] failed to parse field [size]')
petezurich
  • 6,779
  • 8
  • 29
  • 46
hard coder
  • 4,256
  • 4
  • 26
  • 52

3 Answers3

8

I had the same issue. I finally found this solution:

s = Search(using=client, index="jokes").query("match", jks_content=keywords).extra(size=0)
a = A('terms', field='jks_title.keyword', size=999999)
s.aggs.bucket('by_title', a)
response = s.execute()

After 2.x, size=0 for all bucket results won't work anymore, please refer to this thread. Here in my example I just set the size equal 999999. You can pick a large number according to your case.

It is recommended to explicitly set reasonable value for size a number between 1 to 2147483647.

Hope this helps.

petezurich
  • 6,779
  • 8
  • 29
  • 46
Soony
  • 873
  • 9
  • 18
1

This is a bit older but I ran into the same issue. What I wanted was basically an iterator that i could use to go through all aggregations that i got back (i also have a lot of unique results).

The best thing i found is to create a python generator like this

def scan_aggregation_results():
    i=0
    partitions=20
    while i < partitions:
        s = Search(using=elastic, index='my_index').extra(size=0)
        agg = A('terms', field='my_field.keyword', size=999999,
                include={"partition": i, "num_partitions": partitions})
        s.aggs.bucket('my_agg', agg)
        result = s.execute()

        for item in result.aggregations.my_agg.buckets:
            yield my_field.key
        i = i + 1

# in other parts of the code just do
for item in scan_aggregation_results():
    print(item)  # or do whatever you want with it

The magic here is that elastic will automatically partition the number of results by 20, ie the number of partitions i define. I just have to set the size to something large enough to hold a single partition, in this case the result can be up to 20 million items large (or 20*999999). If you have much less items, like me, to return (like 20000) then you will just have 1000 results per query in your bucket, regardless that you defined a much larger size.

Using the generator construct as outlined above you can then even get rid of that and create your own scanner so to speak, iterating over all results individually, just what i wanted.

Peter Kunszt
  • 91
  • 1
  • 4
-1

You should read the documentation.

So in your case, this should be like this :

search.aggs.bucket('per_date', 'terms', field='date')\
            .bucket('response_time_percentile', 'percentiles', field='total_time',
                    percents=percentiles, hdr={"number_of_significant_value_digits": 1})[0:50]
response = search.execute()
petezurich
  • 6,779
  • 8
  • 29
  • 46
fmdaboville
  • 442
  • 6
  • 19
  • I have tired this but this doesn't work error : percents=percentiles, hdr={"number_of_significant_value_digits": 1})[0:20] TypeError: 'Percentiles' object has no attribute '__getitem__' – hard coder Nov 10 '17 at 10:59
  • Do you init your search like this : `search = Search()` ? – fmdaboville Nov 10 '17 at 11:47