Questions tagged [top-n]

298 questions
196
votes
4 answers

Pandas get topmost n records within each group

Suppose I have pandas DataFrame like this: >>> df = pd.DataFrame({'id':[1,1,1,2,2,2,2,3,4],'value':[1,2,3,1,2,3,4,1,1]}) >>> df id value 0 1 1 1 1 2 2 1 3 3 2 1 4 2 2 5 2 3 6 2 4 7 3 1 8 …
Roman Pekar
  • 92,153
  • 25
  • 168
  • 181
159
votes
6 answers

Oracle SELECT TOP 10 records

I have an big problem with an SQL Statement in Oracle. I want to select the TOP 10 Records ordered by STORAGE_DB which aren't in a list from an other select statement. This one works fine for all records: SELECT DISTINCT APP_ID, NAME, …
opHASnoNAME
  • 18,735
  • 24
  • 93
  • 138
59
votes
2 answers

Evaluation & Calculate Top-N Accuracy: Top 1 and Top 5

I have come across few (Machine learning-classification problem) journal papers mentioned about evaluate accuracy with Top-N approach. Data was show that Top 1 accuracy = 42.5%, and Top-5 accuracy = 72.5% in the same training, testing condition. I…
D_9268
  • 829
  • 1
  • 7
  • 16
41
votes
1 answer

How to see top n entries of term-document matrix after tfidf in scikit-learn

I am new to scikit-learn, and I was using TfidfVectorizer to find the tfidf values of terms in a set of documents. I used the following code to obtain the same. vectorizer = TfidfVectorizer(stop_words=u'english',ngram_range=(1,5),lowercase=True) X =…
Amrith Krishna
  • 2,366
  • 3
  • 29
  • 50
21
votes
2 answers

Oracle SQL query: Retrieve latest values per group based on time

I have the following table in an Oracle DB id date quantity 1 2010-01-04 11:00 152 2 2010-01-04 11:00 210 1 2010-01-04 10:45 132 2 2010-01-04 10:45 318 4 2010-01-04 10:45 122 1 2010-01-04 10:30 …
Tom
  • 1,613
  • 4
  • 17
  • 24
16
votes
5 answers

Oracle SQL - How to Retrieve highest 5 values of a column

How do you write a query where only a select number of rows are returned with either the highest or lowest column value. i.e. A report with the 5 highest salaried employees?
Trevor
  • 225
  • 1
  • 2
  • 5
13
votes
2 answers

In MariaDB how do I select the top 10 rows from a table?

I just read online that MariaDB (which SQLZoo uses), is based on MySQL. So I thought that I can use ROW_NUMBER() function However, when I try this function in SQLZoo : SELECT * FROM ( SELECT * FROM route ) TEST7 WHERE ROW_NUMBER() < 10 then I…
Caffeinated
  • 10,270
  • 37
  • 107
  • 197
12
votes
2 answers

Find names of top-n highest-value columns in each pandas dataframe row

I have the following dataframe: id p1 p2 p3 p4 1 0 9 1 4 2 0 2 3 4 3 1 3 10 7 4 1 5 3 1 5 2 3 7 10 I need to reshape the data frame in a way that for each id it will have the top 3 columns with…
chessosapiens
  • 2,436
  • 9
  • 27
  • 49
11
votes
6 answers

Tidyverse: filtering n largest groups in grouped dataframe

I want to filter the n largest groups based on count, and then do some calculations on the filtered dataframe Here is some data Brand <- c("A","B","C","A","A","B","A","A","B","C") Category <- c(1,2,1,1,2,1,2,1,2,1) Clicks <-…
Shinobi_Atobe
  • 1,214
  • 1
  • 9
  • 29
11
votes
2 answers

SUM of only TOP 10 rows

I have a query where I am only selecting the TOP 10 rows, but I have a SUM function in there that is still taking the sum of all the rows (disregarding the TOP 10). How do I get the total of only the top 10 rows? Here is my SUM function : SUM(…
Cfw412
  • 111
  • 1
  • 1
  • 3
10
votes
2 answers

How to find column-index of top-n values within each row of huge dataframe

I have a dataframe of format: (example data) Metric1 Metric2 Metric3 Metric4 Metric5 ID 1 0.5 0.3 0.2 0.8 0.7 2 0.1 0.8 0.5 0.2 0.4 3 0.3 0.1 0.7 0.4 0.2 …
tfcoe
  • 330
  • 1
  • 13
10
votes
4 answers

How to get top n companies from a data frame in decreasing order

I am trying to get the top 'n' companies from a data frame.Here is my code below. data("Forbes2000", package = "HSAUR") sort(Forbes2000$profits,decreasing=TRUE) Now I would like to get the top 50 observations from this sorted vector.
Teja
  • 11,878
  • 29
  • 80
  • 137
8
votes
5 answers

Finding top N columns for each row in data frame

given a data frame with one descriptive column and X numeric columns, for each row I'd like to identify the top N columns with the higher values and save it as rows on a new dataframe. For example, consider the following data frame: df =…
Diego
  • 31,278
  • 18
  • 81
  • 126
8
votes
4 answers

Oracle Analytic function for min value in grouping

I'm new to working with analytic functions. DEPT EMP SALARY ---- ----- ------ 10 MARY 100000 10 JOHN 200000 10 SCOTT 300000 20 BOB 100000 20 BETTY 200000 30 ALAN 100000 30 TOM 200000 30 JEFF 300000 I want the department…
Travis Heseman
  • 10,909
  • 8
  • 34
  • 46
7
votes
1 answer

Is there a way to get the nlargest items per group in dask?

I have the following dataset: location category percent A 5 100.0 B 3 100.0 C 2 50.0 4 13.0 D 2 75.0 3 59.0 4 …
whisperstream
  • 1,596
  • 1
  • 16
  • 25
1
2 3
19 20