3

I am doing a simple Data Frame that can read and write CSV, and includes sorting function to sort by column. How can I sort the correct column by inputting the Column Header, and exclude the Column Header row from the sorting?

This is a sample data of the CSV file:

Name,Age,Salary
Lim,20,2000
Tan,20,3000
Mah,19,2500
Roger,10,4000

I have declared my 2D List, the data will looks like:

List<List<String>> COLUMNDATA = new ArrayList();
COLUMNDATA = [[Name, Age, Salary], [Lim, 20, 2000], [Tan, 20, 3000], [Mah, 19, 2500], [Roger, 10, 4000]]

I want to sort the correct column by passing in the Column Header, and the Column Header row is not included in the sorting. eg:

COLUMNDATA.sort(“Age”)

So that it will become this:

Name,Age,Salary
Roger,10,4000
Mah,19,2500
Lim,20,2000
Tan,20,3000

I have used Comparator and Collections.sort, and I'm stuck now. How can I achieve the function I want?

final Comparator<List<String>> comparator = new Comparator<List<String>>() {
    @Override
    public int compare(List<String> object1, List<String> object2) {
        return object1.get(1).compareTo(object2.get(1));
    }
};

Collections.sort(COLUMNDATA, comparator);
for (List<String> list : COLUMNDATA) {
    System.out.println(list);
}
Vaporeon
  • 33
  • 3

5 Answers5

2

Here is how to do it as you required. Once the comparator is defined, just sort on the sublist starting with list 1, skipping over the headings. Since it is a view of the original list it still sorts the required items.

First make a field map on which field to sort. You can make this case insensitive if you want. For this example, case is important.

static Map<String, Integer> sortingFields = new HashMap<>();
static {
    List<String> columns = List.of("Name", "Age", "Salary");
    for (int i = 0; i < columns.size(); i++) {
        sortingFields.put(columns.get(i), i);
    }
}

create the list of lists.

List<List<String>> data = new ArrayList<>();
data.add(new ArrayList<>(List.of("Name" ,"Age", "Salary")));
data.add(new ArrayList<>(List.of("Lim", "20", "4000")));
data.add(new ArrayList<>(List.of("Tan",   "20", "3000")));
data.add(new ArrayList<>(List.of("Mah",   "19", "2500")));
data.add(new ArrayList<>(List.of("Roger", "10", "3500")));

Now invoke the sort and print

sort("Age", data);
data.forEach(System.out::println);

Prints

[Name, Age, Salary]
[Roger, 10, 3500]
[Mah, 19, 2500]
[Lim, 20, 4000]
[Tan, 20, 3000]

Here is the sort method.

public static void sort(String Column, List<List<String>> data) {
        // use the column string to select the column number to sort.
        Comparator<List<String>> comp =
                (a, b) -> a.get(sortingFields.get(column))
                        .compareTo(b.get(sortingFields.get(column)));

  data.subList(1,data.size()).sort(comp);
}


And here is how I would recommend you organize your data and do the sorting.

First create a class as shown. Then populate the list with instances of the class using the data. Then simply specify the getter to sort on. You can add as many additional fields and their getters as required.

The reason is that it allows mixed types to be stored in the same object and still be sorted. If you sort on a String number it will sort lexcally rather than numerically. This will be a problem unless you convert to an integer (to see this, change 4000 to 400 and sort on salary above). But if you want to sort on the name, you would need a different comparator since converting a non-int to an int will throw an exception. This could all be mitigated to some degree but it isn't as straight forward as creating a class.

By simply changing the method reference to the desired getter you can sort the List on any field. If no getter is present, and the field is public (not recommended) you can use a lambda.

public class SortingByColumn {
    
    public static void main(String[] args) {
        
        List<Person> data = new ArrayList<>();
        data.add(new Person("Lim", 20, 2000));
        data.add(new Person("Tan", 20, 3000));
        data.add(new Person("Mah", 19, 2500));
        data.add(new Person("Roger", 10, 4000));
        
        List<Person> sorted = data.stream()
                .sorted(Comparator.comparing(Person::getAge))
                .collect(Collectors.toList());
        System.out.printf("%10s  %10s  %10s%n", "Name","Age","Salary");
        sorted.forEach(System.out::println);
    }
    
    static class Person {
        private String name;
        private int age;
        private int salary;
        
        public Person(String name, int age, int salary) {
            this.name = name;
            this.age = age;
            this.salary = salary;
        }
        
        public String getName() {
            return name;
        }
        
        public int getAge() {
            return age;
        }
        
        public int getSalary() {
            return salary;
        }
        
        @Override
        public String toString() {
            return String.format("%10s  %10s  %10s", name, age,
                    salary);
        }
    }
}

Prints

      Name         Age      Salary
     Roger          10        4000
       Mah          19        2500
       Lim          20        2000
       Tan          20        3000
WJS
  • 22,083
  • 3
  • 14
  • 32
0

You've done everything right (besides the variable name which shouldn't be all uppercase).

Before sorting just delete the first element. Then sort, and add the header back to the list:

List<String> header = columnData.get(0);
columnData.remove(0);
columnData.sort(getComparator("Age", header));
columnData.add(0, header);

How to pass column number to the comparator:

private Comparator<List<String>> getComparator(String column,
                                               List<String> header) {
    int index = header.indexOf(column);
    return new Comparator<List<String>>() {
        @Override
        public int compare(List<String> object1, List<String> object2) {
            return object1.get(index).compareTo(object2.get(index));
        }
    };
}
Community
  • 1
  • 1
TimonNetherlands
  • 790
  • 1
  • 4
  • 5
0

You can make a part of this list sorted starting from the second row, and then collect a new list from it as follows:

public static void main(String[] args) {
    List<List<String>> columnData = List.of(
            List.of("Name", "Age", "Salary"),
            List.of("Lim", "20", "2000"),
            List.of("Tan", "20", "3000"),
            List.of("Mah", "19", "2500"),
            List.of("Roger", "10", "4000"));

    List<List<String>> sortedData1 = sortByColumn(columnData, "Age");
    List<List<String>> sortedData2 = sortByColumn(columnData, 2);
}
public static List<List<String>> sortByColumn(List<List<String>> list,
                                              String name) {
    // finding index of column by name
    int index = IntStream.range(0, list.get(0).size())
            .filter(i -> list.get(0).get(i).equals(name))
            .findFirst()
            .getAsInt();
    // sorting by index
    return sortByColumn(list, index);
}
public static List<List<String>> sortByColumn(List<List<String>> list,
                                              int index) {
    // preparing a new sorted list
    List<List<String>> sorted = new ArrayList<>(list.size());
    // header row
    sorted.add(list.get(0));
    // other rows, sorting by a specific column
    sorted.addAll(list.stream().skip(1)
            .sorted(Comparator.comparing(row -> row.get(index)))
            .collect(Collectors.toList()));
    return sorted;
}
sortedData1 sortedData2
[Name, Age, Salary]
[Roger, 10, 4000]
[Mah, 19, 2500]
[Lim, 20, 2000]
[Tan, 20, 3000]
[Name, Age, Salary]
[Lim, 20, 2000]
[Mah, 19, 2500]
[Tan, 20, 3000]
[Roger, 10, 4000]

In this case, it is more useful to have a 2D-array rather than a 2D-list, so that you can sort a specific range from index to index using the Arrays.sort(T[],int,int,Comparator) method:

List<List<String>> columnData = List.of(
        List.of("Name", "Age", "Salary"),
        List.of("Lim", "20", "2000"),
        List.of("Tan", "20", "3000"),
        List.of("Mah", "19", "2500"),
        List.of("Roger", "10", "4000"));

String[][] arr = columnData.stream()
        .map(list -> list.toArray(String[]::new))
        .toArray(String[][]::new);

Arrays.sort(arr, 1, arr.length, Comparator.comparing(row -> row[1]));
Original list Sorted array
[Name, Age, Salary]
[Lim, 20, 2000]
[Tan, 20, 3000]
[Mah, 19, 2500]
[Roger, 10, 4000]
[Name, Age, Salary]
[Roger, 10, 4000]
[Mah, 19, 2500]
[Lim, 20, 2000]
[Tan, 20, 3000]
0

I offer not to use List, I think that use class with relative names is much more clear. In this clas you can define required comparators.

public class Foo {
    public static void main(String... args) throws IOException {
        List<DataLine> data =
                readFile(Path.of("e:/data.csv"), StandardCharsets.UTF_8);
        List<DataLine> sortedByName = DataLine.Field.NAME.sort(data);
        List<DataLine> sortedByAge = DataLine.Field.AGE.sort(data);
        List<DataLine> sortedBySalary = DataLine.Field.SALARY.sort(data);
    }

    public static List<DataLine> readFile(Path path, Charset charset)
            throws IOException {
        try (Scanner scan = new Scanner(path, charset)) {
            scan.useDelimiter("[,\n]");
            scan.nextLine();    // skip header

            List<DataLine> data = new ArrayList<>();

            while (scan.hasNext()) {
                String name = scan.next();
                int age = scan.nextInt();
                int salary = scan.nextInt();
                data.add(new DataLine(name, age, salary));
            }

            return data;
        }
    }

    public static final class DataLine {

        enum Field {
            NAME(Comparator.comparing(one -> one.name)),
            AGE(Comparator.comparingInt(one -> one.age)),
            SALARY(Comparator.comparingInt(one -> one.salary));

            private final Comparator<DataLine> comparator;

            Field(Comparator<DataLine> comparator) {
                this.comparator = comparator;
            }

            public final List<DataLine> sort(List<DataLine> data) {
                return data.stream()
                        .sorted(comparator)
                        .collect(Collectors.toList());
            }
        }

        private final String name;
        private final int age;
        private final int salary;

        public DataLine(String name, int age, int salary) {
            this.name = name;
            this.age = age;
            this.salary = salary;
        }
    }
}
Community
  • 1
  • 1
oleg.cherednik
  • 12,764
  • 2
  • 17
  • 25
0

You can use List.subList(int,int) method to get the portion of this list which is backed by this list between the specified indices, and then use Collections.sort(List,Comparator) method. This code should work on Java 7:

List<List<String>> columnData = Arrays.asList(
        Arrays.asList("Name", "Age", "Salary"),
        Arrays.asList("Lim", "20", "2000"),
        Arrays.asList("Tan", "20", "3000"),
        Arrays.asList("Mah", "19", "2500"),
        Arrays.asList("Roger", "10", "4000"));
Collections.sort(columnData.subList(1, columnData.size()),
        new Comparator<List<String>>() {
            @Override
            public int compare(List<String> o1, List<String> o2) {
                return o1.get(1).compareTo(o2.get(1));
            }
        });
Before sorting After sorting
[Name, Age, Salary]
[Lim, 20, 2000]
[Tan, 20, 3000]
[Mah, 19, 2500]
[Roger, 10, 4000]
[Name, Age, Salary]
[Roger, 10, 4000]
[Mah, 19, 2500]
[Lim, 20, 2000]
[Tan, 20, 3000]

See also:
Sort List<Map<String,Object>> based on value
How do I rotate a matrix 90 degrees counterclockwise in java?