I want to implement a program on a Dataset consisting of few columns like the following:
+-----------+---------------+-------------------+-----------------------+
|Item_ID |Product_Name |Manufacturer_Name |Product_Description |
+-----------+---------------+-------------------+-----------------------+
|12345 |Pen |Cello |Ball Pen Soft Nib... |
|12346 |Pencil |Nataraja |Pencil HB Extra D... |
|42345 |Ruler |Nataraja |Scale No.1103 15c... |
|12677 |Sharpener |Nataraja |Pencil Shraperner... |
|12987 |Pen |Reynolds |Dot Pen Extra Gr... |
|44326 |Pen |Reynolds |Gel Pen German T... |
|13456 |Pen |Cello |Dot Pen 0.5mm Nib... |
|19876 |Eraser |Cello |Dust free Eraser ... |
|43246 |Ink Pen |Hero |Ink Pen Smooth Ha... |
+-----------+---------------+-------------------+-----------------------+
and I want to group the Dataset based on the Manufacturer_Name
like shown below
Manufacturer = Cello
+-----------+---------------+-------------------+-----------------------+
|Item_ID |Product_Name |Manufacturer_Name |Product_Description |
+-----------+---------------+-------------------+-----------------------+
|12345 |Pen |Cello |Ball Pen Soft Nib... |
|13456 |Pen |Cello |Dot Pen 0.5mm Nib... |
|19876 |Eraser |Cello |Dust free Eraser ... |
+-----------+---------------+-------------------+-----------------------+
Manufacturer = Nataraja
+-----------+---------------+-------------------+-----------------------+
|Item_ID |Product_Name |Manufacturer_Name |Product_Description |
+-----------+---------------+-------------------+-----------------------+
|12346 |Pencil |Nataraja |Pencil HB Extra D... |
|42345 |Ruler |Nataraja |Scale No.1103 15c... |
|12677 |Sharpener |Nataraja |Pencil Shraperner... |
+-----------+---------------+-------------------+-----------------------+
Manufacturer = Reynolds
+-----------+---------------+-------------------+-----------------------+
|Item_ID |Product_Name |Manufacturer_Name |Product_Description |
+-----------+---------------+-------------------+-----------------------+
|12987 |Pen |Reynolds |Dot Pen Extra Gr... |
|44326 |Pen |Reynolds |Gel Pen German T... |
+-----------+---------------+-------------------+-----------------------+
Manufacturer = Hero
+-----------+---------------+-------------------+-----------------------+
|Item_ID |Product_Name |Manufacturer_Name |Product_Description |
+-----------+---------------+-------------------+-----------------------+
|43246 |Ink Pen |Hero |Ink Pen Smooth Ha... |
+-----------+---------------+-------------------+-----------------------+
I tried using the following code which is not yielding good results. Help me improve this program. Here is the code I have used:
Dataset<Row> countsBy = src.select("Manufacturer_Name").distinct();
List<Row> lsts = countsBy.collectAsList();
for (Row lst : lsts) {
String man = lst.toString();
System.out.println("Records of " + man + " only");
Dataset<Row> mandataset = src.filter("Manufacturer_Name='" + man + "'");
mandataset.show();
}