1

I want to implement a program on a Dataset consisting of few columns like the following:

+-----------+---------------+-------------------+-----------------------+
|Item_ID    |Product_Name   |Manufacturer_Name  |Product_Description    |
+-----------+---------------+-------------------+-----------------------+
|12345      |Pen            |Cello              |Ball Pen Soft Nib...   |
|12346      |Pencil         |Nataraja           |Pencil HB Extra D...   |
|42345      |Ruler          |Nataraja           |Scale No.1103 15c...   |
|12677      |Sharpener      |Nataraja           |Pencil Shraperner...   |
|12987      |Pen            |Reynolds           |Dot Pen Extra Gr...    |
|44326      |Pen            |Reynolds           |Gel Pen German T...    |
|13456      |Pen            |Cello              |Dot Pen 0.5mm Nib...   |
|19876      |Eraser         |Cello              |Dust free Eraser ...   |
|43246      |Ink Pen        |Hero               |Ink Pen Smooth Ha...   |
+-----------+---------------+-------------------+-----------------------+

and I want to group the Dataset based on the Manufacturer_Name like shown below

Manufacturer = Cello
+-----------+---------------+-------------------+-----------------------+
|Item_ID    |Product_Name   |Manufacturer_Name  |Product_Description    |
+-----------+---------------+-------------------+-----------------------+
|12345      |Pen            |Cello              |Ball Pen Soft Nib...   |
|13456      |Pen            |Cello              |Dot Pen 0.5mm Nib...   |
|19876      |Eraser         |Cello              |Dust free Eraser ...   |
+-----------+---------------+-------------------+-----------------------+

Manufacturer = Nataraja
+-----------+---------------+-------------------+-----------------------+
|Item_ID    |Product_Name   |Manufacturer_Name  |Product_Description    |
+-----------+---------------+-------------------+-----------------------+
|12346      |Pencil         |Nataraja           |Pencil HB Extra D...   |
|42345      |Ruler          |Nataraja           |Scale No.1103 15c...   |
|12677      |Sharpener      |Nataraja           |Pencil Shraperner...   |
+-----------+---------------+-------------------+-----------------------+

Manufacturer = Reynolds
+-----------+---------------+-------------------+-----------------------+
|Item_ID    |Product_Name   |Manufacturer_Name  |Product_Description    |
+-----------+---------------+-------------------+-----------------------+
|12987      |Pen            |Reynolds           |Dot Pen Extra Gr...    |
|44326      |Pen            |Reynolds           |Gel Pen German T...    |
+-----------+---------------+-------------------+-----------------------+

Manufacturer = Hero
+-----------+---------------+-------------------+-----------------------+
|Item_ID    |Product_Name   |Manufacturer_Name  |Product_Description    |
+-----------+---------------+-------------------+-----------------------+
|43246      |Ink Pen        |Hero               |Ink Pen Smooth Ha...   |
+-----------+---------------+-------------------+-----------------------+

I tried using the following code which is not yielding good results. Help me improve this program. Here is the code I have used:

Dataset<Row> countsBy = src.select("Manufacturer_Name").distinct();
List<Row> lsts = countsBy.collectAsList();
for (Row lst : lsts) {
    String man = lst.toString();
    System.out.println("Records of " + man + " only");
    Dataset<Row> mandataset = src.filter("Manufacturer_Name='" + man + "'");
    mandataset.show();
}
khelwood
  • 46,621
  • 12
  • 59
  • 83
Abhishek Vk
  • 97
  • 1
  • 2
  • 11
  • Can you be more specific about the bad results ? Was it slowness or mistakes ? – Augustin Bocken Mar 21 '17 at 15:37
  • I want the subsets of the dataset to be usable outside the iterating section. Since it is declared locally and it is being overwritten every iteration i can not use all the subsets except the subset generated during the last iteration. @AugustinBocken – Abhishek Vk Mar 21 '17 at 18:12

1 Answers1

0

Maybe you could try to make a map of Datasets, with the key a string (the Manufacturer_Name) and for each iteration, you check the Manufacturer_Name, then you check if it's already in the map(you create it if needed) and finally, you add your row in the good Dataset.

You'll have something like that :

Map<string,ArrayList<ShopItem>> dic = new HashMap<string,ArrayList<ShopItem>>();
for(/*...*/)
{
  string Manufacturer_Name = //you get the name
  if(/*the Manufacturer_Name is not in dic*/)
  {
    dic.put(Manufacturer_Name,new ArrayList<ShopItem>());
  }
  dic.get(Manufacturer_Name).Add(/*what you want to add*/);
}

You then need a second loop, but only for printing the data.

I hope it will solve your problem !

EDIT : remplaced Dictionnary by Map (sorry) and providing link

How do you create a dictionary in Java?

EDIT : changed code to match new idea

Community
  • 1
  • 1
Augustin Bocken
  • 369
  • 3
  • 17
  • Just to clarify few things before implementation. Can a Dictionary be instantiated? and is the methods add and AddValue available in java? – Abhishek Vk Mar 23 '17 at 10:41
  • Maybe only in a library... You're right, I need to check more, or you could implement your own dictionnary ! It's a list of Entry, each entry has an object Key and object Value, that should be enough for your use... But I'll have a look and edit my answer – Augustin Bocken Mar 23 '17 at 10:45
  • Here, I corrected the answer so it's more correct :D – Augustin Bocken Mar 23 '17 at 10:50
  • ` Map> dic = new HashMap>(); for(Row row : srcrows) { String Manufacturer_Name = row.getString(3); if(!dic.values().equals(Manufacturer_Name)) { dic.put(Manufacturer_Name,new Dataset()); } dic.get(Manufacturer_Name).Add(row); } ` Is this correct? – Abhishek Vk Mar 23 '17 at 11:32
  • **Error one : dic.put(Manufacturer_Name,new Dataset()); It says "The constructor Dataset() is undefined" Error two : dic.get(Manufacturer_Name).Add(row); It says "The method Add(Row) is undefined for the type Dataset"** I am getting few errors would you please help me solve it? – Abhishek Vk Mar 23 '17 at 11:38
  • Maybe use your own object : a `Item` who would contains 3 properties : *Item_ID*, *Product_Name * and *Product_Description*, then you'll have `HashMap>` and you'll be able to access all Items from a Manufacturer by using `map.get(Manufacturer_Name)` and add new ones with `map.get(Manufacturer_Name).add(itemToAdd)` – Augustin Bocken Mar 23 '17 at 12:23
  • Thank you @AugustinBocken I will implement your thoughts and get back to you. If there is any difficulty I will look forward for your help. – Abhishek Vk Mar 23 '17 at 17:32
  • So, how did your implementation go ? Do you need any more help ? – Augustin Bocken Mar 28 '17 at 13:15
  • Thanks @Augustin It helped me a lot – Abhishek Vk Apr 04 '17 at 16:01