I wrote an application which gets all emails from an inbox, filters the emails which contain a specific string and then puts those emails in an ArrayList.
After the emails are put in the List, I am doing some stuff with the subject and content of said emails. This works all fine for e-mails without an attachment. But when I started to use e-mails with attachments it all didn't work as expected anymore.
This is my code:
public void getInhoud(Message msg) throws IOException {
try {
cont = msg.getContent();
} catch (MessagingException ex) {
Logger.getLogger(ReadMailNew.class.getName()).log(Level.SEVERE, null, ex);
}
if (cont instanceof String) {
String body = (String) cont;
} else if (cont instanceof Multipart) {
try {
Multipart mp = (Multipart) msg.getContent();
int mp_count = mp.getCount();
for (int b = 0; b < 1; b++) {
dumpPart(mp.getBodyPart(b));
}
} catch (Exception ex) {
System.out.println("Exception arise at get Content");
ex.printStackTrace();
}
}
}
public void dumpPart(Part p) throws Exception {
email = null;
String contentType = p.getContentType();
System.out.println("dumpPart" + contentType);
InputStream is = p.getInputStream();
if (!(is instanceof BufferedInputStream)) {
is = new BufferedInputStream(is);
}
int c;
final StringWriter sw = new StringWriter();
while ((c = is.read()) != -1) {
sw.write(c);
}
if (!sw.toString().contains("<div>")) {
mpMessage = sw.toString();
getReferentie(mpMessage);
}
}
The content from the e-mail is stored in a String.
This code works all fine when I try to read mails without attachment. But if I use an e-mail with attachment the String also contains HTML code and even the attachment coding. Eventually I want to store the attachment and the content of an e-mail, but my first priority is to get just the text without any HTML or attachment coding.
Now I tried an different approach to handle the different parts:
public void getInhoud(Message msg) throws IOException {
try {
Object contt = msg.getContent();
if (contt instanceof Multipart) {
System.out.println("Met attachment");
handleMultipart((Multipart) contt);
} else {
handlePart(msg);
System.out.println("Zonder attachment");
}
} catch (MessagingException ex) {
ex.printStackTrace();
}
}
public static void handleMultipart(Multipart multipart)
throws MessagingException, IOException {
for (int i = 0, n = multipart.getCount(); i < n; i++) {
handlePart(multipart.getBodyPart(i));
System.out.println("Count "+n);
}
}
public static void handlePart(Part part)
throws MessagingException, IOException {
String disposition = part.getDisposition();
String contentType = part.getContentType();
if (disposition == null) { // When just body
System.out.println("Null: " + contentType);
// Check if plain
if ((contentType.length() >= 10)
&& (contentType.toLowerCase().substring(
0, 10).equals("text/plain"))) {
part.writeTo(System.out);
} else if ((contentType.length() >= 9)
&& (contentType.toLowerCase().substring(
0, 9).equals("text/html"))) {
part.writeTo(System.out);
} else if ((contentType.length() >= 9)
&& (contentType.toLowerCase().substring(
0, 9).equals("text/html"))) {
System.out.println("Ook html gevonden");
part.writeTo(System.out);
}else{
System.out.println("Other body: " + contentType);
part.writeTo(System.out);
}
} else if (disposition.equalsIgnoreCase(Part.ATTACHMENT)) {
System.out.println("Attachment: " + part.getFileName()
+ " : " + contentType);
} else if (disposition.equalsIgnoreCase(Part.INLINE)) {
System.out.println("Inline: "
+ part.getFileName()
+ " : " + contentType);
} else {
System.out.println("Other: " + disposition);
}
}
This is what is returned from the System.out.printlns
Null: multipart/alternative; boundary=047d7b6220720b499504ce3786d7
Other body: multipart/alternative; boundary=047d7b6220720b499504ce3786d7
Content-Type: multipart/alternative; boundary="047d7b6220720b499504ce3786d7"
--047d7b6220720b499504ce3786d7
Content-Type: text/plain; charset="ISO-8859-1"
'Text of the message here in normal text'
--047d7b6220720b499504ce3786d7
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
'HTML code of the message'
This approach returns the normal text of the e-mail but also the HTML coding of the mail. I really don't understand why this happens, I've googled it but it seems like there is no one else with this problem.
Any help is appreciated,
Thanks!