5

im trying to build a map reduce job.

it runs to completion but present weird data at the end.

when i try to debug it using system.out.println("debug data") it doesnt show on screen.

using the java API to produce an external log file, trying to print to the screen using log.severe("log data") or using log4j logger method log.info(log data) wont work either/

nothing works the only time i see my debug messages is when there is an exception in the map reduce job.

how can it be fixed so i can see my debug messages either on a file or on the screen?

Jim Garrison
  • 81,234
  • 19
  • 144
  • 183
Gabriel H
  • 1,334
  • 2
  • 12
  • 31
  • 2
    Using println is not debugging. That's tracing. – maba Oct 02 '12 at 18:23
  • 1
    Maybe you don't have logging turned up enough (http://stackoverflow.com/questions/4821134/hadoop-enable-logging)? Even System.out.println() can be redirected – Jason Sperske Oct 02 '12 at 18:24
  • Does this answer your question? [Where does hadoop mapreduce framework send my System.out.print() statements ? (stdout)](https://stackoverflow.com/questions/3207238/where-does-hadoop-mapreduce-framework-send-my-system-out-print-statements-s) – Vega Jul 04 '20 at 09:50
  • [This answer](https://stackoverflow.com/a/5785472/16959) may help. Hadoop captures System.out for it's own job tracking logging system. – Jason Sperske Oct 02 '12 at 18:27
  • thx that seems to be adequate hopefully now ill manage to figure the problem out – Gabriel H Oct 02 '12 at 18:39

2 Answers2

1

Since you are processing big data, the size of your tracing messages can be huge, so it can cause a problem. It's useful to consider alternatives to "system.out.println" style logging:

The best thing about Counters and MultipleOutputs - you can programmably access them, in case of MultipleOutputs you can even run map/reduce task to extract some statistics from logs.

An another alternative to debugging on production environment is unit-testing, MiniMRCluster will help you to test your map-reduce jobs during unit testing.

Nick ODell
  • 5,641
  • 1
  • 23
  • 47
rystsov
  • 1,738
  • 13
  • 16
0

I develop my map/reduce code in Eclipse using maven to build the runtime jar and to manage dependencies. Once I have hadoop installed and running on my machine to support HDFS, I can run and debug my code in Eclipse. That means using breakpoints and everything else in the Eclipse debug perspective.

Chris Gerken
  • 15,735
  • 6
  • 41
  • 58