19

I am using Spark 2 and Scala 2.11 in a Zeppelin 0.7 notebook. I have a dataframe that I can print like this:

dfLemma.select("text", "lemma").show(20,false)

and the output looks like:

+---------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|text                                                                                                                       |lemma                                                                                                                                                                  |
+---------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|RT @Dope_Promo: When you and your crew beat your high scores on FUGLY FROG  https://time.com/Sxp3Onz1w8                    |[rt, @dope_promo, :, when, you, and, you, crew, beat, you, high, score, on, FUGLY, FROG, https://time.com/sxp3onz1w8]                                                      |
|RT @axolROSE: Did yall just call Kermit the frog a lizard?  https://time.com/wDAEAEr1Ay                                        |[rt, @axolrose, :, do, yall, just, call, Kermit, the, frog, a, lizard, ?, https://time.com/wdaeaer1ay]                                                                     |

I am trying to make the output nicer in Zeppelin, by:

val printcols= dfLemma.select("text", "lemma")
println("%table " + printcols)

which gives this output:

printcols: org.apache.spark.sql.DataFrame = [text: string, lemma: array<string>]

and a new blank Zeppelin paragraph headed

[text: string, lemma: array]

Is there a way of getting the dataframe to show as a nicely formatted table? TIA!

schoon
  • 1,878
  • 3
  • 26
  • 56

2 Answers2

71

In Zeppelin you can use z.show(df) to show a pretty table. Here's an example:

val df = Seq(
  (1,1,1), (2,2,2), (3,3,3)
).toDF("first_column", "second_column", "third_column")

z.show(df)

enter image description here

Daniel de Paula
  • 15,304
  • 8
  • 62
  • 69
  • Nice. Being unaware of this, I had written my own pretty print function (leveraging ``%table``) for pyspark. I cannot find this anywhere in the documentation, however... – akoeltringer Jul 06 '17 at 11:43
  • 1
    @TwUxTLi51Nus It's true the docs are not very good for this part. You can find some info about the ZeppelinContext [here](https://zeppelin.apache.org/docs/latest/interpreter/spark.html#zeppelincontext) and in the code ([here](https://github.com/apache/zeppelin/blob/branch-0.7/spark/src/main/java/org/apache/zeppelin/spark/ZeppelinContext.java)) you can see all available functions. Also, in the notebook you can check using ctrl+space on the z variable. – Daniel de Paula Jul 06 '17 at 11:55
  • ctrl + space does not work for me, however (in python) ``dir(z)`` does. – akoeltringer Jul 06 '17 at 12:11
  • @TwUxTLi51Nus Nice. About the hotkey, maybe "ctrl + ." works? – Daniel de Paula Jul 06 '17 at 12:14
  • Works great, thanks! Can the number of rows be limited? – schoon Jul 06 '17 at 13:01
  • 2
    @schoon you're welcome! You can limit the number of rows with a second parameter: `z.show(df, 10)` – Daniel de Paula Jul 06 '17 at 13:02
  • 1
    I always find a more complex way. I did this: z.show(dfLemma.select("racist", "lemma").limit(20)). Will try yours. – schoon Jul 06 '17 at 13:36
  • Asked a bit more [here](https://stackoverflow.com/questions/45026919/how-can-i-pretty-print-a-wrappedarray-in-zeppelin-spark-scala). – schoon Jul 11 '17 at 06:30
1

I know this is an old thread, but just in case it helps...

The below was the only way that I could take show a portion of the df. Trying to add a second parameter to .show() as suggested in the comments is throwing an error.

z.show(df.limit(10))