How can I pretty print a data frame in Zeppelin/Spark/Scala?

Question

I am using Spark 2 and Scala 2.11 in a Zeppelin 0.7 notebook. I have a dataframe that I can print like this:

dfLemma.select("text", "lemma").show(20,false)

and the output looks like:

+---------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|text                                                                                                                       |lemma                                                                                                                                                                  |
+---------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|RT @Dope_Promo: When you and your crew beat your high scores on FUGLY FROG  https://time.com/Sxp3Onz1w8                    |[rt, @dope_promo, :, when, you, and, you, crew, beat, you, high, score, on, FUGLY, FROG, https://time.com/sxp3onz1w8]                                                      |
|RT @axolROSE: Did yall just call Kermit the frog a lizard?  https://time.com/wDAEAEr1Ay                                        |[rt, @axolrose, :, do, yall, just, call, Kermit, the, frog, a, lizard, ?, https://time.com/wdaeaer1ay]                                                                     |

I am trying to make the output nicer in Zeppelin, by:

val printcols= dfLemma.select("text", "lemma")
println("%table " + printcols)

which gives this output:

printcols: org.apache.spark.sql.DataFrame = [text: string, lemma: array<string>]

and a new blank Zeppelin paragraph headed

[text: string, lemma: array]

Is there a way of getting the dataframe to show as a nicely formatted table? TIA!

score 71 · Accepted Answer · answered Jul 06 '17 at 11:02

71

In Zeppelin you can use z.show(df) to show a pretty table. Here's an example:

val df = Seq(
  (1,1,1), (2,2,2), (3,3,3)
).toDF("first_column", "second_column", "third_column")

z.show(df)

answered Jul 06 '17 at 11:02

Daniel de Paula

15,304
8
62
69

Nice. Being unaware of this, I had written my own pretty print function (leveraging ``%table``) for pyspark. I cannot find this anywhere in the documentation, however... – akoeltringer Jul 06 '17 at 11:43
1

@TwUxTLi51Nus It's true the docs are not very good for this part. You can find some info about the ZeppelinContext [here](https://zeppelin.apache.org/docs/latest/interpreter/spark.html#zeppelincontext) and in the code ([here](https://github.com/apache/zeppelin/blob/branch-0.7/spark/src/main/java/org/apache/zeppelin/spark/ZeppelinContext.java)) you can see all available functions. Also, in the notebook you can check using ctrl+space on the z variable. – Daniel de Paula Jul 06 '17 at 11:55
ctrl + space does not work for me, however (in python) ``dir(z)`` does. – akoeltringer Jul 06 '17 at 12:11
@TwUxTLi51Nus Nice. About the hotkey, maybe "ctrl + ." works? – Daniel de Paula Jul 06 '17 at 12:14
Works great, thanks! Can the number of rows be limited? – schoon Jul 06 '17 at 13:01
2

@schoon you're welcome! You can limit the number of rows with a second parameter: `z.show(df, 10)` – Daniel de Paula Jul 06 '17 at 13:02
1

I always find a more complex way. I did this: z.show(dfLemma.select("racist", "lemma").limit(20)). Will try yours. – schoon Jul 06 '17 at 13:36
Asked a bit more [here](https://stackoverflow.com/questions/45026919/how-can-i-pretty-print-a-wrappedarray-in-zeppelin-spark-scala). – schoon Jul 11 '17 at 06:30

score 1 · Answer 2 · answered May 18 '21 at 18:26

1

I know this is an old thread, but just in case it helps...

The below was the only way that I could take show a portion of the df. Trying to add a second parameter to .show() as suggested in the comments is throwing an error.

z.show(df.limit(10))

answered May 18 '21 at 18:26

Kilian O Carroll

11
1

How can I pretty print a data frame in Zeppelin/Spark/Scala?

2 Answers2

Linked