0

I am a bit confused about the number of partitions and its size in spark. Some tutorials refers number of partitions = number of blocks in HDFS(64 mb or 128mb) and other refers number of partitions = number of cores in cluster. So if my data is of 1 gig size and stored in HDFS of 128 mb block size and cluster has supoose 10 cores, so what will be the number of partitions in this case? Is it 8 or 10? Thanks in advance

A B
  • 1,829
  • 1
  • 19
  • 39
  • When you import from hdfs it is number of blocks in hdfs . – Indrajit Swain Feb 02 '18 at 06:50
  • refering https://techmagie.wordpress.com/2015/12/19/understanding-spark-partitioning/ which mentions for sc.textFile(), partition.size = sc.defaultParallelism or number of file blocks , whichever is greater. So for my scenario Is the answer is 10? – A B Feb 02 '18 at 09:28

0 Answers0