0

I am designing an application which takes files and split/merge them as per content and pushes it to some other system. Once processed no need of those files at all. So, I am planning to store files on HDD where application deployed instead of any distributed/network file system.

Why I haven't chosen network file system as in my scenarios I need to process huge files like 1GB and I am using JSON streams for processing those files. Also some times I need to use RandomAccessFile mechanism for splitting my content. If it happens over network file system, processing time could be high.

I thought about scaling the application as well with local file system, it can be scaled without any worry as target system expects processed data from the same system to which they pushed files.

Please provide your thoughts on this? I want to check I'm on right path

Pokuri
  • 2,994
  • 7
  • 28
  • 51

1 Answers1

0

I'll provide some downsides of this approach:

  • Local HDD has typically no redundancy (e.g. RAID 5/0) - and is more likely to fail (that depends, oc, on your cloud/hardware provider)
  • Local HDD is often based on inferior hardware (compare to SAN/NAS) - and can be slower.
    • The main difference between fast/slow HDD is typically on random access. For sequential access (you mentioned mostly working with large files?) - the effect can be far smaller.
  • Local HDD data is often deleted, on cloud providers, if the instance fails. So, again, this is a risk of losing data.

To recap: If your resiliency and performance requirements are met - I don't see an issue with this approach.

Lior Bar-On
  • 8,250
  • 3
  • 28
  • 41
  • Okay. What do you suggest if I should be able to whatever I'm able to do with local file system. Any services that could support this stream processing and random file access in java? – Pokuri Aug 13 '17 at 12:49
  • Are you running on the cloud? if yes - which vendor? I would consider only a local service - otherwise the latency would "kill" the performance. – Lior Bar-On Aug 13 '17 at 13:01
  • Not decided yet about deployment environment. What if it's AWS? What if it some other cloud environment? – Pokuri Aug 13 '17 at 15:34
  • on AWS, EBS probably the most appropriate for the use-case [here is a comparison](https://stackoverflow.com/questions/29575877/aws-efs-vs-ebs-vs-s3-differences-when-to-use) -- but then, this question start a different discussion.... – Lior Bar-On Aug 13 '17 at 18:41