Questions tagged [large-data-volumes]
290 questions
6
votes
11 answers
Advice on handling large data volumes
So I have a "large" number of "very large" ASCII files of numerical data (gigabytes altogether), and my program will need to process the entirety of it sequentially at least once.
Any advice on storing/loading the data? I've thought of converting…
![](../../users/profiles/10675.webp)
Jake
- 14,329
- 20
- 64
- 85
6
votes
5 answers
How can I determine the difference between two large datasets?
I have large datasets with millions of records in XML format. These datasets are full data dumps of a database up to a certain point in time.
Between two dumps new entries might have been added and existing ones might have been modified or deleted.…
![](../../users/profiles/396458.webp)
NullUserException
- 77,975
- 25
- 199
- 226
6
votes
5 answers
mysql tables structure - one very large table or separate tables?
I'm working on a project which is similar in nature to website visitor analysis.
It will be used by 100s of websites with average of 10,000s to 100,000s page views a day each so the data amount will be very large.
Should I use a single table with…
![](../../users/profiles/64106.webp)
Nir
- 22,471
- 25
- 78
- 114
6
votes
5 answers
Practical size limitations for RDBMS
I am working on a project that must store very large datasets and associated reference data. I have never come across a project that required tables quite this large. I have proved that at least one development environment cannot cope at the…
![](../../users/profiles/68115.webp)
grenade
- 28,964
- 22
- 90
- 125
6
votes
5 answers
Processing apache logs quickly
I'm currently running an awk script to process a large (8.1GB) access-log file, and it's taking forever to finish. In 20 minutes, it wrote 14MB of the (1000 +- 500)MB I expect it to write, and I wonder if I can process it much faster somehow.
Here…
![](../../users/profiles/163068.webp)
konr
- 1,153
- 12
- 25
5
votes
5 answers
"Simulating" a 64-bit integer with two 32-bit integers
I'm writing a very computationally intense procedure for a mobile device and I'm limited to 32-bit CPUs. In essence, I'm performing dot products of huge sets of data (>12k signed 16-bit integers). Floating point operations are just too slow, so I've…
![](../../users/profiles/573432.webp)
Phonon
- 12,013
- 12
- 57
- 111
5
votes
4 answers
NTFS directory has 100K entries. How much performance boost if spread over 100 subdirectories?
Context
We have a homegrown filesystem-backed caching library. We currently have performance problems with one installation due to large number of entries (e.g. up to 100,000). The problem: we store all fs entries in one "cache directory". Very…
![](../../users/profiles/331465.webp)
user331465
- 2,864
- 11
- 42
- 67
5
votes
7 answers
Large primary key: 1+ billion rows MySQL + InnoDB?
I was wondering if InnoDB would be the best way to format the table? The table contains one field, primary key, and the table will get 816k rows a day (est.). This will get very large very quick! I'm working on a file storage way (would this be…
![](../../users/profiles/45530.webp)
James Hartig
- 997
- 1
- 9
- 20
5
votes
1 answer
How to pick a chunksize for python multiprocessing with large datasets
I am attempting to to use python to gain some performance on a task that can be highly parallelized using http://docs.python.org/library/multiprocessing.
When looking at their library they say to use chunk size for very long iterables. Now, my…
![](../../users/profiles/24879.webp)
Sandro
- 2,091
- 4
- 24
- 41
5
votes
1 answer
MySql: Operate on Many Rows Using Long List of Composite PKs
What's a good way to work with many rows in MySql, given that I have a long list of keys in a client application that is connecting with ODBC?
Note: my experience is largely SQL Server, so I know a bit, just not MySQL specifically.
The task is to…
![](../../users/profiles/57611.webp)
ErikE
- 43,574
- 19
- 137
- 181
5
votes
4 answers
How to design a Real Time Alerting System?
I have an requirement where I have to send the alerts when the record in db is not updated/changed for specified intervals. For example, if the received purchase order doesn't processed within one hour, the reminder should be sent to the delivery…
![](../../users/profiles/206491.webp)
Sivasubramaniam Arunachalam
- 6,984
- 15
- 71
- 126
4
votes
3 answers
Optimizing MySQL Aggregation Query
I've got a very large table (~100Million Records) in MySQL that contains information about files. One of the pieces of information is the modified date of each file.
I need to write a query that will count the number of files that fit into specified…
![](../../users/profiles/17785.webp)
Zenshai
- 9,197
- 2
- 17
- 18
4
votes
2 answers
Trivial task - complex solution?
There is a trivial problem:
assign uniqueidentifier to any externalId
do not overwrite the uniqueidentifier once it is assigned - just return existing uniqueidentifier
Imagine a table
ExternalId | Guid
--------------------------------
…
![](../../users/profiles/525533.webp)
Piotr
- 767
- 6
- 21
4
votes
1 answer
Python - Search for items in hundreds of large, gzipped files
Unfortunately, I'm working with an extremely large corpus which is spread into hundreds of .gz files -- 24 gigabytes (packed) worth, in fact. Python is really my native language (hah) but I was wondering if I haven't run up against a problem that…
![](../../users/profiles/661889.webp)
Georgina
- 291
- 4
- 11
4
votes
1 answer
Storing Large Number of Graph Data Structures in a Database
This question asks about storing a single graph in a relational database. The solution is clear in that case: one table for nodes, one table for edges.
I have a graph data structure that evolves over time, so I would like to store "snapshots" of…
![](../../users/profiles/627517.webp)
Alan Turing
- 11,403
- 14
- 66
- 114