Questions tagged [large-data-volumes]

290 questions
4
votes
4 answers

Processing large amounts of data using multithreading

I need to write a c# service ( could be a windows service or a console app) that needs to process large amounts of data ( 100 000 records) stored in a database. Processing each record is also a fairly complex operation. I need to perform a lot of…
Sennin
  • 911
  • 1
  • 9
  • 16
4
votes
2 answers

IDEAs: how to interactively render large image series using GPU-based direct volume rendering

I'm looking for idea's how to convert a 30+gb, 2000+ colored TIFF image series into a dataset able to be visualized in realtime (interactive frame rates) using GPU-based volume rendering (using OpenCL / OpenGL / GLSL). I want to use a direct volume…
bastijn
  • 5,461
  • 4
  • 25
  • 41
4
votes
6 answers

Computing token counters on huge dataset

I need to go over a huge amount of text (> 2 Tb, a Wikipedia full dump) and keep two counters for each seen token (each counter is incremented depending on the current event). The only operation that I will need for these counters is increase. On a…
smola
  • 823
  • 7
  • 14
4
votes
1 answer

Fetch only N rows at a time (MySQL)

I'm looking for a way to fetch all data from a huge table in smaller chunks. Please advise.
akosch
  • 4,118
  • 7
  • 56
  • 80
4
votes
4 answers

Common Lisp: What is the downside to using this filter function on very large lists?

I want to filter out all elements of list 'a from list 'b and return the filtered 'b. This is my function: (defun filter (a b) "Filters out all items in a from b" (if (= 0 (length a)) b (filter (remove (first a) a) (remove (first a)…
schellsan
  • 2,094
  • 1
  • 22
  • 31
4
votes
3 answers

Creating a large sitemap on Google App Engine?

I have a site with around 100,000 unique pages. (1) How do I create a Sitemap for all these links? Should I just list them flat in a large sitemap protocol compatible file? (2) Need to implement this on Google App Engine where there is a 1000 item…
demos
  • 2,504
  • 8
  • 33
  • 50
4
votes
1 answer

Count the occurrence of each element in large data stream

I have a simulation, with N particles, running over T timesteps. At each timestep, each particle calculates some data about itself and the other particles nearby (within radius), which is bitpacked into a c-string of 4-22 bytes long (depending on…
4
votes
1 answer

Mayavi visualizing huge 3D arrays

I have a 3D dataset with around 6 million points. Is there any way to plot it using contour3D? Every time tried, mayavi goes out of memory. Else, is there a way to increase the number of colors in the volume() ctf to more than 256 colors. I have…
pitc
  • 71
  • 4
4
votes
4 answers

Getting random results from large tables

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category. Now, as you would imagine doing random sorting on a table this…
Brett
  • 16,869
  • 50
  • 138
  • 258
4
votes
1 answer

How do I update the database in the most efficient manner?

I am building a price comparison site that holds about 300.000 products and several hundred clients. On a daily basis the site needs updating of prices and vendor stock availability. When a vendor needs updating I was thinking about deleting all the…
user937635
  • 59
  • 5
3
votes
4 answers

I have 100 trillion elements, each of them has size from 1 byte to 1 trillion bytes (0.909 TiB). How to store them and access them very efficiently?

This is an interview question : Suppose: I have 100 trillion elements, each of them has size from 1 byte to 1 trillion bytes (0.909 TiB). How to store them and access them very efficiently ? My ideas : They want to test the knowledge…
user1002288
  • 4,502
  • 9
  • 45
  • 76
3
votes
2 answers

SQL Database design for huge datasets

I have a customer that has the following data structure... for each patient, there may be multiple samples, and each sample may, after processing, have 4 million data objects. The max number of samples per patient is 20. So a single patient may…
Nicros
  • 4,611
  • 12
  • 52
  • 98
3
votes
6 answers

How do I count the number of rows in a large CSV file with Perl?

I have to use Perl on a Windows environment at work, and I need to be able to find out the number of rows that a large csv file contains (about 1.4Gb). Any idea how to do this with minimum waste of resources? Thanks PS This must be done within the…
Alex Wong
  • 691
  • 3
  • 8
  • 14
3
votes
5 answers

Appropriate data structure for faster retrieval process (data size: around 200,000 values all string)

I have a large data set of around 200, 000 values, all of them are strings. Which data structure should i use so that the searching and retrieval process is fast. Insertion is one time, so even if the insertion is slow it wouldn't matter much. Hash…
Elvis
  • 115
  • 9
3
votes
3 answers

SELECT a varchar field on one entry in an 8.2 million entry table - performance help

I have a table with 8.2 million entries in a SQL Server 2005 database. This table stores basic past customer details (referrer, ip, whether they entered via an advertisement, etc) for every customer we have had come to the site. Unfortunately,…
Cody Mays
  • 31
  • 1