Questions tagged [bigdata]

Big data is a concept that deals with data sets of extreme volumes. Questions may tend to be related to infrastructure, algorithms, statistics, and data structures.

Big data is a concept that deals with data sets of extreme volumes.

There are several features that allow to separate this concept into a distinct one:

Data

Data is so large it cannot be processed on a single computer
Relationship between data elements is extremely complex

Algorithms

Local algorithms that take longer than O(N) to compute will likely to take many years to finish
Fast distributed algorithms are used instead

Storage

Underlying data storage shall be fault-tolerant and keep data in a consistent state independently of device failures
One storage device is incapable of holding all the data set

Eco-system

Big data is also synonymous with the set of tools which are used to process huge amounts of data. This is also known as big data eco-system. Popular tools are HDFS, Spark, MapReduce etc

7093 questions

votes

1 answer

Elastic search one index or multiple index for same data

I'm building an application which could greatly benefit from ElasticSearch. In my current version I'm using 1 single index: "messages" with just 1 type: "message". Messages are composed of the following format (averaging 10kb): messages - id -…

asked Nov 05 '14 at 19:13

Floris

votes

1 answer

The best way to filter large data sets

I have a query about how to filter relevant records from a large data set of financial transactions. We use Oracle 11g database and one of the requirements is to produce various end-of-day reports with all sorts of criteria. The relevant tables…

hadoop hbase bigdata

asked Nov 04 '14 at 12:44

JVC

votes

2 answers

What kind of NoSQL storage should we use?

We are a IoT company that provide services for transportation and logistics companies. As a infrastructure service provider we offer GPS tracking devices to our client. Although the format of GPS tracking data is very neat (gpsId, longitude,…

gps iot bigdata nosql

asked Oct 28 '14 at 12:02

Wuaner

votes

2 answers

Filtering in pig by concatenating two column

I have two table in the following format Table 1: com_Data #cc bb mm# 41 22 2563 42 24 3562 Table 2: #name cid# sasi 41-22-2563 soman 42-47-2562 I want to compaine the three column cc bb mm from table 1 and need to filter out all the column…

hadoop apache-pig bigdata

asked Oct 10 '14 at 06:52

Anas A

votes

1 answer

Grails App with Huge Tables

I'm trying to create a database from existing csv files that are about 20,000 columns wide and 700 rows deep. In grails I would like the 20,000 column domain to belongTo another simpler domain (about 200 columns). But upon compilation I get:…

csv grails bigdata

asked Oct 09 '14 at 16:36

janDro

1,186
1
9
23

votes

2 answers

Transform data in Google bigquery - extract text, split it into multiple columns and pivoting the data

I have some weblog data in big query which I need to transform to make it easier to use and query. The data looks like: I want to extract and transform the data within the curled brackets after Results{…..} (colored blue). The data is of the form…

sql regex bigdata google-bigquery google-cloud-storage

asked Oct 09 '14 at 15:59

ravishchhabra

votes

2 answers

Subtract all pairs of values from two arrays

I have two vectors, v1 and v2. I'd like to subtract each value of v2 from each value of v1 and store the results in another vector. I also would like to work with very large vectors (e.g. 1e6 size), so I think I should be using numpy for…

python arrays numpy bigdata

asked Sep 27 '14 at 16:10

jpcgandre

1,427
4
29
53

votes

1 answer

Neo4j Relationship Index - Search on relationship property

I've got a neo4j graph with the following structure. (Account) ---[Transaction]--- (Account) Transaction is a neo4j relationship and Account is a node. There are set various properties on each transaction, such as the transaction ID, amount, date,…

java neo4j bigdata

asked Aug 27 '14 at 22:22

Imme22009

3,474
5
27
47

votes

2 answers

Function awfully slow

I was looking for historical data from our Brazilian stock market and found it at Bovespa's website. The problem is the format the data is in is terrible, it is mingled with all sorts of other information about any particular stock! So far so good!…

python performance parsing web-scraping bigdata

asked Aug 26 '14 at 01:33

Luis Dos Reis

votes

3 answers

Hadoop use-case scenario

I would like to have some expert views on the use of a Big Data platform like Hadoop in one of my project scenarios. I am a complete novice in this technology although I understand databases like MySQL well. We are creating a product which would be…

hadoop bigdata hadoop2

asked Aug 09 '14 at 08:01

user1826116

votes

1 answer

How to append the output of Parallel Grep to a file?

I have a file of 500 MB, and a pattern file of 20MB. Since it was taking too much time to grep the 1.2 million patterns from the file with 5 million lines, I split the pattern file into 100 parts. I tried to run Grep parallely with the multiple…

linux bash parallel-processing grep bigdata

asked Aug 08 '14 at 13:22

Rohit

votes

1 answer

Please help what is the necessity of Shuffle and Sorting in Hadoop?

In a normal wordcount program in mapreduce, do we need to set any method for shuffle and sort, or the framework will take care of this?

hadoop mapreduce bigdata

asked Aug 08 '14 at 06:47

shakti

votes

1 answer

How to train neural networks on big sample sets in Matlab?

I am trying to train neural network on big training set. inputs consists of aprox 4 million of columns and 128 rows, and targets consisting of 62 rows. hiddenLayerSize is 128. The script is follows: net =…

matlab neural-network bigdata

asked Aug 06 '14 at 11:18

Suzan Cioc

26,725
49
190
355

votes

1 answer

How to handle large amount of documents stored in a database?

I am working on one application where user can scan/upload documents. Application processes those documents and store it in the database. We are using MySQL database. Right now we are having more than 200,000 documents in the database. So we are…

mysql database bigdata

asked Aug 01 '14 at 10:52

pan1490

votes

0 answers

Same data in two different databases to improve query performance

I'm storing social data in Elasticsearch, but it's so difficult to query it without any kind of joins. So, I'm thinking a possible way: All docs in elasticsearch. Complete docs with all infos. All relations in neo4j. Only queriable data (dates,…

database elasticsearch neo4j bigdata

asked Jul 29 '14 at 18:42

user3175226

3,159
5
26
41

Prev 1 2 3

…

100