Questions tagged [data-compression]

156 questions
4
votes
3 answers

What is the best way to compress a list of similar but not identical strings?

Say, I have a number of strings which are quite similar but no absolutely identical. They can differ more or less, but similarity can be seen by the naked eye. All lengths are equal, each is 256 bytes. The total number of strings is less than…
lithuak
  • 5,409
  • 8
  • 38
  • 52
4
votes
2 answers

How can one compute the optimal parameters to a start-step-stop coding scheme?

A start-step-stop code is a data compression technique that is used to compress number that are relatively small. The code works as follows: It has three parameters, start, step and stop. Start determines the amount of bits used to compute the first…
fluffels
  • 3,946
  • 7
  • 33
  • 51
4
votes
2 answers

Efficient way to store list of URLs

I need to store trillion of list of URLs where each URL list will contain ~50 URLs. What would be the most space efficient way to compress them for on-disk storage. I was thinking of first removing useless information like "http://" and then build a…
4
votes
2 answers

What is best compression algorithm for integers?

I want the best compression algorithm for a list of random numbers. List example: 224.19 225.57 226.09 222.74 222.20 222.11 223.14 540.56 538.96 540.14 540.44 336.45 338.47 340.78 156.73 160.02 158.56 156.23 55.08 56.33 54.88 53.45 I can skip the…
Waqas
  • 305
  • 1
  • 6
  • 20
4
votes
2 answers

Discovering Consecutive Repetitive Patterns in a String

I am trying to search for the maximal number of substring repetitions inside a string, here are some few examples: "AQMQMB" => QM (2x) "AQMPQMB" => "AACABABCABCABCP" => A (2x), AB (2x), ABC (3x) As you can see I am searching for…
Y.H.
  • 2,268
  • 1
  • 23
  • 34
3
votes
1 answer

Which Data compression algorithm has been used in WinRar?

Which compression algorithm is used in WinRar? I am working on file compression techniques. So which algorithm will be best for compression of audio/video files?
Sachin Mhetre
  • 4,225
  • 10
  • 39
  • 65
3
votes
2 answers

Are there any current C/C++ libraries filled with well-known compression algorithms?

I am looking for a C or C++ library that includes several well-known compression algorithms (particularly lossless ones), for the purpose of developing a custom compression scheme and comparing it to generic solutions. I have found one, called Basic…
Rubix
  • 153
  • 4
3
votes
2 answers

Design question on storing meteorological data on SQL Server 2008

We're using SQL Server 2008 R2 Enterprise Edition. We are measuring meteorological data from what we call MetMasts. Basically this is a mast with lots of equipment; anemometers (for wind speed) at different positions on the mast, thermometers , and…
3
votes
1 answer

Run-Length Encoding assumptions

When implementing the Run-length encoding (RLE), can I assume that the Runs are going to be shorter than one byte? So there will not be a situation where there is a run like this WWWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB... Where there…
3
votes
1 answer

Minimum File Size for Compression Algorithms

I know that for small files sometimes the compressed format can actually be larger than the initial file size. Are the minimum file sizes known for popular compression libraries such as gzip and lz4? I am dealing with files that are ~ 384 bytes.
bzak
  • 443
  • 2
  • 12
3
votes
2 answers

Universal archive unpacker library

A lot of antiviruses can unpack most archives, found on users harddrives. They dissect .zip, .rar, .chm, .exe, .msi (and other installers) and a lot lot more. Also they can unpack an executable (get resources from them, unpack packed executable and…
osgx
  • 80,853
  • 42
  • 303
  • 470
3
votes
5 answers

Data Aggregation in SQL Server 2005

I need a query for SQl server 2005 (SQL server management studio express). I have data stored as 1 minute time frame (1 minute each row), for each table columns are ID, Symbol, DateTime, Open, High, Low, Close, Volume. I need to convert (compress)…
Alberto acepsut
  • 1,922
  • 9
  • 39
  • 85
3
votes
2 answers

Any possible pitfalls for employing string compression in order to decrease database size

Background One of our SQL Server 2012 databases is getting a bit large, at least compared to our other databases. I was running some queries and noticed that we are currently storing large amounts of xml/html data in one of the columns. This is…
Narnian
  • 3,618
  • 22
  • 28
3
votes
1 answer

need mysql compatible compress()/decompress() for Java

I'm thinking of applying the MySql compress() function to a field that is varchar and tends to run from a few thousand characters to more than a million, per column. The text is almost normal English, so I get a 8-to-1 or ever better compression.…
fishtoprecords
  • 2,256
  • 6
  • 25
  • 38
2
votes
2 answers

GZIPOutputStream not properly compressing a String for HTTP Response

I'm writing a simple Java http server that responds with JSON data. I'm trying to GZip the data before sending it, but it usually sends back gzipped data that produces an error in the browser. For example, in Firefox it says: Content Encoding…
DFx
  • 205
  • 2
  • 9
1
2
3
10 11