Technology

New Record Set In Data Compression

A developer from New York has set a new record in data compression. In doing so, he won the so-called “Hutter Prize for lossless compression of human knowledge,” which involves stamping out an excerpt from Wikipedia.

Shrinks the Wikipedia

The new award winner is Saurabh Kumar, as announced by the award’s founder. This is a developer who works for high-frequency trading and financial services funds in his normal working life. In this respect, he is quite used to dealing with large amounts of data, which brought him some advantages when working on better compression algorithms.

The standardized Hutter Prize is about compressing a 1 billion byte (1 gigabyte) excerpt from the online encyclopedia Wikipedia as much as possible – of course in a form with which the original can then be restored without loss. Kumar managed to beat the previous record holder with a result that was 1.04 percent smaller.

11.41 percent!

Specifically, the developer’s compression algorithm managed to shrink the original data set to a size of just 114,156,155 bytes, i.e. around 114 megabytes. This left only 11.41 percent of the original size. Since the reward for the price mentioned depends on the strength of the compression achieved, there is no round amount: the developer is now entitled to 5187 euros in prize money.

In times of gigantic capacities for data storage and fast transmission speeds, data compression seems to be of less use than it was a few years ago. However, this is a fallacy. Because this is not just about the direct use cases in everyday life. Rather, research into algorithms also offers more in-depth insights into working with data structures, which are more in demand today than ever to improve AI development.