PeaZip
PeaZip
64 bit

PeaZip
Portable

Linux / BSD
how to improve compression
how to efficiently compress files file compression tutorial file compression hints data compression tips and tricks



high speed compression tips

PeaZip, free file archiver utility, Open Source WinRar / WinZip alternative software providing unified cross-platform portable GUI for tools as 7-Zip, FreeArc, PAQ, UPX.
Create 7Z, ARC, BZ2, GZ, PAQ, PEA, self-extracting archives, TAR, WIM, XZ, ZIP files
Open and extract 150+ archive types: ACE, CAB, DMG, ISO, RAR, UDF, ZIPX files...
Features includes: extract, create and convert multiple archives at once, split / join files, strong encryption with two factor authentication, encrypted password manager, secure delete, find duplicate files, calculate hash value, export job definition as script to automate backup / restore.

fast data compression hints




file compression benchmark data compression tutorial best file compression optimize file compression optimal data compression
file compression hints


Learn more  |  Change log  |  Screenshots  |  Benchmarks  |  Reviews  |  Add-ons  |  Support  |  FAQ  |  Donations

File compression performances and efficiency tips and tricks

file compression tips and tricks

How to optimize maximum compression and fast compression

maximum data compression tips and tricks





How to optimize maximum compression and fast compression


File archival and compression is a way to condensate multiple input files in a single output archive, removing data redundancies, so the output is both smaller (to save disk space and upload/download bandwidth) and easier to handle than separate input files.
A common concern about compressing data - either for backup or file distribution - is balancing worthy compression ratio with reasonably fast operation, so i.e. end users will be able to unpack data in a timely fashion, or a backup process will end in a fixed maximum amount of time.
As scenario of different goals and constrains will vary, file compression efficiency factors must be carefully weighted minding intended use of the data in first place.
Following, factors that influences more efficiency of compression and needing more weight and attention in evaluation, and options to obtain best results.

Lossless compression uses statistical models to map the input to a smaller output eliminating redundancy in the data.
In this way the output carry exactly all the information featured by the input in less bytes, and can be expanded when needed to a 1:1 copy of the original data, which is a fundamental property for storing some types of data - i.e. a software, a database.

For this reason lossless compression algorithms are used for archive file formats used in general purpose archive manager utilities, like 7Z, RAR, and ZIP, where an exact and reversible image of the original data must be saved.

Lossy compression, instead, works identifying unnecessary or less relevant information (not just redundant data) and removing it.
In this way data compression is improved but at the cost of making lossy compression a non reversible process - as it comes at the cost of losing part of the information.
Lossy compression is consequently not suitable for general purpose file archiving (as in example losing a single byte of an executable file would make it not working), but  it works very well when loss of less relevant information is acceptable, as for multimedia files compression - in example for MP3 losing audio information below the audibility threshold, or losing not visible details in JPEG images, or both in compressed video formats.
So, information loss is destructive for the ability of 1:1 reversal of the algorithm (the information is permanently lost), but it is not prejudicial for the ability of end users to receive meaningful information - intelligible audio, clear picture or video.
Most common lossy compression algorithms are consequently usually fine tuned for the specific pattern of a multimedia data type.
Due the lossy nature of those compression schemes, however, usually professional editing work is performed on non compressed data (i.e. WAV audio, or TIFF images) or data compressed in a lossless way (i.e. FLAC audio, or PNG images) every time it is feasible so saving the work in progress multiple times does not result in losing bits of the information each time, with progressive degradation of quality - usually reserving use of lossy compression to final step for creating a reasonably sized output to distribute for media consumption.

General-purpose good practices for improving data compression efficiency

You usually don't need to archive duplicate files. Deduplicate files in order to avoid archiving redundant data. Identify and remove duplicate files before archival decreases the input size improving both operation time and final size result, and at the same time make easier for the end user to navigate/search in a tidier archive. Don't remove duplicate files if they are mandatorily needed in the path they are originally featured, i.e. by a software or an automated procedure.

Identify poorly compressible files and evaluate if spending time to compress it or simply store it "as is". Multimedia files (MP3, JPG, MPEG, AVI, DIVX...) tend to poorly compressible, as those formats features lossy compression, and, especially videos, are usually very large compared to other file types (documents, applications), so it should be evaluated carefully if they should be compressed at all  - using "Store" option for compression level, provided by most file archivers, meaning compression is disabled - or even copied "as is".
To reduce disk usage of graphic files (JPEG, PNG, TIFF, BMP) see pictures compression and optimization tips.
Some document formats (PDF, Open Office and new Office 2007 and beyond file formats), and some databases, are already compressed (usually fast deflate based lossless compression), so they generally does not compress well.
Encrypted data is not compressible at all, being pseudo random there is not a "shorter way" to represent the information carried in encrypted form.
Separating poorly compressible data from other data is a good way to start a compression policy definition to decide the best strategy for both the types of data.

compression ratio suggestions

Highest compression ratio is usually attained with slower and more computing intensive algorithms, i.e. RAR is slower and more powerful compressor than ZIP, and 7Z is slower and more powerful compressor than RAR, see file format compression comparison and benchmarks.
Different data types may lead to different results with different data compression algorithms, in example weaker RAR and ZIPX compression can close the gap with stronger 7Z compression when multimedia files compression is involved, due to efficiently optimized filters for multimedia files employed in RAR and ZIPX when suitable data structures are detected - anyway lossy compressed multimedia files remains poorly compressible data structures.
Switching to a more powerful algorithm is usually more efficient in terms of improving compression ratio than using highest compression ratios of a weaker compression algorithm.
It should be evaluated carefully if better compression is really needed (after deduplication, and evaluation of poorly compressible files), or if the archive is mainly made for other reasons than sparing file size i.e. applying encryption, handling the content as a single file, etc.

Solid compression, available as option for some archival formats like 7Z and RAR, can improve final compression ratio, it works providing a wider context for compression algorithm to reduce data redundancy and represent it in a more convenient way to spare output file size.
But the context information is needed also during extraction, so extraction from a solid archive needs more time to parse all the relevant context data (usually defined "solid block") and can be significantly slower than from a non solid archive.
7Z allows to chose the block size to be used for solid mode operation (the "window" data context is used by the algorithm) to minimize overhead, but this option also slightly reduces compression ratio improvements.

Solid compression is an option meant to improve compression ratio providing a wider context for compression algorithm while compressing multiple files.
The ideas behind solid compression are simple and effective:
  • when multiple files are processed as a single block (especially similar files, i.e. same type, or even revisions of the same file), it is possible to find redundant data between the files of the group, improving efficiency of compressed representation of the data better than treating each file separately
  • when many small files are processed as a single block, overhead content (marker of file begin/end, checksum, table of content) is written only once rather than once per file, saving extra bytes of size for each input object.
Solid compression is used in compressed TAR files (TAR.GZ, TAR.BZ2, TGZ, TXZ...), and it is available as option for some archival formats, like 7Z and RAR.

Main drawbacks of solid compression are:
  • the context information is needed also during compression / extraction to preserve the advantage of solid compression. So, the partial extraction (a single file or group of file rather than the whole archive) from a solid archive, or adding or deleting files to already existing solid archive, needs more time because all the relevant context data (usually defined "solid block") must be parsed, making the process significantly slower than adding / extracting data from a non solid archive
  • for the very same reason, a damage in any part of the archive may make all the data after that point non-usable for lack of context information needed for extraction, while data corruption in non solid archive usually harms only the data of a single file.
To mitigate those disadvantages, 7Z format allows to choose the block size to be used for solid mode operation (the "window" of data context that is parsed by the compression/extraction algorithm) minimizing overhead during extraction, and possible impact of data corruption - but for the very same reason reducing solid block size potentially reduces compression ratio improvements.
Solid blocks can be defined by size, number of files in a block, and if blocks are separated by file extension.

Chose carefully if the intended use of the compressed data needs high compression/solid compression to be used, the more often the data will be needed to be extracted the more times the computational overhead will apply for each end user.
In example, software distribution would greatly benefit of maximum compression, as saving bandwidth is critical and end user usually extracts the data only once, while the overhead may not be acceptable if the data needs to be accessed often and fastest extraction time becomes a decisive efficiency advantage.

To fit in size constrains (i.e. mail attachment limit, physical support size) is usually feasible from most archival utilities splitting the output file in volumes of desired size (volume spanning, or file split), progressively numbered i.e .001, .002, .nnn so the receiver can extract the whole archive, usually, saving all files in the same path and starting extraction from .001 file.
This is the simplest and most efficient way to securely fit in a mandatory output size, rather than trying to improve compression ratio with slower/heavier algorithms/settings in the hope to fit the desired target size.

Quite obviously, best data compression practices mean nothing if the file cannot be provided to the intended end user. If the archive needs to be shared, the first concern is what archive file types is capable to read the end user - what archive formats are supported or can be supported through end user computing platform (Microsoft Windows, Google Android/ChromeOS, iOS, Apple OSX, Linux, BDS...) - if the user is willing and authorized to install needed software.
So most of times the better choice in this case is staying with most common format (ZIP), while RAR is quite popular on MS Windows platforms and TAR is ubiquitously supported on Unix derivate systems, and 7Z is becoming increasingly popular on all systems.

Some file sharing platforms, cloud services, and e-mail provides may block some file types with the explanation they are commonly abused (spam, viruses, illicit content), preventing it to reach the intended end user(s), so it is critical to read terms of services to avoid this issue.
Usually changing file extension is not a solution, as each archive file has a well defined internal structure (that is meant for the file to properly function, so can hardly be cloaked) so file format recognition is seldom based on simple parsing the file extension.
In some other cases are blocked all encrypted files or all files of unknown/unsupported formats that service provider are not able to inspect / scan for viruses.

Self extracting archives are useful to provide the end user of the appropriate extraction routines without the need of installing any software, but being the extraction module embedded in the archive it represent an overhead of some 10s or 100s of KB, which makes it a noticeable disadvantage only in the case of very small (e.g. approximately less than 1MB) archives - which is however well in the size  range of a typical archive of a few textual documents. Moreover, being the self extracting archive an executable file, some file sharing platforms, cloud providers, and e-mail servers, may block the file, preventing it to reach the intended receiver(s).

Zero delete function (File tools submenu) is intended for overwriting file data or free partition space with all-0 stream, in order to fill corresponding physical disk area of homegeneus, highly compressible data.
This allows to save space when compressing disk images, either low-level physical disk snapshot done for backup porpose, and Virtual Machines guest virtual disks, as the 1:1 exact copy of the disk content is not burdened of leftover data on free space area - some disk imaging utilities and Virtual Machines players/managers have built-in compression routines, zeroing free space before is strongly recommended to improve compression ratio.
Zeroing deletion also offers a basic grade of security improvement over PeaZip's "Quick delete" function, which simply remove the file from filesystem, making it not recoverable by system's recycle bin but susceptible of being recovered with undelete file utilities. Zero deletion however is not meant for advanced security, and PeaZip's Secure delete should be used instead when it is needed to securely and permanently erase a file or sanitize free space on a volume for privacy reasons.

Topics: maximum compression how to, fit mail attachment limit, compress under mandatory size, highest compression ratio, solid compression, self extracting archives, fastest compression, fast extraction, smaller archive, reduce file size, fit data in cd/dvd size, spanned volumes, sharing files, e-mail files, e-mail filtering, improve compression ratio, best compression ratio, compression efficiency, hints for data compression, file compression tips and tricks, suggestions for improving compression efficiency, optimize compression, optimal data compression, lossy, lossless, virtual machines, disk image, encryption.

Related articles:  Add content to already existing archive, Convert existing archive files, Create 7Z files, RAR files, Create ZIP files, Encrypted files, Find duplicate files, Comparison of archive file formats, File compression benchmarks

FAQ > Tips and tricks > Suggestions for best compression results







Tag Cloud
7z compression ace archive cab file compression optimization hints convert existing archive decompress rar file data compression tips and tricks file compression efficiency free rar file tool optimize compression efficiency rar compression suggestions for improving compression efficiency tar archives zip file zipx compression


how to compress filesDownloads
data compression how to
PeaZip
Peazip 64 bit
Peazip Portable
Linux/BSD

free rarHelp
file compression best practices

Learn more
Change log
Screenshots
Benchmarks
Reviews
Add-ons
Support
FAQ
extract rar freeDonations
free rar utility download
Support PeaZip project, or donate to FAO, UNICEF and UNESCO from donations' page

© PeaZip srl, TOS and Privacy
Giorgio Tani
Search
winzip alternative
find help