Internet, IT and Technology

Compress and decompress files and entire directories on Linux

compress files and directories

As not everything comes with a manual, or many times it is difficult for us to find documentation for some users, especially if it is a new operating system, such is the case of Linux. If you know how to use it, the terminal is a very powerful tool for any user, and you don't necessarily need to be an expert. We just need to know some commands or the more basics, such as the commands to compress and decompress entire directories, including all the files and folders that are within these.

This applies to any Linux distribution, it could be Debian, Ubuntu, CentOS, etc.

tar gz

Compress an entire directory or file

The most basic way to compress is by using the command:

tar -zcvf file-2080-04-01.tar.gz /home/user/folder

We can also use the following command, which is exactly the same as the previous one, only using the full parameters:

tar --gzip --create --verbose --file=file-2080-04-01.tar.gz /home/user/folder

The resulting file will be compressed and located in the directory in which we are. For example, if we log in with the root user from the terminal, and immediately execute the previous command, the file file-2080-04-01.tar.gz will remain in the directory /root. Which is the default directory in which we are when accessing as root user.

The parameters of the command used are:

  • z, --gzip: use the gzip tool to compress.
  • c, --create: create a new file, in this case the tar container, which by default has no compression.
  • v, --verbose: it is an option that makes the whole process of the command be shown in more detail.
  • f, --file=: indicates that the next parameter used will be the output file, in this case the file is tar.gz.

In order to compress a directory or file, and choose the location, we have two options:

1. The first is to navigate with the command cd towards the directory in which we want the file to be created.

cd /backups/
tar -zcvf file-2080-04-01.tar.gz /home/user/folder

2. The second option is to change the path of the file, placing the command in the following way, using absolute paths at the origin and destination:

tar -zcvf /home/user/file-2080-04-01.tar.gz /home/user/file

Choose compression software

The above was a very basic usage using the gzip compression algorithm, however the tar command supports multiple algorithms. Different algorithms are used because each one has different compression levels, in other words each one gives a different final file size, and at the same time each algorithm has its own speed.

If we execute tar --help the command, options will be displayed, and exactly in the Compression Options section the options that we will use are shown.

Option Description
-j, --bzip2 bzip2
-J, --xz xz
--lzip lzip
--lzma lzma
--lzop lzop
--no-auto-compress  
--zstd zstd
-z, --gzip gzip
-Z, --compress LZA
-a, --auto-compress Used to choose the algorithm automatically based on the file extension.
-I, --use-compress-program= An external program existing on our system is used.

The compression algorithms may vary depending on the tar version and the utilities installed on our system.

To use a compression algorithm other than the default, we follow the instructions mentioned at the beginning, except that we remove the -z or --gzip option and replace them with the algorithm we want.

For example, we are going to create a compressed file from a directory with the zstd algorithm.

tar --zstd --create --file=/home/compressed-file.tar.zst /home/user/directory/

# or
tar --zstd -cf /home/compressed-file.tar.zst /home/user/directory/

Compress using an algorithm that is not in the list, but that is installed on our system, we execute the following.

tar --use-compress-program=brotli -cf /home/compressed-file.tar.br /home/user/directory/

For the above command to work correctly, you must accept the -d option, which is commonly used for unzipping.

Automatically compress based on file extension. Here we just add the -a option and remove any hint of the algorithm from the options so that it is detected from the same file.

tar -acf /home/compressed-file.tar.gz /home/user/directory/

Why is gzip used by default?

gzip is widely used, it is not used because it is the best, but because it is very fast and offers an acceptable compression rate.

What is the best compressor overall?

It has been shown with different tests that zstd offers a balance between compression time, compression rate and compressed size. Originally called Zstandard, it was created by Facebook.

Choose compression level

Tar by default uses automatic compression levels, this means that it uses the default levels of each compressor.

In order to choose a custom level we have to use tar only to archive the files and directories, and then we continue compressing the resulting file. We can do this in steps or using pipes (GNU pipe).

tar --create example.txt | zstd --ultra > example.tar.zst

# or
tar -c example.txt | zstd -19 > example.tar.zst

Keep in mind that each program has different compression levels and different options. When decompressing, we use the same technique as when not using a custom compression level.

Why use tar instead of using a compressor directly?

The tar utility is originally used to package files and directories into a single archive, thereby preserving their structure. This allows us to transfer, compress and store efficiently. The tar command does not have any type of compression, it only works as an encapsulation, which is why it is used in conjunction with its compression options.

Another reason to use tar is that most compression algorithms work at the individual file level, and in order to compress several files at once we would have to navigate within directories, file by file. For example, running the brotli command help option of the Brotli algorithm will display options to compress a single file and not multiple files or directories.

brotli --help

Usage: brotli [OPTION]... [FILE]...

On the other hand, if we use zstd directly, it does allow us to choose many input files, but it only works at the file level, and does not work within subdirectories.

By using tar, we ignore these limitations, and can compress any type of content without problem.

Decompress a file .tar.gz

In the current directory.

We use the following command, from the directory where the file is located .tar.gz:

tar -zxvf file-2080-04-01.tar.gz

Alternative with named parameters:

tar --gunzip --extract --verbose --file=file-2080-04-01.tar.gz

In this command with parameters with names gunzip is the tool in charge of filtering the content and --extract indicates that the files will be extracted from the file found in --file=, and --verbose is the level of detail that the command gives us.

On a custom route.

To choose a custom path where the files are going to be decompressed, we add the parameter: -C

tar -zxvf archivo-2080-04-01.tar.gz -C /tmp/folder

Although we can also navigate to the destination directory, and execute, for example:

tar -zxvf /home/user/file-2080-04-01.tar.gz

With this, all the contents of the archive will be decompressed in the current directory, regardless of whether the .tar.gz file is elsewhere.

Decompress file with custom compressor

It is quite easy to decompress files with custom compression algorithms, we just have to specify what algorithm the file has, exactly the same as when we compressed it.

For example, in the case of a file compressed with Brotli.

tar --use-compress-program=brotli -xf example.tar.br

# or
tar -I brotli -xf example.tar.br

If we use the tar command with the -x option without specifying the compressor type, it will give us an error.

tar -xf example.tar.br

tar: This doesn't look like a tar file
tar: Exits with failed status due to previous errors

Finally, if it is not enough to make a mistake with the above, if we use the -z option of gzip, it will give us the following error. So the only valid option is the first, specifying what algorithm the file has.

tar -zxf example.tar.br

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Categories

Related content