Daniel López Azaña

Theme

Social Media

Blog

GNU/Linux, Open Source, Cloud Computing, DevOps and more...

15 most useful Linux commands for file system maintenance

linux-penguin-inside-a-box-tar-gz

One of the most common and tedious tasks of a sysadmin is to prevent file systems become completely full , because when a server runs out of space the consequences are unpredictable. Depending on how you structured the root file system and if it is divided into different partitions or volumes , those consequences will be more or less severe, but in any case undesirable.

In any case it is always better to be safe than sorry, so use tools that perform automatic log rotation as logrotate and custom scripts to monitor and conduct periodic emptying actions to prevent file systems to get full. However, still using these prevention methods it is for sure it will be many times when you will have to act manually to troubleshoot problems.

Below I gather a collection of Linux commands that I find most useful on my day to day work freeing up disk space and keeping my file systems in optimal health from command line.

1. Check free space available

To find the free space available on all filesystems within a computer execute the following command:

~$ df -h
S.ficheros     Tamaño Usados  Disp Uso% Montado en
/dev/sdb5        3,9G   842M  3,0G  22% /
udev             2,0G   4,0K  2,0G   1% /dev
tmpfs            791M   956K  790M   1% /run
none             5,0M      0  5,0M   0% /run/lock
none             2,0G    72M  1,9G   4% /run/shm
/dev/sdb1        118M    91M   28M  77% /boot
/dev/sdb10        31G   6,8G   24G  23% /home
/dev/sdb6         23G    12G   12G  52% /usr
/dev/sdb7        9,6G   1,5G  8,1G  16% /var
/dev/sdb8        2,9G    62M  2,8G   3% /tmp
/dev/sdb11       102G    11G   91G  11% /vmware
/dev/sda1        230G    93G  125G  43% /var/datos

For a specific directory:

~$ df -h /home
S.ficheros     Tamaño Usados  Disp Uso% Montado en
/dev/sdb10        31G   6,8G   24G  23% /home

To display the file systems in order of occupation and thus know which ones are fuller:

~$ df -h | awk '{print $5 " " $6}' | sort -n | tail -5
22% /
23% /home
43% /var/datos
52% /usr
77% /boot

2. Calculate directory size

The -h parameter shows directory size in a friendly human readable way, either in Kilobytes, Megabytes, Gigabytes or Terabytes.

~# du -h -s /var/log
9,6M	/var/log

3. Remove vs empty files

We usually use the rm command to remove files in order to free up space. However, it is very common that we can not delete a file because it is being used at that time by an application, which is most common with log files on production systems wich can not be stopped. Removing them directly can have harmful effect, such as hanging the application, or milder but also undesirable, as dumping data to these files is interrupted and they are no longer useful.

In order to not alter application behavior and achieve our goal of freeing up disk space we will empty files instead of deleting them :

~# >/var/log/syslog

After this the file will be 0 bytes size.

If you need to empty multiple files at once with a single command:

~# for I in `ls "/var/log/*.log"`;do >"$I";done

4. Count the number of files in a directory

~# ls -l /var/log | wc -l
80

5. Get the bigger files in a filesystem

This command is also useful when you want to free up space, as it shows the largest files within a directory including all its subdirectories.

~# du -k /var/log | sort -n | tail -5
516	/var/log/apache2
5256	/var/log/exim4
12884	/var/log/installer/cdebconf
13504	/var/log/installer
21456	/var/log

The file size must be displayed in Kilobytes (parameter -k). You must use this parameter instead of -h otherwise sort -n command will not sort the list the way you expect.

It’s important to limit the number of files you want to display with tail -X , where X is that number, because if the directory at issue has hundreds or thousands of files, the command output may take too much input/output overhead to your terminal and slow down the command response too much, especially if you are connecting remotely via telnet or ssh and the connection is not very fast.

6. List the largest files in a directory

Similar to above, but in this case subdirectories are not included.

~# ls -lSr | tail -5
-rw-r----- 1 syslog      adm   118616 sep 29 22:05 auth.log
-rw-r--r-- 1 root        root  149012 sep  9 17:12 udev
-rw-r--r-- 1 root        root  160128 ago  4 19:27 faillog
-rw-r----- 1 syslog      adm   499400 sep 28 06:25 auth.log.1
-rw-rw-r-- 1 root        utmp 1461168 sep 29 21:54 lastlog

If -r parameter is removed files listed would be the smaller instead of the larger ones.

7. Calculate the size of only certain files

For example, if you want to get the total size of only .log files in a directory use the following command:

~# du -ch /var/log/*.log | grep total
468K	total

8. Find large files setting boundaries

For example, those greater than 100 MB in size, or those between 100 MB and 1 GB:

~$ find . -type f -size +100M -ls
~$ find . -type f -size +100M -size -1G -ls

9. List the most recently modified files

~# ls -larth /var/log | tail -5
drwxr-xr-x 20 root              root     4,2K sep 30 12:27 .
-rw-rw-r--  1 root              utmp      11K sep 30 13:03 wtmp
-rw-r-----  1 syslog            adm       13K sep 30 13:03 syslog
-rw-r-----  1 syslog            adm       13K sep 30 13:03 kern.log
-rw-r-----  1 syslog            adm      1000 sep 30 13:03 auth.log

The -a parameter says that hidden files must also be displayed.

10. Find old files (I)

Many times we need to know the files modified within a given time interval. In the following example files older than 90 days are located in order to find out old files no longer in use wich can be safely removed to free up space.

~# find /var/log -mtime +90 -ls
~# find /var/log -mtime +90 -ls -exec rm {} \;

The first command locates files only, the second one also removes them.

11. Find old files (II)

Same as above, but in this case also files that have been accessed, modified or not, within the specified time interval are considered.

~# find /var/log -atime +90 -ls

12. Find empty files

The following command allows you to find files in the current directory with a size of 0 bytes , ie empty. This is useful at anomalous situations in which this files are generated, for example after a file system got 100% full and applications tried to unsuccesfully write to disk, or by an abnormal application behavior. Cleaning is necessary given these scenarios, because although those empty files do not take up disk space they can consume all available file system inodes if 0 byte files are massively created, wich in turn causes no more files can be created.

~$ find . -type f -size 0b -ls

Or:

~$ find . -type f -empty -ls

To know the number of free inodes available in a file system use the df -i command.

13. Package and compress directory content

Sometimes it’s useful to package all log files in a directory into a single compressed tar file to preserve the state of that directory at a given point in time and then safely remove or empty all those files in order to free up space.

~# tar -zcvf var_log.`date +%Y%m%d`.tar.gz /var/log/*.log

Last command compresses all log files into a single file with .tar.gz extension and today’s date to make it easier to locate them in the future. Let’s see how to save space, passing in this example from 468 MB to 35 MB:

~# du -ch /var/log/*.log | grep total
468M	total
~# ls -lh var_log.20140930.tar.gz 
-rw-r--r-- 1 root root 35M sep 30 13:36 var_log.20140930.tar.gz

After that we can proceed and empty all log files as in section #3.

14. Find files in Recycle Bin

Normally when we send a file to the recycle bin it is simply moved to a hidden folder in your home directory such as ~/.local/share/Trash in Ubuntu. However, there exist applications that use their own directories to store the trash with a name that is a combination of the word trash either in upper or lower case in combination with a sequence of numbers, such as .Trash001 , .trash-002 , .Trash_0003 , etc.

Also when file systems from external hard drives or SD cards are mounted, the recycle bin’s name can differ from one operating system to another wich can cause it is not recognized and therefore although bin is emptied the device continues to have a large amount of space used for no apparent reason.

Therefore, the solution lies in searching all trash subdirectories you have on your system with no upper and lower case differentation and analyzing its contents to see if we can get rid of it (not always all items found are garbage).

The following is the required command. Its execution can be very time consuming so you may want to enter a specific file system or directory:

~$ find / -iname "*trash*" -ls

15. Find duplicate files

Finally, here’s a huge command that will allow you to find and delete duplicate files under a directory to avoid unnecessary redundancies that can be very costly in terms of disk space consumed.

~$ find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate | cut -f3-100 -d ' ' | tr '\n.' '\t.' | sed 's/\t\t/\n/g' | cut -f2-100 | tr '\t' '\n' | perl -pe 's/([ (){}-])/\\$1/g' | perl -pe 's/'\''/\\'\''/g' | xargs -pr rm -v

And what about you? Do you have other useful commands that you are used to use in order to keep your file systems as empty as you can?

Daniel López Azaña

About the author

Daniel López Azaña

Tech entrepreneur and cloud architect with over 20 years of experience transforming infrastructures and automating processes.

Specialist in AI/LLM integration, Rust and Python development, and AWS & GCP architecture. Restless mind, idea generator, and passionate about technological innovation and AI.

Related articles

fatrace command man page

Fatrace command: how to know in real time which processes are writing to a file

It is usually easy to know which process or processes are writing to a given file in Linux, since we either know its origin and its nature beforehand (for example the Apache access_log), or we can easily find it out with the fuser or lsof commands. However, sometimes it will happen that although we know the role and purpose of a file, there are so many applications accesing it simultaneously that it is very difficult to know which of them is the one that reads/writes the most or does so in a precise moment. Knowing this would be very useful to learn for example why a log file is growing excessively or which application is making an abusive use of system resources, either by mistake or intentionally.

June 23, 2017
cpu-cores

How to know how many cores and processors has a Linux box

The simplest and shortest method to measure the number of processors present on a Linux box, which is also widely extended as it’s part of coreutils is:Another way to get the same result which also allows us to obtain additional information from our processor are the lscpu command:Or we can examine the cpuinfo file from /proc filesystem:There are many additional details about these processors in /proc/cpuinfo file, including the CPU model and number of cores:

March 13, 2012
Ctrl+S

Unlock Linux command line after pressing Ctrl+s in Bash

Since the key combination Control+s is widely used as a shortcut to save files in GUI applications such as text editors, image editors, web browsers, etc. sometimes you are betrayed by your subconscious when you are working from the Linux command line and you use that same key combination when you are for example editing a Vim document when trying to save it. Then you notice that no key answers, the shell is locked and you can no longer do anything else in it.  Even worse, you get a cold sweat because you can’t continue editing your document and you can’t save the changes.

April 27, 2017

Comments

Andrea Urrego February 25, 2015
It was helpful!! tks
Michael A Smith August 31, 2016
Great write up. A few of these commands use the antipattern of parsing the output of `ls`. `ls` is not a good tool to parse. In the example of truncating multiple files, consider `for file in /var/log/*.log; do > "$file"; done` or even just `tee /var/log/*.log < /dev/null`. For counting the number of files in a directory `ls | wc -l` is fine for humans. But if you need to script it I'd recommend `find . -mindepth 1 -maxdepth 1 -printf . | wc -c`
Daniel January 9, 2017
Thank you for your interesting contribution!
srinivas April 24, 2018
this was nice page.helpful for me
hurd September 7, 2018
best and very useful for me tanks very tanks
Alex October 9, 2020
mr.Daniel! Thanks for the nice hints. But the command for duplicates - find -not -empty -type f -printf "%s\\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate | cut -f3-100 -d ' ' | tr '\\n.' '\\t.' | sed 's/\\t\\t/\\n/g' | cut -f2-100 | tr '\\t' '\\n' | perl -i -pe 's/([ (){}-])/\\\\$1/g' | perl -i -pe 's/'\\''/\\\\'\\''/g' | xargs -pr rm -v - throw this (in this form): -i used with no filenames on the command line, reading from STDIN. -i used with no filenames on the command line, reading from STDIN. What is the happening? Thanks in advance!
Daniel February 27, 2021
Just remove the -i parameter from perl commands, it's not actually necessary.

Submit comment