Sometimes you might get an incident for high usage on a filesystem. You check and yes, filesystem usage is high, but ‘du’ (disk usage) if different. why?
Some explanations might say: they don’t use the same methods or metrics to calculate what they are reporting.
Yes, that’s true, but pretty much outputs should be the same.
If you ask me, my answer is easy and simple: HUMAN ERROR! if there is human intervention in the system always assume someone did something wrong.
In this case we have a 1.1T filesystem:
[root@prod_dbnode ~]# df -h /u02
Filesystem Size Used Avail Use% Mounted on
/dev/xvdh 1.1T 943G 85G 92% /u02
[root@prod_dbnode ~]#
[root@prod_dbnode ~]# du -sh /u02
148G /u02
[root@prod_dbnode ~]#
Hey! did you see that?? there is a +800G difference.
When you see a huge difference, pretty much in all the cases it is because someone deleted a huge file while the OS process is still running (aka still holding the space). ‘du’ reports are already freed but ‘df’ still show the space is not released yet.
This is a real case where someone saw a huge trace file and decided to delete, but space never got released and file was growing and growing….
[root@prod_dbnode ~]# lsof | grep -i deleteg | grep /u02 | sort -nk7 | tail -4
oracle_19 199546 oracle 21w REG 202,112 1492 56080968 /u02/app/oracle/diag/rdbms/primaryDB/instance1/trace/instance1_ora_199546.trm (deleted)
oracle_20 203218 oracle 20w REG 202,112 3689 55984621 /u02/app/oracle/diag/rdbms/primaryDB/instance1/trace/instance1_ora_203218.trc (deleted)
oracle_19 199546 oracle 20w REG 202,112 4881 56080967 /u02/app/oracle/diag/rdbms/primaryDB/instance1/trace/instance1_ora_199546.trc (deleted)
ora_p007_ 116364 oracle 47w REG 202,112 810482773836 56039058 /u02/app/oracle/diag/rdbms/primaryDB/instance1/trace/instance1_p007_116364.trc (deleted) <<---- 810G trace
[root@prod_dbnode ~]#
What do you do to fix this? Well… the easy fix is to stop the OS process and right away will release the space.
But what is process running is critical and you can not stop until you get a maintenance window? The only option is to null the pointer to that file:
*We need to check the fd's (file descriptor) for OS process 116364, in this case fd 47:
[root@ryderprod-ajr2k2 ~]# ls -tlr /proc/116364/fd | grep deleted
l-wx------ 1 oracle asmadmin 64 Jun 14 23:29 47 -> /u02/app/oracle/diag/rdbms/primaryDB/instance1/trace/instance1_p007_116364.trc (deleted) (deleted)
[root@ryderprod-ajr2k2 ~]#
* Just null the file and voila! space released:
[root@prod_dbnode ~]# cd /proc/116364/fd
[root@prod_dbnode fd]# > 47
[root@prod_dbnode fd]#
[root@prod_dbnode fd]# df -h /u02
Filesystem Size Used Avail Use% Mounted on
/dev/xvdh 1.1T 139G 890G 14% /u02
[root@prod_dbnode fd]#
Comments