SRE/DevOps Interview Questions — Linux Troubleshooting

Satyajit Roy on 2022-04-07

SRE/DevOps Interview Questions — Linux Troubleshooting

I have been on both side of the table as interviewer and as interviewee for DevOps and SRE Roles. This blog I am trying to share some of the questions I have been asked or I have asked.

Note: This is just to share knowledge, experience and some fun questions

Linux Troubleshooting

Any DevOps and SRE interview commonly starts with some troubleshooting questions, where the interviewer tries to nudge your Linux Internal and some basic core concepts. Here are some of them on top of my mind

1. What happens when a Linux System boots, till you get a login prompt

This type of question usually comes from the companies where bare metals are still in use and they don’t use any public cloud. So let see what happens

Detail Answer can be found here

2. What happens when you type ls on terminal

These type of questions are used to understand interviewee attention to details and depth of Linux Internals . Basically the interviewer wants to know if you under forks() and exec() system calls.

the shell reads what you typed using the getline() function and function called strtok() which took the line to tokenize. Shell also check if the 1st token ls is a Shell alias or not. If it’s not a built-in function, shell will find the PATH variable in the directory. Since it holds the absolute paths for all the executable binary files. Once it finds the binary for ls , the program is loaded in memory and a system call fork() is made. This creates a child process as ls and the shell will be the parent process. The fork() returns 0 to the child process so it knows it has to act as a child and returns PID of the child to the parent process(i.e. the shell).

Next, the ls process executes the system call execve() that will give it a brand new address space with the program that it has to run. Now, the ls can start running its program. The ls utility uses a function to read the directories and files from the disk by consulting the underlying filesystem’s inode entries.

Once ls process is done executing, it will call the _exit() system call with an integer 0 that denotes a normal execution and the kernel will free up its resources.

Note: you can use strace ls to dig deeper into the system calls

3. Explain Linux Inodes

An Inode number points to an Inode. An Inode is a data structure that stores the information about the file or folder

Detailed Answer is available here

4. Crash vs Panic

Crash usually happens when a trap occurs when the application trying to access memory incorrectly. Panic usually when the application kill/shutdown itself abruptly. Main difference between crash and panic is that crash is hardware or OS initiated and panic usually imitated by application by calling abort() function. Some applications use a special function called a signal handler to generate information about the trap other can use gdb to collection information about the same.

Most common bad programming signals are SIGSEGV , SIGBUS and SIGILL usually caused by bad memory management, a bad pointer, uninitialized values or memory corruption.

5. Explain the /proc filesystem

/proc is very special in that it is also a virtual filesystem. It’s sometimes referred to as a process information pseudo-file system. It doesn’t contain ‘real’ files but runtime system information. Lot of system utilities are simply calls to files in this directory

/proc file system has the pid for the process running. if you do cd /procs/self you will see al lot file and there size is 0 however you will see that they do contain information

/maps provides information about the memory address space of the process

/cmdline contains the arguments for the commandline

/environ provides information about the process' current environment

/fd contains symbolic link pointing to each file for which the process currently has file descriptor

/proc/locks shows all the locks on currently exist in the system

/proc/sys/fs contains some useful information like file-nr which tells you the number of open files and available on the system

/proc/sys/vm holds files and information to tune virtual memory

6. When I get a “filesystem is full” error, but “df” shows there is free space

Check if you see zero IFree by using df -i . If that is not the case then see if deleted files are still in use using lsof and restart those processes

7. What are the performance tools you would use on Linux Machine

uptime dmesg | tail vmstat 1 mpstat -P ALL 1 pidstat 1 iostat -xz 1 free -m sar -n DEV 1 sar -n TCP,ETCP 1 top

Detailed Answer is available here

8. Explain Linux FileSystem

Interviewer wants to know how much you understand about linux filesystems. A specific type of data storage format, such as EXT3, EXT4, BTRFS, XFS, and so on. Linux supports almost 100 types of filesystems.

Detailed Answer is available here

9. Explain Kernel Space and User Space

This can be a rabbit hole question, Interviewer can go as deep as possible to see what are your limits. This is also the most interesting topic about Linux that how the control flows from User Space to Kernel Space and why that is important. Why can’t we directly access the Kernel Space. What are use internal libraries like libc and why we need system call

Detailed Answer is available here

10. How would you troubleshoot a High I/O Issue

Detail Answer is available here

11. What are processes and threads?

Process are basically the programs which are dispatched from the ready state and are scheduled in the CPU for execution. PCB(Process Control Block) holds the concept of process. A process can create other processes which are known as Child Processes. The process takes more time to terminate and it is isolated means it does not share the memory with any other process.

Detailed Answer is available here

12. Explain Kernel Memory Management

This is not a trivial question. It is very deep and convoluted. So I would hope that interviewer will only be trying to see if you understand the basics around the Kernel Memory Management

Detailed Answer is available here

13. Explain Processes and threads ?

Detailed Answer is available here

14. Explain different type of task status ?

Detailed Answer is available here

15. Explain Linux Concurrency and Race Conditions ?

Detailed Answer is available here

16. Explain STACK and HEAP in Operating System ?

Detailed Answer is available here

17. Explain Memory Leak ?

Naive definition: Failure to release unreachable memory, which can no longer be allocated again by any process during execution of the allocating process. This can mostly be cured by using GC (Garbage Collection) techniques or detected by automated tools.

Subtle definition: Failure to release reachable memory which is no longer needed for your program to function correctly. This is nearly impossible to detect with automated tools or by programmers who are not familiar with the code. While technically it is not a leak, it has the same implications as the naive one. This is not my own idea only. You can come across projects that are written in a garbage collected language but still mention fixing memory leaks in their changelogs.

18. How does Linux handles Interrupts ?

Detailed Answer is available here

19. Explain Load Average ?

The best definition and internals about load average can be is explained here. I would encourage everybody to go though this website for more deeper understanding about internals

20. What happens when you try to curl to website ?

This is very famous question and comes to life every now and then. However I would think that we all should be aware of the internal process flow when you do curl . Once the best detailed explanation I found is here. One can certainly argue that this way too much detail but hey no harm in knowing things completely, you may not say this whole thing when asked but one should certainly know about it

Other awesome resources available out there for interview preparations

  1. Facebook Production Engineer Interview
  2. Facebook Production Engineer Interview
  3. Site Reliability Interview
  4. Engineering Manager Interview
  5. Google SWE Interview
  6. Amazon SWE Interview
  7. Good Troubleshooting Tips and Tricks
  8. Good Refs What is boiling
  9. Linux Perf Analysis
  10. Scalability, Reliability and Performance for Large Systems

I just made this effort to put all these together in one place. I will keep tracking these and put them together here in part…so stay tune!!

Happy Troubleshooting and Best of luck!!