1 files changed, 248 insertions, 0 deletions
diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst
new file mode 100644
index 000000000000..a8ef794aadae
--- /dev/null
+++ b/Documentation/admin-guide/bug-hunting.rst
@@ -0,0 +1,248 @@
+Bug hunting
++++++++++++
+
+Last updated: 20 December 2005
+
+Introduction
+============
+
+Always try the latest kernel from kernel.org and build from source. If you are
+not confident in doing that please report the bug to your distribution vendor
+instead of to a kernel developer.
+
+Finding bugs is not always easy. Have a go though. If you can't find it don't
+give up. Report as much as you have found to the relevant maintainer. See
+MAINTAINERS for who that is for the subsystem you have worked on.
+
+Before you submit a bug report read
+:ref:`Documentation/REPORTING-BUGS <reportingbugs>`.
+
+Devices not appearing
+=====================
+
+Often this is caused by udev. Check that first before blaming it on the
+kernel.
+
+Finding patch that caused a bug
+===============================
+
+
+
+Finding using ``git-bisect``
+----------------------------
+
+Using the provided tools with ``git`` makes finding bugs easy provided the bug
+is reproducible.
+
+Steps to do it:
+
+- start using git for the kernel source
+- read the man page for ``git-bisect``
+- have fun
+
+Finding it the old way
+----------------------
+
+[Sat Mar  2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)]
+
+This is how to track down a bug if you know nothing about kernel hacking.
+It's a brute force approach but it works pretty well.
+
+You need:
+
+        - A reproducible bug - it has to happen predictably (sorry)
+        - All the kernel tar files from a revision that worked to the
+          revision that doesn't
+
+You will then do:
+
+        - Rebuild a revision that you believe works, install, and verify that.
+        - Do a binary search over the kernels to figure out which one
+          introduced the bug.  I.e., suppose 1.3.28 didn't have the bug, but
+          you know that 1.3.69 does.  Pick a kernel in the middle and build
+          that, like 1.3.50.  Build & test; if it works, pick the mid point
+          between .50 and .69, else the mid point between .28 and .50.
+        - You'll narrow it down to the kernel that introduced the bug.  You
+          can probably do better than this but it gets tricky.
+
+        - Narrow it down to a subdirectory
+
+          - Copy kernel that works into "test".  Let's say that 3.62 works,
+            but 3.63 doesn't.  So you diff -r those two kernels and come
+            up with a list of directories that changed.  For each of those
+            directories:
+
+                Copy the non-working directory next to the working directory
+                as "dir.63".
+                One directory at time, try moving the working directory to
+                "dir.62" and mv dir.63 dir"time, try::
+
+                        mv dir dir.62
+                        mv dir.63 dir
+                        find dir -name '*.[oa]' -print | xargs rm -f
+
+                And then rebuild and retest.  Assuming that all related
+                changes were contained in the sub directory, this should
+                isolate the change to a directory.
+
+                Problems: changes in header files may have occurred; I've
+                found in my case that they were self explanatory - you may
+                or may not want to give up when that happens.
+
+        - Narrow it down to a file
+
+          - You can apply the same technique to each file in the directory,
+            hoping that the changes in that file are self contained.
+
+        - Narrow it down to a routine
+
+          - You can take the old file and the new file and manually create
+            a merged file that has::
+
+                #ifdef VER62
+                routine()
+                {
+                        ...
+                }
+                #else
+                routine()
+                {
+                        ...
+                }
+                #endif
+
+            And then walk through that file, one routine at a time and
+            prefix it with::
+
+                #define VER62
+                /* both routines here */
+                #undef VER62
+
+            Then recompile, retest, move the ifdefs until you find the one
+            that makes the difference.
+
+Finally, you take all the info that you have, kernel revisions, bug
+description, the extent to which you have narrowed it down, and pass
+that off to whomever you believe is the maintainer of that section.
+A post to linux.dev.kernel isn't such a bad idea if you've done some
+work to narrow it down.
+
+If you get it down to a routine, you'll probably get a fix in 24 hours.
+
+My apologies to Linus and the other kernel hackers for describing this
+brute force approach, it's hardly what a kernel hacker would do.  However,
+it does work and it lets non-hackers help fix bugs.  And it is cool
+because Linux snapshots will let you do this - something that you can't
+do with vendor supplied releases.
+
+Fixing the bug
+==============
+
+Nobody is going to tell you how to fix bugs. Seriously. You need to work it
+out. But below are some hints on how to use the tools.
+
+To debug a kernel, use objdump and look for the hex offset from the crash
+output to find the valid line of code/assembler. Without debug symbols, you
+will see the assembler code for the routine shown, but if your kernel has
+debug symbols the C code will also be available. (Debug symbols can be enabled
+in the kernel hacking menu of the menu configuration.) For example::
+
+    objdump -r -S -l --disassemble net/dccp/ipv4.o
+
+.. note::
+
+   You need to be at the top level of the kernel tree for this to pick up
+   your C files.
+
+If you don't have access to the code you can also debug on some crash dumps
+e.g. crash dump output as shown by Dave Miller::
+
+     EIP is at ip_queue_xmit+0x14/0x4c0
+      ...
+     Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
+     00 00 55 57  56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
+     <8b> 83 3c 01 00 00 89 44  24 14 8b 45 28 85 c0 89 44 24 18 0f 85
+
+     Put the bytes into a "foo.s" file like this:
+
+            .text
+            .globl foo
+     foo:
+            .byte  .... /* bytes from Code: part of OOPS dump */
+
+     Compile it with "gcc -c -o foo.o foo.s" then look at the output of
+     "objdump --disassemble foo.o".
+
+     Output:
+
+     ip_queue_xmit:
+         push       %ebp
+         push       %edi
+         push       %esi
+         push       %ebx
+         sub        $0xbc, %esp
+         mov        0xd0(%esp), %ebp        ! %ebp = arg0 (skb)
+         mov        0x8(%ebp), %ebx         ! %ebx = skb->sk
+         mov        0x13c(%ebx), %eax       ! %eax = inet_sk(sk)->opt
+
+In addition, you can use GDB to figure out the exact file and line
+number of the OOPS from the ``vmlinux`` file. If you have
+``CONFIG_DEBUG_INFO`` enabled, you can simply copy the EIP value from the
+OOPS::
+
+ EIP:    0060:[<c021e50e>]    Not tainted VLI
+
+And use GDB to translate that to human-readable form::
+
+  gdb vmlinux
+  (gdb) l *0xc021e50e
+
+If you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function
+offset from the OOPS::
+
+ EIP is at vt_ioctl+0xda8/0x1482
+
+And recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled::
+
+  make vmlinux
+  gdb vmlinux
+  (gdb) p vt_ioctl
+  (gdb) l *(0x<address of vt_ioctl> + 0xda8)
+
+or, as one command::
+
+  (gdb) l *(vt_ioctl + 0xda8)
+
+If you have a call trace, such as::
+
+     Call Trace:
+      [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
+      [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
+      [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
+      ...
+
+this shows the problem in the :jbd: module. You can load that module in gdb
+and list the relevant code::
+
+  gdb fs/jbd/jbd.ko
+  (gdb) p log_wait_commit
+  (gdb) l *(0x<address> + 0xa3)
+
+or::
+
+  (gdb) l *(log_wait_commit + 0xa3)
+
+
+Another very useful option of the Kernel Hacking section in menuconfig is
+Debug memory allocations. This will help you see whether data has been
+initialised and not set before use etc. To see the values that get assigned
+with this look at ``mm/slab.c`` and search for ``POISON_INUSE``. When using
+this an Oops will often show the poisoned data instead of zero which is the
+default.
+
+Once you have worked out a fix please submit it upstream. After all open
+source is about sharing what you do and don't you want to be recognised for
+your genius?
+
+Please do read :ref:`Documentation/SubmittingPatches <submittingpatches>`
+though to help your code get accepted.