Wednesday, November 19, 2008

kernel upgrade to 2.6.27 for mips/arm platform

I have encountered other interesting issue when I was upgrading the kernel from 2.6.23 to 2.6.27 for our MIPS32 platform.

1. the timer, as in 2.6.24 or later, the default MIPS timer has been separated from the old timer code. For our architecture used the default cp0 comparison based timer, we need to enable the r4k timer to get the timer working. I spend couple of days to understand this change. The start_kernel function was able to proceed after the proper is enabled. However, we do have a timer block in our chip, and we can set the timer to a certain frequency as an individual timer source. In my later debugging process, I have enabled the timer block and use our own timer source. It also works.

2. the cache should be disabled when kernel started and re-enable later on. In 2.6.23, the cache was enabled by default. However, in 2.6.27 or some version later than 2.6.23, the cache was configurable by a kernel option "cca=" and it is disabled by default. This change really hurts me. As there are so many changes from 23 to 27 kernel, it is almost impossible for me to notice this change at first. What I have observed at first was the slowness of the system. The BogoMips dropped from about 273 to 3, which is unbelievable. I was doubting the correctness of the timer function at the beginning. I scrutinized the code and well studied the new timer implementation. I even implemented our own timer by using the timer block in our chip. Those doesn't help either. The system was able to boot to busybox but it is really slow. I accidentally tried to use our performance counter program to measure the performance. The performance counter reported the cache hit is 0, which means that there is no cache enabled. I checked our private i/d cache register and they seems enabled. However, I forget to check the setting of the cp0 register of MIPS. There is another setting to enable/disable cache policy. I used a very stupid and old method to pinpoint the problem. I added NOP test to both 23 and 27 kernels. In 23 kernel, when the cache is enabled, the NOP test gives much lower CPI (clock per instruction), otherwise the CPI is high. In 27 kernel, the CPI doesn't change. I tried to figure out the exact point where the CPI drop 23 kernel and check the corresponding code in 27 kernel. I finally found that the default cache policy was disable in 27 kernel, while it is enabled in 23 kernel. By adding the "cca=3" kernel command line option, everything backs to normal, BogoMips, kernel boots properly.


3. Export symbol and export symbol gpl'ed. If your driver, kernel module used the latter symbols, your driver/kernel module must be gpl'ed. This can cause problem for us as we don't want to open source all our kernel modules, especially wlan driver. We deliver binary kernel module for our wlan drivers.

Tuesday, November 18, 2008

kernel upgrade to 2.6.27

I had just fixed a network driver bug when I upgrade the kernel from 2.6.23 to 2.6.27 for our MIPS platform. It takes couple of weeks. The original problem appears when the NAPI was used in network driver and I changed the net_poll function accordingly. Then, I got kernel panic with memory access failure. After long time debugging, I found that there is some problem when the driver tries to figure out the address of skb out of the skb->data structure. This is weird because the same code was used in both 2.6.23 kernel and 2.6.27 kernel. The original author of the network driver gave some hints that he had experienced similar problem when he was creating a network driver for our next generation chip, based on the old driver. He mentioned that the original driver was confused about the physical/virtual address when accessing the dma'ed memory. This is quite helpful. I spend a whole day dig into this issue, and studied the new driver for next generation chip. After replacing the memory allocation function for the skb buffer, I finally got the proper method to access the memory. I've learned about the physical/virtual/bus address when accessing memory in kernel. The principle was simple as stated by Linus, "use virtual address when accessing memory in kernel, and use bus address when the memory was given to device". In some architecture, the bus address is identical to physical address. Never use physical address directly. The functions: phys_to_virt, virt_to_bus, bus_to_virt, virt_to_phys, are all helper functions.

It seems that we still had a lot of bugs in our network driver. Apparently, our engineers haven't had enough knowledge creating drivers in Linux. Most of their experience was in VxWorks, with flat memory model.

Thursday, October 02, 2008

kernel upgrade

I am recently working on kernel upgrade from 2.6.23.17 to 2.6.27-rc5 for our VoIP SoCs.

The following are lessons I learned this time.

* use git to help migrate the patches, as we manage patches in our system, not by git. So, we may use git to help migrate the patches.
* get a kernel git tree (git fetch)
* check out a branch for your current kernel version (git branch, git checkout)
* patch the git tree with your patches (patch, git commit)
* rebase the git tree to the newer kernel (git rebase)
* resolve conflict, for each resolved conflict, remember your changes (git rerere)
* once resolved all conflicts, generate new patches for the new kernel (git format-patch)


* to get your new kernel built for the platform at first
* use make oldconfig to migrate the kernel configurations
* keep all the old configuration as much as possible

* turn on most of the kernel debug option, like early_printk, printk_with_time, spinlock, etc.

* If it doesn't boot, there may be a lot of reasons
* check kernel dump, especially the dumped code, disassemble the whole kernel to figure out where the dump happened
* there are usually a big change in each major version of kernel, such as timer change in mips architecture, path and inclusion changes in arm architecture, sd/mmc infrastructure changes, be very careful.

Sunday, September 21, 2008

vmlinux.lds

The vmlinux.lds is the linker script for the linux kernel. It is generated/stored in arch/mips/boot/compressed directory for mips architecture. For x86 architecture, it is stored in arch/x86/boot/compressed directory.

Wednesday, April 16, 2008

check the version of gcc in your program

use #if __GNUC__ < 4

#if __GNUC_MINOR__ < 3

Tuesday, March 04, 2008

Linker script .lds

A script xxx.lds is used to instruct the linker, how to generate and link the final elf target. I recently solved a bootloader problem, by moving a part of (reginfo) into Data section. The bug in binutil overwrite my target elf file, with the reginfo, which shouldn't be loaded.

Some links about the linker.

http://www.gnu.org/software/binutils/manual/ld-2.9.1/html_mono/ld.html
http://www.embedded.com/2000/0002/0002feat2.htm

Tuesday, January 29, 2008

CGI internal server error

Haven't worked on CGI/perl for a while, recently I am reworking on my perl scripts for star5.ca website. The following items should be checked when you got the 'Internal Server Error" in Perl. GoDaddy has no server log enabled by default, so you have to be very careful in doing CGI/Perl programming.

1. File permission (your perl program should have 755 mode)
2. DOS ending (your perl program should be using unix line ending, instead of dos line ending. A lot system will complain of "file not found", because of the ending issue)
3. Path to perl (usual !/usr/bin/perl)
4. Library path (use "path to your local library"), to support local perl modules.

Wednesday, January 16, 2008

gcc

I am working on toolchain upgrade recently, from gcc 3.4.6 to gcc 4.1.1.

Thing I learned during the upgrade.

1. libgcc contains compiler specific library, usually used for floating point computation. For example, 64 bit floating point has no support on the host system, the libgcc has to convert the computation into some native instructions.

You can use "gcc -v" to figure out the default library path "-L xxx" used by gcc, besides the path passed by your Makefile. "gcc -v" can also give you the exact command called by gcc, as gcc itself is an umbrella program, which calls "cc1", "collect2" etc.

2. -ffreestanding, flag may be used for kernel compilation. It implies that standard library may not exist and the program startup may not necessarily be at "main". To use gcc 4.1.1 to compile linux kernel 2.6.17, we need to add -ffreestanding in our makerules.

3. Optimization, gcc optimize the code differently in each version. Some functions were optimized away in kernel, but called in bootloader. I have to create a dummy function in bootloader to avoid linking problem.

4. -std=gnu99. I've met several preprocessing error when I was building gcc in buildroot. cpp was complaining about the unknown labels in assembly code. It turns out the the cpp using gnu standard is not 100% compatible with the iso c standard. By removing the -std=gnu99 flag, I can get gcc compiled.

5. -sysroot. This option will enable you to use a different set of "/include", "/lib" directories in a different root. I think that it may be useful in cross-compiling environment.