2007年6月30日星期六

关于 GSM 加密的一些资料

Kxn's eXercise Notes » 关于 GSM 加密的一些资料

关于 GSM 加密的一些资料

最近想弄双卡通,但是我的 K700 插卡的位置非常薄,那种剪卡的一卡通不见得好用。于是看了一些关于拷卡的内容,遗憾的是,似乎 05 下半年和以后的移动卡都是 V2 加密的了,用现有的拷卡软件没戏拷出来。虽然一堆厂商都声称在开发 V2 的破解软件,不过我觉得就是扯淡。

一篇 blog 不能写太短,那就简单讲讲刚看的关于 GSM 加密的内容吧:

GSM 的加密系统里面大致涉及三种算法,A3,A5,A8,这些并不特定指代什么算法,只是给出算法的输入和输出规范,以及对算法的要求,GSM 对于每种算法各有一个范例实现,理论上并没有限制大家使用哪种算法。但是世界上的设备商和运营商都是很懒得沟通的,看到既然有了范例实现,就都拿来用了,于是全世界的 SIM 卡被破解了都一样拷法。

说到这里就不能不简单介绍一下 SIM 卡, SIM 卡是一种智能卡片,里面有个非常简单的 CPU 和一点 NVRAM,可以存储和读出数据,还可以进行一些运算。卡里面有很多内容,不过我只介绍和加密相关的。每张 SIM 卡里面一般都存着一个全球唯一的标志号,叫做 IMSI,这个是用来唯一标识你 SIM 卡的,手机在开机时候会从卡里面读出这个号发给移动网络,移动那里有一个很大的数据库,描述了 IMSI 和手机号的对应关系,于是网络就知道你的手机号是多少了(如果你手机卡丢了去补,新补来的卡 IMSI 和原有的不同,而移动数据库那里将你原来的手机号指向新的 IMSI,旧的卡就再也不能用了)除了 IMSI ,还有 16 个字节的密钥数据,这个数据是无法通过 SIM 卡的接口读出的,通常称为 Ki, Ki 在移动网络那边也保存了一份。

在手机登录移动网络的时候,移动网络会产生一个 16 字节的随机数据(通常称为 RAND)发给手机,手机将这个数据发给 SIM 卡, SIM 卡用自己的密钥 Ki 和 RAND 做运算以后,生成一个 4 字节的应答(SRES)发回给手机,并转发给移动网络,与此同时,移动网络也进行了相同算法的运算,移动网络会比较一下这两个结果是否相同,相同就表明这个卡是我发出来的,允许其登录。这个验证算法在 GSM 规范里面叫做 A3,m = 128 bit, k = 128 bit, c=32 bit,很显然,这个算法要求已知 m 和 k 可以很简单的算出 c ,但是已知 m 和 c 却很难算出 k 。A3 算法是做在 SIM 卡里面的,因此如果运营商想更换加密算法,他只要发行自己的 SIM 卡,让自己的基站和 SIM 卡都使用相同的算法就可以了,手机完全不用换。

在移动网络发送 RAND 过来的时候,手机还会让 SIM 卡对 RAND 和 Ki 计算出另一个密钥以供全程通信加密使用,这个密钥的长度是 64 bits, 通常叫做 Kc, 生成 Kc 的算法是 A8 ,因为 A3 和 A8 接受的输入完全相同,所以实现者偷了个懒,用一个算法同时生成 SRES 和 Kc 。

在通信过程中的加密就是用 Kc 了,这个算法叫做 A5 ,因为 A5 的加密量很巨大,而且 SIM 卡的速度很慢,因此所有通信过程中的加密都是在手机上面完成的,这样一来,除非天下所有 GSM 手机都至少支持一种相同的 A5 算法,否则就没法漫游了,这时候运营商和设备商的懒惰又体现出来了,全世界目前只有一种通用的 A5 算法,没有其他的,这个算法就是和 Kc 的 8 字节序列进行简单的循环 XOR,再和报文序号做个减法。

上面只是简单的介绍 GSM 的加密通信过程,实际上 GSM 的操作比这个还要复杂一些,比如除了第一次登录时候用真正的 IMSI ,之后都是用商定的临时标识 TMSI ,不过这个不是今天讨论的重点。

下面就来说说为啥手机卡可以被复制。

从前面的介绍里面我们知道,要完成一次登录过程,IMSI 和 Ki 是必不可少的,A3 算法也需要知道,这其中, IMSI 是直接可读的,但是 A3 算法和存在你的卡里面的数据,都是不知道的,手机只是简单的把 RAND 给 SIM 卡, SIM 卡把算好的数据返回。实际设备中使用的 A3 算法被作为高级商业机密保护起来。但是世界上没有不透风的墙,在 1998 还是 1999 年的时候,有人从哪里偷到了几页纸的相关文档,然后把这文档输入了电脑。后来这个文档落到了加州伯克力几个教授手里面。这个文档里面缺少一些东西,而且还有写错的地方,这几个牛教授们拿一个 SIM 卡比对了一阵子,把缺的补上了,错的也给修正了,于是这个算法就成为了世人皆知的秘密。这个算法又被叫做 Comp128 ,他同时生成 SRES 和 Kc ,代码在这个文件里面。
http://blog.kangkang.org/wordpress/wp-content/uploads/2006/06/a3a8.txt

光有了算法还是不能够得到在 SIM 卡里面保存的 Ki, 理论上面是可以把 SIM 卡拆了,然后把芯片接到特殊设备上面来读出 Ki ,但是这个听起来就像用小刀在硬盘上面刻操作系统一样不靠谱。于是很多有志之士就开始了对 Comp128 算法的攻击,在一开始大家想到的肯定是穷举,不过这个 GSM 的设计者也想到了,SIM 卡里面有个逻辑是一共只能查询 2^16 次左右,之后卡会自杀,让破解者啥都得不到。因此研究者们试图在可以接受的次数之内通过构造特定明文和分析输出秘文来分析出 Ki 的值,结果还真被大家发现出来了一些。在下面这个 pdf 里面有一些相关的内容介绍,IBM 的一个小组甚至用 6 次查询就可以彻底解出 Ki,当然现在外面卖的那种拷卡器肯定没有这么牛,但是看表现似乎都可以在几分钟之内破解 。
http://blog.kangkang.org/wordpress/wp-content/uploads/2006/06/S5.Brumley-comp128.pdf

随着时间的推移,针对 Comp128 的破解算法越来越成熟,商用的卡复制设备也越来越多,运营商们终于坐不住了。很多运营商都开始发行 Comp128 v2 加密算法的卡了。这其中就包括中国移动,我看了一下论坛上面的帖子,大部分都是在反映 05 年的新卡基本都没法用 simscan 之类软件读出 Ki 。Comp128 v2 算法是 GSM 协会在 v1 被攻破以后,迅速在 v1 上面修改得来的结果,据说比较好的解决了 v1 算法中的弱点,当然,这个算法像 v1 一样,还是不公布于众。。而且到现在也没有人公布出来。这样一来,基本就没法解了。现在网上面很多拷卡设备厂商说的正在研发 v2 解码,我觉得基本是扯淡,这个既要有足够内线,能从设备商那里盗窃到 v2 的算法库或者从其他位置盗窃到文档,还要有足够数学实力,能够找出算法漏洞。就凭这些作坊企业那还差得远点。

另外关于 GSM 加密,还有两个文档可以参考一下:
http://blog.kangkang.org/wordpress/wp-content/uploads/2006/06/gsmsty.pdf

http://blog.kangkang.org/wordpress/wp-content/uploads/2006/06/miei_lec_3.ppt

最后,有用过剪卡双卡通的出来说一声,那东西有多厚啊?K700C 能不能用。。或者谁知道联通今年的 UP 新势力是 v1 的还是 v2 的?我的全球通卡是 03 年的,应该是 v1 的没问题,但是另一张今年的 UP 新势力就不好说了。换卡也没有太大希望,我弄这个是为了给联通的人发短信省钱,老的如意通就算找到,也没有价值用的说。。

2007年6月26日星期二

Handy one-liners for SED



-------------------------------------------------------------------------
HANDY ONE-LINERS FOR SED (Unix stream editor) Apr. 26, 2004
compiled by Eric Pement - pemente[at]northpark[dot]edu version 5.4
Latest version of this file is usually at:
http://sed.sourceforge.net/sed1line.txt http://www.student.northpark.edu/pemente/sed/sed1line.txt This file is also available in Portuguese at: http://www.lrv.ufsc.br/wmaker/sed_ptBR.html FILE SPACING: # double space a file sed G # double space a file which already has blank lines in it. Output file # should contain no more than one blank line between lines of text. sed '/^$/d;G' # triple space a file sed 'G;G' # undo double-spacing (assumes even-numbered lines are always blank) sed 'n;d' # insert a blank line above every line which matches "regex" sed '/regex/{x;p;x;}' # insert a blank line below every line which matches "regex" sed '/regex/G' # insert a blank line above and below every line which matches "regex" sed '/regex/{x;p;x;G;}' NUMBERING: # number each line of a file (simple left alignment). Using a tab (see # note on '\t' at end of file) instead of space will preserve margins. sed = filename | sed 'N;s/\n/\t/' # number each line of a file (number on left, right-aligned) sed = filename | sed 'N; s/^/ /; s/ *\(.\{6,\}\)\n/\1 /' # number each line of file, but only print numbers if line is not blank sed '/./=' filename | sed '/./N; s/\n/ /' # count lines (emulates "wc -l") sed -n '$=' TEXT CONVERSION AND SUBSTITUTION: # IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format sed 's/.$//' # assumes that all lines end with CR/LF sed 's/^M$//' # in bash/tcsh, press Ctrl-V then Ctrl-M sed 's/\x0D$//' # gsed 3.02.80, but top script is easier # IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format sed "s/$/`echo -e \\\r`/" # command line under ksh sed 's/$'"/`echo \\\r`/" # command line under bash sed "s/$/`echo \\\r`/" # command line under zsh sed 's/$/\r/' # gsed 3.02.80 # IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format sed "s/$//" # method 1 sed -n p # method 2 # IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format # Can only be done with UnxUtils sed, version 4.0.7 or higher. # Cannot be done with other DOS versions of sed. Use "tr" instead. sed "s/\r//" infile >outfile # UnxUtils sed v4.0.7 or higher tr -d \r <infile >outfile # GNU tr version 1.22 or higher # delete leading whitespace (spaces, tabs) from front of each line # aligns all text flush left sed 's/^[ \t]*//' # see note on '\t' at end of file # delete trailing whitespace (spaces, tabs) from end of each line sed 's/[ \t]*$//' # see note on '\t' at end of file # delete BOTH leading and trailing whitespace from each line sed 's/^[ \t]*//;s/[ \t]*$//' # insert 5 blank spaces at beginning of each line (make page offset) sed 's/^/ /' # align all text flush right on a 79-column width sed -e :a -e 's/^.\{1,78\}$/ &/;ta' # set at 78 plus 1 space # center all text in the middle of 79-column width. In method 1, # spaces at the beginning of the line are significant, and trailing # spaces are appended at the end of the line. In method 2, spaces at # the beginning of the line are discarded in centering the line, and # no trailing spaces appear at the end of lines. sed -e :a -e 's/^.\{1,77\}$/ & /;ta' # method 1 sed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/' # method 2 # substitute (find and replace) "foo" with "bar" on each line sed 's/foo/bar/' # replaces only 1st instance in a line sed 's/foo/bar/4' # replaces only 4th instance in a line sed 's/foo/bar/g' # replaces ALL instances in a line sed 's/\(.*\)foo\(.*foo\)/\1bar\2/' # replace the next-to-last case sed 's/\(.*\)foo/\1bar/' # replace only the last case # substitute "foo" with "bar" ONLY for lines which contain "baz" sed '/baz/s/foo/bar/g' # substitute "foo" with "bar" EXCEPT for lines which contain "baz" sed '/baz/!s/foo/bar/g' # change "scarlet" or "ruby" or "puce" to "red" sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g' # most seds
gsed 's/scarlet\|ruby\|puce/red/g' # GNU sed only

# reverse order of lines (emulates "tac")
# bug/feature in HHsed v1.5 causes blank lines to be deleted
sed '1!G;h;$!d' # method 1
sed -n '1!G;h;$p' # method 2

# reverse each character on the line (emulates "rev")
sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'

# join pairs of lines side-by-side (like "paste")
sed '$!N;s/\n/ /'

# if a line ends with a backslash, append the next line to it
sed -e :a -e '/\\$/N; s/\\\n//; ta'

# if a line begins with an equal sign, append it to the previous line
# and replace the "=" with a single space
sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D'

# add commas to numeric strings, changing "1234567" to "1,234,567"
gsed ':a;s/\B[0-9]\{3\}\>/,&/;ta' # GNU sed
sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta' # other seds

# add commas to numbers with decimal points and minus signs (GNU sed)
gsed ':a;s/\(^\|[^0-9.]\)\([0-9]\+\)\([0-9]\{3\}\)/\1\2,\3/g;ta'

# add a blank line every 5 lines (after lines 5, 10, 15, 20, etc.)
gsed '0~5G' # GNU sed only
sed 'n;n;n;n;G;' # other seds

SELECTIVE PRINTING OF CERTAIN LINES:

# print first 10 lines of file (emulates behavior of "head")
sed 10q

# print first line of file (emulates "head -1")
sed q

# print the last 10 lines of a file (emulates "tail")
sed -e :a -e '$q;N;11,$D;ba'

# print the last 2 lines of a file (emulates "tail -2")
sed '$!N;$!D'

# print the last line of a file (emulates "tail -1")
sed '$!d' # method 1
sed -n '$p' # method 2

# print only lines which match regular expression (emulates "grep")
sed -n '/regexp/p' # method 1
sed '/regexp/!d' # method 2

# print only lines which do NOT match regexp (emulates "grep -v")
sed -n '/regexp/!p' # method 1, corresponds to above
sed '/regexp/d' # method 2, simpler syntax

# print the line immediately before a regexp, but not the line
# containing the regexp
sed -n '/regexp/{g;1!p;};h'

# print the line immediately after a regexp, but not the line
# containing the regexp
sed -n '/regexp/{n;p;}'

# print 1 line of context before and after regexp, with line number
# indicating where the regexp occurred (similar to "grep -A1 -B1")
sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h

# grep for AAA and BBB and CCC (in any order)
sed '/AAA/!d; /BBB/!d; /CCC/!d'

# grep for AAA and BBB and CCC (in that order)
sed '/AAA.*BBB.*CCC/!d'

# grep for AAA or BBB or CCC (emulates "egrep")
sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d # most seds
gsed '/AAA\|BBB\|CCC/!d' # GNU sed only

# print paragraph if it contains AAA (blank lines separate paragraphs)
# HHsed v1.5 must insert a 'G;' after 'x;' in the next 3 scripts below
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;'

# print paragraph if it contains AAA and BBB and CCC (in any order)
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;/BBB/!d;/CCC/!d'

# print paragraph if it contains AAA or BBB or CCC
sed -e '/./{H;$!d;}' -e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d
gsed '/./{H;$!d;};x;/AAA\|BBB\|CCC/b;d' # GNU sed only

# print only lines of 65 characters or longer
sed -n '/^.\{65\}/p'

# print only lines of less than 65 characters
sed -n '/^.\{65\}/!p' # method 1, corresponds to above
sed '/^.\{65\}/d' # method 2, simpler syntax

# print section of file from regular expression to end of file
sed -n '/regexp/,$p'

# print section of file based on line numbers (lines 8-12, inclusive)
sed -n '8,12p' # method 1
sed '8,12!d' # method 2

# print line number 52
sed -n '52p' # method 1
sed '52!d' # method 2
sed '52q;d' # method 3, efficient on large files

# beginning at line 3, print every 7th line
gsed -n '3~7p' # GNU sed only
sed -n '3,${p;n;n;n;n;n;n;}' # other seds

# print section of file between two regular expressions (inclusive)
sed -n '/Iowa/,/Montana/p' # case sensitive

SELECTIVE DELETION OF CERTAIN LINES:

# print all of file EXCEPT section between 2 regular expressions
sed '/Iowa/,/Montana/d'

# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'

# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'

# delete all lines except duplicate lines (emulates "uniq -d").
sed '$!N; s/^\(.*\)\n\1$/\1/; t; D'

# delete the first 10 lines of a file
sed '1,10d'

# delete the last line of a file
sed '$d'

# delete the last 2 lines of a file
sed 'N;$!P;$!D;$d'

# delete the last 10 lines of a file
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # method 1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # method 2

# delete every 8th line
gsed '0~8d' # GNU sed only
sed 'n;n;n;n;n;n;n;d;' # other seds

# delete ALL blank lines from a file (same as "grep '.' ")
sed '/^$/d' # method 1
sed '/./!d' # method 2

# delete all CONSECUTIVE blank lines from file except the first; also
# deletes all blank lines from top and end of file (emulates "cat -s")
sed '/./,/^$/!d' # method 1, allows 0 blanks at top, 1 at EOF
sed '/^$/N;/\n$/D' # method 2, allows 1 blank at top, 0 at EOF

# delete all CONSECUTIVE blank lines from file except the first 2:
sed '/^$/N;/\n$/N;//D'

# delete all leading blank lines at top of file
sed '/./,$!d'

# delete all trailing blank lines at end of file
sed -e :a -e '/^\n*$/{$d;N;ba' -e '}' # works on all seds
sed -e :a -e '/^\n*$/N;/\n$/ba' # ditto, except for gsed 3.02*

# delete the last line of each paragraph
sed -n '/^$/{p;h;};/./{x;/./p;}'

SPECIAL APPLICATIONS:

# remove nroff overstrikes (char, backspace) from man pages. The 'echo'
# command may need an -e switch if you use Unix System V or bash shell.
sed "s/.`echo \\\b`//g" # double quotes required for Unix environment
sed 's/.^H//g' # in bash/tcsh, press Ctrl-V and then Ctrl-H
sed 's/.\x08//g' # hex expression for sed v1.5

# get Usenet/e-mail message header
sed '/^$/q' # deletes everything after first blank line

# get Usenet/e-mail message body
sed '1,/^$/d' # deletes everything up to first blank line

# get Subject header, but remove initial "Subject: " portion
sed '/^Subject: */!d; s///;q'

# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'

# parse out the address proper. Pulls out the e-mail address by itself
# from the 1-line return address header (see preceding script)
sed 's/ *(.*)//; s/>.*//; s/.*[:<] *//'

# add a leading angle bracket and space to each line (quote a message)
sed 's/^/> /'

# delete leading angle bracket & space from each line (unquote a message)
sed 's/^> //'

# remove most HTML tags (accommodates multiple-line tags)
sed -e :a -e 's/<[^>]*>//g;/</N;//ba'

# extract multi-part uuencoded binaries, removing extraneous header
# info, so that only the uuencoded portion remains. Files passed to
# sed must be passed in the proper order. Version 1 can be entered
# from the command line; version 2 can be made into an executable
# Unix shell script. (Modified from a script by Rahul Dhesi.)
sed '/^end/,/^begin/d' file1 file2 ... fileX | uudecode # vers. 1
sed '/^end/,/^begin/d' "$@" | uudecode # vers. 2

# zip up each .TXT file individually, deleting the source file and
# setting the name of each .ZIP file to the basename of the .TXT file
# (under DOS: the "dir /b" switch returns bare filenames in all caps).
echo @echo off >zipup.bat
dir /b *.txt | sed "s/^\(.*\)\.TXT/pkzip -mo \1 \1.TXT/" >>zipup.bat

TYPICAL USE: Sed takes one or more editing commands and applies all of
them, in sequence, to each line of input. After all the commands have
been applied to the first input line, that line is output and a second
input line is taken for processing, and the cycle repeats. The
preceding examples assume that input comes from the standard input
device (i.e, the console, normally this will be piped input). One or
more filenames can be appended to the command line if the input does
not come from stdin. Output is sent to stdout (the screen). Thus:

cat filename | sed '10q' # uses piped input
sed '10q' filename # same effect, avoids a useless "cat"
sed '10q' filename > newfile # redirects output to disk

For additional syntax instructions, including the way to apply editing
commands from a disk file instead of the command line, consult "sed &
awk, 2nd Edition," by Dale Dougherty and Arnold Robbins (O'Reilly,
1997; http://www.ora.com), "UNIX Text Processing," by Dale Dougherty
and Tim O'Reilly (Hayden Books, 1987) or the tutorials by Mike Arst
distributed in U-SEDIT2.ZIP (many sites). To fully exploit the power
of sed, one must understand "regular expressions." For this, see
"Mastering Regular Expressions" by Jeffrey Friedl (O'Reilly, 1997).
The manual ("man") pages on Unix systems may be helpful (try "man
sed", "man regexp", or the subsection on regular expressions in "man
ed"), but man pages are notoriously difficult. They are not written to
teach sed use or regexps to first-time users, but as a reference text
for those already acquainted with these tools.

QUOTING SYNTAX: The preceding examples use single quotes ('...')
instead of double quotes ("...") to enclose editing commands, since
sed is typically used on a Unix platform. Single quotes prevent the
Unix shell from intrepreting the dollar sign ($) and backquotes
(`...`), which are expanded by the shell if they are enclosed in
double quotes. Users of the "csh" shell and derivatives will also need
to quote the exclamation mark (!) with the backslash (i.e., \!) to
properly run the examples listed above, even within single quotes.
Versions of sed written for DOS invariably require double quotes
("...") instead of single quotes to enclose editing commands.

USE OF '\t' IN SED SCRIPTS: For clarity in documentation, we have used
the expression '\t' to indicate a tab character (0x09) in the scripts.
However, most versions of sed do not recognize the '\t' abbreviation,
so when typing these scripts from the command line, you should press
the TAB key instead. '\t' is supported as a regular expression
metacharacter in awk, perl, and HHsed, sedmod, and GNU sed v3.02.80.

VERSIONS OF SED: Versions of sed do differ, and some slight syntax
variation is to be expected. In particular, most do not support the
use of labels (:name) or branch instructions (b,t) within editing
commands, except at the end of those commands. We have used the syntax
which will be portable to most users of sed, even though the popular
GNU versions of sed allow a more succinct syntax. When the reader sees
a fairly long command such as this:

sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d

it is heartening to know that GNU sed will let you reduce it to:

sed '/AAA/b;/BBB/b;/CCC/b;d' # or even
sed '/AAA\|BBB\|CCC/b;d'

In addition, remember that while many versions of sed accept a command
like "/one/ s/RE1/RE2/", some do NOT allow "/one/! s/RE1/RE2/", which
contains space before the 's'. Omit the space when typing the command.

OPTIMIZING FOR SPEED: If execution speed needs to be increased (due to
large input files or slow processors or hard disks), substitution will
be executed more quickly if the "find" expression is specified before
giving the "s/.../.../" instruction. Thus:

sed 's/foo/bar/g' filename # standard replace command
sed '/foo/ s/foo/bar/g' filename # executes more quickly
sed '/foo/ s//bar/g' filename # shorthand sed syntax

On line selection or deletion in which you only need to output lines
from the first part of the file, a "quit" command (q) in the script
will drastically reduce processing time for large files. Thus:

sed -n '45,50p' filename # print line nos. 45-50 of a file
sed -n '51q;45,50p' filename # same, but executes much faster

If you have any additional scripts to contribute or if you find errors
in this document, please send e-mail to the compiler. Indicate the
version of sed you used, the operating system it was compiled for, and
the nature of the problem. Various scripts in this file were written
or contributed by:

Al Aab <af137@freenet.toronto.on.ca> # "seders" list moderator
Edgar Allen < era@sky.net> # various
Yiorgos Adamopoulos < adamo@softlab.ece.ntua.gr>
Dale Dougherty <dale@songline.com> # author of "sed & awk"
Carlos Duarte <cdua@algos.inesc.pt> # author of "do it with sed"
Eric Pement < pemente@northpark.edu> # author of this document
Ken Pizzini < ken@halcyon.com> # author of GNU sed v3.02
S.G. Ravenhall <stew.ravenhall@totalise.co.uk > # great de-html script
Greg Ubben <gsu@romulus.ncsc.mil> # many contributions & much help
-------------------------------------------------------------------------

2007年6月23日星期六

The Sieve of Eratosthene

求质数的筛法
The Sieve of Eratosthene

Some history

One of the oldest and simplest yet interesting algorithm that involves arbitrary large data is the sieve of Eratosthene.

Eratosthene (Cyrene circa 284 BC - Alexandria circa 192 BC) was a great greek mathematician. He was the very first to accurately compute the circumference of Earth, and he invented this clever algorithm to find prime numbers.

In the computing world, this algorithm was once used as a speed "benchmark" for languages and compilers (which is stupid both in the first case, because by inlining arbitrary many steps of the sieve, you could speed the whole thing arbitrarily anyway, and in the second case, as the sieve would eventually be specifically recognized by compilers to speed up benchmarks).

Here we use it not to benchmark speed, but to illustrate language expressiveness.

The Mathematical Idea

Some definitions:

  • A natural number m is a multiple of some natural number n if and only if there is a natural number k such that m=kn. We then say that n divides m , or that m is divisible by n, or that n is a divisor of m. We also call k the multiplicity factor.
  • A natural number is said to be prime if and only if it has precisely two distinct divisors, namely 1 and itself (which must then be different). Thus a natural number m is not a prime if and only if it is 1, or it is divisible by a natural number strictly greater than 1 and strictly lesser than m.

Then we have these preliminary lemmas:

  • The multiple of a multiple of a number is a multiple of this number, and conversely, the divisor of a divisor of a number is a divisor of this same number; a number is a multiple and divisor of itself, and is the only number so: divisibility defines a partial order on natural numbers.
  • All the multiples of a natural number, but itself and zero, are strictly greater than it: divisibility defines among non-null natural integers a partial order included in the natural order, while zero is greater than any other number with respect to divisibility.
  • Primes are thus minimal elements with respect to this order.
  • Any number has a prime divisor: consider all numbers from 2 to that number, and the least to divide it will be a prime (possibly itself).
  • Unless the number is prime, the square of its least prime divisor is not greater than the number itself: when you divide the number by the prime divisor, you obtain a new number stricly lesser than the original whose divisors will also divide the original one; hence the least one will be greater than or equal to the least prime divisor of the original number.

Thus here is the algorithm invented by Eratosthene:

  • To find all primes lesser than some large (well, not too large either) number N, just consider every number between 2 and N included, and eliminate all the multiples of each considered number. Those that remain will be the primes.
  • Now, if we consider numbers in their natural order, instead than some arbitrary other one, we can simplify things greatly: anytime we reach a number, either it is already marked as a multiple of some lesser number, thus is not a prime, and all its multiples are already marked as multiples of that lesser number; or it isn't marked yet, and cannot be marked later, because all further numbers will be strictly greater than it and thus not possibly divisors of it, thus is prime, and we shall mark all its multiples.
  • We also see that when a prime p is reached, all its multiples below its square are already marked, as the multiplicity factork being lesser than p, it will already be marked as a multiple of k's least prime divisor.
  • Thus, we can also stop marking when the square of the considered number is greater than N.
  • If we have some clever way to mark integers without any limit N, by remembering the list of found primes in their natural order instead of marking numbers from 2 to N, we can yield the indefinite list of primes (that we know as being infinite from Euclide).

Programming the sieve

The idea of the sieve is relatively simple; but even then, you can write infinitely many programs based on this idea. Existing computer languages would require you to rewrite each from scratch. But in the same way as a human doesn't re-learn the sieve from scratch each time he sees it, we can and should have computers understand it once and for all. Here is how we could do:

  • Take the program's most generic form, that should work in any context.
  • Tell the computer how to specialize this form into efficient ones according to the context.

The sieve program in various programming languages and paradigms

Lame C-language Version

char * sieve (int max)
{
int n, n2, m;
char * sieved ;

/* malloc'ing */
sieved = malloc(max) ;
assert(sieved) ;
memset(sieved,0,max) ;

/* loop */
for (n = 2, n2 = 4; n2<=max ; n2+=(n++<<1)+1 ) {
if (!sieved[n]) {
out(n) ;
for (m=n2; m <= max ;m+=n) sieved[m] = 1 ;
}
}
return sieved ;
}

Here are many reasons why C is a disadvantage, that we can see in this program:

  • In this example, the only parameter is the upper bound of a range to sieve to find which integers inside are primes. Instead, we could have used the very same algorithm to compute the indefinite list of prime numbers, a function that associates its lowest prime factor to each integer in a range, or an indefinite sieving function. But in C, these mean complete rewrite of the function.
  • The caller must know the low-level implementation of the result structure to use it.
  • Every type or function definition being global in C, object-oriented style is made difficult, as the programmer must manually resolve name clash, and/or add indirection levels; this wastes the programmer's time and leads to inefficient code. In C++, the problem is only partially solved, as classes are still global, statically bound, and never value-dependent.
  • The preprocessor could be used to increase the genericity of the function, but its use is heavy, very unpowerful, static, and especially horrible if multiple instances of the preprocessed function are to be used.
  • If you want time or space efficiency, you must explicitly inline the sieving function for a correctly-small quantity of small prime numbers needed to achieve efficiency, and pack/unpack data from the array. Furthermore, this will depend on the particular compilation target.
  • Allocating memory is painful; either the caller or the callee must do it. If the caller does it, it can sometimes be optimized, but is painful everytime. If the callee does it, it's painful only once (unless there's an exception mechanism, but this would be extremely painful), but it can't be optimized, and the caller must know the internals anyway, to have a consistent deallocation policy.
  • You must explicitly give the low-level implementation of arrays, lists, inlined code, allocation procedure. You can't let the system optimize them according to the way the function will actually be used.
  • In this example, we assumed that out() was a function to call whenever a prime number is detected. Using data-driven style in the program that uses this information would require either complete rewrite of the program, or explicit forking (or use of threads).

To conclude, it's impossible to express efficiently the generic algorithm in a one C program.

Side-Effecting Message-Passing Version

var n : int = 2 ;
sieved : int->bool = function x|->false ;
fun step () =
if not sieved (n) then
begin
out(n) ;
sieved := function x|-> (old sieved x) or (n divides x)
end ;
loop
begin
step () ;
increment(n)
end ;

Pure lazy functional version

let Eratosthene's_Sieve =
let rec step n sieved =
let next_num = n+1 in
if not (sieved num)
then
let new_sieved m = (sieved m) or (n divides m) in
cons_stream n (step next_num new_sieved)
else step next_num sieved in
prime_stream = step 2 (fun x->false)

"Natural language" version

Consider knowing a natural number n to not be a prime as being false.
Consider number two.
A sieve step is
if the considered natural number is not known to not be a prime,
then
say that the considered natural number is prime, and
consider knowing a natural number not to be a prime as
previously knowing that and the considered natural number
not dividing this one.
To sieve is to loop on the sieve step, each time considering the next number.
Recursively export the verb to sieve.

Less capable natural language interpreters may require a more thorough quoting, parenthesing, annotating and explicit typing than otherwise.

Variations on the sieve

Here are things we could tell the system so it can enhance the implementation of the sieve:

Efficiency annotations:

  • Eliminate multiples only from the square of the prime on, thus stopping when the square exceeds the numbers to sieve.
  • Use a well-known egregious identity to compute the square of a number together with it by doing only additions, not multiplications, and hardware where multiplication is much more expensive than additions.
  • Statically or dynamically limit (and change) the precision of integer computations as needed.
  • Use an array to implement the variable sieving function if a low enough bound is (locally) known to the sieve.
  • Otherwise, use the list of previously-known primes as a parameter for the sieving function instead of dynamically producing (pseudo- or real) code to test each new prime.
  • Arbitrarily-dynamic combinations of the former methods.
  • Expand the first steps of the sieve (i.e. inline primes up to 2,3,5, or 7, etc) to speed up the whole thing: typical programs only inline the first step to gain a factor two on space (when using arrays) and time; another factor two may be gained by inlining the four first steps instead; for growing zones, this factor can be dynamically increased, etc.
  • Fit the loops and buffers in instruction and data caches.

Input/Output annotations

  • The out() (or "say") input procedure defines what to do with a number known to be prime.
  • The function may be executed step-by-step, or prime-by-prime, or continuously as needed.
  • The sieve results may be buffered or not; the buffer may be shared or not between smaller or larger sets of programs and running instantiations of a same persistent interrupted program. If needed results are not available, a sieve would of course be re-run.
  • A proof that the sieve actually yields the prime numbers shall be included, exported, and this fact pointed to in the standard list of algorithms, so that this algorithm will be automatically used by whoever wants prime numbers without explicitly programming.


    10 Linux Shell Tricks You Don't Already Know. Really, we swear.

    10 Linux Shell Tricks You Don't Already Know. Really, we swear.

    Shell

    Yeah, I've read them too. Lists of shell tricks you already know - pstree (wow!) bc (bash already has built-in math), and a dozen commands you see in every Linux site, book, and training course.

    Here's a list of damn useful commands you haven't heard before.

    1. A Simple way to Send Output and Errors
    Want to send output and errors to the same file?

    command &> file

    Maybe you're troubleshooting a problematic app with strace, and want to see the system calls at the same time as the apps errors?

    strace badapp &> errors_and_output

    Benefit: Easy to remember, and simpler than 'send errors to output, and then send that to a file'
    Works in: any currently supported Linux

    2. Parallelize Your Loops
    Almost every Linux network administrator knows the power of the for loop: the way to do something for one, one hundred or one thousand users, files, machines, processes, or whatever else. Most people set their loops in sequence – so that each jobs is finished before moving onto the next.
    But the jobs command can be used to background each loop, so you don't have to wait for it to complete before continuing with the next.

    Here's an example running an apt-get update:

    for HOST in $(< ListOfHosts); do ssh $HOST 'sudo apt-get update' & done

    Maybe you need a bunch of SSH tunnels running simultaneously:

    for HOST in $(< ListOfHosts); do ssh -C -N -R 80:localhost:80 $HOST & done

    Sometimes you probably don't want to see all the output as it happens – in that case, save a file on each machine and use another loop to collect it later.

    Benefit: Saving a metric shitload (2/3rd of an imperial shitload) of time waiting on stuff to finish
    Works in: any currently supported Linux
    Drawbacks: Bash probably has some limits of the amount of concurrent jobs. But I've yet to run into them.

    3. Catch Memory Leaks By Using Top via Cron
    Memory apps are rare in Linux, but they do happen, particularly when using beta distros or home grown software. A lot of time the identitty of the app with a memory leak isn't that apparent. Linux has an Out-Of-Memory app built in to identify and kill these apps, but by the time it eventually kicks on your system may have been unusually slow for a while – and that's if your patience hasn't already worn thin and you've rebooted.

    The normal way to find an apps memory consumption is by running top (or one of it's graphical equivalents, like System Monitor) and check the Resident Set Size (called Res or RSS) of the processes you care about (you can ignore figures for how much memory the app has allocated – memory leaks come from usage, not allocation, and apps can assign bucketloads of memory they don't use without hurting your system). Most people aren't aware top can be run non-interactively, which means you can use cron and top to generate a simple report of an apps usage over time.

    • Run top.
    • Use the < and > keys until processes are sorted by RES (resident memory usage).
    • Hit W to write your config out to a file
    • Add a cron job:

      crontab - <<< '*/15 * * * * top -n 1 -b'

    You'll now get an email every 15 minutes with the top output.

    Benefit: way less complicated than adding software like SAR
    Works in: any currently supported Linux
    Drawbacks: Has some limitations of the amount of concurrent jobs.

    4. Standard in directly from the command line

    Wondering what the hell the <<< above was? Bash allows you to send programs stdin directly from the command line.

    Benefit: Let's you write your command on the goddamned commandline, even for weird creepy programs that want you to do everything via standard in. Shakes fist at MySQL.
    Works in: Bash 3 and newer.
    Drawbacks: Still quite a few Bash 2.x systems out there.

    5. Set a Random Initial Password, That Must be Changed
    There's a lot of organizations who have nice, secure policies for passwords. Passwords stored on Windows machines. Linux is either not covered by the policy or the policy is routinely violated – people have idea about Linux authentication (most people don't quite understand PAM, and Linux admins don't often realize Linux can quite happily authenticate to Active Directory) and once upon a time, the OpenSSH developers didn't like PAM (that's since changed).

    To set password that must be changed upon first login.

    umask u=rw,go=
    openssl rand -base64 6 | tee -a PasswordFile | passwd –stdin joe
    chage -d 0 joe

    The password is saved to PasswordFile , which only your own account can read. Then contact via some medium you consider relatively secure – like a phone call or encrypted email and them tell their initial password.

    Benefit: Ensures users aren't using your default password forever
    Works in: any currently supported Linux where OpenSSH has been updated (if your users use SSH to do their first login). Red Hat still say this doesn't work in the RHEL 3 / 4 documentation, but with their own updates applied, it's AOK.
    Drawbacks: None

    6. Add Your Public Key to Remote Machines the Easy Way
    In order to perform key based logins to a new machine, you need to get a copy of a public key to the remote machine yourself. Sure, you could do this manually – which gets a bit boring after a while (why doesn't SSH have an authorized_keys.d anyway?), but why waste time when SSH comes with it the tool to do it?
    Just run:

    ssh-copy-id -i .ssh/id_rsa.pub hostname

    After being prompted to enter your password for the last time, SSH will say:
    Now try logging into the machine, with "ssh 'hostname'", and check in:

    .ssh/authorized_keys

    to make sure we haven't added extra keys that you weren't expecting.

    Try it. No more passwords!

    7. Extract an RPM without any additional software
    This one isn't necessary on Debian based distros, as .deb are merely ar archives. Every Red Hat guide ever written mentions using rpm2cpio (which comes as part of the default rpm package), but frankly I can never be bothered remembering the weird syntax to cpio, the ancient archive format used by …uh, pretty much just rpm.

    The following command installs a package to a temporary directory, and but doesn't modify your RPM database (just one in the temporary directory, whose contents you can delete afterward). Since the temp directory doesn't have any othr software in it, we also disable dependencies and scripts.

    mkdir /tmp/deleteme

    rpm -ivh –root /tmp/deleteme –nodeps –noscripts package.rpm

    8. See How a File Has Changed from Factory Defaults
    This is a simple troubleshooting tool when you're not sure how a file has changed from its defaults. First identify the package that owns the file:

    dpkg -S /etc/foo/foo.conf

    rpm -qf /etc/foo/foo.conf

    Then extract the original package either with tar (DPkg) or the rpm trick above (RPM) and run:

    diff /etc/foo/foo.conf /tmp/deleteme/etc/foo/foo.conf

    And see the differences.
    Benefit: Faster troubleshooting of bad config files (note strace is also handy in these cases)
    Works in: any currently supported Linux
    Drawbacks: You have more time free at work, to spend reading Digg.

    9. Undo Your Network Screwups After You've Lost the Connection
    Messing with a firewall or a network over a remote connection? Nerve wracking isn't it? Change the wrong setting and you'll be locked out, unable to fix it.

    So why not undo your mistake? Schedule a job to run at a later time that undoes what you're about to do.

    at now + 5 minutes <<< 'cp /etc/ssh/sshd_config.old /etc/ssh/sshd_config; service sshd restart'

    If you screw up, the job will run and restore things to the way they were.

    If you your change works, then just run atq to check the queue of upcoming at jobs, and atrm jobNumber to remove it.

    Benefit: Gets you back in after you lock yourself out.
    Works in: any Linux provided atd is enabled - which is usually the case.
    Drawbacks: Remembering to do it before you make the risk change.

    10. Check a Port is Open

    Want to check whether a network service is running before you use it? Netcat can be used to easily connect to ports, and has a rather handy wait -w option to tell it how long to wait for.

    nc -w 3 server ssh <<< ' '

    Would connect to the ssh port on a machine called server, and wait for up to 3 seconds before sending it, er, nothing, and closing the connection. Whether the port was open will be reflected by nc exit status.

    if nc -w 3 localhost 22 <<< " &> /dev/null
    then
    echo 'Port is open'
    else
    echo 'Port is closed'
    fi

    Here's a few bonus tricks, which you may already know…

    Bonus Trick 1: The Easy Way to Extract Tar Archives

    New to Linux? Don't feel like using File Roller to extract archives? Despite everything you rad on the internet, tar (the tape archiver - that why you have to tell it you want to use a file with -f) doesn't need you to specify archive file formats any more. To extract a file, simply:

    tar -xf archive.tgz

    But keep reading…

    Benefits: No stupid messages from a computer asking for information the computer can determine on its own.
    Works in: Recent distros (last year) only
    Drawbacks: It's still to early to use this in a lot of distros.

    Bonus Trick 2: Use Math Shells

    Most of you already know this, but since digg.com keeps linking to articles about 'bc', it's worth pointing out that Bash comes with it's own math shells. These can be invoked like an ordinary subshell, but by using two round brackets. Say you have a n script that needs to figure out disk space.

    SIZE_IN_KB=204535848
    SIZE_IN_GB=$(( $SIZE_IN_KB / 1024 / 1024 ))
    echo $SIZE_IN_GB

    Benefit: No need for an extra process like bc every time you need to work with number
    Works in: any currently supported Linux.
    Drawbacks: If you want to do floating point or other advanced math, you'll probably want to bust out python.

    Bonus Trick 3: Never reboot a system for NFS failures

    OK, this isn't a shell trick. It's more of a general thing for NFS that not enough people know. We've included it here because VentureCake loves you.

    At some point every Linux admin has had a problem with a computer using a hard-mounted NFS export, where the connecton to the server has been lost - perhaps the network had a problem or the server went down. Any processes which check the status of filesystems - df, rpm, etc. - will hang, waiting on the storage to respond. Next time, you'll want to mount using the intr option (not soft - see the Linux NFS FAQ). This time, run:

    killall -KILL rpciod

    rpciod (the kernel process that handles NFS IO), will instantly respawn, sending errors to processes waiting for NFS IO, causing them to respond. If you're mounting exports from multiple NFS servers and only wish to time out a single connection, you can do so with:

    iptables -A OUTPUT -d nfsserver -j REJECT

    Within about a minute, the NFS client will decide the server is unreachable. Again, the processes start responding.

    You can now unmount the NFS server. No need to reboot.

    Benefit: No need to reboot when an NFS mount fails.
    Works in: any Linux.
    Drawbacks: You can't disable an individual NFS export, just all the exports from A particular NFS server. Still beats rebooting though.

    Bonus Trick 4: Encourage Others to Use & Contribute to Your Scripts

    If you want to improve your scripting skills, it pays to be kind to your peers.

    • Use self-explanatory uppercase names for your variables. In particular, this means not using 'i' as a variable name, so when your fellow scripter is eighty lines down your twelfth, nested, for loop, they don't have to scroll up and work out what the hell 'i' means now, when it's much easier to come to your house and kill you with an axe.
    • Keep your loops and conditionals indented
    • Putting your functions at the top of the script, and check input.

    Bonus Trick 5: Shell Sites That Don't Suck

    There's a lot of sites on Linux shell commands. Very few are Bash specific, so you'll be missing out on a lot of the good stuff. They also tend to be non-task oriented - if they need to show you grep, they'll show it using some weird thing about animals, rather than, say how to strip comments and blank lines from a file.

    grep -vE '^$|^#' /etc/foo.conf

    …by the way. Anyway, here’s a few of our personal favorites:

    Tips from an RHCE - Part of Red Hat Magazine, but useful even if you’re not into Red Hat.
    SHELLdorado - Not Linux specific, and a little out of date, but very practically oriented- lots of sample scripts you can pillage and plunder.
    Handy Sed One Liners - Another ancient document, but the examples covered in it show 99% of what you want to use sed for.



    2007年6月22日星期五

    X11record - an X11 capture program which creates an animated gif



    #!/bin/sh

    # x11record - an X11 capture program which creates an animated gif
    #
    # USAGE:
    #
    # % x11record foo.gif # creates animation GIF
    #
    # REQUIREMENTS:
    #
    # * ImageMagick - to create animation GIF movies
    # * xwd - to capture window's image
    # * xwininfo - to get window's id
    # * ungifsicle - to create GIFs
    # * awk - for arithmetic
    # * mktemp - for secure temporary files
    #
    # CREDIT: heavily based on the GPL-licensed 'x11rec'. I reimplemented this
    # in /bin/sh so it wouldn't have a ruby dependency.
    #
    #
    # Copyright (C) 2005 Benjamin Rutt < brutt@bloomington.in.us>
    # All rights reserved.
    # This is free software with ABSOLUTELY NO WARRANTY.
    #
    # You can redistribute it and/or modify it under the terms of
    # the GNU General Public License version 2.

    if [ $# -ne 1 ]; then
    echo "usage: `basename $0` <filename.gif>"
    exit 1
    fi

    requireutil () {
    while [ -n "$1" ]; do
    type $1 >/dev/null 2>/dev/null
    if [ $? -ne 0 ]; then
    echo Missing utility "$1". Please install it. >/dev/stderr
    exit 1
    fi
    shift
    done
    }

    requireutil grep mktemp xwininfo date awk convert xwd gifsicle

    GIFOUT=$1
    set -e
    TD=`mktemp -d /tmp/x11record.XXXXXX`
    set +e
    echo "** Select an X window to record with a mouse. **"
    ID=`xwininfo | grep "^xwininfo.*Window id:" | awk '{print $4}'`
    echo "Window ID is $ID"
    TSTART=`date +%s`
    echo "** Type Ctrl+C to finish the recording. **"
    FRAMES=0
    function x11recfinish {
    trap 2
    TSTOP=`date +%s`
    elapsed=`awk "BEGIN { print $TSTOP - $TSTART}"`
    echo
    echo "elapsed: $elapsed seconds"
    if [ $elapsed -eq 0 ]; then
    echo "ERROR: movie too short!"
    exit 1
    fi
    if [ $FRAMES -eq 0 ]; then
    echo "ERROR: no frames!"
    exit 1
    fi

    N=0
    for f in $TD/*.xwd; do
    convert $f $f.gif
    rm -f $f
    N=`expr $N + 1`
    awk "BEGIN { printf \"\rconverted ${N}/${FRAMES}\" }"
    done
    echo

    DELAY=`awk "BEGIN { printf \"%d\", ($elapsed / $FRAMES) * 100}"`
    gifsicle --colors 256 -O2 --delay $DELAY $TD/*.gif > $GIFOUT
    rm -fr $TD
    if [ $? -eq 0 ]; then
    echo $GIFOUT created
    exit 0
    else
    exit 1
    fi
    }

    trap "x11recfinish" 2
    echo
    while true; do
    xwd -silent -id $ID > $TD/$FRAMES
    mv $TD/$FRAMES `awk "BEGIN { printf \"$TD/%08d.xwd\", $FRAMES}"` >/dev/null 2>&1
    FRAMES=`expr $FRAMES + 1`
    awk "BEGIN { printf \"\r${FRAMES} frames captured\" }"
    done

    msf-abbrev.el

    msf-abbrev.el

    msf-abbrev.el

    Intro/Demo

    A package to manage many mode-specific abbrevs. At least, it started out that way. Now it seems to have evolved into a way to make certain programming tasks as easy as filling out forms. The best way to see how it works is watch the demo. (Feel free to use this demo wherever, however, you like, for any personal or professional purposes; consider it in the public domain). In the demo, I use the TAB key to navigate between fields, and the space bar to trigger expansion of the abbrevs after typing the abbrev name. For more info on abbrevs, look up the relevant Emacs documentation (look for abbrevs, not dynamic abbrevs). To those who are curious, I used the x11record script to create this demo.

    NOTE: I am aware of packages such as ELSE, tempo, skeleton, some of which do similar things. However, I believe this package does some things better than these other packages, but you'll have to decide for yourself. I tend to name my abbrev's ending in "x", as you can tell if you watched the demo; this is merely my convention. Each abbrev is put in a file, in a special directory, which names the mode. The parent directory of the directories containing a bunch of directories naming each mode is loaded using the command (e.g.):

    (msf-abbrev-load-tree "~/emacs/mode-abbrevs")

    where my ~/emacs/mode-abbrevs dir looks like:
     drwx------   2 brutt brutt 4096 May  9 15:00 LaTeX-mode
    drwx------ 2 brutt brutt 4096 May 9 15:00 TeX-mode
    drwx------ 2 brutt brutt 4096 Jun 5 12:42 c++-mode
    drwx------ 2 brutt brutt 4096 Jun 5 12:19 c-mode
    drwx------ 2 brutt brutt 4096 Sep 16 2004 cperl-mode
    drwx------ 2 brutt brutt 4096 Oct 6 2004 global
    drwx------ 2 brutt brutt 4096 Oct 6 2004 java-mode
    drwx------ 2 brutt brutt 4096 Oct 6 2004 jde-mode
    drwx------ 2 brutt brutt 4096 Sep 16 2004 perl-mode
    drwx------ 2 brutt brutt 4096 Jun 3 17:25 python-mode
    drwx------ 2 brutt brutt 4096 Jun 3 17:25 sh-mode
    drwx------ 2 brutt brutt 4096 Jun 3 17:25 shell-mode
    and my ~/emacs/mode-abbrevs/c++-mode dir looks like:
     /home/brutt/emacs/mode-abbrevs/c++-mode:
    total used in directory 424 available 4092320
    drwx------ 2 brutt brutt 4096 Jun 5 12:42 .
    drwx------ 13 brutt brutt 4096 Mar 30 14:04 ..
    -rw------- 1 brutt brutt 207 Jun 5 12:19 fclosex
    -rw------- 1 brutt brutt 244 Jun 5 12:19 fopenx
    -rw------- 1 brutt brutt 259 Jun 5 12:19 fwritex
    ...
    This scheme makes it really easy to add new abbrevs that are active in certain modes, or modify them without writing elisp to do so. I think this makes abbrevs more useful. Each abbrev file expands with a special syntax, enabling tasks such as user input via the minibuffer, filling out "forms" (navigating using the TAB key) and leaving the cursor at a specific place. Somewhat of a cross between a form widget toolkit and using skeletons.

    File Syntax

    By default, the file contents will be inserted. However, the following special forms are recognized:
        <cursor> leaves the cursor there at the end

    <varlookup "user-mail-address"> inserts the elisp value of the expression in quotes

    <elisp "(insert (current-time-string))"> executes a snippet of elisp at that point in the file

    <query "what color? "> asks the user "what color?" and replaces with the response. Any question can be
    asked here and will be prompted in the minibuffer. Multiple <query "what color? ">
    terms will result in a single question but multiple identical replacements.

    <field "foo"> <-- creates a "form" field called foo, which can be
    TAB-navigated and shift-TAB-navigated

    <choose><choice "foo"><choice "bar"></choose> creates a combo box, press enter and then
    arrow keys to make a choice

    <endpoint> marks the end of a form (i.e. the last point to TAB to) of a set of "field" or "choose" tags

    <comment "blah blah"> will be removed altogether
    As an example, here is my abbrev file fopenx for the 'fopen' C library command:
        if ((f = fopen(<field "char * filename">, <choose><choice ""r""><choice ""w""><choice ""a""><choice ""r+""></choose>)) == NULL)
    {
    std::cerr << "ERROR: opening file"
    << " at " << __FILE__ << ":" << __LINE__
    << std::endl << std::flush;
    exit(1);
    }
    <endpoint>
    Also, if you name your file 'fopenx.el' instead of 'fopenx', then the file will be assumed to be an elisp file which will be evaluated to do arbitrary things (like inserting current date, etc.), instead of the contents being inserted directly.

    Finally, if you (e.g.) have a file 'fwritex_' in addition to 'fwrite', the underscore-trailing file can be used to transform the expansion depending on current in-buffer context (e.g. as provided by semantic/cedet). Look at my 'fwritex_' for an example.

    Symlinks

    Symlinks can be used to link together mode directories (i.e. c-mode, c++-mode, etc.) if that is your liking. However, they don't work on Windows. So the following solution is now implemented. (Feel free to keep using symlinks however if you're only on unix platforms). Create empty ".aliases." files as follows instead of symlinks, to indicate what aliases what. i.e. if mode B should use mode A's abbrevs, create empty file "B-mode.aliases.A-mode" in the mode-abbrevs directory. More use of the filesystem-as-data-structure. :)
          $ ls -al ~/emacs/mode-abbrevs
    total 72
    drwx------ 2 rutt rutt 4096 Jun 15 09:56 LaTeX-mode
    -rw-rw---- 1 rutt rutt 0 Aug 9 09:12 TeX-mode.aliases.LaTeX-mode
    -rw-rw---- 1 rutt rutt 0 Aug 9 08:45 c++- mode.aliases.c-mode
    drwx------ 2 rutt rutt 28672 Jul 3 13:45 c-mode
    -rw-rw---- 1 rutt rutt 0 Aug 9 09:12 cperl-mode.aliases.perl-mode
    drwxrwx--- 2 rutt rutt 4096 May 17 15:10 emacs-lisp-mode
    drwxrwx--- 2 rutt rutt 4096 Jul 13 2005 fortran-mode
    drwxrwx--- 2 rutt rutt 4096 Aug 9 08:24 global
    drwx------ 2 rutt rutt 4096 Jul 26 07:52 java-mode
    -rw-rw---- 1 rutt rutt 0 Aug 9 09:13 jde-mode.aliases.java-mode
    drwxrwx--- 2 rutt rutt 4096 Sep 18 2005 lisp-interaction-mode
    drwxrwx--- 2 rutt rutt 4096 Jun 20 08:45 message-mode
    drwx------ 2 rutt rutt 4096 Jul 13 2005 perl-mode
    drwx------ 2 rutt rutt 4096 Aug 8 14:46 python-mode
    drwx------ 2 rutt rutt 4096 May 24 16:49 sh-mode
    drwx------ 2 rutt rutt 4096 Jul 13 2005 shell-mode

    Requirements

    GNU Emacs 22 (or a emacs from CVS from 2005 or later) is required. Emacs 21.3 is known not to work.
    Concerning XEmacs, although it earlier was reported that it works, I find that it does not; the low-level details of text properties are too different between the two versions that I cannot support XEmacs at this time.

    Downloads

    msf-abbrev.el 1.0beta3

    My own abbrev definitions version 1.0beta3, for example purposes (updated on Wed Aug 9 2006)

    Usage

    Initialization is somewhat of a problem, as all modes needed in the abbrevs you define must be loaded before msf-abbrev loading is done. I hope to fix this somehow. But for now, I initialize like the following.
    First, place msf-abbrev.el in your load-path. e.g. in your ~/.emacs:
    (add-to-list 'load-path "~/directory/where/you/downloaded/msf-abbrev.el/file")
    Then, I do the following in my ~/.emacs:
    ;; ensure abbrev mode is always on
    (setq-default abbrev-mode t)

    ;; do not bug me about saving my abbreviations
    (setq save-abbrevs nil)

    ;; load up modes I use
    (require 'cc-mode)
    (require 'perl-mode)
    (require 'cperl-mode)
    (require 'sh-script)
    (require 'shell)
    (require 'tex-site) ;; I use AUCTeX
    (require 'latex) ;; needed to define LaTeX-mode-hook under AUCTeX
    (require 'tex) ;; needed to define TeX-mode-hook under AUCTeX
    ;; (require 'python) ;; I use python.el from Emacs CVS, uncomment if you do also

    ;; load up abbrevs for these modes
    (require 'msf-abbrev)
    (setq msf-abbrev-verbose t) ;; optional
    (setq msf-abbrev-root "~/emacs/mode-abbrevs")
    (global-set-key (kbd "C-c l") 'msf-abbrev-goto-root)
    (global-set-key (kbd "C-c a") 'msf-abbrev-define-new-abbrev-this-mode)
    (msf-abbrev-load)
    Whenever I define a new abbrev, it is loaded automatically when I save the file, thanks to an after-save-hook. To define a new abbrev in a mode, I just do C-c a and it will prompt me for the abbrev name (as seen in the demo). The whole process is now streamlined (compared to previous versions). (NOTE: for modes which do not have an abbreviation table (such as emacs-lisp-mode or AUCTeX), after defining a new abbrev, you still need to restart Emacs or re-visit any of the open files in that mode to cause the changes to take effect).

    User-Contributed abbrevs

    In case you come up with some abbreviations you think are useful, send them to me and I will post them here.

    Nickolay Savchenko has contributed some c-mode abbrevs by parsing glibc function index ('info glibc'). This should contain most of the C library functions. The actual abbrevs (with c-mode directory) are available here and the index of functions he used to build these abbrevs is available here. Thanks Nickolay! I have integrated some of his functions into my own abbrev files (I eliminated those files containing '_' as abbrevs in emacs cannot contain a '_').

    Known bugs

    • some people have problems with large unrelated-to-msf-abbrev yanks (since this package defadvices yank). if you can reproduce this, let me know. I believe it's now fixed in version 1.0beta1, but would like to hear counterexamples.
    • Sometimes, the logic to ressurect a deleted form field breaks when the whole form is deleted, and then new text is inserted in its place; the symptom of this is that you get highlighted text where you did not expect it, while inputting new text. If this happens, you should be able to eliminate the form entirely with the key binding M-RET while on a highlighted field. This key binding is active normally anyway, but usually only needed when this particular bug shows up.
    • One Mac user had success using this package with Carbon Emacs but under Aquamacs (a different Mac OS X port) the LaTeX-mode (AUCTeX) abbrevs didn't load while the other modes did. The user reported the following solution:
      The problem is associated with HFS+ (the Mac OS X filesystem).
      It doesn't handle case (!!!). All the unix tools on Mac OS X
      do so you don't really notice (tab-completion with bash is
      case-sensitive but trying to create two files whose names only
      differ in case fails).

      So now I renamed all the mode-dirs to the proper lowercase version
      and now it works. Weird, though, because opening "LaTeX-mode" should
      find "latex-mode" as well if the file system is not case sensitive.

    Bug Reports/Comments

    Send these to rutt.4 AT osu.edu (include "msf-abbrev" in the subject). NOTE: my old email address was deactivated sometime in early 2007, so I'm sorry if you sent mail to me which bounced, please try again now with the above.

    2007年6月21日星期四

    Benjamin Rutt's Emacs C development tips

    Benjamin Rutt's Emacs C development tips

    Benjamin Rutt's Emacs C development tips

    Welcome

    So you are developing in C (or C++, or java) in emacs. This web page will hopefully make you more productive at doing just that. It is based on my experience of using GNU Emacs as a development environment for C and syntactically similar languages since 1997. I'm assuming you use a relatively recent GNU Emacs (preferably version 21.x) on a UNIX-like machine. Most of my tips will probably also work in XEmacs in some other development environment, but your mileage may vary.

    Preliminaries

    You should add any customization code to your ~/.emacs. If you're doing any C customization at all, add a
    (require 'cc-mode)
    to your ~/.emacs. If you'd like syntax colorization, add a
    (global-font-lock-mode 1)
    to your ~/.emacs.

    Compiling/linking your project

    Emacs makes it easy to get your project to build, by providing a means to move the cursor (known as the "point" in emacs) to the locus of compilation errors. Simply run M-x compile, and then type in the name of your compilation command (this is probably "make" or something similar). If there are any errors, you can press the key sequence C-x ` to visit the first error found. You can then press C-x ` again to visit the second error found, etc. To re-start the sequence of visiting errors (i.e. to visit the first one again), use the key sequence C-u C-x ` which will take you back to the first error. At one point, I got tired of typing M-x compile, so I bound it to a key, in particular F9 (Borland JBuilder trained me that this key is for building and also keys F5-F9 are user-assignable in emacs). Here's how I did that:
    (global-set-key [(f9)] 'compile)
    Now, pressing F9 ENTER starts a build. Or, you can adjust the build command in between pressing F9 and ENTER.

    Better management of the compilation window

    I don't like it that the compilation window takes up 1/2 of the frame by default. So, I use the following, which works well most of the time:
    (setq compilation-window-height 8)
    I also don't like that the compilation window sticks around after a successful compile. After all, most of the time, all I care about is that the compile completed cleanly. Here's how I make the compilation window go away, only if there was no compilation errors:
    (setq compilation-finish-function
    (lambda (buf str)

    (if (string-match "exited abnormally" str)

    ;;there were errors
    (message "compilation errors, press C-x ` to visit")

    ;;no errors, make the compilation window go away in 0.5 seconds
    (run-at-time 0.5 nil 'delete-windows-on buf)
    (message "NO COMPILATION ERRORS!"))))

    Enabling hungry delete

    Simply add the line (c-toggle-hungry-state 1) to your ~/.emacs, and you will enable "hungry" delete, which means that all whitespace around the cursor will be consumed when you press Backspace or C-d. Try it out, it will probably save you some keystrokes.

    Tabs vs. spaces for indentation

    It's easy to indent or re-indent current lines when doing C development, just press the TAB key. The problem comes when you want to share your code with others. By default, emacs uses a mixture of tabs and spaces for indentation, since tabs save bytes (by representing multiple units of indentation with a single byte), and spaces are important for alignment. Not all editors use this strategy however, and may barf on code that is indented using this strategy. I personally like to use all-spaces for indentation and no TABs, but sometimes it's not your call how to indent your files. Here's how to play nicely with e.g coworkers who use different editors.

    If you need to use only TABs for indentation

    You can achieve this in C/C++/java modes with the following in your ~/.emacs:
    (require 'cc-mode)
    (defun my-build-tab-stop-list (width)
    (let ((num-tab-stops (/ 80 width))
    (counter 1)
    (ls nil))
    (while (<= counter num-tab-stops)
    (setq ls (cons (* width counter) ls))
    (setq counter (1+ counter)))
    (set (make-local-variable 'tab-stop-list) (nreverse ls))))
    (defun my-c-mode-common-hook ()
    (setq tab-width 5) ;; change this to taste, this is what K&R uses :)
    (my-build-tab-stop-list tab-width)
    (setq c-basic-offset tab-width))
    (add-hook 'c-mode-common-hook 'my-c-mode-common-hook)
    By redefining the tab-stop-list in each C (or similar) buffer inside a hook which will be executed as each C buffer is opened, indentations should arrive at tab stops, obviating the need for space-padding of any kind. This will not work correctly, however, if you are trying to lineup arglists, etc. (e.g. (c-set-offset 'arglist-cont-nonempty 'c-lineup-arglist)). That's a fundamental problem...how do you align code with TABS to arbitrary columns, without using spaces? Therefore, I recommend using only spaces for indentation instead...

    If you need to use only spaces for indentation

    You can achieve this in C/C++/java modes with the following in your ~/.emacs:
    (require 'cc-mode)
    (defun my-build-tab-stop-list (width)
    (let ((num-tab-stops (/ 80 width))
    (counter 1)
    (ls nil))
    (while (<= counter num-tab-stops)
    (setq ls (cons (* width counter) ls))
    (setq counter (1+ counter)))
    (set (make-local-variable 'tab-stop-list) (nreverse ls))))
    (defun my-c-mode-common-hook ()
    (setq tab-width 5) ;; change this to taste, this is what K&R uses :)
    (my-build-tab-stop-list tab-width)
    (setq c-basic-offset tab-width)
    (setq indent-tabs-mode nil)) ;; force only spaces for indentation
    (add-hook 'c-mode-common-hook 'my-c-mode-common-hook)
    By adding the line with the comment "force only spaces for indentation" to the earlier example, all indentations of new C code will use spaces for indentation. This will make life easier, as your code will look the same in all editors and you can explore using spaces for lining up argument lists, etc.

    Customizing indentation

    First of all, put (require 'cc-mode) atop the C customization section of your ~/.emacs, so the methods below will be defined. Then, while visiting a C source file, if you don't like how a particular line is indenting, press C-c C-o near the expression you want to change, and figure out the symbol to change (pressing C-c C-o gives you a chance to set these variables interactively). Then, do a (c-set-offset 'symbol-to-change X) in your ~/.emacs where X is typically a numeric value or the symbol '+ or '-. The following is an example of some of my customizations, which is simply my personal preference:
    ;; personal preferences
    (c-set-offset 'substatement-open 0)
    (c-set-offset 'case-label '+)
    (c-set-offset 'arglist-cont-nonempty '+)
    (c-set-offset 'arglist-intro '+)
    (c-set-offset 'topmost-intro-cont '+)

    Debugging in emacs

    M-x gdb (or M-x jdb) is very useful to debug inside emacs. Try it, it will split your frame into two windows, with the gdb interaction buffer in the top window and the current source line (if the program is being executed) displayed in the bottom window. I typically use the following sequence:
    1. type M-x gdb. Enter the executable name to run, e.g. ./foobar
    2. in the gdb interaction buffer, type "break main" and "run"
    3. to use the debugger to execute the current statement and go to the next one, use C-c C-n
    4. to use the debugger to step into the current statement, use C-c C-s
    You can still use all the power of gdb that you could from any shell prompt, but the arrow following the current source line is value added by emacs, providing debugger state without requiring a lot of gdb list statements.

    Indexing your source code for easy navigation

    Emacs comes with an etags executable (and also probably its cousin ctags), which can index source code, allowing you to jump around your source code quickly by simply giving a method name to jump to, etc. Here are some simple steps to get you started:
    1. from a shell, run etags *.[ch] in your source directory, which will index or re-index the .c and .h files in your source directory, and will create a file named TAGS.
    2. inside emacs, run M-x visit-tags-table RET and then enter the name of the TAGS file created by the previous step.
    3. for example, to visit a function named "sortRecords", type M-. and then enter the name "sortRecords". Or, better yet, enter just "sortRec", as substrings will work as well, as long as they are unique.
    4. did your press of M-. not put you in the right function/variable? Then try pressing C-u M-. to visit the next function/variable with a similar name.
    5. do you want to go back to where you were before the last M-. or C-u M-.? Then simply to M-*, each press of which will pop a stack and return you to where you came from.

    2007年6月16日星期六

    让gcc检查变参函数的参数类型

    Kxn's eXercise Notes » 让gcc检查变参函数的参数类型

    gcc 可以检查 printf 的参数,对于非基本类型的参数,会给个warning, 但是自己实现的变参函数gcc不做检查,这个就需要利用gcc的扩展,加个attribute, 类似下面的

    void xlog(LFD *const lfd, const char *fmt, …)
    #ifdef __GNUC__
    __attribute__ ((format (printf, 2, 3)))

    Five great games in Fedora

    Red Hat Magazine | Five great games in Fedora

    Fedora里面的游戏
    1.wormux
    2.Ballz
    3.Alex the Alligator 4
    4.Frozen Bubble
    5.Blob Wars

    都没玩过

    从gcc静态链接开始的讨论

    Kxn’s eXercise Notes » 从gcc静态链接开始的讨论,学了好几手

    2007年6月8日星期五

    C++ template specialisation help

     
    We have a template:

    template<class Type>
    class t_type : public BaseNode.

    In this template, we've defined generic behaviours for amany functions..


    Now, we're defined a specialisation, so we can control how it behaves with these types:


    template<>
    class t_type<mpz_class> : public BaseNode

    We also have:


    template<>
    class t_type<mpf_class> : public BaseNode


    Note: mpf and mpz are classes from GNU MP. (Multi Precision library, allows arbituary sized #'s.


    Now, I have a function in the mpz specialisation, that must return a new instance of the mpf specialisation. Note that the mpf specialisation is declared below the mpz's , and NOR does a forward declaration help.


    When i go to compile said function, i get:

    my_type.h:2009: error: specialization of
    #t_type<__gmp_expr<__gmpf_value, __gmpf_value> ># after instantiation
    my_type.h:2009: error: redefinition of #class
    t_type<__gmp_expr<__gmpf_value, __gmpf_value> >#
    my_type.h:43: error: previous definition of #class
    t_type<__gmp_expr<__gmpf_value, __gmpf_value> >#


    what the heck? The line it is pointing to, for the redefinition, is the 'return new' statement.

    Obviously: its because I am trying to instanciate something i have specialised on below it.
    So, move it up?

    Cant: I need to be able to create new instances of all specialised classes.

    Possible solution:
    forward declaration: still yeilds same result.

    Note: the generic class should in itself, act as the 'header' for the function, and indeed it does, as it the 'redefinition complaint' shows.

    How can i get it to allow me to instantiate it?


    Rationale for creating the new instances: As you can see, the classes all inherit from the class BaseNode, BaseNode is used to attach these to a tree. We need to be able to attach different classes to a tree, each with the same functions, and have them behave differently. (Each specialisation has functions that will tell us about it, and also, convert between different types).

    -----
    Remember kids, they aren't H games, they're "Bishoujo Games."

     

    2007年6月7日星期四

    我只想安安静静的学点东西,做点东西,不要在我耳边喋喋不休,我不想听。
    我也不想热闹。有个我想聊的人安安静静的陪我聊天就足够了。
    我现在别无它求。
    我有很多事要做,我需要时间。

    picture in picture and osd

    Picture-in-Picture

    The Picture-in-Picture display (PIP) is an option available on high-end TVs. By pressing a button on the remote control, a second picture, (coming from another connected source of video images,) will be displayed as a small inset in the corner of the main TV picture. The source could be a video recorder, a laserdisc player or a security camera. Because a video recorder has a built-in TV tuner, it is possible to display a second channel as a PIP-image. This enables you to watch one program, while keeping an eye on another. A PIP-image can be changed in size and moved about the screen. The inset picture can also be made the main picture and vice versa.


    On-Screen Display (OSD)

    By pressing the menu key on the remote control, a menu card appears on the screen. It enables you to make choices by simply entering a number, using the arrows or the +/- key. These choices cover TV installation, pre-defined channels, image and sound settings, and various special applications. This information can be retrieved on screen when watching a program. This digital feature is called OSD, On-Screen Display.



    Connections

    Home video equipment can be connected in various ways. TV sets can be connected to VCRs, CD-i players, audio systems, cable antennae, etc. But some connections perform better than others. Much depends on at which level the signal is transferred.

    Level of Signal Processing:
    HF
    CVBS
    Y/C
    RGB

    Connectors:
    antenna socket- HF
    Cinch - CVBS
    S-VHS - Y/C
    SCART - general
    http://repairfaq.cis.upenn.edu/sam/icets/connect.htm

    2007年6月6日星期三

    od-显示文件的实际内容

    od  -c file
    以字符方式显示file文件的内容。
    详细内容见man

    2007年6月5日星期二

    etags tips

    emacs中与TAGS文件有关的两个命令。
    • Type M-x locate RET TAGS RET and Emacs will list for you the full path names of all your TAGS files.
    • f the tags table you want has been created, you can use the M-x visit-tags-table command to specify it. Otherwise, you will need to create the tag table yourself and then use M-x visit-tags-table.

    basename -display file name component of path name

    basename name [suffix]

    从路径名中提取出文件名。
    例如:basename /usr/bin/sort
    Output "sort".

    basename include/stdio.h .h
    Output "stdio".

    basename strips off the leading part of a path name, leaving only the
    final component of the name, which is assumed to be the file name. To
    accomplish this, basename first checks to see if name consists of
    nothing but slash (/) or backslash (\) characters. If so, basename
    replaces name with a single slash and the process is complete. If not,
    basename removes any trailing slashes. If slashes still remain,
    basename strips off all leading characters up to and including the
    final slash. Finally, if you specify suffix and the remaining portion
    of name contains a suffix which matches suffix, basename removes that
    suffix.

    basename /usr/bin/sort
    Output "sort".
    basename include/stdio.h .h
    Output "stdio".

    Possible exit status values are:
    0
    Successful completion.
    1
    Failure due to any of the following:
    ― unknown command line option
    ― incorrect number of arguments

    2007年6月3日星期日

    6月3号

    为什么找我说呢?
    我还是不知道的好。
    我闲的吗,真是的。

    2007年6月2日星期六

    POJ各题算法分类和题目推荐

    POJ各题算法分类和题目推荐 - 路雪军 Carl - CSDNBlog
    POJ上一些算法题的分类。
    想对算法研究下,但心有余而力不足。
    google baidu的面试都很重算法。

    6月2号

    在前面一篇日记中忘记了说一件事情,在水木上frostyblade的回帖。
    大意就是写程序要落笔就成,没有错误,不需要调试就可以运行,不要依赖各种IDE来查错,这样在面试中会有很大优势。
    而这个问题主要是在写程序前缺乏对整个程序整体的认识,只是修修改改,然后再试,这是一个错误的编程思想。
    我很同意这个观点,但还没达到这种境界,对于编程,我只是初涉,对于很多东西都还是不懂的。要学习很多。

    今天上午本科答辩,马马虎虎,讲的马马虎虎,问的问题答的马马虎虎,希望老师手下留情,不要给分太低啊。

    在google 黑板报上有篇文章,"对学生朋友的建议",google中国工程研究院的 工程师 方坤写的,说的是在招聘过程中学生的问题,
    实站,缺乏实站,是现今学生的通病,我也一样,所以读个研究生老缓冲一下,补充实践经验。

    "于一个未来的软件工程师来说,实际编程经验是相当重要的。我们会要求应聘者在紧凑的时间内编写大量的代码,从中考察应聘者的分析能力,编码速度,代码可读 性,对所用编程语言的掌握程度,对边界条件与异常状况的处理,数据结构与函数接口的定义,程序运行的效率和应聘者查错纠错的能力等等"

    "我记得开复对于青年学子们有一个建议,大学四年,至少要编写 10 万行代码。不是每一个人都期望进入谷歌这样的顶级技术公司,但即使对开复的建议打个三折,也还有 3 万行呢,不努力,能行嘛。前来应聘的学生们在编写代码时暴露出这样那样的问题,大都可以归结到同一个原因:锻炼太少。比如"for (int i = 0; i < strlen(s); ++i)",没有实战经验的人,怎么可能意识到暗藏其间的效率陷阱。再比如内存泄漏,就和初恋一样,没有亲身经历过的人,不会有刻骨铭心的感受,而一旦经 历,终身难忘,根本用不着你有多聪明。遗憾的是,我看到许多相当聪颖的学生写出来的代码只能用惨不忍睹来形容,真让我怀疑这是不是就是他们的第一次。大家 不要怪谷歌要求高,恕我直言,如果不能持之以恒,下点儿苦功夫,不光谷歌一家,其它公司恐怕也是进不去的。"

    for (int i = 0; i < strlen(s); ++i),这句话什么问题呢,初一看貌似可以,我第一眼看,感觉这样没错啊,可以跑啊。但是忘记了一点,效率,在每次循环我们都要调用strlen,多么耗时啊,完全可以将其先行求出,再循环。这就是经验的问题了。

    内存泄漏的问题,我感觉在我们项目中存在很多,编程经验啊。才写过几行程序啊。10万行代码,我一年能写这么多吗,一年365天,每天要写300多行啊。

    "前不久我在西安遇到一个学生,他半年来一直坚持在北京大学的 ACM 网站上参赛、做题,我看他写出来的程序就确实比他大多数同学都要好一些。我相信,只要他能够持之以恒,还会取得更大的进步。我听说浙江大学的 ACM 网站、TopCoder 网站也都是不错的教育资源,感兴趣的同学不妨一看;虽说做竞赛题距离真正的软件开发还有着显著差别,但也不失为一个不错的出发点。"

    我要向这位同学学习,ACM,TopCoder,都要做。

    "到了这个阶段,我推荐大家读一些经典的进阶书籍,例如《Effective C++》、《Effective Java》等,即使地处偏远地区,也可通过网上书店买到。(我建议编码量太少的同学就先不要读了,会走火入魔的。我见过有学生连引用和指针都没搞清楚,就 在那里重载操作符的。)交流也很重要,如果能够与网上网下志同道合的朋友互相帮助,共同进步,当收事半功倍之效。"

    我算编码量少的吗,会走火入魔吗?

    "一家外包公司或许会满足于雇佣仅能从事简单的、机械性的重复劳动的软件蓝领,而让谷歌苦苦寻觅的,乃是最优秀的软件工程师兼计算机科学家――是的,在谷歌,研究与开发融为一体,软件工程师与计算机科学家当然也合二为一。"

    我不是CS科班出身,怎么办呢

    "遍历一个数组或链表的时间复杂度是多少?对于这样一个不是问题的问题,竟然各地都有相当数量的学生回答说是 O(logN)!有一次我实在忍不住了,提示应聘的学生说:"你是如何理解'遍历'一词的涵义呢?"他立刻做恍然大悟状,回答说:"哦,对,应该是 O(NlogN)"。"

    这个问题怎么回事?遍历?不就是走一遍吗,O(N)不可以吗,O(logN),O(NlogN)怎么回事呢?

    诚信为本

    "试中偶尔也能遇到诚信堪忧的学生。有一次我出了一道题,前来应聘的学生明明以前见过这道题,却告诉我说没见过,自以为得计,可他那一纵即逝的狡黠一笑哪里 逃得过我洞若观火的锐利眼神。大哥,你又不是专业演员,为什么要玩这样的花招?我每年面试的求职者在百人以上,捣鬼是过不了我这一关的。两点之间直线最 短,说真话最简单。"