Retroelements

Posted on 2016-01-03 | In molecular biology | Comments: | Views: ℃

| Words count in article: | Reading time ≈

Introduction

Retroelements are mobile genetic elements (MGEs) that retrotranspose via a RNA intermediate that is reverse-transcribed to DNA by the encoded reverse transcriptase and integrated into a new location within the host genome by an integrase enzyme. They have been found among different organisms from bacteria to humans and often constitute a significant part of genomes, particularly in higher plants and fungi. Various retroelements with different gene organizations and replicative mechanisms have evolved in the course of evolution. Although in most cases they have no effect on the host organism, there are many examples of mutations caused by retroelements resulting in various diseases. However co-adaptation has led in some cases to the use of retroelements for essential and beneficial host functions.

Retroelements (retrotransposons and retroviruses) can currently be divided into four systems or groups commonly known as the long terminal repeat (LTR) retroelements, the non-LTR retroelements, the tyrosine recombinase (YR) retroelements and the Penelope retrotransposons (Eickbush and Jamburuthugoda 2008).

LTR retroelements

These include the broad range of LTR retrotransposons and retroviruses circulating in plants, fungi and animals. A full-lenght consensus LTR retroelement genome is characterized by an internal translating region (gag and pol genes, and env gene when is present) flanked by long terminal repeats (LTRs). They can be classified into four major groups or families based on sequence similarity and other features known as the Ty1/Copia, the Ty3/Gypsy, the Bel/Pao, and the Retroviridae families.

补充材料

Gag is a polyprotein and is an acronym for Group Antigens (ag).
Pol is the reverse transcriptase.
Env in the envelope protein.
The group antigens form the viral core structure, RNA genome binding proteins, and are the major proteins comprising the nucleoprotein core particle. Reverse transcriptase is the essential enzyme that carries out the reverse transcription process that take the RNA genome to a double-stranded DNA preintegrate form. The reverse transcriptase gene also encodes an Integrase activity and an RNase H activity that functions during genome reverse transscription.

Non-LTR retroelements

These constitute a system of retrotransposons widely distributed in eukaryotes, which do not present LTRs or terminal repeats; non-LTR retrotransposons end most frequently with a poly(A) tail at their 3′ end, while their 5′ end often contains variable deletions (5′ truncations). Depending on their capability to transpose or not autonomously these elements are classified as autonomous and non autonomous retroelements, respectively.

Autonomous non-LTR retroelements. On the basis of their molecular structures the autonomous non-LTR retroelements have been grouped in two major classes:

R2 elements : these constitute one of the most studied families of non-LTR retroelements (Eickbush 2002). They encode for a single ORF with a central RT domain and an endonuclease (EN) conserved domain at the C-terminus.

Long INterspersed repetitive Elements (LINEs) : these are a family of 6-8 kb long elements encoding for two ORFs. The first ORF shows similarity to retroviral gags. The second ORF encodes for a pol polyprotein displaying RT and apurinic-apyrimidinic endonuclease (APE) domains. However, some lineages additionally include a protein domain of unknown function at the C-terminus and in other cases an RNase H (RH) downstream of the RT domain as usual in LTR retrotransposons (Eickbush and Jamburuthugoda 2008). LINEs are widespread in mammals (Moran and Gilbert 2002).

Non autonomous non-LTR retroelements. These are DNA sequences of 80 to 630 bp known as Short INterspersed repetitive Elements (SINEs). They represent reverse-transcribed RNA molecules originally transcribed by RNA polymerase III into tRNA, rRNA, and other small nuclear RNAs (Malik and Eickbush 1998). Like the LINEs, the SINEs end by a poly(A) tail or by A- or T-rich sequences, their 5’ and 3’ ends reveal similarities to tRNA genes (or, as shown for some animal SINEs, to 7SL RNA gene) and to the 3’ end of LINEs, respectively (Oshima et al. 1996). Two conserved sequence motifs found in the SINE tRNA-like part, called box A and box B, show homology to RNA polymerase III promoters. SINEs do not encode their own reverse transcriptase and are therefore unable to transpose autonomously. For this reason it has been proposed that SINEs use the enzymatic machinery of LINEs for their retrotransposition (Luan et al. 1993; Wallace et al. 2008; Kroutter et al. 2009). LINEs and SINEs elements have been not only described in the genomes of animals but also in many species across the Plantae Kingdom (Schmidt 1999).

Tyrosine recombinase (YR) retroelements

YR retroelements have been found in plants, protists, fungi, as well as a variety of animals including vertebrates, echinoderms and nematodes. The “gag-RT-RNaseH” genome organization of YR retroelements is similar to that of LTR retroelements but differ in the fact that YR retroelements lack the PR and usually show a tyrosine recombinase (instead of INT), which is typically involved in site-specific recombinations between similar or identical DNA sequences. YR retroelements can be divided into three families DIRS, Ngaro and VIPER (Goodwin et al. 2004; Goodwin and Poulter 2004; Vazquez et al. 2000; Lorenzi et al. 2006). The typical structure of DIRS elements contains inverted terminal repeats (ITRs), internal ORFs and an internal complementary region (ICR) derived from the duplication of flanking ITR sequences. DIRS elements differ from the two other YR-like families in the presence of a conserved methyltransferase (MT) domain at C-terminal to RT/RH that is similar to those MTs encoded by various bacteriophages (Goodwin and Poulter 2004). Ngaro elements show an ORF organization similar to that of DIRS elements but differ in the orientation of flanking repeats. While DIRS elements are delimited by inverted repeats, Ngaro elements are flanked by direct repeats (referred as A1, A2, B1 and B2). Ngaro elements have been found in different organisms as the zebrafish Danio rerio (DrNgaro1), as well as in fungi and in echinoderms (Goodwin and Poulter 2004). In turn, VIPER (Vestigial interposed retroelement) elements have been described in the genomes of trypanosome protozoan parasites. As an example, the figure below also shows the genomic organization of tcVIPER, an element described in the genome of T. cruzi. This element contains a coding internal region flanked by a lineage of SINEs (called SIRE) also found in T. cruzi genome (Lorenzi et al. 2006; Vazquez et al. 2000).

Penelope retrotransposons (PLEs)

This is a family of retrotransposons described in many animal genomes (more than 80 species belonging to at least 10 animal phyla), protists, fungi, and plants (Arkhipova 2006). Their genome structure contains apparent LTRs that may be in either direct or inverted orientations flanking a coding region with RT and EN domains (i.e. a pol polyprotein domain). Phylogenetic reconstruction analysis indicate that PLEs-like RTs are closer to telomerase RTs (TERTs) than to any other characterized RTs (Arkhipova et al. 2003). In turn, PLE-like ENs are related with the intron-encoded endonucleases and the bacterial repair endonuclease UvrC, both belonging to the Uri family of ENs (Pyatkov et al. 2004). Studies realized with the fly Drosophila virilis and bdelloid rotifer organisms revealed that the majority of PLEs in these species contain spliceosomal introns (Arkhipova et al. 2003). The peculiar structural organization of PLE elements, their ability to retain introns during transposition, and the distinct placement in the phylogeny of retroelements, suggest that PLEs constitute an ancient class of retroelements (Arkhipova 2006; Schostak et al. 2008).

Read more: http://gydb.org/index.php/Retroelements

SCI经典英语写作语句汇总

Posted on 2015-12-28 | In Paper | Comments: | Views: ℃

| Words count in article: | Reading time ≈

It is time to ‘upgrade’ cancer epigenetics research and put together an ambitious plan to tackle the many unanswered questions in this field using epigenomics approaches.

Histones are no longer considered to be simple ‘DNA-packaging’ proteins; they are recognized as being dynamic regulators of gene activity that undergo manypost-translational chemical modifications, including acetylation, methylation, phosphorylation, ubiquitylation and sumoylation.

In addition to their influence on gene expression, emerging evidence indicates that specific histone modifications interface with other nuclear processes.

Histone modifications, together with DNA methylation, also have a vital role in organizing nuclear architecture,which, in turn, is involved in regulating transcription and other nuclear processes.

However, what distinguishes metabolomics from clinical chemistry is the fact that in metabolomics one is not attempting to characterize a few compounds at a time, but literally dozens or even hundreds of compounds at a time.

Blood is a special biofluid, as it potentially reflects all processes going on in all organs. This can be both a blessing and a curse, as metabolite perturbations in the blood, while easily detectable, cannot be easily traced to a specific organ or a specific cause.

On a cellular level, organisms face two main challenges: to maintain genome integrity in the face of mutagens and mobile genetic elements, and to express a specific repertoire of genes at the proper level and with the appropriate timing.

In sharp contrast to the low within-species genetic variation, differences between species-specific haplotypes were high.

Given the staggering crop losses that result from insect herbivory and the environmental problems associated with insecticide use, genomeenabled research on the natural mechanisms that plants use to defend themselves against insects will undoubtedly have not only ecological, but also agricultural relevance.

Awk经典实例总结

Posted on 2015-12-27 | In Linux | Comments: | Views: ℃

| Words count in article: | Reading time ≈

删除某一行

[zpxu@node102 ~]$ cat fkjsaf 
        GO_ids
Gh_A01G0005     GO:0016021
Gh_A01G0006     GO:0006629
Gh_A01G0007
Gh_A01G0008
Gh_A01G0009
Gh_A01G0010     GO:0008121,GO:0006122
Gh_A01G0011
Gh_A01G0012
Gh_A01G0013     GO:0003677,GO:0006355
Gh_A01G0014
Gh_A01G0015     GO:0004713,GO:0005524,GO:0004674,GO:0004672,GO:0006468
Gh_A01G0016     GO:0006886,GO:0005643,GO:0008536,GO:0005515,GO:0008565
Gh_A01G0017     GO:0003676
Gh_A01G0018
Gh_A01G0019     GO:0016020,GO:0006810,GO:0005215
[zpxu@node102 ~]$ awk '{if(NR==1){next} print $0}' fkjsaf 
Gh_A01G0005     GO:0016021
Gh_A01G0006     GO:0006629
Gh_A01G0007
Gh_A01G0008
Gh_A01G0009
Gh_A01G0010     GO:0008121,GO:0006122
Gh_A01G0011
Gh_A01G0012
Gh_A01G0013     GO:0003677,GO:0006355
Gh_A01G0014
Gh_A01G0015     GO:0004713,GO:0005524,GO:0004674,GO:0004672,GO:0006468
Gh_A01G0016     GO:0006886,GO:0005643,GO:0008536,GO:0005515,GO:0008565
Gh_A01G0017     GO:0003676
Gh_A01G0018
Gh_A01G0019     GO:0016020,GO:0006810,GO:0005215

删除列数小于N的行

[zpxu@node102 ~]$ awk '{if(NF==1){next} print $0}' fkjsaf 
Gh_A01G0005     GO:0016021
Gh_A01G0006     GO:0006629
Gh_A01G0010     GO:0008121,GO:0006122
Gh_A01G0013     GO:0003677,GO:0006355
Gh_A01G0015     GO:0004713,GO:0005524,GO:0004674,GO:0004672,GO:0006468
Gh_A01G0016     GO:0006886,GO:0005643,GO:0008536,GO:0005515,GO:0008565
Gh_A01G0017     GO:0003676
Gh_A01G0019     GO:0016020,GO:0006810,GO:0005215

删除空行

[zpxu@node102 ~]$ cat text
111
222

222
333
[zpxu@node102 ~]$ awk NF text
111
222
222
333

不输出后两列

cat file 
a b c d e f
1 2 3 4
awk 'NF-=2' file
a b c d
1 2

不输出前两列

1
2
3

awk '{for(i=3;i<NF;i++)printf("%s ",$i);print $NF}' file
c d e f
3 4

文件1和文件2交集部分合并输出

cat a.txt    //a.txt  
111   aaa  
222   bbb  
333   cccc  
444   ddd  
cat b.txt    //b.txt  
111  123  456  
2    abc  cbd  
444  rts  786  
#要求输出结果是
111,aaa,123,456
444,ddd,rts,786
#实现方法1
awk 'NR==FNR{a[$1]=$2;}NR!=FNR && a[$1]{print $1","a[$1]","$2","$3}' a.txt b.txt  
111,aaa,123,456  
444,ddd,rts,786

解释
当NR和FNR相同时,这就说明在对第一个文件进行操作，a[$1]=$2表示，建立一个数组，以第一个字段为下标，第二个字段为值。当NR!=FNR时,说明在对第二个文件进行操作，注意：这个时候的,$1和前面的$1不是同一个东西了，前面的$1表示的是a.txt的第一个字段，而后面的$1表示的是b.txt的第一个字段。a[$1]表示以b.txt中第一个字段的为下标的值，如果a[$1]有值的话，说明也存在于a.txt文件中，这样就把数据print出来就行了。

#方法2
awk -v OFS="," 'NR==FNR{a[$1]=$2;} NR!=FNR && $1 in a { print $1,a[$1],$2,$3}' a.txt b.txt  
111,aaa,123,456  
444,ddd,rts,786

解释
-v OFS=”,”这个是设置输出时的列分割符，$1 in a这个是b.txt文件中的第一列的值是不是在数组a的key中，这个对做程序的来说很好理解，各种语言当中都有这样的用法，或者函数。

sed中那些特殊的替换

Posted on 2015-12-26 | In Linux | Comments: | Views: ℃

| Words count in article: | Reading time ≈

修改匹配的第N个内容

cat text1
2
2
3
22
2
sed ':a;N;$! ba;s/2/--/5' text1   #替换第5个
2
2
3
22
--
sed ':a;N;$!ba;s/\(.*\)2/\1--/' text1  #替换最后一个
cat text2
2 52 8 2
sed 's/2/--/4' text2
2 52 8 --

Awk系统变量和内置函数

Posted on 2015-12-26 | In Linux | Comments: | Views: ℃

| Words count in article: | Reading time ≈

awk中有两种类型的系统变量。第一种类型定义的变量默认值可以改变，如字段和记录分隔符；第二种类型定义的变量的值可用于报告或数据处理中，如字段数量或记录数量。

内置变量表

实例

处理多行记录

$ cat text.txt
John Robinson

Koren Inc.

978 Commonwealth Ave.

Boston

MA 01760

696-0987

注：6个字段，记录之间用空行分隔。
为了处理这种包括多行数据的记录，可以将字段分隔符定义为换行符，记录分隔符定义为空字符串，代表一个空行。

1	BEGIN｛FS="\n";RS=""｝

所以可以用下面脚本打印第一个和最后一个字段：

1 2	$ awk 'BEGIN{ FS = "\n"; RS = "" } {print $1, $NF}' text.txt John Robinson 696-0987

输出数据格式设置：(OFMT使用）

1
2
3

$ awk 'BEGIN{OFMT="%.3f";print 2/3,123.11111111;}' /etc/passwd   
0.667 123.111
#OFMT默认输出格式是：%.6g 保留六位小数，这里修改OFMT会修改默认数据输出格式

按宽度指定分隔符（FIELDWIDTHS使用）

1
2
3

$ echo 20100117054932 | awk 'BEGIN{FIELDWIDTHS="4 2 2 2 2 3"}{print $1"-"$2"-"$3,$4":"$5":"$6}'
2010-01-17 05:49:32
#FIELDWIDTHS其格式为空格分隔的一串数字，用以对记录进行域的分隔，FIELDWIDTHS="4 2 2 2 2 2"就表示$1宽度是4，$2是2，$3是2  .... 。这个时候会忽略：FS分隔符

内置函数

awk内置函数，主要分以下3种类似：算数函数、字符串函数、其它一般函数、时间函数

字符串函数

split

awk的内建函数split允许你把一个字符串分隔为单词并存储在数组中。你可以自己定义域分隔符或者使用现在FS(域分隔符)的值。
格式：
split (string, array, field separator)
split (string, array) —>如果第三个参数没有提供，awk就默认使用当前FS值。
split有3个参数，第一个传要切分的字符串，第二个放切分完后输出的数组，第三个定义分隔符;

$ awk 'BEGIN{info = "this is a test";slen=split(info,ta," ");for (i=1;i<=slen;i++) {print i,ta[i];}}'
1 this
2 is
3 a
4 test

参考:
http://www.cnblogs.com/chengmo/archive/2010/10/06/1844818.html

Advanced-sed：n，N，d，D，p，P，b, T,t,h，H，g，G，x,y

Posted on 2015-12-25 | In Linux | Comments: | Views: ℃

| Words count in article: | Reading time ≈

高级命令分为3个组：

处理多行模式空间(N,D,P)。
采用保持空间来保存模式空间的内容，并使它用于后续命令(H,h,G,g,x)。
编写使用分支和条件指令的脚本来更改控制流（：，b，T/t）。
多行模式空间
awk，sed，grep的模式匹配是面向行的，在单个输入行上匹配一个模式。但是其他如在一行的结尾处开始到下一行的开始处结束的短语，则只有在多行上重复时才有意义。
sed能察看模式空间的多个行，允许匹配模式扩展到多行上.
这里的3个多行命令(N,D,P)对应于之前的小写字母的基本命令（n,d,p）。
命令解释
D/d: d删除模式空间的内容，D只是删除模式空间的第一行内容。
P/p: p打印当前模式空间内容，追加到默认输出之后，P(大写)打印当前模式空间开端至\n的内容，并追加到默认输出之前。
N/n: Next(N)通过读取新的输入行，并将它添加到模式空间的现有内容之后来创建多行模式空间。模式空间最初的内容和新的输入行之间用换行符分隔。多行模式空间中，“^”匹配空间中的第一个字条，而不是换行符后面的字符，“$”只匹配模式空间中最后的换行符，不匹配任何嵌入的。next（n）输出模式空间的内容，然后读取新的输入行。
实例
n
1
2
3
4
5
6
7
8
9
10
cat aaa
This is 1
This is 2
This is 3
This is 4
This is 5

sed -n 'n;p' aaa //-n表示隐藏默认输出内容
This is 2
This is 4

注释：读取This is 1，执行n命令，此时模式空间为This is 2，执行p，打印模式空间内容This is 2，之后读取 This is 3，执行n命令，此时模式空间为This is 4，执行p，打印模式空间内容This is 4，之后读取This is 5，执行n 命令，因为没有了，所以退出，并放弃p命令。

N

sed -n '$!N;P' aaa            
This is 1   
This is 3   
This is 5
sed -n 'N;P' aaa 
This is 1    
This is 3

注释中1代表This is 1 2代表This is 2 以此类推
注释：读取1，$!条件满足（不是尾行），执行N命令，得出1\n2，执行P，打印得1，读取3，$!条件满足（不是尾行），执行N命令，得出3\n4，执行P，打印得3，读取5，$!条件不满足，跳过N，执行P，打印得5.
$!N: 排除了对最后一行（$）执行Next命令。http://blog.chinaunix.net/uid-10540984-id-1759548.html

cat text 
Owner and Operator
Guide
sed '/Operator$/ {N;s/Owner and Operator\nGuide/Installtion Guide/ }' text 
Installtion Guide

关于更详细的关于sed参数n和N，见ww.cbcb.umd.edu/software/PBcR/MHAP/asm/

d

sed 'n;d' aaa           
This is 1   
This is 3   
This is 5

注释：读取1，执行n，得出2，执行d，删除2，得空，以此类推，读取3，执行n，得出4，执行d，删除4，得空，但是读取5时，因为n无法执行，所以d不执行。因无-n参数，故输出1\n3\n5.

D

1 2	sed 'N;D' aaa This is 5

注释：读取1，执行N，得出1\n2，执行D，得出2，执行N，得出2\n3，执行D，得出3，依此类推，得出5，执行N，条件失败退出，因无-n参数，故输出5.

输入/输出循环

P（大写）经常出现在N之后D之前，通过N-P-D可建立一个输入/输出循环，用来维护两行的模式空间，但是一次只输出一行。这个循环的目的是只输出模式空间的第一行，然后返回到脚本的顶端将所有的命令应用于模式空间的第二行。
案例分析：

$ cat text.txt
I want to see @f1(what will happen) if we put the 
font change commands @f1(on a set of lines). If I understand
things (correctly), the @f1(third) line causes problems.(No?).
Is this really the case, or is it (maybe) just something else?

Let's test having two on a line @f1(here) and @f1(there) as
well as one that begins on one line and ends @f1(somewhere
on another line). What if @f1(it is here) on the line?
Another @f1(one).
$ #将@f1(anything)替换为\fB anything \fR
$ sed 's/@f1(\(.*\))/\\fB\1\\fR/g'   #匹配@f1(.*) 并用“\(” 和 “\)” 保存括号中任意内容，在替换部分，保存的匹配部分用“\1” 回调。
$ sed -f sed.len test
I want to see \fBwhat will happen\fR if we put the
font change commands \fBon a set of lines\fR. If I understand
things (correctly), the \fBthird) line causes problems. (No?\fR.
Is this really the case, or is it (maybe) just something else?

Let's test having two on a line \fBhere) and @f1(there\fR as
well as one that begins on one line and ends @f1(somewhere
on another line). What if \fBit is here\fR on the line?
Another \fBone\fR.
$ #替换命令在第三行和第二段第一行失效，正则表达式贪婪匹配总是进行可能最长的匹配，“.*”匹配从"@f1(" 到这一行最后一个右圆括号中所有字符。
$ sed 's/@f1(\([^)]*\))/\\fB\1\\fR/g'  #除“）”以外的零次或多次出现的任意字符
I want to see \fBwhat will happen\fR if we put the
font change commands \fBon a set of lines\fR. If I understand
things (correctly), the \fBthird\fR line causes problems.(No?).
Is this really the case, or is it (maybe) just something else?

Let's test having two on a line \fBhere\fR and \fBthere\fR as
well as one that begins on one line and ends @f1(somewhere
on another line). What if \fBit is here\fR on the line?
Another \fBone\fR.
$ #可以看到对于跨越两行的替换仍然没有完成。这时多行模式空间变可发挥其神奇功效了,如果匹配“@f1(” 并且没有找到右圆括号的话，那么就需要将另一行读入(N)缓冲区并试着生成与第一种情况相同的匹配。 
$ cat sednew
s/@f1(\([^)]*\))/\\fB\1\\fR/g
/@f1(.*/ {
          N
		  s/@f1(.*\n[^)]*\))/\\fB\1\\fR/g
}
$ #/@f1(.*/地址将过程限制在匹配/@f1(.*/的行上，并对其执行｛｝中的命令。
$ sed -f sednew test
I want to see \fBwhat will happen\fR if we put the
font change commands \fBon a set of lines\fR. If I understand
things (correctly), the \fBthird\fR line causes problems.(No?).
Is this really the case, or is it (maybe) just something else?

Let's test having two on a line \fBhere\fR and \fBthere\fR as
well as one that begins on one line and ends \fBsomewhere
on another line\fR. What if @f1(it is here) on the line?
Another \fBone\fR.
$ #可以看出倒数第二个替换不成功，why? 模式匹配/@f1(.*/找到@f1(somewhere\n后执行N输入第二行，此时模式空间为“well as one that begins on one line and ends @f1(somewhere\non another line). What if @f1(it is here) on the line?“，进行第二行脚本替换命令"s/@f1(.*\n[^)]*\))/\\fB\1\\fR/g",模式空间变为"well as one that begins on one line and ends \fBsomewhere\non another line\fR. What if @f1(it is here) on the line?"并一起输出，然后sed再次输入的是最后一行“Another @f1(one)”来从头执行脚本；我们发现这个替换脚本似乎是”忘记“了@f1(it is here)的存在，成功跳过它完成匹配。而这原因就是sed默认是输出模式空间的整个内容，所以@f1(it is here)没有机会让脚本程序重头对其执行，也就没能通过脚本第一行替换完成任务。
$ #如果我们在多行模式空间中完成跨越两行的匹配替换后只是输出第一行（P），然后将其删除（D），这样剩下的“What if @f1(it is here) on the line?”部分成为模式空间的第一行，并将控制转移到脚本的顶端，这时检查是否在该行上还有其他的“@f1(”,这得到机会让脚本从上至下的所有命令应用到它完成替换。
$ cat sednew2
s/@f1(\([^)]*\))/\\fB\1\\fR/g
/@f1(.*/ {
          N
		  s/@f1(\(.*\n[^)]*\))/\\fB\1\\fR/g
		  P
		  D
}

大于2行模式空间

我们发现Next(N)命令只能的在读入第一行的基础上再次读入下一行，即模式空间中同时存在2行，如果想要匹配3行或更多怎么办？
这时就该高级的流控制命令起作用了。
其中用于控制执行脚本的哪一部分以及何时执行的命令为分支(b)和测试(T/t)，他们将脚本中的控制转移到包含特殊标签的行，如果没有标签被指定，则转移到脚本结尾处。分支用于无条件转移，测试用于有条件转移。
标签是任意不多于7个字符的序列，标签本身占据一行并以冒号开始：

:mylabel

注：冒号和标签间不允许有空格，行尾处的空格是标签的一部分。当在分支和测试命令中指定标签时，在命令和标签间允许有空格：

b mylabel

所以对于大于2行的模式空间匹配可以通过一下实现：

:begin
/@f1(\([^)]*\))/{
                 s//\\fB\1\\fR/g
                 b begin
}
/@f1(.*/{
         N
         s/@f1(\([^)]*\n[^)]*\))/\\fB\1\\fR/g
         t again
         b begin
}
:again
P
D

保持空间

模式空间是容纳当前输入的缓冲区。还有一个保持空间(hold space)的顶留(set-aside)缓冲区,模式空间和保持空间内容可实现互换。保持空间用于临时存储，单独的命令不能寻址保持空间或更改他的内容。
保持空间最常用的当改变模式空间中的原始内容时，用于保留当前输入行的副本。

y

y命令的作用在于字符转换
将aaa文件内容大写

sed 'y/his/HIS/' aaa  
THIS IS 1  
THIS IS 2  
THIS IS 3  
THIS IS 4  
THIS IS 5
#或者echo "axxbxxcxx" | sed 'y/abc/123/'
1xx2xx3xx
#不连续字符串的替换

h命令，H命令，g命令，G命令

h命令是将当前模式空间中内容覆盖至保持空间，H命令是将当前模式空间中的内容追加至保持空间

g命令是将当前保持空间中内容覆盖至模式空间，G命令是将当前保持空间中的内容追加至模式空间

cat ddd   
This is a and a is 1   
This is b and b is 2   
This is c and c is 3   
This is d and d is 4   
This is e and e is 5  
#将ddd文件中数字和字母互换，并将字母大写
cat ddd.sed
h  
{  
s/.*is \(.*\) and .*/\1/  
y/abcde/ABCDE/
G  
s/\(.*\)\n\(.*is \).*\(and \).*\(is \)\(.*\)/\2\5 \3\5 \4\1/  
}  
                                           
sed -f ddd.sed ddd  
This is 1 and 1 is A  
This is 2 and 2 is B  
This is 3 and 3 is C  
This is 4 and 4 is D  
This is 5 and 5 is E

x

x命令是将当前保持空间和模式空间内容互换.

perl-oneline

Posted on 2015-12-08 | In Perl | Comments: | Views: ℃

| Words count in article: | Reading time ≈

Perl有很多命令行参数. 通过它们, 我们有机会写出更简单的程序. 在这篇文章里我们来了解一些常用的参数.

第一部分：Safety Net Options 安全网参数

在使用Perl尝试一些聪明(或stupid)的想法时, 错误难免会发生. 有经验的Perl程序员常常使用三个参数来提前找到错误所在,
1：-C
这个参数编译 Perl 程序但不会真正运行它. 由此检查所有语法错误. 每次修改 perl 程序之后我都会立刻使用它来找到任何语法错误.
$ perl -c program.pl

2：-W
它会提示你任何潜在的问题.Perl 5.6.0之后的版本已经用use warnings; 替换了-w .你应该使用 use warnings 因为它要比 -w 更灵活.

3：-T
它把perl放到了tain模式. 在这个模式里, Perl 会质疑任何程序外传来的数据. 例如,从命令行读取, 外部文件里读取或是 CGI 程序里传来的数据.
这些数据在 -T 模式里都会被 Tainted 掉.

第二部分：命令行Perl参数：可以让短小的Perl程序运行在命令行.

1：-e
可以让Perl程序在命令行上运行.
例如, 我们可以在命令行上运行 “Hello World” 程序而不用把它写入文件再运行.
$ perl -e ‘print “Hello Worldn”‘

多个 -e 也可以同时使用, 运行顺序根据它出现的位置.
$ perl -e ‘print “Hello “;’ -e ‘print “Worldn”‘
象所有的 Perl 程序一样, 只有程序的最后一行不需要以 ; 结尾.

2：-M
可以象通常一样引用模
$ perl -MLWP::Simple -e ‘getstore (“http://www.163.com/","163.html")'##下载整个网页
-M+模块名和 use模块名一样

第三部分:隐式循环
3：-n
增加了循环的功能, 使你可以一行一行来处理文件
$ perl -n -e’print;’ 1.txt #####$ perl -ne ‘print;’ 1.txt
这与下面的程序一样.
LINE:
while (<>;) {
print;
}

<>; 打开命令行里的文件,一行行的读取.每一行缺省保存在 $
$ perl -n -e ‘print “$. - $“‘ file

上面的这一行可以写成
LINE:
while (<>;) {
print “$. - $“
}
输出当前行数 $. 和当前行 $.

4:-p ,和 -n 一样，但是还会打印 $_ 的内容

如果想在循环的前后做些处理, 可以使用 BEGIN 或 END block. 下面的这一行计算文件里的字数.

$ perl -ne ‘END { print $t } @w = /(w+)/g; $t += @w’ file.txt

每一行所有匹配的字放入数组 @w , 然后把 @w 的元素数目递加到 $t. END block 里的 print 最后输出文件总字数.
还有两个参数可以让这个程序变得更简单.

5:-a
打开自动分离 (split)模式. 空格是缺省的分离号. 输入根据分离号被分离然后放入缺省数组@F

使用-a，上面的命令可以写成这样：
$ perl -ane ‘END {print $x} $x += @F’ file.txt ##使用了-a

6：-F
把缺省的分离号改为你想要的.例如把分离号定为非字符，上面的命令可以改为：
$ perl -F’W’ -ane ‘END {print $x} $x += @F’ file.txt

下面通过Unix password 文件来介绍一个复杂的例子. Unix password 是文本文件, 每一行是一个用户记录,
由冒号 : 分离. 第 7 行是用户的登录 shell 路径. 我们可以得出每一个不同 shell 路径被多少个用户使用 :

$ perl -F’:’ -ane ‘$s{$F[6]}++;’ >; -e ‘END { print “$ : $s{$}” for keys %s }’ /etc/passwd

虽然现在不是一行, 但是你可以看出使用参数可以解决什么问题.

第四部分：Record Separators 数据分隔符

$/ 和 $ — 输入,输出分隔号.
$/ 用来分隔从文件句柄里读出的数据, 缺省 $/ 分隔号是 n , 这样每次从文件句柄里就会一行行的读取
$ 缺省是空字符, 用来自动加到要 print 的数据尾端. 这就是为什么很多时候 print 都要在末尾加上 n.
$/ 和 $ 可与 -n -p 一起使用. 在命令行上相对应为 -0 (零) 和 -l ( 这是 L ).
-0 后面可以跟一个 16 进制或8进制数值, 这个值用来付给 $/ .
-00 打开段落模式, -0777 打开slurp 模式 (即可以一次把整个文件读入) , 这与把 $/ 设为空字符和 undef 一样效果.

单独使用 -l 有两个效果：
第一：自动 chomp 输入分隔号
第二：把$/ 值付给 $ (这样 print 的时候就会自动在末尾加 n )

1：-l 参数, 用来给每一个输出加 n. 例如
$ perl -le ‘print “Hello World”‘

第五部分：原位编辑

使用已有的参数我们可以写出很有效的命令行程序. 常见的Unix I/O 重定向:
$ perl -pe ‘some code’ < input.txt > > output.txt

这个程序从 input.txt 读取数据, 然后做一些处理再输出到 output.txt. 你当然也可以把输出重定向到同一个文件里.

上面的程序可以通过 -i 参数做的更简单些.

2: -i
把源文件更名然后从这个更名的源文件里读取.最后把处理后的数据写入源文件.
如果 -i 后跟有其他字符串, 这个字符串与源文件名合成后来生成一个新的文件名.
此文件会被用来储存原始文件以免被 -i 参数覆盖.

这个例子把所有 php 字符替换为 perl :
$ perl -i -pe ‘s/bPHPb/Perl/g’ file.txt
程序读取文件的每一行, 然后替换字符, 处理后的数据重新写入( 即覆盖 ) 源文件.

如果不想覆盖源文件, 可以使用
$perl -i.bak -pe ‘s/bPHPb/Perl/g’ file.txt

这里处理过的数据写入 file.txt , file.txt.bak 是源文件的备份.

perl经典的例子

问题：
遇到一问题：
aaa@domain.com 2
aaa@domain.com 111
bbb@home.com 2222
bbb@home.com 1

类似这种输出，我想把他们变换成下面形式：

aaa@domain.com 113
bbb@home.com 2223
就是将相同邮箱名称后面的数字相加。各位大侠能否给些思路如何用perl来实现。

答案：
perl -anle ‘$cnt{$F[0]}+=$F[1];END{print “$t$cnt{$}” for keys %cnt}’ urfile

如果熟悉了上面几个perl命令行参数的用法，上面的这个命令应该很好理解：
每次读取urfile的一行，由于使用了-a，打开自动分离 (split)模式. 空格是缺省的分离号. 输入根据分离号被分离然后放入缺省数组@F中，
以文件的第一行为例子$F[0] 就是 aaa@domain.com , $F[1] 就是2

$cnt{$F[0]} +=$F[1] 就是一个哈希数组, 以$F[0]为key,$F[1]为value,把相同key的数值都叠加起来.然后把文件的每一行都这样处理一次.
END{} 就是在循环完之后再处理.里面的意思就是打印这个%cnt 哈希数组.这个哈希数组的key就是邮箱名称,value就是叠加后的数字.

下面的是上面行命令的文本形式：

!/usr/bin/perl

use strict;
use warnings;

my %hash;
while (<>){
chomp;
my @array=split;
$hash{$array[0]} +=$array[1];
}

END{
foreach (keys %hash){
print”$t$hash{$}n”;
}
}

与One-Liner相关的Perl命令行参数：

-0<数字>

(用8进制表示)指定记录分隔符($/变量)，默认为换行

-00，段落模式，即以连续换行为分隔符

-0777，禁用分隔符，即将整个文件作为一个记录

-a，自动分隔模式，用空格分隔$_并保存到@F中。相当于@F=split。分隔符可以使用-F参数指定

-F，指定-a的分隔符，可以使用正则表达式

-e，执行指定的脚本。

-i<扩展名>原地替换文件，并将旧文件用指定的扩展名备份。不指定扩展名则不备份。

-l，对输入内容自动chomp，对输出内容自动添加换行

-n，自动循环，相当于while(<>){脚本;}

-p，自动循环+自动输出，相当于while(<>){脚本;print;}

http://blog.sina.com.cn/s/blog_4af3f0d20100g9oz.html

How to use subscripts in ggplot2 legends -[expression()]

Posted on 2015-12-05 | In R | Comments: | Views: ℃

| Words count in article: | Reading time ≈

If you want to incorporate Greek symbols etc. into the major tick labels, use an unevaluated expression.

library(ggplot2)
data <- data.frame(names=tolower(LETTERS[1:4]),mean_p=runif(4))

p <- ggplot(data,aes(x=names,y=mean_p))
p <- p + geom_bar(colour="black",fill="white")
p <- p + xlab("expressions") + scale_y_continuous(expression(paste("Wacky Data")))
p <- p + scale_x_discrete(labels=c(a=expression(paste(Delta^2)),
                               b=expression(paste(q^n)),
                               c=expression(log(z)),
                               d=expression(paste(omega / (x + 13)^2))))
p

Contribution from ：
http://stackoverflow.com/questions/6202667/how-to-use-subscripts-in-ggplot2-legends-r

hexo语法：markdown基础篇

Posted on 2015-12-04 | In Hexo | Comments: | Views: ℃

| Words count in article: | Reading time ≈

Markdown是一种轻量级的「标记语言」，目标是实现「易读易写」。我使用改语言，主要的目的还是因为github的缘故。所以了解一些Markdown的一些基本语法，就是非常有必要了。

Markdown 常用语法

标题

只需要在文字前加 #。具体可以支持到1到6个#，建议在#后，最好加入一个空格，这是Mardown的标准写法。

列表

列表主要两种类型，无序和有序。无序的只要在文字前加-或者*，有序的是使用1.,2.,3.标记。

PDF

我是PDF

引用

要引用一段文字，在文字前使用标记 > 这种尖括号（大于号）即可。

这里是引用：hope

1	<blockquote><p>这里是引用：hope</p></blockquote>

图片与链接

图片：

1	![](){ImgCap}{/ImgCap}

或者：

1	<img src="https://raw.githubusercontent.com/wiki/tiramisutes/blog_image/pythonlogo.jpg" width="600" height="300">

链接:

1	[标注](link)

http://tiramisutes.github.io/

下载

Download Now

粗体与斜体

粗体与斜体也比较简单，两个或_包含一段文本就是粗体，一个或_包含一段文本就是斜体
粗体斜体

表格

| Tables        | Are           | Cool  |
| ------------- |:-------------:| -----:|
| col 3 is      | right-aligned | $1600 |
| col 2 is      | centered      |   $12 |
| zebra stripes | are neat      |    $1 |

效果展示
| Tables | Are | Cool |
| ——————- |:——————-:| ——-:|
| col 3 is | right-aligned | $1600 |
| col 2 is | centered | $12 |
| zebra stripes | are neat | $1 |

如果让标题居中，加:——————-:，右对齐——-:

代码框

1	cord if for while

分割线

分割线的语法只需要三个 * 号。

我是分割线

视频

1	<iframe width="420" height="315" src="http://www.youtube.com/" frameborder="0" allowfullscreen></iframe>

hexo server

hexo s启动hexo服务时报错如下：

FATAL Port 4000 has been used. Try other port instead.

显示hexo默认4000端口被占用；
解决办法：

windows下检查端口是否占用并杀死该进程

1
2
3

netstat -ano | findstr 4000 （最后一列是pid）
tasklist | findstr pid
taskkill -PID pid -F

或者换其他端口

1	hexo server --port=4001

Contribution from ：
http://www.jianshu.com/p/1e402922ee32
http://daringfireball.net/projects/markdown/basics
https://guides.github.com/features/mastering-markdown/
http://blog.csdn.net/microcosmv/article/details/51868284

python基础教程总结

Posted on 2015-12-03 | In Python | Comments: | Views: ℃

| Words count in article: | Reading time ≈

简介

Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本语言。
Python 的设计具有很强的可读性，相比其他语言经常使用英文关键字，其他语言的一些标点符号，它具有比其他语言更有特色语法结构。
Python 是一种解释型语言：这意味着开发过程中没有了编译这个环节。类似于PHP和Perl语言。
Python 是交互式语言：这意味着，您可以在一个Python提示符，直接互动执行写你的程序。
Python 是面向对象语言: 这意味着Python支持面向对象的风格或代码封装在对象的编程技术。
Python 是初学者的语言：Python 对初级程序员而言，是一种伟大的语言，它支持广泛的应用程序开发，从简单的文字处理到 WWW 浏览器再到游戏。

Python 特点

1.易于学习：Python有相对较少的关键字，结构简单，和一个明确定义的语法，学习起来更加简单。
2.易于阅读：Python代码定义的更清晰。
3.易于维护：Python的成功在于它的源代码是相当容易维护的。
4.一个广泛的标准库：Python的最大的优势之一是丰富的库，跨平台的，在UNIX，Windows和Macintosh兼容很好。
5.互动模式：互动模式的支持，您可以从终端输入并获得结果的语言，互动的测试和调试代码片断。
6.便携式：Python可以运行在多种硬件平台和所有平台上都具有相同的接口。
7.可扩展：可以添加低层次的模块到Python解释器。这些模块使程序员可以添加或定制自己的工具，更有效。
8.数据库：Python提供所有主要的商业数据库的接口。
9.GUI编程：Python支持GUI可以创建和移植到许多系统调用。
10.可扩展性：相比 shell 脚本，Python 提供了一个更好的结构，且支持大型程序。

Python 环境搭建

可以通过终端窗口输入 “python” 命令来查看本地是否已经安装Python以及Python的安装版本。

Python下载

Python最新源码，二进制文档，新闻资讯等可以在Python的官网查看到：
Python官网：http://www.python.org/
可以在一下链接中下载Python的文档，你可以下载 HTML、PDF 和 PostScript 等格式的文档。
Python文档下载地址：www.python.org/doc/

Python安装

Unix & Linux 平台安装 Python:

下载及解压压缩包。

如果你需要自定义一些选项修改Modules/Setup

执行 ./configure 脚本

make

make install

执行以上操作后，Python会安装在 /usr/local/bin 目录中，Python库安装在/usr/local/lib/pythonXX，XX为你使用的Python的版本号。

Window 平台安装 Python:

下载后，双击下载包，进入Python安装向导，安装非常简单，你只需要使用默认的设置一直点击”下一步”直到安装完成即可。

环境变量配置

Unix/Linux 设置环境变量

1 2	export PATH="$PATH:/usr/local/bin/python" ##/usr/local/bin/python 是Python的安装目录

Windows 设置环境变量：

1
2
3

##命令提示框中(cmd) : 输入 
path %path%;C:\Python 
##C:\Python 是Python的安装目录

Python 重要环境变量：

Python 中文打印错误

解决方法为只要在文件开头加入 # -- coding: UTF-8 -- 或者 #coding=utf-8 就行了。

Python 基础语法

行和缩进

学习Python与其他语言最大的区别就是，Python的代码块不使用大括号（{}）来控制类，函数以及其他逻辑判断。python最具特色的就是用缩进来写模块。
缩进的空白数量是可变的，但是所有代码块语句必须包含相同的缩进空白数量，这个必须严格执行。
IndentationError: unexpected indent 错误是python编译器是在告诉你”Hi，老兄，你的文件里格式不对了，可能是tab和空格没对齐的问题”，所有python对格式要求非常严格。
如果是 IndentationError: unindent does not match any outer indentation level错误表明，你使用的缩进方式不一致，有的是 tab 键缩进，有的是空格缩进，改为一致即可。
因此，在Python的代码块中必须使用相同数目的行首缩进空格数。
建议你在每个缩进层次使用单个制表符或两个空格或四个空格 , 切记不能混用。

多行语句

Python语句中一般以新行作为为语句的结束符，但是我们可以使用斜杠（ \）将一行的语句分为多行显示，语句中包含[], {} 或 () 括号就不需要使用多行连接符。

Python注释

python中单行注释采用 # 开头，多行注释使用三个单引号(‘’’)或三个单引号(“””)。

Python 变量

Python有五个标准的数据类型：

Numbers（数字）

String（字符串）

List（列表）

Tuple（元组）

Dictionary（字典）

Python数字(Number)

Python支持四种不同的数值类型：

int（有符号整型）

long（长整型[也可以代表八进制和十六进制]）

float（浮点型）

complex（复数）

Python数学函数：

Python字符串(String)

字符串或串(String)是由数字、字母、下划线组成的一串字符。
字符串用’’标识

python的字串列表有2种取值顺序:

从左到右索引默认0开始的，最大范围是字符串长度少1

从右到左索引默认-1开始的，最大范围是字符串开头

如果你的实要取得一段子串的话，可以用到变量[头下标:尾下标]，就可以截取相应的字符串，其中下标是从0开始算起，可以是正数或负数，下标可以为空表示取到头或尾。
加号（+）是字符串连接运算符，星号（*）是重复操作。
Python字符串运算符：

python的字符串内建函数：

序号	方法	描述
1	string.capitalize()	把字符串的第一个字符大写
2	string.center(width))	返回一个原字符串居中,并使用空格填充至长度 width 的新字符串
3	string.count(str, beg=0, end=len(string))	返回 str 在 string 里面出现的次数，如果 beg 或者 end 指定则返回指定范围内 str 出现的次数
4	string.decode(encoding=’UTF-8’, errors=’strict’)	以 encoding 指定的编码格式解码 string，如果出错默认报一个 ValueError 的异常，除非 errors 指定的是 ‘ignore’ 或者’replace’
5	string.encode(encoding=’UTF-8’, errors=’strict’)	以 encoding 指定的编码格式编码 string，如果出错默认报一个ValueError 的异常，除非 errors 指定的是’ignore’或者’replace’
6	string.endswith(obj, beg=0, end=len(string)))	检查字符串是否以 obj 结束，如果beg 或者 end 指定则检查指定的范围内是否以 obj 结束，如果是，返回 True,否则返回 False.
7	string.expandtabs(tabsize=8)	把字符串 string 中的 tab 符号转为空格，默认的空格数 tabsize 是 8.
8	string.join(seq)	Merges (concatenates)以 string 作为分隔符，将 seq 中所有的元素(的字符串表示)合并为一个新的字符串
9	string.ljust(width)	返回一个原字符串左对齐,并使用空格填充至长度 width 的新字符串
10	string.lower()	转换 string 中所有大写字符为小写
11	string.lstrip()	截掉 string 左边的空格
12	string.maketrans(intab, outtab])	maketrans() 方法用于创建字符映射的转换表，对于接受两个参数的最简单的调用方式，第一个参数是字符串，表示需要转换的字符，第二个参数也是字符串表示转换的目标。
13	string.replace(str1, str2, num=string.count(str1))	把 string 中的 str1 替换成 str2,如果 num 指定，则替换不超过 num 次
14	string.rstrip()	删除 string 字符串末尾的空格
15	string.split(str=””, num=string.count(str))	以 str 为分隔符切片 string，如果 num有指定值，则仅分隔 num 个子字符串
16	string.strip([obj])	在 string 上执行 lstrip()和 rstrip()
17	string.title()	返回”标题化”的 string,就是说所有单词都是以大写开始，其余字母均为小写(见 istitle())

Python列表(List)

列表用[ ]标识。
列表中的值得分割也可以用到变量[头下标:尾下标]，就可以截取相应的列表，从左到右索引默认0开始的，从右到左索引默认-1开始，下标可以为空表示取到头或尾。
加号（+）是列表连接运算符，星号（*）是重复操作。

访问列表中的值

使用下标索引来访问列表中的值，同样你也可以使用方括号的形式截取字符。

更新列表

>>> list = ['physics', 'chemistry', 1997, 2000]
>>> print list[2];
1997
>>> list[2] = 2001;
>>> print list[2];
2001
>>> print list;
['physics', 'chemistry', 2001, 2000]

删除列表元素

可以使用 del 语句来删除列表的的元素

1
2
3

>>> del list[2];
>>> print list;
['physics', 'chemistry', 2000]

Python列表函数

序号	函数	作用
1	cmp(list1, list2)	比较两个列表的元素
2	len(list)	列表元素个数
3	max(list)	返回列表元素最大值
4	min(list)	返回列表元素最小值
5	sum(list)	返回列表元素总和
6	list(seq)	将元组转换为列表

Python列表方法

序号	函数	作用
1	list.append(obj)	在列表末尾添加新的对象
2	list.count(obj)	统计某个元素在列表中出现的次数
3	list.extend(seq)	在列表末尾一次性追加另一个序列中的多个值（用新列表扩展原来的列表）
4	list.index(obj)	从列表中找出某个值第一个匹配项的索引位置
5	list.insert(index, obj)	将对象插入列表
6	list.pop(obj=list[-1])	移除列表中的一个元素（默认最后一个元素），并且返回该元素的值
7	list.remove(obj)	移除列表中某个值的第一个匹配项
8	list.reverse()	反向列表中元素
9	list.sort([func])	对原列表进行排序

Python元组

元组是另一个数据类型，类似于List（列表）。
元组用”()”标识。
内部元素用逗号隔开。但是元素不能二次赋值，相当于只读列表。

元组内置函数

tuple(seq)：将列表转换为元组。

Python元字典

字典(dictionary)是除列表以外python之中最灵活的内置数据结构类型。列表是有序的对象结合，字典是无序的对象集合。
两者之间的区别在于：字典当中的元素是通过键来存取的，而不是通过偏移存取。
字典用”{ }”标识。字典由索引(key)和它对应的值value组成。
adict[key] 形式返回键key对应的值value，如果key不在字典中会引发一个KeyError。

字典用法举例

>>> code = {"GLY" : "G", "ALA" : "A", "LEU" : "L", "ILE" : "I",
... "ARG" : "R", "LYS" : "K", "MET" : "M", "CYS" : "C"}
>>> code[’VAL’]
’V’
>>> code.keys()
>>> code.values()
>>> code.items()
>>> del code[’CYS’]
>>> code.update({’CYS’:’C’, ’MET’:’M’)
>>> one2three = {}
>>> for key,val in code.items():
... one2three[val]= key

字典内置函数&方法

序号	函数	作用
1	radiansdict.clear()	删除字典内所有元素
2	radiansdict.copy()	返回一个字典的浅复制
3	radiansdict.fromkeys()	创建一个新字典，以序列seq中元素做字典的键，val为字典所有键对应的初始值
4	radiansdict.get(key, default=None)	返回指定键的值，如果值不在字典中返回default值
5	radiansdict.items()	以列表返回可遍历的(键, 值) 元组数组
6	radiansdict.keys()	以列表返回一个字典所有的键
7	radiansdict.update(dict2)	把字典dict2的键/值对更新到dict里
8	radiansdict.values()	以列表返回字典中的所有值

Python数据类型转换

有时候，我们需要对数据内置的类型进行转换，数据类型的转换，你只需要将数据类型作为函数名即可。
以下几个内置的函数可以执行数据类型之间的转换。这些函数返回一个新的对象，表示转换的值。

Python 运算符

Python语言支持以下类型的运算符:

算术运算符：+，-，，/,%,*(幂 - 返回x的y次幂xy),//(取整除 - 返回商的整数部分)

比较（关系）运算符:==,!=,<>(不等于),>,<,>=,<=

赋值运算符：=，-=(减法赋值运算符),+=(加法赋值运算符),=,/=,%=,*=,//=

Python 条件语句

if 判断条件1:
    执行语句1……
elif 判断条件2:
    执行语句2……
elif 判断条件3:
    执行语句3……
else:
    执行语句4……

其中”判断条件”成立时（非零），则执行后面的语句，而执行内容可以多行，以缩进来区分表示同一范围。

Python 循环语句

Python提供了for循环和while循环

Python While循环语句

#!/usr/bin/python

count = 0
while (count < 9):
   print 'The count is:', count
   count = count + 1

print "Good bye!"
输出：
The count is: 0
The count is: 1
The count is: 2
The count is: 3
The count is: 4
The count is: 5
The count is: 6
The count is: 7
The count is: 8
Good bye!

Python for 循环语句

#!/usr/bin/python
# -*- coding: UTF-8 -*-

for num in range(10,20):  # 迭代 10 到 20 之间的数字
   for i in range(2,num): # 根据因子迭代
      if num%i == 0:      # 确定第一个因子
         j=num/i          # 计算第二个因子
         print '%d 等于 %d * %d' % (num,i,j)
         break            # 跳出当前循环
   else:                  # 循环的 else 部分
      print num, '是一个质数'
以上实例输出结果：

10 等于 2 * 5
11 是一个质数
12 等于 2 * 6
13 是一个质数
14 等于 2 * 7
15 等于 3 * 5
16 等于 2 * 8
17 是一个质数
18 等于 2 * 9
19 是一个质数

Python函数

自定义一个函数

你可以定义一个由自己想要功能的函数，以下是简单的规则：

函数代码块以def关键词开头，后接函数标识符名称和圆括号()。

任何传入参数和自变量必须放在圆括号中间。圆括号之间可以用于定义参数。

函数的第一行语句可以选择性地使用文档字符串—用于存放函数说明。

函数内容以冒号起始，并且缩进。

Return[expression]结束函数，选择性地返回一个值给调用方。不带表达式的return相当于返回 None。

语法

def functionname( parameters ):
   "函数_文档字符串"
   function_suite
   return [expression]

实例

#!/usr/bin/python
# -*- coding: UTF-8 -*-

total = 0; # 这是一个全局变量
# 可写函数说明
def sum( arg1, arg2 ):
   #返回2个参数的和."
   total = arg1 + arg2; # total在这里是局部变量.
   print "函数内是局部变量 : ", total
   return total;  #return语句[表达式]退出函数，选择性地向调用方返回一个表达式。不带参数值的return语句返回None
 
#调用sum函数
sum( 10, 20 );
print "函数外是全局变量 : ", total 
以上实例输出结果：

函数内是局部变量 :  30
函数外是全局变量 :  0

Python 模块

简单地说，模块就是一个保存了Python代码的文件。模块能定义函数，类和变量。模块里也能包含可执行的代码。关于模块的安装见《python模块安装—无root权限（easy_install和pip）》

import 语句

想使用Python源文件，只需在另一个源文件里执行import语句。

From…import 语句

Python的from语句让你从模块中导入一个指定的部分到当前命名空间中。

From…import* 语句

把一个模块的所有内容全都导入到当前的命名空间也是可行的。

Python中的包

包是一个分层次的文件目录结构，它定义了一个由模块及子包，和子包下的子包等组成的Python的应用环境。

Python 文件I/O

读取键盘输入

Python提供了两个内置函数从标准输入读入一行文本，默认的标准输入是键盘。如下：

raw_input：raw_input([prompt]) 函数从标准输入读取一个行，并返回一个字符串（去掉结尾的换行符）

input：input([prompt]) 函数和raw_input([prompt]) 函数基本可以互换，但是input会假设你的输入是一个有效的Python表达式，并返回运算结果。

open()函数

你必须先用Python内置的open()函数打开一个文件，创建一个file对象，相关的辅助方法才可以调用它进行读写。

语法：


file object = open(file_name [, access_mode][, buffering])

各个参数的细节如下：

file_name：file_name变量是一个包含了你要访问的文件名称的字符串值。

access_mode：access_mode决定了打开文件的模式：只读，写入，追加等。所有可取值见如下的完全列表。这个参数是非强制的，默认文件访问模式为只读(r)。

buffering:如果buffering的值被设为0，就不会有寄存。如果buffering的值取1，访问文件时会寄存行。如果将buffering的值设为大于1的整数，表明了这就是的寄存区的缓冲大小。如果取负值，寄存区的缓冲大小则为系统默认。

Close()方法

语法：


fileObject.close();

Write()方法

Write()方法在字符串的结尾不添加换行符(‘\n’)
注意：write(str())写入的数据必须是字符串。

语法：


fileObject.write(string);

read()方法

read（）方法从一个打开的文件中读取一个字符串。


fileObject.read([count]);

Python正则表达式

Python 自1.5版本起增加了re 模块，它提供 Perl 风格的正则表达式模式。
re 模块使 Python 语言拥有全部的正则表达式功能。
compile 函数根据一个模式字符串和可选的标志参数生成一个正则表达式对象。该对象拥有一系列方法用于正则表达式匹配和替换。
re 模块也提供了与这些方法功能完全一致的函数，这些函数使用一个模式字符串做为它们的第一个参数。
详细内容分见

自己使用总结

python字符串替换的2种有效方法：

用字符串本身
a = ‘hello word’
a.replace(‘word’,’python’) 用正则表达式
import re
strinfo = re.compile(‘word’)
b = strinfo.sub(‘python’,a)
print b

异常报错

TypeError: ‘str’ object does not support item assignment
AttributeError: ‘str’ object has no attribute ‘append’
错误原因：对str进行list的操作
解决办法：转换数据类型
list和str转化str.split()
这个内置函数实现的是将str转化为list。其中str=””是分隔符。
join可以说是split的逆运算

1
2
3

>>> name=['Albert', 'Ainstain']
>>> "".join(name)
'AlbertAinstain'

Contribution from ：
http://m.runoob.com/python/
http://www.ynpxrz.com/n781659c2025.aspx

tiramisutes

hope bioinformatics blog

RSS

GitHub E-Mail Weibo Twitter

Introduction

LTR retroelements

补充材料

Non-LTR retroelements

Tyrosine recombinase (YR) retroelements

Penelope retrotransposons (PLEs)

删除某一行

删除列数小于N的行

删除空行

不输出后两列

不输出前两列

文件1和文件2交集部分合并输出

修改匹配的第N个内容

内置变量表

实例

处理多行记录

输出数据格式设置：(OFMT使用）

按宽度指定分隔符（FIELDWIDTHS使用）

内置函数

字符串函数

split

高级命令分为3个组：

多行模式空间

命令解释

实例

n

N

d

D

输入/输出循环

大于2行模式空间

保持空间

y

h命令，H命令，g命令，G命令

x

!/usr/bin/perl

Markdown 常用语法

标题

列表

PDF

引用

图片与链接

下载

粗体与斜体

表格

代码框

分割线

视频

hexo server

简介

Python 特点

Python 环境搭建

Python下载

Python安装

环境变量配置

Python 中文打印错误

Python 基础语法

行和缩进

多行语句

Python注释

Python 变量

Python数字(Number)

Python字符串(String)

Python列表(List)

访问列表中的值

更新列表

删除列表元素

Python列表函数

Python列表方法

Python元组

元组内置函数

Python元字典

字典用法举例

字典内置函数&方法

Python数据类型转换

Python 运算符

Python 条件语句

Python 循环语句

Python While循环语句

Python for 循环语句