Wednesday, November 9, 2011

Wednesday, November 2, 2011

Subsetting lists in Python and R

I'm sure these are not the only two approaches:


Python:

a = ["1/fish","1/cat","1/dog","2/mouse"]
#method 1:
filter(lambda w:w[0:2]=='1/',a)
#method 2:
import re
filter(lambda w:re.search('^1/.*', w), a)

R:

a = c("1/fish","1/cat","1/dog","2/mouse")
#method 1
a[sapply(a, substr, 1,2) == "1/"]
#method 2
grep("^1/",a, value=T)

Tuesday, November 1, 2011

Nifty R trick..

Suppose you have something like this:

[1] "contig00001_1_1 # 686 # 1906 # -1 # ID=1_1;partial=00;start_type=ATG;rbs_motif=AGGAG;rbs_spacer=5-10bp"
[2] "contig00001_1_2 # 1921 # 3102 # -1 # ID=1_2;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp"
[3] "contig00001_1_3 # 3260 # 4159 # 1 # ID=1_3;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp"
[4] "contig00001_1_4 # 4594 # 5715 # 1 # ID=1_4;partial=00;start_type=GTG;rbs_motif=AGGAG;rbs_spacer=5-10bp"
[5] "contig00001_1_5 # 5731 # 6051 # 1 # ID=1_5;partial=00;start_type=ATG;rbs_motif=AGxAGG/AGGxGG;rbs_spacer=5-10bp"
[6] "contig00001_1_6 # 6051 # 7484 # 1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp"

You want to split it on the "#" character, and the pull out the 1st, 2nd, and 3rd columns. Using a handy little function named '[' in R you can do this in one line:

details = t(sapply(sapply(names(seqs), strsplit, "#"), '[', c(1,2,3)))

(Thanks to Matt for the '[' trick!)

Thursday, October 20, 2011

Bacterial vs Eukaryotic genomes

Bacterial: work hard for a day --> get exciting results
Eukaryotic: work hard for a day --> wait 3 days to see if it works