Next Gen Sequence Analysis: Nifty R trick..

Suppose you have something like this:

[1] "contig00001_1_1 # 686 # 1906 # -1 # ID=1_1;partial=00;start_type=ATG;rbs_motif=AGGAG;rbs_spacer=5-10bp"
[2] "contig00001_1_2 # 1921 # 3102 # -1 # ID=1_2;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp"
[3] "contig00001_1_3 # 3260 # 4159 # 1 # ID=1_3;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp"
[4] "contig00001_1_4 # 4594 # 5715 # 1 # ID=1_4;partial=00;start_type=GTG;rbs_motif=AGGAG;rbs_spacer=5-10bp"
[5] "contig00001_1_5 # 5731 # 6051 # 1 # ID=1_5;partial=00;start_type=ATG;rbs_motif=AGxAGG/AGGxGG;rbs_spacer=5-10bp"
[6] "contig00001_1_6 # 6051 # 7484 # 1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp"

You want to split it on the "#" character, and the pull out the 1st, 2nd, and 3rd columns. Using a handy little function named '[' in R you can do this in one line:

details = t(sapply(sapply(names(seqs), strsplit, "#"), '[', c(1,2,3)))

(Thanks to Matt for the '[' trick!)

Next Gen Sequence Analysis

Tuesday, November 1, 2011

Nifty R trick..

No comments:

Post a Comment