Part 3: Building the morloc environment

by Zebulun Arendsee

Contents:

The morloc alternative

In the morloc library, all file support will be removed. The idea is the tRNA prediction library should only predict tRNA. It should not have to support FASTA or GENBANK parsing. Functions in general purposes libraries can read from and write to storage formats, if necessary.

What do we want to return?

predictTrna :: Str -> ???
record TRNA = TRNA
  { anticodon :: Int   -- the offset of the start of the anticodon
  , energy    :: Real  -- the thermodynamic energy of the tRNA structure as
                       -- calculated by ARAGORN
  -----
  , astem1    :: Int 
  , spacer1   :: Int 
  , dstem     :: Int 
  , dloop     :: Int 
  , spacer2   :: Int 
  , cstem     :: Int 
  , cloop     :: Int 
  , intron    :: Int  -- position of the intron start
  , nintron   :: Int  -- length of the intron
  , var       :: Int 
  , tstem     :: Int 
  , tloop     :: Int 
  , astem2    :: Int 
  }

The TRNA type is “affine”. That is, it is the pattern describing a tRNA that is independent of its origin. It contains no information about the species of origin and thus stores only the location of the anticodon and not the amino acid it codes for, since this may vary depending on the genetic code. It contain no genomic start coordinate and no strand information.

The TRNA type is also minimal. It contains no information that can be derived without rerunning the ARAGORN algorithm. Conspicuously missing is the actual tRNA sequence. This could be included without loss of the affine property. However, this would require performaing work, at a computational and memory expense, that may not be desired by the user and that the user can easily do later.

The TRNA type is also fixed in size. Every type in the structure is of fied with (ints and doubles, in C). Thus the record has fixed size and can be stored very efficiently without having to lookup variable length values like strings.

The goal is to develop a minimal type that ARAGORN can emit that is independent of anything else in the wider ecosystem. I do not, for example, want to import some bio object oriented library and make TRNA a subclass of some generic Feature object. ARAGORN needs to be timeless and independent.

                            cca   3' amino-acyl acceptor
                    5'    a       NCCA is the canonical motif
                       g-c
                       c-g
                       g-c  A-stem
                       g-c
                       a-t
                       g-c
             space1    t-a     ta
                      t   cgccc  a
              tga    a    !!!!!  a   T-loop
             c   tttg     gcggg  c
     D-loop  t   :+!!    c     tt
             g   tgac     c
              gta    g     g
                      t-aag   var
            spacer2   c-g
                      a-t
                      g-c
                      c-g
                     t   a
                     t   a   C-loop (anticodon stem-loop)
                      ccc
                      \ /
                       anticodon

morloc implementation

However, it segfaults when the flanks around the tRNA are too short.

morloc ecosystem

Making a primordial rna package.

  • searching both strands
  • circular search
  • codon translation
  • mapping
  • filtering
  • parallelism
  • visualization
built on 2024-08-12 11:47:46.22895832 UTC from file 2024-08-31-aragorn-3