Transcriptome Reconstitution Practical
Now that you understand the process of how to generate transcript
information for each locus, this practical will show you one way of visualising transcript data using
the Acedb system. Background information about Acedb can be found here,
but for the purposes of this practical, you need know only the following:
Ø
Acedb
is an object-based database system that can be used to store and display
mapping and sequence information for a given genome. A version for Windows is
installed on this machine (WinAce).
Ø
Data
to be read into the database must be in ‘ace’ format (see below).
Ø
Objects
are displayed using a method, described in a methods.ace file.
Ø
The
database has a defined directory structure: a single parent directory (usually
with the same name as the database), and two subdirectories named ‘database’
(containing a binary form of the data) and ‘wspec’ (containing configuration
files). For convenience, ace-formatted files can be stored in a subdirectory
named ‘rawdata’.
The
goals of this practical are:
ü
to
write Perl scripts to convert the supplied transcript data for HTR004283 to ace
format. Three types of data are supplied: genomic sequence containing the locus;
alignment maps for transcripts from the locus (experimental and Tromer-defined); and 3’Tags that map to the locus.
ü
to
create an Acedb database of transcript information for HTR004283
ü
to
display this locus using the graphics features of Acedb
A SOLUTION
is here!
Guidance
(feel free to ask us too)
Colour code: . . . . . . Blue = do this in unix . . . . . . Green = do this in Windows
All
required files can be found at:
/net/ludwig‑sun1/export/mirror/sib‑isrec‑www/DEA/module8/B_Stevenson/Practicals/transcriptome_recon/.
Conversion
of sim4 alignment data to ace format:
The
alignment for NM_018845
looks like this
in ace format. Blank lines separate individual blocks. There are three blocks in
this example: one for the alignment, defining the exon-intron structure of the
transcript; one for the LongText containing (part of) the original data; and
one for the placement of the alignment on the genomic sequence. Together, these
three blocks provide sufficient information to fully describe the relationship
between the transcript and the genomic sequence. The general layout is ‘keyword
tab data data …’. Most keywords are self explanatory, but some deserve special
mention. ‘Sim4’ provides the link between the alignment and LongText blocks.
‘Brief_identification’ provides a label for the transcript in the Acedb
graphical display. ‘CDS’ is absolutely required for the “exon-intron”-type
graphical display. ‘Method’ tells Acedb which method to use to display the
transcript. In this practical there are five types of transcript, each with its
own method – take this into account in your Perl script.
Conversion
of the genomic sequence to ace format:
An
example of genomic sequence in ace format is here. No more to say…
Conversion
of 3’Tag data to ace format:
3’Tags
can be thought of as single exon transcripts, with their own method.
Create
Acedb directory structure:
Create a
parent directory and database subdirectory. Copy wspec/ as a second
subdirectory. Also, put your ace files in a ‘rawdata’ subdirectory. Note: you
need to have write access to the new database, so replace “Your Windows login
name” in wspec/passwd.wrm with your login name on the local Windows machine.
Create
a gzip/tar archive of the directory structure and ftp to your local Windows hard disk.
Import
of ace format data into Acedb:
Gunzip
and untar the archive (use WinZip). Launch WinAce, and navigate to the location
of the new database (select the folder, and click OK). Select ‘Read .ace file’
from the Edit menu and follow the logical steps.
Alternatively, view the database in UNIX using xace...use Help or ask us.