Transcriptome Reconstitution Practical

 

Now that you understand the process of how to generate transcript information for each locus, this practical will show you one way of visualising transcript data using the Acedb system. Background information about Acedb can be found here, but for the purposes of this practical, you need know only the following:

 

Ø     Acedb is an object-based database system that can be used to store and display mapping and sequence information for a given genome. A version for Windows is installed on this machine (WinAce).

Ø     Data to be read into the database must be in ‘ace’ format (see below).

Ø     Objects are displayed using a method, described in a methods.ace file.

Ø     The database has a defined directory structure: a single parent directory (usually with the same name as the database), and two subdirectories named ‘database’ (containing a binary form of the data) and ‘wspec’ (containing configuration files). For convenience, ace-formatted files can be stored in a subdirectory named ‘rawdata’.

 

The goals of this practical are:

 

ü     to write Perl scripts to convert the supplied transcript data for HTR004283 to ace format. Three types of data are supplied: genomic sequence containing the locus; alignment maps for transcripts from the locus (experimental and Tromer-defined); and 3’Tags that map to the locus.

ü     to create an Acedb database of transcript information for HTR004283

ü     to display this locus using the graphics features of Acedb

 

A SOLUTION is here!

 

Guidance (feel free to ask us too)

 

Colour code: . . . . . . Blue = do this in unix . . . . . . Green = do this in Windows

 

All required files can be found at:

/net/ludwig‑sun1/export/mirror/sib‑isrec‑www/DEA/module8/B_Stevenson/Practicals/transcriptome_recon/.

 

Conversion of sim4 alignment data to ace format:

The alignment for NM_018845 looks like this in ace format. Blank lines separate individual blocks. There are three blocks in this example: one for the alignment, defining the exon-intron structure of the transcript; one for the LongText containing (part of) the original data; and one for the placement of the alignment on the genomic sequence. Together, these three blocks provide sufficient information to fully describe the relationship between the transcript and the genomic sequence. The general layout is ‘keyword tab data data …’. Most keywords are self explanatory, but some deserve special mention. ‘Sim4’ provides the link between the alignment and LongText blocks. ‘Brief_identification’ provides a label for the transcript in the Acedb graphical display. ‘CDS’ is absolutely required for the “exon-intron”-type graphical display. ‘Method’ tells Acedb which method to use to display the transcript. In this practical there are five types of transcript, each with its own method – take this into account in your Perl script.

 

Conversion of the genomic sequence to ace format:

An example of genomic sequence in ace format is here. No more to say…

 

Conversion of 3’Tag data to ace format:

3’Tags can be thought of as single exon transcripts, with their own method.

 

Create Acedb directory structure:

Create a parent directory and database subdirectory. Copy wspec/ as a second subdirectory. Also, put your ace files in a ‘rawdata’ subdirectory. Note: you need to have write access to the new database, so replace “Your Windows login name” in wspec/passwd.wrm with your login name on the local Windows machine.

 

Create a gzip/tar archive of the directory structure and ftp to your local Windows hard disk.

 

Import of ace format data into Acedb:

Gunzip and untar the archive (use WinZip). Launch WinAce, and navigate to the location of the new database (select the folder, and click OK). Select ‘Read .ace file’ from the Edit menu and follow the logical steps.


Alternatively, view the database in UNIX using xace...use Help or ask us.