NAME

parasight (version 7.4)


SYNOPSIS

 parasight -align alignment.table

This simply loads the file alignment.table containing either a table of tab-delimited alignments or miropeats standard output (see below).

 parasight -align AC002038.blast.parse -showseqqueryonly

This simply loads the file AC002038.blast.parse containing parse blastdata and displays the hits relative to the query sequence. This is the number one use in our lab for parasight.

 parasight -align AC002038.blast.parse -showseq AC002038.1: 
     -extra repeats -template bacblastview.pst 
     -options 'seq_color=>red, canvas_width=>1000'

This draws the blast output from a search with AC002038.1 formatted with the options contained in the template file bacblastview.pst. It uses -options to modify the screen width and sequence color.

 parasight -in saved.parasight -showseq AC002304:AC002035: 
     -arrangeseq sameline -template template.file 
     -options 'seq_color=>red,extra_arrow_on=>1'

This loads a previously saved parasight view (the files saved.parsight.psa, saved.parasight.pse and saved.parasight.pso, shows only 2 of the sequences (AC002034 and AC002035), arranges or places these two sequences on the same line, loads a template file of options to reformat the view, and modifies two options directly (sequence color and turns on arrows for annotation extra).


DESCRIPTION

Parasight is a generalized pairwise alignment viewer originally developed for analyzing segmental duplications (or paralogy) within the human genome. It is designed to display the positions and relationships of pairwise alignments within sequeunce(s). It provides both interactive analysis as well as publication quality postscript output. Parasight can arrange and color alignments on the basis of any other included data such as size, percent similarity or even species designation. It can also display the position of any type of simple sequence annotation such as repeats and exons. Finally, it can graph numerical data in relation to the seqeunce such as windows of percentage GC content. Parasight has been used to analyze output from programs such as BLAST, miropeats, and pip maker from the scale of whole-genomes to (such as segmental duplications in the human genome) to the analysis of a single protein searched against a database of interest. If it is pairwise data, parasight can display the output.

Parasight functions on both Unix and Windows platforms. The program is written in Perl using the graphical Perl/Tk module. It was designed to be extremely flexible and thus a price is paid in terms of speed as well as the complexity, i.e. the large number of options). However, the numerous options makes it more likely that parasight can do what you want it to do. Although not necessary for basic interactive use, an understanding of regular expressions and a familiarity Perl is helpful in order to utilize the program more fully. Parasight and its options are fully accessible through the GUI interface, the command line, or via loadable templates making anlyses flexible and automatable. Most users of parasight load their data into the program and then format the view interactively using the extensive options menu (and now templates). Programmers will be the most likely to use command line manipulation of internal options. Parasight has been used and tested extensively on both Linux and MS Windows. The Unix version is the most extensively tested and all options should be available. Windows lacks some of the more advanced options due to incompatiblities/inflexabilities of Bill Gates' operating system.

To extol parasight's strengths:

  1. Flexible
  2. The RAM is the limit when it comes to loading data. Other than the most basic description of a pairwise alignment or extra sequence feature Parasight makes absolutely no assumptions about your data allowing the user to analyze what they are interested in analyzing. Technically parasight can't tell a bp apart from an inch or DNA from Protein.

  3. Formatible
  4. Parasight has a pletora of options, if I have ever needed it parasight has it. Every parasight option is available is availble from the command line, the GUI interface or from a saved template file of options. Thus, the basic user and the programmer can completely tailor their parasight views to their exact needs.

  5. Interactive
  6. Parasight can interact with the user and with other programs. The user can interaction format the parasight image via the GUI option menu as well as edit the data. Parasight allows the user to print screen shots or dump a postscript of the entire image. Popup windows over alignments and extra sequence features display the objects data. In addition to scaling with the options menu, users can zoom in and out to gain an appreciation of the detail. Parasight has the ability to link to (or execute) other programs allowing the viewing of web pages or associated sequence alignments at the bp level.

  7. Programmable
  8. Parasight has the ability to accept additional Perl code from the command line or via a file, which allows for more complex formatting or for the execution of commands such as searching or printing. Combined with the -die option this allows for powerful batch processes such as generating PostScript images of 30,000 BACs (if you are so inclined).


UNDERSTANDING DATA CATEGORIZATION/CLASSIFICATION

A basic understanding of the logic behind parasight is useful in understanding data input and manipulation. Data falls into three categories, pairwise alignments, extra sequence annotation and graph data. The graphical option menu is organized on the basis of which data is being manipulated. Alignments, the core of parasight, have two forms of display: pairs and subs. Pairs are representations of the pairwise alignments that are normally drawn atop the seqeunce. For each alignment the pairs representing it can be connected by lines to show their relationship. Thus, for pairs relationships are only visable if the both sequences containing the pairwise are drawn. Subs are representations of pairwise in relation to only one of the sequences mimicing blast type output. Sub is for sub-sequence (as they are drawn below the seqeunce) or subjects (if you are examining BLAST results). Extras are simple sequence annotations that have one beginning and one end such as introns, LINEs, SINES, motifs, etc. In the case of a gene the intron exon structure can not be drawn as one object, but only as individual exons and indvidual introns. (A -gene data structure is planned for the distant future.) The last data type is graph data Graph data is a plot of sequence positions (x-axis) versus a numerical value (y-axis).

Below is a crude schematic (the best I could do in POD) of a typical display with Sequence(-), Pairs(P), Subs(S), Extras(E), and Graph(G) data.

                                                                 G
                   G                   G         G                   G 
          G               G     G                               G           
               G                                          G                 
               
           EEE   EEEE          EEEE         EEEE      EEE     EEEEEEE
    S0001--------PPPPPPPPPPPPP-----------PPPPPPPPPPPPPP--PPPPPPPP-----
    SEQ04        SSSSSSSSSSSSS                           SSSSSSSS
    SEQ02        SSSSSS   SSSS           SSSSSSSSS
    SEQ03           SSSSSSSSSSS              SSSSSSSSSSS  SSSSS


COMMAND LINE OPTIONS

The main command line arguments available when parsight is executed can be divided into three main headings: data input, reloading a saved parasight view, and changing view options.

DATA INPUT

While many types of data can be loaded and displayed, the absolute minimum input is simply the length of a sequence to be drawn. The length of a sequence can be provided in -showseq file (see below). Usually the lengths are supplied as part of the pairwise alignment file. If no alignments are being drawn then the user must supply the lengths with the -showseq option. Of course examining a line representing a sequence is pretty boring--even if it is decorated with tick marks--so parasight has a few other data input options:

Option: -align

-align [filepath1:filepath2:filepath3:etc] loads files containing pairwise alignments. The files must be either the saved standard output for miropeats (Jeremy Parsons) or a tab-delimited format akin to miropeats standard output. The tab-delimited format is simply a file where the first 8 columns contain the pairwise coordinates and lengths of the two similar sequences. The align file is assumed to have a descriptive header in the first row. Hence, the first alignment will be lost (and loaded as the header) if no header actually is present.

Miropeats standard output

An example of Jeremy Parson's Miropeats standard output is:

 ## Minimum repeat length set to 300.
 
 
         ICAass  Version 2.1
         =======
 
 
 Indexing all the sequences now. This may take a few minutes.
 
 Total of 1 sequences indexed
 The sorted index is being saved to the file cluster.index.7507 
 .AC002038 118 731 161973 AC002038 44681 44068 161973
 AC002038 1299 1788 161973 AC002038 47175 47664 161973
 AC002038 22870 23591 161973 AC002038 39920 40641 161973
 AC002038 46067 46524 161973 AC002038 26363 26820 161973
 AC002038 46067 47435 161973 AC002038 26363 27731 161973
 AC002038 46699 47435 161973 AC002038 26995 27731 161973
 AC002038 47175 47664 161973 AC002038 1299 1788 161973
 Graphic ready for printing - type the command shown below to print:
 lp threshold300

Example tab-delimited alignment file

An example of a tab-delimited align file consisting of parsed BLAST output with additional data columns is shown below:

 name1 begin1 end1  len1  name2 begin2 end2  len2  similarity  transversions
 S001  1322   20001 20001 S002  1      18064 18064 0.945632    125
 S001  1322   20001 20001 S003  1      21010 21010 0.980581    143
 S002  1      18064 18064 S003  100    21010 21010 0.999587    7
 S002  1      18064 18064 S004  1      19041 19041 0.989587    43
 S002  1      12191 18064 S005  1      12141 18073 0.997548    17
 S002  12799  18064 18064 S005  12809  18073 18073 0.998548    3

The format consists of 2 sequence names, the coordinates of the pairwise similarity, and the overall lengths of the named sequences (name1 begin1 end1 len1 name2 begin2 end2 len2), which must always be in this given order. The first row of the alignment file contains column header names. For the first 8 rows the header rows are ignored and thus it is necessary to place these eiqht columns in the exact order given above. The only data from these first 8 rows that may be omitted is the overall lengths of the sequence. However, these columns must still be present and contain no value (empty cell in Excel). Also, the lengths of the sequence must then be provided via -showseq. Any additional columns (such as the similarity and transversions columns above) are kept within the internal alignment data table. This additional data can be used to format and filter the parasight views generated. Additional columns that are created if not present in the alignment file are color, width, offset, sline, scolor, and hide. These are case sensitive and all lower case. ( color contains the color of the pairwise. width is the width or thickness for the bar representing the pairwise. offset is the offset of the subject object. scolor is the color of a subject object. hide does not display a pairwise if it is equal to 1. If the values for these columns are not inputed or blank then teh default values for the options are used. (NOTE: It is usually simpler to modify these formatting columns in saved parasight tables using programs such as Excel.)

Option: -showseq

-showseq [filepath | seqname1[,length,begin,end]:seqname2[,length,begin,end]:etc] displays only the designated sequences. With no colon a filename is assumed and the program attempts to load it. If a colon is found in the option then it is assumed that the input is a colon separated group of sequence names. Optional length and begin and end positions can be designated as well. This information may be given on the command line using commas after the sequence name or be contained within a file (tab-delimited). For analysis such as BLAST searches where you just want to display the query sequence it is easier to use the short cut option -showseqqueryonly in combination with -showseq ALL.

Example showseq file

The format of the tab-delimited show file is shown below:

 seqname    length   begin   end
 S001       10000    50      1000
 S002       15432    1000    15432
 
An example of the data above as a command line entry is:
 -showseq S001,10000,50,1000:S002,15432,1,15432:

Or with just the lengths to display the entire sequence:

 -showseq S001,10000:S002,15432:
 
Or if the lengths are in the alignment file, just the begin and end positions can be designated (note the double comma to skip the sequence length):
 -showseq S001,,50,1000:S002,,1000,15432

The only required column is the sequence name. The names must be exactly the same as in the alignment file and extra file. The lengths, begins, and ends, are optional for the sequences to be drawn. However, if length, begin or end are used, they must appear in the proper columns. Begin and end must always be found in the 3rd and 4th columns and thus a blank sequence length must be provided for column 2 when sequence length is not designated but begin and end are. If lengths are not supplied in the alignment files or the show file then errors will occur. Lengths designated by -showseq always supercede lengths found in alignments.

-extra [filepath1:filepath2:etc] loads any 'extra' sequence annotation/feature that can be expressed as a continuous block of the displayed sequence (i.e. a simple begin and end position). This can include features such as high copy repeats, introns, exons, and genes (if you don’t care about introns/exon structure). The simplest extra file contains 3 columns in the given order seqname begin end.

Example of an tab-delimited extra file

An example of a tab-delimited extra file is shown below:

 seqname   begin   end    name   color  offset
 S001      50      1000   exon1  blue   -10
 S001      5000    5500   exon2  blue   -10
 S002      5000    9000   LINE1  red    -20

Columns added if not present are color, offset, width, and orient. (Again it is usually simpler to modify formatting data after the fact unless it is generated beforehand (e.g. orient). In terms of formatting simple color names should work: black, red, green. Orientation should be either 'F' or 'R' (capitalized). Plus and Minus will NOT work. However, during the initial loading of the extras they as well as other common designations for orientation are automatically changed to 'F' and 'R'. Other columns may be added that give additional information such as names and descriptions.

Option: -graph1 or -graph2

-graph1 file(:s) and -graph2 file(:s) loads simple graphing data in the form of seqname position value. The position within the sequence in bases and the value must be numerical (floating point). It plots the points and/or a connecting line. The graph is numerical values of sequence positon on the x-axis versus y-axis numerical values. -graph1 is used to generate one plot. -graph2 is used to plot another line or set of data points on the same scales. The left and right axis can be scaled for different ranges. Thus, Alu content and GC content can be graphed at the same time. The left axis shows the scale for -graph1 and the right axis the scale for -graph2. No header is required for the input file. An unknown value can be designated with an empty value position. An empty value position causes a discontinuous line to be drawn. Only the first 3 columns are loaded all additional columns are ignored. Graphing is built for speed-not flexiblity. The only flexibility is in scaling and formating the axes.

Example of a graph file

 seqname  position  value
 chr1      5000      0.43
 chr1     10000      0.65
 chr1     15000      0.73
 chr1     20000      0.65

RELOADING A SAVED PARASIGHT VIEW

Data can be saved directly as parasight formatted files.

Option: -in

-in [base filepath] loads a previously saved parasight dataset. Data is saved in 4 separate files (basefilename.psa, basefilename.pse, basefilename.pso) and basefilename.psg. Each file is editable text. The .psa, .pse, and .pso are required even if there are no alignments and/or extras. These extensions to the basefilename are automatically searched for in the given path.

The .psa and .pse are tab-delimited tables containing the alignment and extra data, respectively. These tables are easily edited with any text editor; spreadsheets such as Excel are particularly useful in modifying these tables since the data is separated into columns and calculations can easily be done to modify the data as necessary.

The .pso file contains all of the current option information. It is saved as text which can be modified by the end user. It has a similar format to the template files.

The .psg file contains all of the current data to graph. The -graph1 data is stored in the first 3 columns and the -graph2 data is stored in the next 3 columns (4 to 6). The graph file, unlike the alignment and extra files is only created if graph data has been loaded. Thus, a missing *.psg will will not generate an error. For each set of 3 columns, column one is the seqeunce, column 2 is the position on the sequence, and column 3 is the value of to plot on the y-axis.

IMPORTANT: All data necessary for a parasight view is contained in these files EXCEPT for any -showseq files or -arrangeseq files. These files must still be accessible in the same relative path positions in order for the saved file to be loaded properly. In other words, only the file names to a show and arrange files are saved and that data must be reloaded. If the files get moved then the link will be broken and their paths will need to be altered.

CHANGING DISPLAY OPTIONS

Option arguments modify and format the parasight view. All of these options may either be change from the command line, a template file, or interactively within the program OPTION menu. The interactive menu is the easiest way to learn and template files the easiest way to apply a set of options again and again. Changing options at the command line follows a set order of precedence--whereby old options loaded from a previous parasight view (-in) are overridden by an option template file (-template), both of which are overridden by any options specified in (-option) command. All of these are overridden by direct command line options such as -arrangeseq, -colorsub, and -showsub.

PRECEDENCE: internal default ---> -in ---> -template ---> -option ---> commandline

Option: -template

-template [filepath] loads an option template file. This allows a user to quickly format future parasight views so that they are just like the saved one. It is created using the save option template in the file menu. When loading a template, if the file is not found in the current or specified path, hard-wired default template directories are searched. For our lab one directory contains templates shared among multiple users. And a user specific directory for an individuals PARASIGHT files. The $template_path variable contains the paths. To modify them you must modify the code. The current setting is '~/.PARASIGHT:/people/PARASIGHT'. The search is left to right and first one found is first one used. Template directory as it is currently set does not work for WINDOWS. The tilde must be removed as it only works on Unix where the HOME directory (~) is designated by the environmental variables..

The template is an standard text file so a user can modify the values easily. It is created using the save option template in the file menu. 0 and 1 are used for on and off as well as yes and no values. An empty string is simply a line return right after the (=>) A line beginning with ### is ignored and is used to give descriptions of the values. Be careful about adding blank space, it is a good idea to edit with normally unseen characters such as spaces and line breaks visualized.

Option: -options

-options ['opt1=>value1,opt2=>value2'] is a list of options to modify. All of the underlying options are available; however, there are probably many that you will never have reason to modify, but they are all listed in appendix A for completeness.

Option: -showsub

-showsub [filepath | seqname1:seqname2:etc] This option shows only the designated subs to be drawn under sequences. Multiple sequence names can be directly entered with colon delimitation. If no colon is present then the input will be treated as a file containing a list of subs and will be loaded. Default is ALL, which displays all possible subs.

Option: -arrangeseq

-arrangeseq [oneperline | sameline |file:filename] This option arranges the sequences in a specified manner.

oneperline draws each sequence on a separate line that may wrap if needed.

sameline draws all of the sequences on the same line with a given amount of spacing between them.

file:filename uses the data in the file to arrange the sequences in user defined pattern. The file consists of two columns seqname and position in current line. To start a new line NEWLINE is typed alone. The example below places the chromosomes on 3 lines.

Example Arrange File

        acc     start
   chr1 400000000       
   chr6 1668388704      
   chr7 1870803946      
   chr8 2057427852      
   chr9 2230204273      
   NEWLINE              
   chr22        1       
   NEWLINE              
   chr10        400000000       
   chr11        565589288       
   chr12        736372841       
   chr13        900655330       
   chr14        1040400228      
   chr15        1167353549

-arrangesub [oneperline|stagger|subscale|cscale] This option arranges subs below the drawn sequences. The name came from blast subjects, but you can also think of them in terms of sub (beneath) the sequence.

oneperline = each sub sequence is placed on its own line underneath the drawn sequence. The ordering of sequences can be altered by choosing a column to sort on (arrangesub_col)

stagger = multiple subjects are placed on same line only when non-overlapping. The spacing required between the beginning and end of two subs can be varied. This spacing gives room for labels. The ordering starts in terms of other sequences with hits closest to the beginning of the sequence of interest under which the subs are being drawn

subscaleN = subjects are places on a numerical scale based on given column values. Tricky so avoid setting up from command line--use the GUI and then save a template from that.

subscaleC = subjects are placed on categorical scale based on column values. Tricky so avoid setting up from command line. Use a template or the GUI.

Note: the best way to figure the scales out is to experiment with them interactively in the options menu .There are specific modifications of subscaleN and subscaleC that are included as choices. They are denoted by a preceding asterisk and were developed to display breakdowns of percent similarity and chromosome position (for mostly oudated draft versions of the genome) However,they may be instructive to the new user. New views are now simply done via a template rather than adding adding even more choices. =back

-color ***not implemented*** When implemented it will color the pairwise sequences and connecting lines. Currently, coloring is only based inter and intrachromosomal designation. (As of yet the need hasn't really arisen.) For consistency this should be called colorseq.

-colorsub [NONE|RESET|seqrandom|hitrandom|hitconditional] This option provides color schemes for the subs drawn below the sequence.

NONE does not change the color and leaves hit colors intact. Hit colors are stored within each pairwise in the table. Subject colors are stored transiently. Hit colors over-ride subject colors. To remove hit colors use RESET.

RESET removes hit (individual pairwise) colors, which override any assigned subject colors. For example, if you use hitrandom and then try to switch to seqrandom, nothing will change. This is because hitrandom colors are still stored in the internal alignment table and they take precedence over the subject color scheme. Thus, this intermediate RESET step is required to clear the hit colors. CAUTION: if you use RESET all of your manual coloring will be wiped out. (NOTE: This is because hit colors reside in the same column scoloras manually modified sub colors. The column color defines the pairwise color--overriding inter and intra colors.) Sorry, this is part of the program that could be simplified if I ever have a chance to gut it.

seqrandom randomly assigns colors to the various sequences that are displayed as subs. (There is a random set of 20 odd colors that are cycled through.)

hitrandom randomly assigns colors to each individual hit or pairwise alignment. (There is a random set of 20 odd colors that are cycled through.)

hitconditional allows for each pairwise to be assigned a color based on pseudo-Perl code by using a series of conditional statements that test a single alignment column. Basic syntax is [color] [test] [value];, where color= color to set, test is =, >, or <, and value is some numerical value.

-minload is a switch to load only the alignments and extras for the sequences that will be drawn as designated by -showseq. It is very useful for increasing the speed of the program when there are a large number of alignments that will not be drawn in the current view. Why load the genome if you only want to look at chromosome 22?

-precode ['Perl code'] This code is executed after the initial drawing of objects. It allows automation for batch processes when combined with die option. (See Advanced option section below for details.)

-die parasight quits after executing the precode option (See Advanced option section below for details.)


INTERACTIVE MENUS

RESHOW, REARRANGE, REDRAW

This part is to answer why there is a blue and white button for updating the drawing. For beginners, I simply suggest using the blue R,R&R (Reshow, Rearrange, and Redraw) button. For extremely large data sets; however, the Reshow, and Rearrange calculations can take a significant amount of time. Thus, if you are just changing the spacing of tick marks it is handy to skip the sequence and arrangement calculations. However, for simple views of BAC BLAST output stick with the blue button.

OPTION MENU

The option menu has popup help (over yellow text) and most options are self-explanatory. If in doubt try changing an option and see what happens. I have tried to adhere to a semi-logical naming convention when ever possible. Blue color coding is to show whether a variable will require reshow and rearrangement before taking effect. The menu is subdivided into 6 main parts: MAIN, SEQ/PAIRS, SUBS, EXTRA, GRAPH, FILTER, and MISC. The organization trys to follow the organization of the data in parasight.

The MAIN menu allows access to important command line options like -showseq and -showsub. Also, basic screen properties such as size of the window and the number of bases for the width of the screen.

The SEQ/PAIRS portion allows manipulation of the sequence and assocaiated tick marks. Pairs and their designation as inter and intrachromomal as well as connecting lines are controled from this part of the menu as well.

The SUB portion of course is all about the manipulation of subs. This is some of the more complex data manipulation.

The EXTRA portion is about the options relating to the extra data.

The GRAPH portion is for the graph data. Try turning everything on when you first test out this feature.

The FILTER portion allows for the filtering/removal of pairwise and extras based on data in a given column of numerical data.

The MISC portion allows for the setting of options controling printing, the display of alignments, the extraction of sequence, and the execution of other programs.

FILE DROP-DOWN MENU

This is the only place where the save parsight command is found. All data and options are saved. A few files are not saved--see information about -in. Loading must be done at the command line. Additionally, template files (*.pst) may be saved and loaded through this menu. After loading a template file the screen must be R,R,& R.

PRINT DROP-DOWN MENU

The print menu allows for the generation of a postscript file and its subsequent transmission to a printer if the option print_command is properly set. The postscript file can consist of the visible screen (screen) or the entire parasight drawing (all). If the all option is chosen then the number of pages (vertically and horizontally) across which to print the image is set with the option print_multipages_wide and print_multipages_high. The postcript files are encapsulated and can be easily turned into PDF files with software such as Adobe Distiller or imported into Adobe Illustrator. Also, word has a special eps import option which was handy when writing my dissertation.

ORDER DROP-DOWN MENU

The order menu on the main drop down menu bar allows the order or level of objects in the display to be changed. You can either send objects all the way to the background or the foreground.

MISC DROP-DOWN MENU

Currently it contains the ability to transfer colors for alignments in order to allow syncing of colors between the pairs and the subs. It requires a redraw to see the effect after choosing one of these options. This is really the only way to currently go outside of the inter vs intra coloring schemes for pairs.


SCREEN MANIPULATION

In addition to gazing lovingly at the pretty images after formating them using the option menu, direct manipulation of the display once drawn can be accomplished with various commands.

MOUSE BUTTON FUNCTIONS

(see APPENDIX B: for table of mouse functions)

First when you mouse over an object it will shimmer with a number of bright colors. the shimmering object represents the object you will select if you click on it.Most of the mouse commands work on sequence, pairwise, extra, and subjects. Tick Marks and Labels are immune except for the ALT buttons. The middle mouse button is not used since some systems like my home PC lack them (and I don’t have the dexterity to precisely click both Left and Right at the exact same time which is the usual substitute).

DATA POPUP WINDOW (Left-Click) This pops up a simple window displaying all data for an alignment or an extra object. Use Shift-Drag to move the popup window if it is obscured or obscuring data. Formatting options for this popup window are found under MISC tab of the OPTIONS menu.

OPTIONS POPUP (Right-Click) Brings up a popup menu of options, which includes a variety of commands such as choosing colors and editing the underlying data. If the actual alignments are present in the alignment table, the alignments can be viewed. If the underlying sequence files are available, subsequences representing objects can be extracted.

ZOOM IN AND OUT (Control-Left-Click and Control-Right-Click) Zooming can be accomplished with Control held down at the same time as a mouse click. The left mouse clicked in conjunction with the control key will zoom in two fold centered at the point of the click. The right mouse has the opposite effect and zooms out. The DeZoom button on the main window returns the scaling to normal.

MOVE OBJECT TO FOREGROUND OR BACKGROUND (Alt-Left-Click and Alt-Right-Click) This causes the object clicked on to move all the way to the foreground or the background. The left mouse button moves it to the foreground. The right mouse button moves the object to the background.

MOVE (nonpermanent) ANY OBJECT (Shift-Left Drag) Allows for the movement of object in the drawing--even tick marks and sequence lines. It is non-permanent but it is useful for removing tick marks or names before you print or create a PostScript file.

QUICK COLOR (Shift-Right-Click to color and Shift-Right-Double-Click to uncolor) Allow for rapid coloring of objects. Shift-Right Click causes the object's color to change to that of the Quick Color Button on the Main Window. Shift-Double-Click-Button attempts to remove the color and leave the default color. In the case of Pairs, black is assigned to the object as inter and intra chromosomal colors can not be reassigned until a Redraw. Coloring of all other objects (i.e. not extras and not alignments) are not saved or stored and consequently revert to normal as soon as the image is redrawn.

HIDE SEQUENCE OR EXTRA (Alt-Right-Double-Click) This will hide sequences from view (i.e it will disappear from view). To unhide sequences you must use the pre-filter in the filter options. (For which I should add a command line!).


APPENDIX A: LIST OF VALID OPTIONS WITH INTERNAL DEFAULTS


APPENDIX B: QUICK REFERENCE

COMMAND LINE SUMMARY

-align [filepath1:filepath2:etc] load pairwise alignment table(s) (table must be miropeats format)

-arrangeseq [oneperline/sameline/file] (default is oneperline)

 *oneperline = each sequence is placed on a separate wrapping line
 *sameline = the sequences are placed in alphabetical 
    order on the same line
 *file:filepath = arrange file that allows specification
    of line/paragraph and position

-arrangesub [oneperline/stagger/subscale/cscale] (default stagger) Arrange subs below the sequence.

 *oneperline = each sequence is placed on its own line
    underneath sequence
 *stagger = multiple subjects are placed on same line 
    only when non-overlapping
 *subscaleN =  pairwise hits are placed on a numerical scale
    based on values in chosen column(s)
 *subscaleC =  pairwise hits are placed on categorical 
    scale based on hash(s)

-color [scheme] ***not implemented yet, no demand yet*** Use other options for determining inter vs intrachromosal***

-colorsub [NONE/RESET/seqrandom/hitrandom/hitconditional]

 *NONE = does not add a colorsub and does not remove colors 
    for pairwise hits
 *RESET = removes colors for pairwise hits 
    colors for pairwise hits override colors for sequence hits
 *seqrandom = color all pairwise comparisons for a subject the same
 *hitrandom = randomly independently color each pairwise comparison
 *hitconditional = allows coloring based on a conditional statement

-extra [filepath1:filepath2:etc] loads extra sequence feature table(s) Sequence features are annotation that have single begin and end points (e.g. exons, introns, and repeats). The rows must consist of seqname[tab]begin[tab]end. Further columns may contain optional data. Columns named offset, width, and color provide extra formatting information.

-graph1 [filepath1:filepath2:etc] Graphs a data set of values above the sequence line. such as %GC. The data scale is found on the left. The data row format is simply seqname[TAB]begin[TAB]value. No more, no less. For regions with out a value a blank will cause the graph line to be disrupted.

-graph2 [filepath1:filepath2:etc] Creates another graph using the scale on the right axis. Same parameters as -graph1

-in [filepath] load a previously saved parasight view. Three files required are *.psa, *.pse and *.psm (*.psg needed only if a graph has been used)

-options ['opt1=>value1,opt2=>value2'] *** Allows all of the parasight options to be changed directly ***. One and zero are used for on/off, yes/no and true/false. Complete access for the programmer using parasight as a displayer (e.g. 'canvas_width=>500,seq_tick_on=>1,graph_scale_on=>1')

-showseq [a file or seqname(s):] names of sequences to display

   *ALL = show all files (default) 
   *no colon = load as file of names 
     format each line ( seqname[TAB]length[TAB]begin[TAB]end )
     only sequence name is required other info optional
   *colon(:) = parse as list of colon-delimited seq names
     format: (seqname,length,begin,end:seqname2,length2,begin2,end2)

-showseqqueryonly This toggles the display of only the first sequence in a given row. This is the usually position for a blast query (hence the name of the option).

-showsub [file | seqnames: | ALL] names of subjects to display

   *ALL: displays all subject sequences (default)
   *no colon = load file containing names (one seqname per line)
   *colon(:) = parse input as list of colon-delimited sequence names

-template [filepath] loads a saved option template file. Template files can be stored in default directories for easy loading.

ADVANCED OPTIONS

-minload *loads only the relevant pairwise that will be displayed (quicker when just certain sequences are needed from large files)

-precode 'perl code commands to execute after first screen draw' *an advanced option useful for automating initial tasks

-die parasight ends after executing precode *an advanced option useful in automating tasks

OPTION PRECEDENCE

internal default ---> -in ---> -template ---> -option ---> commandline

MOUSE FUNCTIONS

 [DBL]=double click  [DRAG]=button hold down and move mouse 
 EXECUTE # = Execute Command (User Defined under MISC options)
 
 KEY            LEFT-BUTTON       MIDDLE BUTTON  RIGHT-CLICK                             
 ---------      -----------       -------------  --------------------
 NONE           Popup Desc                       Menu
 CONTROL        Zoom in                          Zoom out
 SHIFT          Move Object[DRAG]                Quick color; Uncolor[DBL]
 ALTERNATE      Del  Object [DBL]                Lower Object; Raise Object[DBL]
 CONTROL-SHIFT  Execute 1         Execute 2      Execute 3

COMPACT ALPHABETICAL LIST OF -OPTIONS WITH DEFAULTS

alignment_col=>0   alignment_col2=>0   alignment_wrap=>50   arrangeseq=>oneperline   arrangesub=>stagger   arrangesub_stagger_spacing=>40000   canvas_bpwidth=>250000   canvas_indent_left=>60   canvas_indent_right=>30   canvas_indent_top=>40   color=> None   colorsub=> None   colorsub_hitcond_col=>34   colorsub_hitcond_tests=>red if <2; orange if <0.99; yellow if <0.98; green if <0.97; blue if <0.96; purple if <0.95; brown if <0.94; grey if <0.93; black if <0.92; pink if <0.91   execute=>   execute2=>   execute2_array=>m   execute2_desc=>   execute3=>   execute3_array=>m   execute3_desc=>widget   execute4=>   execute4_array=>m   execute4_desc=>   execute_array=>e   execute_desc=>   extra_arrow_diag=>5   extra_arrow_on=>1   extra_arrow_para=>5   extra_arrow_perp=>4   extra_color=>purple   extra_label_col=>10   extra_label_col_pattern=>   extra_label_color=>purple   extra_label_fontsize=>6   extra_label_offset=>2   extra_label_on=>1   extra_label_test_col=>   extra_label_test_pattern=>   extra_offset=>-4   extra_on=>1   extra_width=>6   fasta_blastdb=>htg:nt   fasta_directory=>.:fastax   fasta_fragsize=>400000   fasta_on=>1   fasta_wrap=>50   filename_color=>grey   filename_offset=>-10   filename_offset_h=>0   filename_on=>1   filename_pattern=>   filename_size=>10   filter1_col=>   filter1_max=>   filter1_min=>   filter2_col=>   filter2_max=>   filter2_min=>   filterextra1_col=>   filterextra1_max=>   filterextra1_min=>   filterextra2_col=>   filterextra2_max=>   filterextra2_min=>   filterpre1_col=>   filterpre1_max=>   filterpre1_min=>   filterpre2_col=>   filterpre2_max=>   filterpre2_min=>   gif_anchor=>center   gif_on=>0   gif_path=>   gif_x=> int($opt{window_width}/2)   gif_y=>0   graph1_label_color=>blue   graph1_label_decimal=>2   graph1_label_fontsize=>10   graph1_label_multiplier=>1   graph1_label_offset=>1   graph1_label_on=>1   graph1_line_color=>blue   graph1_line_on=>1   graph1_line_smooth=>0   graph1_line_width=>1   graph1_max=>100   graph1_min=>-5   graph1_on=>0   graph1_point_fill_color=>blue   graph1_point_on=>1   graph1_point_outline_color=>blue   graph1_point_outline_width=>1   graph1_point_size=>2   graph1_tick_color=>black   graph1_tick_length=>6   graph1_tick_offset=>1   graph1_tick_on=>1   graph1_tick_width=>3   graph1_vline_color=>black   graph1_vline_on=>1   graph1_vline_width=>2   graph2_label_color=>red   graph2_label_decimal=>2   graph2_label_fontsize=>10   graph2_label_multiplier=>1   graph2_label_offset=>8   graph2_label_on=>1   graph2_line_color=>red   graph2_line_on=>1   graph2_line_smooth=>0   graph2_line_width=>1   graph2_max=>1000   graph2_min=>-1000   graph2_on=>0   graph2_point_fill_color=>red   graph2_point_on=>1   graph2_point_outline_color=>red   graph2_point_outline_width=>1   graph2_point_size=>2   graph2_tick_color=>black   graph2_tick_length=>6   graph2_tick_offset=>5   graph2_tick_on=>1   graph2_tick_width=>3   graph2_vline_color=>black   graph2_vline_on=>1   graph2_vline_width=>2   graph_scale_height=>80   graph_scale_hline_color=>black   graph_scale_hline_on=>1   graph_scale_hline_width=>1   graph_scale_indent=>-20   graph_scale_interval=>4   graph_scale_on=>0   help_on=>1   help_wrap=>50   mark_advanced=>   mark_array=>m   mark_col=>   mark_col2=>   mark_color=>red   mark_pairs=>0   mark_pattern=>AC002038   mark_permanent=>0   mark_subs=>1   pair_inter_color=>red   pair_inter_line_on=>0   pair_inter_offset=>0   pair_inter_on=>1   pair_inter_width=>13   pair_intra_color=>blue   pair_intra_line_on=>0   pair_intra_offset=>0   pair_intra_on=>1   pair_intra_width=>9   pair_level=>NONE   pair_type_col=>   pair_type_col2=>   pair_type_col2_pattern=>   pair_type_col_pattern=>   popup_format=>text   popup_max_len=>300   print_command=>lpr -P Rainbow {}   print_multipages_high=>1   print_multipages_wide=>1   printer_page_length=>11i   printer_page_orientation=>1   printer_page_width=>8i   quick_color=>purple   seq_color=>black   seq_label_color=>black   seq_label_fontsize=>12   seq_label_offset=>-4   seq_label_offset_h=>0   seq_label_on=>1   seq_label_pattern=>   seq_line_spacing_btwn=>250   seq_line_spacing_wrap=>200   seq_spacing_btwn_sequences=>10000   seq_tick_b_color=>black   seq_tick_b_label_anchor=>ne   seq_tick_b_label_color=>black   seq_tick_b_label_fontsize=>9   seq_tick_b_label_multiplier=>0.001   seq_tick_b_label_offset=>2   seq_tick_b_label_offset_h=>0   seq_tick_b_label_on=>1   seq_tick_b_length=>10   seq_tick_b_offset=>0   seq_tick_b_on=>1   seq_tick_b_width=>2   seq_tick_bp=>20000   seq_tick_color=>black   seq_tick_e_color=>black   seq_tick_e_label_anchor=>nw   seq_tick_e_label_color=>black   seq_tick_e_label_fontsize=>9   seq_tick_e_label_multiplier=>0.001   seq_tick_e_label_offset=>2   seq_tick_e_label_offset_h=>0   seq_tick_e_label_on=>1   seq_tick_e_length=>10   seq_tick_e_offset=>0   seq_tick_e_on=>1   seq_tick_e_width=>2   seq_tick_label_anchor=>n   seq_tick_label_color=>black   seq_tick_label_fontsize=>9   seq_tick_label_multiplier=>0.001   seq_tick_label_offset=>2   seq_tick_label_on=>1   seq_tick_length=>10   seq_tick_offset=>0   seq_tick_on=>1   seq_tick_whole=>0   seq_tick_width=>2   seq_width=>3   showqueryonly=>0   sub_arrow_diag=>5   sub_arrow_on=>0   sub_arrow_paral=>5   sub_arrow_perp=>4   sub_color=>lightgreen   sub_initoffset=>30   sub_labelhit_col=>13   sub_labelhit_color=>black   sub_labelhit_offset=>0   sub_labelhit_on=>0   sub_labelhit_pattern=>0?([0-9.]{4})   sub_labelhit_size=>9   sub_labelseq_col=>0   sub_labelseq_col2=>4   sub_labelseq_col2_pattern=>   sub_labelseq_col_pattern=>   sub_labelseq_color=>black   sub_labelseq_offset=>0   sub_labelseq_on=>1   sub_labelseq_size=>6   sub_labelseqe_col=>4   sub_labelseqe_col2=>0   sub_labelseqe_col2_pattern=>   sub_labelseqe_col_pattern=>   sub_labelseqe_color=>black   sub_labelseqe_offset=>0   sub_labelseqe_on=>0   sub_labelseqe_size=>6   sub_line_spacing=>9   sub_on=>1   sub_scale_categoric_string=>   sub_scale_col=>   sub_scale_col2=>   sub_scale_col2_pattern=>   sub_scale_col_pattern=>   sub_scale_hline_color=>grey   sub_scale_hline_on=>1   sub_scale_hline_width=>1   sub_scale_label_color=>black   sub_scale_label_fontsize=>12   sub_scale_label_multiplier=>100   sub_scale_label_offset=>1   sub_scale_label_on=>1   sub_scale_label_pattern=>   sub_scale_lines=>10   sub_scale_max=>1.00   sub_scale_min=>0.80   sub_scale_on=>0   sub_scale_step=>0.01   sub_scale_tick_color=>black   sub_scale_tick_length=>9   sub_scale_tick_offset=>4   sub_scale_tick_on=>1   sub_scale_tick_width=>3   sub_scale_vline_color=>black   sub_scale_vline_offset=>-5   sub_scale_vline_on=>1   sub_scale_vline_width=>2   sub_width=>8   template_desc_on=>1   text2_anchor=>nw   text2_color=>red   text2_offset=>0   text2_offset_h=>0   text2_on=>1   text2_size=>20   text2_text=>   text_anchor=>nw   text_color=>red   text_fontsize=>20   text_offset=>0   text_offset_h=>0   text_on=>1   text_text=>   window_font_size=>9   window_height=>550   window_width=>800


APPENDIX C: INSTALLATION (WINDOWS OR UNIX)

Parasight has been tested extensively on Solaris, Linux, and MsWindows. Perl is available from www.perl.org. ActiveState (www.activestate.com) has binary versions available for many platforms--particularly useful for Windows installs. Follow instructions on the choosen sites for installing Perl. Unix installs should be easier simply because you probably have more experience with Perl or you have a network administrator. Windows installs are quite easy--just like installing any other program. Once the install is done put parasight program in the Perl bin directory (usually C:\Perl\bin). If you need to install any Perl modules such as Tk consult the individual OS. For Windows Active State binary the PPM provides easy searches and installations of modules. UNIX environments can utilize the CPAN module..

If there is a strong need a standalone versionsof the program that are package together with all need Perl functions could be generated using ActiveState's PerlApp program. All needed components are contained within the ``packed up'' executable for both Linux, Solaris, and Windows. No installation of Perl is needed. Note this is not a compiled version, so the run speed will be the same as the non-PerlApp-packaged program. It is actually just an executable that has collected all of the Perl components required for Parasight to run.


APPENDIX D: PRECODE HINTS

Precode affords the ability to add additional code to further manipulate parasight. Extensive use of precode is found in the parasight.examples file. The best way to figure out how to manipulate parasight is to study all of the parasight code. Of course even I am trying to forget most of the code so the following are useful subroutines to abuse:

First the hash variable containing all of the command line options is %opt. So, if you want to chance arrangesub you have to use the code $opt{'arrangesub'};

Useful commands to use when scripting:

 $opt{'x'}

Any normal option can be accessed within the hash %opt.

 &reshowNredraw; &update;
 
These two subroutines will cause the any changes in options to be redrawn and updated on the screen.  While update is not normally used in the internal code (as it is called automatically whenever control is returned to the GUI), it is necessary when a script has control of parsight.

 &print_screen(0, "fileoutpath");

This will print a postscript of the visble screen to the designated file. If 1 is used for the initial print varaible then the postscript will be sent to the printer. If zero is used only the file is created.

 &print_all (1, "fileoutpath");

This will print a poscript of the entire parasight area to the designated file. If 1 is used for the intial print variable then the postscript will be sent to the printer. If zero is used only the file is created. Depending upon the multipage options, multiple files may be created.

 &save_parasight_table("basefileoutpath");

To save as parasight formated files which can be reload with the -in ``basefileoutpath'' name.

 &fitlongestline;

This will force the length of the screen in bases to the length of the longest sequence. This is most useful for BLAST views.

 $opt{'die'}=0;

This is useful to turn off the die option if you are subsequently saving the parasight files. Otherwise when you load the saved parasight it will ``die'' before you get to see it.

 &reshowNredraw; &update; print "PAUSED\n"; my $pause=<STDIN>; 

A useful sequence of commands if you want to pause for the user.

 $opt{"text_text"}="This is displayed text."; $opt{"text_fontsize"}=16;         $opt{"text_offset_h"}=10;

Allows for a line of text to be printed within the image. text2_text allows for a second line.


APPENDIX E: ADDITIONAL EXAMPLES

 parasight -showseq show.file -extra repeat.file:exon.file

This draws the sequences specified in show.file decorated with the repeats and exons specified in repeat.file and exon.file. Note: this example does not contain any alignments so show.file is required in order to specify the lengths of the sequencesto be displayed.

 parasight -in saved  -extra exons:introns 
     -arrangeseq oneperline

This loads a saved parasight, adds extra annotation from the files exons and introns annotation. It arranges subjects one per line below the sequence


AUTHOR

Jeff Bailey (jab@cwru.edu)


ACKNOWLEDGEMENTS

This software was developed in the laboratory of Evan Eichler, Department of Genetics,Case Western Reserve University and University Hosiptals, Cleveland.


COPYRIGHT

Copyright (C) 2001-3 Jeff Bailey. Distribute and modify freely as defined by the GNU General Public License.


DISCLAIMER

This software is provided ``as is'' without warranty of any kind.