Looking for ways
to:
|
You have come to the right place... |
| Step | Command-line | Web-based | Results in... | Results name |
| Project Creation | stack_ProjectManager | WebPipe | project created in database with project name, owner and description | N/A |
| Data Import | stack_ImportFasta or stack_ImportGenbank | WebPipe | sequences with original accession numbers stored in database | sequences' original accession numbers |
| Masking | stack_Mask | WebPipe | original sequence overwritten with masked version of sequence; what is masked is dependent on the makeup of each site's masking file. We recommend that each site create a masking file appropriate for their use. | sequences' original accession numbers |
| Clustering | stack_Cluster | WebPipe | Large, loose groupings of sequences brought together because they share similar words. These are grouped using the d2_cluster algorithm. | cl#
group of members no consensus or alignment is generated |
| Cluster Assembly | stack_Assemble | WebPipe | One or more contigs created by attempting to assemble a loose cluster using PHRAP. | ct#
PHRAP assembly PHRAP consensus (also known as contig consensus) |
| Assembly Analysis | stack_Analysis | WebPipe | Each contig is analyzed using CRAW as well as stack_Analysis to identify possible subassemblies (potential alternate forms). Final consensus sequences are created - one for each subassembly. The primary consensus is chosen from all possible subassembly consensus sequences. | cn#
Final consensus sequence(s) for the contig Alignment Analysis View (CRAW output) CRAW alignment (final processed alignment resulting in final consensus) |
| Linking | stack_Link | WebPipe | Non-overlapping clusters are grouped based on sharing clone IDs (e.g., 3' and 5' clusters that do not overlap but come from the same clone). A consensus sequence is generated by concatenating the primary consensus sequences from each cluster in the linked group. These are separated by a series of 10 N's. | ln#
linked consensus sequence |
The stackPACK system and STACKdb database of clustered human ESTs and mRNAs are described in more detail in two publications:
The /etc/stackpack configuration file contains three sections:
A full description of each parameter in the /etc/stackpack file
is provided below.
It is important to note that the .stackpackrc file is only effective when the user is running stackPACK from the command line. Any web-based use of stackPACK is governed by the system wide /etc/stackpack file.
The .stackpackrc file contains two sections:
To construct a .stackpackrc file, do the following:
| Parameter | Settings | Effects |
| System Configuration | ||
| STACKPACK_BASE | Directory location where stackPACK installation sits | change if stackPACK installation is moved to a different location |
| STACKPACK_BIN | Directory location where stackPACK executables are stored | |
| STACKPACK_LIB | Directory location where stackPACK library files are stored | |
| STACKPACK_LIB_EXTERNAL | Directory location where third-party library files are stored | |
| STACKPACK_TMP | Location of temporary directory used to store intermediate results while stackPACK processing occurs. | If this directory runs out of space, stackPACK will not run. Ensure the temp directory is in a location with plenty of space. More users and larger projects require more temp space. |
| STACKPACK_LIB_PYTHON | Directory location where stackPACK's python libraries are stored | |
| STACKPACK_SUPPORTING | Directory location where supporting files are stored. E.g., this is the default location for the repeats file distributed with stackPACK and the script used to upgrade projects from 2.0 to 2.1 format. | |
| STACKPACK_LOG | Location of stackCORBAd log file. | |
| [DATABASE] | ||
| ODBCSYSINI | location of odbc files that identify location and type of database used behind stackPACK | |
| DSN_NAME | table containing list of stackPACK projects | |
| DSN_LOGIN | login used by stackPACK when accessing the backend database | |
| DSN_PASSWORD | password used by stackPACK when accessing the backend database | |
| [WEBPROBE] | ||
| HTTP_SERVER | web server used for WebProbe viewing software | |
| HTTPD_LOCATION | location of cgi and html directories where WebProbe is stored | |
| HTML_LOCATION | subdirectory where WebProbe html files are stored | |
| CGI_LOCATION | subdirectory where WebProbe cgi files are stored | |
| [WEBPIPE] | ||
| SMTP_SERVER | mail server used to send confirmation and conclusion messages at the beginning and end of each web-based stackPACK run | |
| HTTP_SERVER | web server used by WebPipe clustering submission software | |
| HTTPD_LOCATION | location of cgi and html directories where WebPipe is stored | |
| HTML_LOCATION | subdirectory containing WebPipe html files | |
| CGI_LOCATION | Full path of directory containing WebPipe cgi files | |
| INSTALLED | Yes/No whether or not WebPipe is installed.
This is set to "no" for sites who have purchased STACKdb and not stackPACK |
If "NO", WebPipe will not appear in the menus for web-based stackPACK |
| [ILU] | ||
| ILU_LIB | location of ilu libraries | |
| ILU_BINDING | location of ilubinding directory | |
| ILU_SERVERNAME | name of the ilu CORBA server | |
| ILU_MAXREQUESTS | number of client requests the server will handle before shutting itself down. | Increase if you experience crashes under heavy usage, decrease if stackCORBAd uses too much memory. |
| ILU_GARBAGESIZE | Number of client connections that can be maintained simultaneously. | Increase if you experience crashes due to heavy use. |
| User Editable Parameters | ||
| [stack_Mask] | ||
| program | either cross_match or RepeatMasker | |
| mask_file | full path for location of repeats file to be used for masking | |
| num_cpus | number of CPUs on which the masking should be run | increasing # cpus can speed processing, but will also use more memory. These two factors must be balanced. |
| batch_size | for cross_match, number of sequences processed in each 'batch' | higher number speeds cross_match, but uses more memory. If you are running out of memory, reduce this number |
| [stack_Cluster] | ||
| num_cpus | number of CPUs on which the d2_cluster algorithm should be run | more CPUs = faster speed |
| [stack_Assemble] | ||
| num_cpus | number of CPUs on which the assembly should be run | increasing # cpus can speed processing, but will also use more memory. These two factors must be balanced. |
| [stack_Analysis] | ||
| [stack_Link] | ||
| redundancy | number of independent cloneIDs that must match before two clusters are considered "linked" | If this number is low (e.g., 1) and you are using public data, you may experience spurious linking due to errors in the public data annotations. |
| max_seq_per_clone | Maximum number of sequences that may have the same clone ID | This is used as a check-and-balance to ensure that erroneous cloneID information doesn't slip through and cause, e.g., all data to link together |
| External Programs | ||
| [cross_match] | ||
| executable | full path to program executable | |
| flags | you can set all flags (parameters) for cross_match by entering them here | See cross_match documentation for details about the effects of changing cross_match parameters. |
| [RepeatMaster} | ||
| executable | full path to program executable | |
| flags | you can set all flags (parameters) for RepeatMasker by entering them here | See RepeatMasker documentation for details about the effects of changing RepeatMasker parameters. |
| [enc_db] | ||
| executable | full path to program executable | |
| [d2_cluster] | ||
| executable | full path to program executable | See d2_cluster
paper for more details on parameters and parameterization.
Remember, d2_cluster is NOT an alignment based similarity algorithm and so results and parameterization are different from alignment based methods. |
| word_size | size of words used to calculate comparison | Increased word size increases selectivity.
Maximum word size is 9. Default word size has been optimized - change this parameter with care. |
| similarity_cutoff | percent similarity within window required for positive match | Lower similarity can create looser cluster and increase cluster membership.
If similarity is too low, the assembly stage may have difficulty aligning the sequences. |
| minimum_sequence_size | sequences below this length (prior to masking) are not processed through d2_cluster | |
| window_size | the window within which comparisons are made | Window size can be increased if your sequences are all longer (e.g.,
clustering only mRNAs).
Clustering with longer window sizes, however, runs the risk of potentially missing regions of variation that are shorter than the window. |
| reverse_comparisons | should reverse complement sequences also be compared | value = 1 then both sequence and its reverse complement are used in
the d2_cluster comparison.
value = 0 then only the sequence in its original orientation is used. |
| [phrap] | ||
| executable | full path to program executable | |
| old_ace | controls phrap output format | for older versions of PHRAP, set to 0
for current version of PHRAP, set to 1 |
| vector_bound | Number of potential vector bases at beginning of each read. Matches
that lie entirely within this region are assumed to represent vector matches and are ignored. |
Currently set to 0 so no bases are ignored. Increasing this number will decrease the number of bases used in calculating the phrap alignment. |
| trim_score | Minimum score for identifying degenerate sequence at
beginning & end of read. |
|
| forcelevel | Relaxes stringency to varying degree during final
contig merge pass. Allowed values are integers from 0 (most stringent) to 10 (least stringent), inclusive. |
Increasing this will bring more sequences into the contig. |
| penalty | Mismatch (substitution) penalty for SWAT comparisons. | Increasing this will make alignment comparisons more stringent (i.e., less mismatches allowed). |
| gap_init | Gap initiation penalty for SWAT comparisons. | Increasing penalty will decrease the number of gaps allowed. |
| gap_ext | Gap extension penalty for SWAT comparisons. | Increasing penalty will decrease the size/length of gaps allowed. |
| ins_gap_ext | Insertion gap extension penalty for SWAT comparisons (insertion in
subject relative to query). |
|
| del_gap_ext | Deletion gap extension penalty for SWAT comparisons (deletion
in
subject relative to query) |
|
| maxgap | Maximum permitted size of an unmatched region in
merging contigs, during first (most stringent) merging pass. |
Increased size will allow large unmatching regions and may allow alternate forms to be aligned more readily. |
| flags | you can set all remaining/other flags (parameters) for phrap by entering them here | See PHRAP documentation for more details on parameters for PHRAP. |
| [ace2gde] | ||
| executable | full path to program executable | |
| [craw] | ||
| executable | full path to program executable | |
| sig | ||
| window_size | Window used for CRAW calculation | |
| ignore_first | Number of bases at beginning of sequence that are ignored | |