Code

Code can be cloned on github and compiled on Mac, Ubuntu, and other Unix based systems

Go to github

Full Pipeline Tutorials

Full pipeline tutorials utilizing all of SeekDeep are available

Illumina Paired End Illumina Paired End Multiplexed

Individual Programs

Each part of SeekDeep can be used by itself

extractor qluster processClusters

SeekDeep

Targeted Amplicon Analysis

The SeekDeep pipeline is intended for use on targeted amplicon sequencing data for haplotype frequency estimation for multiple samples from a population. This is broken down into three steps.

extractor - Demultiplexing and read filtering
qluster - Clustering and haplotype prediction/frequency estimation per demultiplex data subset
processClusters - Possible multiple PCR replicate comparison and comparison of haplotypes across samples
popClusteringViewer - A html server to view final results, example http://seekdeep.brown.edu/SeekDeepExample

Hathaway, Nicholas J, Christian M Parobek, Jonathan J Juliano, and Jeffrey A Bailey. 2017. “SeekDeep: Single-Base Resolution de Novo Clustering for Amplicon Deep Sequencing.” Nucleic Acids Res. 46 (4): e21.

extractor

Extractor can de-multiplex sequence data on sample MIDs and/or on primer pairs from 454, Ion Torrent, and Illumina. Can also apply several filtering parameters including read length and quality scores

View Usage »

qluster

qluster takes a single sample sequence file and clusters the reads based on common errors seen in sequencing and pcr.

View Usage »

processClusters

Does possible pcr replicate comparisons, performs some final data processing, and can do a simple population comparison across samples

View Usage »

Plasmodium Population

The SeekDeep pipeline grew out of the work done by Bailey Lab and Collaborators on studying the Plasmodium parasite (Malaria) and is therefore shaped by this work

Multiplexed Patient Data

Full Tutorial Found Here

The SeekDeep pipeline was developed for work with targeted amplicon sequencing on patient/individual samples with dual PCR replicates and therefore is best used with this set up though it can also be used other set ups as well (multiple targets, single replicate data, etc.) Below is a schematic of simplified example of this normal set up.

Samples are taken from several patients/individuals and each sample has two PCR replicates done with different barcodes. Barcoded data is pooled and sequenced normally requiring several sequencing runs (which means the same barcodes can be used for different pools). Each sequencing run are demultiplexed by SeekDeep extractor to recover replicate organization.

Each demultiplex file is then clustered by SeekDeep qluster to estimate haplotypes diversity and frequency in each replicate.

Output data from each qluster run is then organized into output directory (this step is helped with use of SeekDeep makeSampleDirectories). This results directory is then analyzed by SeekDeep processClusters to do replicate comparison to keep haplotypes found only in both replicates and to do some final data filtering. Final data from the replicate comparison is then analyzed together to get a population analysis to get info on the haplotypes appearing in the whole population

Contact

Any questions or concerns should be directed to Nick Hathaway
nicholas.hathaway@umassmed.edu
nickjhathaway@gmail.com
nickjhathaway@github
© 2015- Bailey Lab

--- nocite: | @Hathaway2017-ib ---  <div id="myCarousel" class="carousel slide" data-bs-ride="carousel">  <ol class="carousel-indicators"> <li data-bs-target="#myCarousel" data-bs-slide-to="0" class="active"></li> <li data-bs-target="#myCarousel" data-bs-slide-to="1"></li> <li data-bs-target="#myCarousel" data-bs-slide-to="2"></li> </ol>  <div class="carousel-inner"> <div class="carousel-item active"> <img src="images/background_images/background_images.001.jpeg" class="d-block w-100" alt="Slide 1"> <div class="carousel-caption d-none d-md-block"> <h1>Code</h1> <p>Code can be cloned on github and compiled on Mac, Ubuntu, and other Unix based systems</p> <p><a class="btn btn-lg btn-primary" href="https://github.com/bailey-lab/SeekDeep" role="button">Go to github</a></p> </div> </div> <div class="carousel-item"> <img src="images/background_images/background_images.002.jpeg" class="d-block w-100" alt="Slide 2"> <div class="carousel-caption d-none d-md-block"> <h1>Full Pipeline Tutorials</h1> <p>Full pipeline tutorials utilizing all of SeekDeep are available</p> <p><a class="btn btn-lg btn-primary" role="button" href="tutorial_PairedEnd_noMIDs.html">Illumina Paired End</a> <a class="btn btn-lg btn-primary" role="button" href="tutorial_PairedEnd_withMIDs.html">Illumina Paired End Multiplexed</a></p> </div> </div> <div class="carousel-item"> <img src="images/background_images/background_images.003.jpeg" class="d-block w-100" alt="Slide 3"> <div class="carousel-caption d-none d-md-block"> <h1>Individual Programs</h1> <p>Each part of SeekDeep can be used by itself</p> <p><a class="btn btn-lg btn-primary" href="extractor_usage.html" role="button">extractor</a> <a class="btn btn-lg btn-primary" href="qluster_usage.html" role="button">qluster</a> <a class="btn btn-lg btn-primary" href="processClusters_usage.html" role="button">processClusters</a></p> </div> </div> </div>  <a class="carousel-control-prev" href="#myCarousel" data-bs-slide="prev"> <span class="carousel-control-prev-icon"></span> </a> </div> ### SeekDeep ### <span class="text-muted">Targeted Amplicon Analysis</span> ::: {.grid} ::: {.g-col-7} The SeekDeep pipeline is intended for use on targeted amplicon sequencing data for haplotype frequency estimation for multiple samples from a population. This is broken down into three steps. 1. **extractor** - Demultiplexing and read filtering 1. **qluster** - Clustering and haplotype prediction/frequency estimation per demultiplex data subset 1. **processClusters** - Possible multiple PCR replicate comparison and comparison of haplotypes across samples 1. **popClusteringViewer** - A html server to view final results, example <http://seekdeep.brown.edu/SeekDeepExample> ::: ::: {.g-col-5} ![](images/full pipeline.jpg) ::: ::: ::: {#refs} ::: ------ ::: {.grid} ::: {.g-col-4 .text-center} ![](images/extractor.gif){.rounded-circle} ### extractor Extractor can de-multiplex sequence data on sample MIDs and/or on primer pairs from 454, Ion Torrent, and Illumina. Can also apply several filtering parameters including read length and quality scores <p><a class="btn btn-default" href="extractor_usage.html" role="button">View Usage »</a></p> ::: ::: {.g-col-4 .text-center} ![](images/clustering.gif){.rounded-circle} ### qluster qluster takes a single sample sequence file and clusters the reads based on common errors seen in sequencing and pcr. <p><a class="btn btn-default" href="qluster_usage.html" role="button">View Usage »</a></p> ::: ::: {.g-col-4 .text-center} ![](images/processClusters.gif){.rounded-circle} ### processClusters Does possible pcr replicate comparisons, performs some final data processing, and can do a simple population comparison across samples <p><a class="btn btn-default" href="processClusters_usage.html" role="button">View Usage »</a></p> ::: ::: ------ ### Plasmodium Population The SeekDeep pipeline grew out of the work done by <a href= "http://www.baileylab.org/">Bailey Lab</a> and <a href = "https://www.med.unc.edu/infdis/ideel">Collaborators</a> on studying the Plasmodium parasite (Malaria) and is therefore shaped by this work ![](images/malaraia_design.001.jpg)  ### Multiplexed Patient Data <span class="text-muted">Full Tutorial Found <a href="multiplexTutorial_cmds.html">Here</a></span> The SeekDeep pipeline was developed for work with targeted amplicon sequencing on patient/individual samples with dual PCR replicates and therefore is best used with this set up though it can also be used other set ups as well (multiple targets, single replicate data, etc.) Below is a schematic of simplified example of this normal set up.<br><br> Samples are taken from several patients/individuals and each sample has two PCR replicates done with different barcodes. Barcoded data is pooled and sequenced normally requiring several sequencing runs (which means the same barcodes can be used for different pools). Each sequencing run are demultiplexed by <a href = "extractor_usage.html">SeekDeep extractor</a> to recover replicate organization.<br><br> Each demultiplex file is then clustered by <a href = "qluster_usage.html">SeekDeep qluster</a> to estimate haplotypes diversity and frequency in each replicate. <br><br> Output data from each <strong>qluster</strong> run is then organized into output directory (this step is helped with use of <a href = "makeSampleDirectories_usage.html">SeekDeep makeSampleDirectories</a>). This results directory is then analyzed by <a href = "processClusters_usage.html">SeekDeep processClusters</a> to do replicate comparison to keep haplotypes found only in both replicates and to do some final data filtering. Final data from the replicate comparison is then analyzed together to get a population analysis to get info on the haplotypes appearing in the whole population ![](images/SeekDeep_Full_Pipeline.jpg) ------ ### Contact Any questions or concerns should be directed to Nick Hathaway nicholas.hathaway@umassmed.edu nickjhathaway@gmail.com <a href= "https://github.com/nickjhathaway">nickjhathaway@github</a> <a href="http://www.baileylab.org">© 2015- Bailey Lab</a>