SeekDeep
  • Home
  • Installing
    • Mac OS
    • Ubuntu
    • Windows
    • Vagrant/virtual image (any system)
  • Code
    • Github
  • Usages

    • Pipeline
    • extractor/extractorPairedEnd
    • makeSampleDirectories
    • qluster
    • processClusters
    • popClusteringViewer

    • Pipeline Wrapper
    • setupTarAmpAnalysis

    • Utilities
    • genTargetInfoFromGenomes
    • SeekDeep control mixture benchmarking
    • SeekDeep Variant Calling
  • Tutorials

    • Single End
    • Ion Torrent with MIDs

    • Illumina Paired End
    • Paired End No MIDs/Barcodes
    • Paired End With MIDs/Barcodes
  • Misc Info
    • Illumina Paired End Info
  • References
    • versions
    • References

    Contents

    • SeekDeep
    • Targeted Amplicon Analysis
    • Plasmodium Population
    • Multiplexed Patient Data
    • Contact

    Slide 1

    Code

    Code can be cloned on github and compiled on Mac, Ubuntu, and other Unix based systems

    Go to github

    Slide 2

    Full Pipeline Tutorials

    Full pipeline tutorials utilizing all of SeekDeep are available

    Illumina Paired End Illumina Paired End Multiplexed

    Slide 3

    Individual Programs

    Each part of SeekDeep can be used by itself

    extractor qluster processClusters

    SeekDeep

    Targeted Amplicon Analysis

    The SeekDeep pipeline is intended for use on targeted amplicon sequencing data for haplotype frequency estimation for multiple samples from a population. This is broken down into three steps.

    1. extractor - Demultiplexing and read filtering
    2. qluster - Clustering and haplotype prediction/frequency estimation per demultiplex data subset
    3. processClusters - Possible multiple PCR replicate comparison and comparison of haplotypes across samples
    4. popClusteringViewer - A html server to view final results, example http://seekdeep.brown.edu/SeekDeepExample

    Hathaway, Nicholas J, Christian M Parobek, Jonathan J Juliano, and Jeffrey A Bailey. 2017. “SeekDeep: Single-Base Resolution de Novo Clustering for Amplicon Deep Sequencing.” Nucleic Acids Res. 46 (4): e21.

    extractor

    Extractor can de-multiplex sequence data on sample MIDs and/or on primer pairs from 454, Ion Torrent, and Illumina. Can also apply several filtering parameters including read length and quality scores

    View Usage »

    qluster

    qluster takes a single sample sequence file and clusters the reads based on common errors seen in sequencing and pcr.

    View Usage »

    processClusters

    Does possible pcr replicate comparisons, performs some final data processing, and can do a simple population comparison across samples

    View Usage »


    Plasmodium Population

    The SeekDeep pipeline grew out of the work done by Bailey Lab and Collaborators on studying the Plasmodium parasite (Malaria) and is therefore shaped by this work

    Multiplexed Patient Data

    Full Tutorial Found Here

    The SeekDeep pipeline was developed for work with targeted amplicon sequencing on patient/individual samples with dual PCR replicates and therefore is best used with this set up though it can also be used other set ups as well (multiple targets, single replicate data, etc.) Below is a schematic of simplified example of this normal set up.

    Samples are taken from several patients/individuals and each sample has two PCR replicates done with different barcodes. Barcoded data is pooled and sequenced normally requiring several sequencing runs (which means the same barcodes can be used for different pools). Each sequencing run are demultiplexed by SeekDeep extractor to recover replicate organization.

    Each demultiplex file is then clustered by SeekDeep qluster to estimate haplotypes diversity and frequency in each replicate.

    Output data from each qluster run is then organized into output directory (this step is helped with use of SeekDeep makeSampleDirectories). This results directory is then analyzed by SeekDeep processClusters to do replicate comparison to keep haplotypes found only in both replicates and to do some final data filtering. Final data from the replicate comparison is then analyzed together to get a population analysis to get info on the haplotypes appearing in the whole population


    Contact

    Any questions or concerns should be directed to Nick Hathaway
    nicholas.hathaway@umassmed.edu
    nickjhathaway@gmail.com
    nickjhathaway@github
    © 2015- Bailey Lab

    Source Code
    ---
    nocite: |
      @Hathaway2017-ib
    ---
    
    
    <!-- Carousel
    ================================================== -->
    <div id="myCarousel" class="carousel slide" data-bs-ride="carousel">
    <!-- Carousel indicators -->
    <ol class="carousel-indicators">
    <li data-bs-target="#myCarousel" data-bs-slide-to="0" class="active"></li>
    <li data-bs-target="#myCarousel" data-bs-slide-to="1"></li>
    <li data-bs-target="#myCarousel" data-bs-slide-to="2"></li>
    </ol>
    
    <!-- Wrapper for carousel items -->
    <div class="carousel-inner">
    <div class="carousel-item active">
    <img src="images/background_images/background_images.001.jpeg" class="d-block w-100" alt="Slide 1">
    <div class="carousel-caption d-none d-md-block">
    <h1>Code</h1>
    <p>Code can be cloned on github and compiled on Mac, Ubuntu, and other Unix based systems</p>
    <p><a class="btn btn-lg btn-primary" href="https://github.com/bailey-lab/SeekDeep" role="button">Go to github</a></p>
    </div>
    </div>
    <div class="carousel-item">
    <img src="images/background_images/background_images.002.jpeg" class="d-block w-100" alt="Slide 2">
    <div class="carousel-caption d-none d-md-block">
    <h1>Full Pipeline Tutorials</h1>
    <p>Full pipeline tutorials utilizing all of SeekDeep are available</p>
    <p><a class="btn btn-lg btn-primary" role="button" href="tutorial_PairedEnd_noMIDs.html">Illumina Paired End</a>
    <a class="btn btn-lg btn-primary" role="button" href="tutorial_PairedEnd_withMIDs.html">Illumina Paired End Multiplexed</a></p>
    </div>
    </div>
    <div class="carousel-item">
    <img src="images/background_images/background_images.003.jpeg" class="d-block w-100" alt="Slide 3">
    <div class="carousel-caption d-none d-md-block">
    <h1>Individual Programs</h1>
    <p>Each part of SeekDeep can be used by itself</p>
    <p><a class="btn btn-lg btn-primary" href="extractor_usage.html" role="button">extractor</a> 
    <a class="btn btn-lg btn-primary" href="qluster_usage.html" role="button">qluster</a> 
    <a class="btn btn-lg btn-primary" href="processClusters_usage.html" role="button">processClusters</a></p>
    </div>
    </div>
    </div>
    
    <!-- Carousel controls -->
    <a class="carousel-control-prev" href="#myCarousel" data-bs-slide="prev">
    <span class="carousel-control-prev-icon"></span>
    </a>
    </div>
    
    
    
    ### SeekDeep  
    ### <span class="text-muted">Targeted Amplicon Analysis</span>  
    
    
    ::: {.grid}
    
    ::: {.g-col-7}
    The SeekDeep pipeline is intended for use on targeted amplicon sequencing data for haplotype frequency estimation for multiple samples from a population.  This is broken down into three steps.  
    
    1.  **extractor** - Demultiplexing and read filtering  
    1.  **qluster** - Clustering and haplotype prediction/frequency estimation per demultiplex data subset  
    1.  **processClusters** - Possible multiple PCR replicate comparison and comparison of haplotypes across samples  
    1.  **popClusteringViewer** - A html server to view final results, example <http://seekdeep.brown.edu/SeekDeepExample>  
    :::
    
    ::: {.g-col-5}
    ![](images/full pipeline.jpg)
    :::
    
    :::
    
    ::: {#refs}
    
    :::
    
    
    ------
    
    ::: {.grid}
    
    ::: {.g-col-4 .text-center}
    ![](images/extractor.gif){.rounded-circle}  
    
    ### extractor  
    Extractor can de-multiplex sequence data on sample MIDs and/or on primer pairs from 454, Ion Torrent, and Illumina.  Can also apply several filtering parameters including read length and quality scores   
    <p><a class="btn btn-default" href="extractor_usage.html" role="button">View Usage »</a></p> 
    
    :::
    
    ::: {.g-col-4 .text-center}
    ![](images/clustering.gif){.rounded-circle}  
    
    ### qluster  
    qluster takes a single sample sequence file and clusters the reads based on common errors seen in sequencing and pcr.   
    <p><a class="btn btn-default" href="qluster_usage.html" role="button">View Usage »</a></p> 
    
    :::
    
    ::: {.g-col-4 .text-center}
    ![](images/processClusters.gif){.rounded-circle}
    
    ### processClusters  
    Does possible pcr replicate comparisons, performs some final data processing, and can do a simple population comparison across samples     
    <p><a class="btn btn-default" href="processClusters_usage.html" role="button">View Usage »</a></p> 
    
    :::
    
    :::
    
    ------
    
    
    ### Plasmodium Population  
    The SeekDeep pipeline grew out of the work done by <a href= "http://www.baileylab.org/">Bailey Lab</a> and <a href = "https://www.med.unc.edu/infdis/ideel">Collaborators</a> on studying the Plasmodium parasite (Malaria) and is therefore shaped by this work  
    
    ![](images/malaraia_design.001.jpg)
    
    <!-- START THE FEATURETTES -->
    
    ### Multiplexed Patient Data  
    <span class="text-muted">Full Tutorial Found <a href="multiplexTutorial_cmds.html">Here</a></span>  
    
    The SeekDeep pipeline was developed for work with targeted amplicon sequencing on patient/individual samples with dual PCR replicates and therefore is best used with this set up though it can also be used other set ups as well (multiple targets, single replicate data, etc.) Below is a schematic of simplified example of this normal set up.<br><br>  Samples are taken from several patients/individuals and each sample has two PCR replicates done with different barcodes.  Barcoded data is pooled and sequenced normally requiring several sequencing runs (which means the same barcodes can be used for different pools).  Each sequencing run are demultiplexed by <a href = "extractor_usage.html">SeekDeep extractor</a> to recover replicate organization.<br><br> Each demultiplex file is then clustered by <a href = "qluster_usage.html">SeekDeep qluster</a> to estimate haplotypes diversity and frequency in each replicate. <br><br> Output data from each <strong>qluster</strong> run is then organized into output directory (this step is helped with use of <a href = "makeSampleDirectories_usage.html">SeekDeep makeSampleDirectories</a>).  This results directory is then analyzed by <a href = "processClusters_usage.html">SeekDeep processClusters</a> to do replicate comparison to keep haplotypes found only in both replicates and to do some final data filtering.  Final data from the replicate comparison is then analyzed together to get a population analysis to get info on the haplotypes appearing in the whole population
    
    ![](images/SeekDeep_Full_Pipeline.jpg)
    
    ------ 
    
    ### Contact  
    Any questions or concerns should be directed to Nick Hathaway  
    nicholas.hathaway@umassmed.edu  
    nickjhathaway@gmail.com  
    <a href= "https://github.com/nickjhathaway">nickjhathaway@github</a>  
    <a href="http://www.baileylab.org">© 2015- Bailey Lab</a>