Sentieon
Sentieon 中文手册
Sentieon 中文手册(上册)
Sentieon 中文手册(下册)
Sentieon 软件应用教程
Sentieon | 应用教程: 使用DNAscope对HiFi长读长数据进行胚系变异检测分析
Sentieon | 应用教程: 利用Sentieon Python API引擎为自研算法加速
Sentieon | 应用教程: 关于读段组的建议
Sentieon | 应用教程: TNscope® 使用机器学习模型进行有匹配正常样本的体细胞变异发现
Sentieon | 应用教程: CCDG使用Sentieon®的功能等效流程
Sentieon | 应用教程: 利用共识功能去除PCR重复
Sentieon | 应用教程: 适用于PacBio HiFi和Oxford Nanopore长读长测序数据的结构变异检测
Sentieon | 应用教程: 使用 Sentieon进行大型基因组重测序分析
Sentieon | 应用教程: 体细胞SNP/Indel变异检测
Sentieon | 应用教程: DNAscope使用机器学习模型进行胚系变异调用
Sentieon | 应用教程: 唯一分子标识符(UMI)
Sentieon | 应用教程: Sentieon分布模式
Sentieon | 应用教程:使用CNVscope进行CNV检测分析
Sentieon发布核心家系(trio)基因分析最佳实践方案
Sentieon推出Segdup-caller:针对片段重复区域的专用精准变异检测工具
Sentieon软件版本更新
Sentieon | 发布V202503.01版本
Sentieon | 发布V202503.02版本
Sentieon软件快速入门指南
Sentieon 软件模块总述
Sentieon 特色流程 - DNAscope
Sentieon | DNAscope Illumina 流程
sentieon | DNAscope Complete Genomics 流程
Sentieon | DNAscope LongRead PacBio 流程
Sentieon | DNAscope Ultima Genomics 流程
Sentieon | DNAscope Element Bio 流程
Sentieon | DNAscope LongRead Nanopore 流程
Sentieon混合分析流程 - DNAscope Hybrid
Sentieon推出混合型短读长和长读长变异检测DNAscope Hybrid流程(上)
Sentieon推出混合型短读长和长读长变异检测DNAscope Hybrid流程(下)
Sentieon | 泛基因组分析流程详解
Sentieon | 物种全基因组(WGS)分析流程
Sentieon | 植物全基因组(GWS)分析流程
毅硕Sentieon | 小麦(Triticum_aestivum)全基因组WGS分析流程
毅硕Sentieon | 水稻(Oryza_sativa)全基因组WGS分析流程
毅硕Sentieon | 拟南芥(Arabidopsis_thaliana)全基因组WGS分析流程
毅硕Sentieon | 马铃薯(Solanum_tuberosum)全基因组WGS分析流程
毅硕Sentieon | 巨桉(Eucalyptus grandis)全基因组WGS分析流程
毅硕Sentieon | 向日葵(Helianthus annuus)全基因组WGS分析流程
毅硕Sentieon | 野草莓(Fragaria vesca)全基因组WGS分析流程
毅硕Sentieon | 银杏(Ginkgo biloba)全基因组WGS分析流程
毅硕Sentieon | 大豆(Glycine max)全基因组WGS分析流程
毅硕Sentieon | 陆地棉(Gossypium hirsutum)全基因组WGS分析流程
Sentieon | 动物全基因组(WGS)分析流程
毅硕Sentieon | 猪(sus scrofa)全基因组WGS分析流程
毅硕Sentieon | 鸡(Gallus gallus)全基因组WGS分析流程
毅硕Sentieon | 家鼠(Mus musculus)全基因组WGS分析流程
毅硕Sentieon | 家犬(canis lupus familiaris)全基因组WGS分析流程
毅硕Sentieon | 东方蜜蜂(Apis cerana)全基因组WGS分析流程
毅硕Sentieon | 电鳗(Electrophorus electricus)全基因组WGS分析流程
毅硕Sentieon | 红隼(Falco tinnunculus)全基因组WGS分析流程
毅硕Sentieon | 家猫(Felis catus)全基因组WGS分析流程
毅硕Sentieon | 尼罗罗非鱼(Oreochromis niloticus)全基因组WGS分析流程
Sentieon文献解读
Sentieon文献解读 | Population Sequencing
Sentieon文献解读 | Agrigenomics
Sentieon | Agrigenomics-泛基因组揭示小麦结构变异与栖息地及育种的关联
Sentieon文献解读 | Genetic Disease
Sentieon文献解读 | Tumor Sequencing
Sentieon文献解读 | Benchmark and Method Study
Sentieon文献解读 | Long Read Sequencing
Sentieon文献解读 | Clinical Trial
Sentieon文献解读 | Epidemiology
Sentieon文献解读 | Gene Editing
Sentieon文献解读 | Liquid Biopsy
-
+
首页
Sentieon | 应用教程: CCDG使用Sentieon®的功能等效流程
# 一、介绍 本文介绍如何使用Sentieon®工具实施"功能等效流程",也称为CCDG流程标准,该标准在:https://github.com/CCDG/Pipeline-Standardization/blob/master/PipelineStandard.md 中进行了描述,并发布在https://www.nature.com/articles/s41467-018-06159-4 中。为了符合该流程的版本要求,您应该使用Sentieon®工具的201704版本或更高版本。 从Sentieon®工具的201911版本开始,Sentieon® BWA更新为0.7.17版本;BWA 0.7.17版本在其输出中生成MC MateTags,而samblaster addMateTags不会删除此MC标签并将自己的MC标签添加到BAM文件中,从而创建了重复的MC标签。 --- # 二、命令行等效性 ## 1. 比对 CCDG功能等效管道中的比对阶段使用BWA-MEM 0.7.15版本完成: ``` FASTA=Homo_sapiens_assembly38.fasta NT=$(nproc) bwa mem -R "@RG\tID:$RGID\tSM:$SM\tPL:$PL" -t $NT -K 100000000 -Y $FASTA $FASTQ1 $FASTQ2 | \ samblaster --addMateTags -a | \ samtools view -Sbhu - | \ sambamba sort -n -t $nt --tmpdir tmp -o sorted.bam /dev/stdin ``` 要使用Sentieon®运行等效命令,请执行以下操作: ``` sentieon bwa mem -R "@RG\tID:$RGID\tSM:$SM\tPL:$PL" -t $NT -K 100000000 -Y $FASTA $FASTQ1 $FASTQ2 | \ samblaster --addMateTags -a | \ util sort --sam2bam -i - -r $FASTA -t $nt -o sorted.bam ``` 要在使用Sentieon®版本201911或更高版本时运行等效命令,请执行以下操作: ``` sentieon bwa mem -R "@RG\tID:$RGID\tSM:$SM\tPL:$PL" -t $NT -K 100000000 -Y $FASTA $FASTQ1 $FASTQ2 | \ sed 's|MC:Z:[^\t]*\t||' | \ samblaster --addMateTags -a | \ util sort --sam2bam -i - -r $FASTA -t $nt -o sorted.bam ``` ## 2. 重复标记 CCDG 功能等效管道中的重复数据删除阶段使用 Picard 2.4 或更高版本完成: ``` java -Xmx48g -jar $picard MarkDuplicates I=sorted.bam METRICS_FILE=markdup_metrics.txt \ ASSUME_SORT_ORDER=queryname QUIET=true COMPRESSION_LEVEL=0 O=/dev/stdout | \ sambamba sort -t $NT --tmpdir tmp -o markduped.bam /dev/stdin ``` 要使用 Sentieon® 运行等效命令,请执行以下操作: ``` sentieon driver -t $nt -r $FASTA -i sorted.bam --algo LocusCollector --fun score_info tmp_score.gz && \ sentieon driver -t $nt -r $FASTA -i sorted.bam --algo Dedup --score_info tmp_score.gz \ --output_dup_read_name --metrics dedup_metrics.txt tmp_dup_qname.txt.gz && \ sentieon driver -t $nt -r $FASTA -i sorted.bam --algo Dedup --dup_read_name tmp_dup_qname.txt.gz markduped.bam ``` Sentieon®命令使用特殊的3次重复数据删除流程来标记具有唯一或多个比对位置的读段。 --- ## 3. 使用分箱方案重新校准碱基质量分数 CCDG 功能等效管道中的 BQSR 阶段使用 GATK3 或 GATK4 完成: ``` INTERVAL_ARG="-L chr1 -L chr2 -L chr3 -L chr4 -L chr5 -L chr6 -L chr7 -L chr8 -L chr9 -L chr10 \ -L chr11 -L chr12 -L chr13 -L chr14 -L chr15 -L chr16 -L chr17 -L chr18 -L chr19 -L chr20 -L chr21 -L chr22" DOWNSAMPLE_ARG="--downsample_to_fraction .1" KNOWN_MILLS_INDELS="Mills_and_1000G_gold_standard.indels.hg38.vcf.gz" KNOWN_1000G_INDELS="Homo_sapiens_assembly38.known_indels.vcf.gz" KNOWN_DBSNP="Homo_sapiens_assembly38.dbsnp138.vcf" java -Xmx48g -jar $GATK_37 -T BaseRecalibrator -R $FASTA -I markduped.bam $DOWNSAMPLE_ARG $INTERVAL_ARG \ -knownSites $KNOWN_MILLS_INDELS -knownSites $KNOWN_1000G_INDELS -knownSites $KNOWN_DBSNP \ -o recal_data_37.table && \ java -Xmx15g -jar $GATK_37 -T PrintReads -R $fasta -I markduped.bam --BQSR recal_data_37.table -o recaled_37.bam \ --globalQScorePrior -1.0 --preserve_qscores_less_than 6 --static_quantized_quals 10 \ --static_quantized_quals 20 --static_quantized_quals 30 --disable_indel_quals && \ samtools view -C -T $fasta -@ 2 -o recaled_37.cram recaled_37.bam && \ samtools index -c recaled_37.cram recaled_37.cram.index ``` 要使用 Sentieon® 运行等效命令,请执行以下操作: ``` INTERVAL_ARG="--interval chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,\ chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22" sentieon driver -t $NT -r $FASTA --interval $INTERVAL_ARG -i markduped.bam --algo QualCal -k $KNOWN_MILLS_INDELS \ -k $KNOWN_1000G_INDELS -k $KNOWN_DBSNP recal_data_Sentieon.table && \ sentieon driver -t $NT -r $FASTA -i markduped.bam \ --read_filter QualCalFilter,table=recal_data_Sentieon.table,prior=-1.0,indel=false,levels=10/20/30,min_qual=6 \ --algo ReadWriter recaled_RW.cram ``` 请记住,Sentieon不会进行任何降采样,因为Sentieon®工具足够高效,能够处理测序中的所有深度。此外,此流程不同于实现 CCDG 功能等效管道中所需的特殊分箱的常规最佳实践流程。 --- # 三、使用 Sentieon® 的管道脚本 以下脚本将使用 Sentieon® 对输入 FASTQ 执行 CCDG 功能等效管道: ``` #!/bin/sh # ********************************************************************************* # Script to perform DNA seq variant calling using Sentieon following # the functional equivalent pipeline described in # https://github.com/CCDG/Pipeline-Standardization/blob/master/PipelineStandard.md # ********************************************************************************* # Update with the fullpath location of your sample fastq SM="sample" #sample name RGID="rg_$SM" #read group ID PL="ILLUMINA" #or other sequencing platform FASTQ_1="${SAMPLE}_r1.fastq.gz" FASTQ_2="${SAMPLE}_r2.fastq.gz" #if using 2 FASTQ inputs # Update with the location of the reference data files FASTA_DIR="/home/regression/references/hg38bundle" FASTA="$FASTA_DIR/Homo_sapiens_assembly38.fasta" KNOWN_DBSNP="$FASTA_DIR/Homo_sapiens_assembly38.dbsnp138.vcf.gz" KNOWN_INDELS="$FASTA_DIR/Homo_sapiens_assembly38.known_indels.vcf.gz" KNOWN_MILLS="$FASTA_DIR/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz" # Update with the location of the Sentieon software package and license file SENTIEON_INSTALL_DIR=/home/release/sentieon-genomics-|release_version| export SENTIEON_LICENSE=/home/Licenses/Sentieon.lic #or using licsrvr: c1n11.sentieon.com:5443 # Other settings NT=$(nproc) #number of threads to use in computation SAMBLASTER=/home/release/other_tools/samblaster-0.1.23/samblaster START_DIR=$PWD # ****************************************** # 0. Setup # ****************************************** workdir="$START_DIR/${SM}" #Determines where the output files will be stored mkdir -p $workdir logfile=$workdir/run.log exec >>$logfile 2>&1 cd $workdir # ****************************************** # main pipeline with Sentieon # ****************************************** # 1. Mapping BWA-MEM 0.7.15 util sort SENTIEON_VERSION=$($SENTIEON_INSTALL_DIR/bin/sentieon driver --version) if (( $(echo "${SENTIEON_VERSION##*-} < 201911" |bc -l) )); then $SENTIEON_INSTALL_DIR/bin/sentieon bwa mem -R "@RG\tID:$RGID\tSM:$SM\tPL:$PL" -t $NT \ -K 100000000 -Y $FASTA $FASTQ_1 $FASTQ_2 | \ $SAMBLASTER --addMateTags -a | \ $SENTIEON_INSTALL_DIR/bin/sentieon util sort -r $FASTA -o sorted.bam -t $NT --sam2bam -i - else #Sentieon 201911 and higher use BWA 0.7.17, which already produce MC tags in the output $SENTIEON_INSTALL_DIR/bin/sentieon bwa mem -R "@RG\tID:$RGID\tSM:$SM\tPL:$PL" -t $NT \ -K 100000000 -Y $FASTA $FASTQ_1 $FASTQ_2 | \ $SENTIEON_INSTALL_DIR/bin/sentieon util sort -r $FASTA -o sorted.bam -t $NT --sam2bam -i - fi # 2. Mark Duplicates with Sentieon $SENTIEON_INSTALL_DIR/bin/sentieon driver -t $NT -i sorted.bam --algo LocusCollector --fun score_info score.txt $SENTIEON_INSTALL_DIR/bin/sentieon driver -t $NT -i sorted.bam --algo Dedup --score_info score.txt \ --metrics mark_dup_metrics.txt --output_dup_read_name tmp_dup_qname.txt $SENTIEON_INSTALL_DIR/bin/sentieon driver -t $NT -i sorted.bam --algo Dedup \ --dup_read_name tmp_dup_qname.txt markduped.bam # 3. Base Quality Score Recalibration with Sentieon interval_arg="--interval chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,\ chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22" $SENTIEON_INSTALL_DIR/bin/sentieon driver $interval_arg -r $FASTA -t $NT -i markduped.bam \ --algo QualCal -k $KNOWN_MILLS -k $KNOWN_INDELS -k $KNOWN_DBSNP recal_data.table $SENTIEON_INSTALL_DIR/bin/sentieon driver -r $FASTA -t $NT -i markduped.bam \ --read_filter QualCalFilter,table=recal_data.table,prior=-1.0,indel=false,levels=10/20/30,min_qual=6 \ --algo ReadWriter recaled_RW.cram # 4. Haplotyper with Sentieon $SENTIEON_INSTALL_DIR/bin/sentieon driver -r $FASTA -t $NT -i recaled_RW.cram --algo H ``` [**想了解更多Sentieon软件应用教程,可以点击此处进行跳转**](https://doc.insvast.com/doc/10/)
chsnp
2025年11月26日 17:30
转发
收藏文档
上一篇
下一篇
手机扫码
复制链接
手机扫一扫转发分享
复制链接
Markdown文件
Word文件
PDF文档
PDF文档(打印)
分享
链接
类型
密码
更新密码
有效期