码迷,mamicode.com
首页 > 系统相关 > 详细

linux command line 利用Entrez Direct下载NCBI数据

时间:2017-12-22 21:45:43      阅读:958      评论:0      收藏:0      [点我收藏+]

标签:ast   tree   zip   -o   nrv   osi   ase   nec   base   

一、软件的安装

1.软件下载:

curl    ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/edirect.zip -O (熟悉curl下载文件的方法,见http://www.cnblogs.com/duhuo/p/5695256.html)

2.解压

unzip edirect.zip

3.添加环境变量

echo  ‘export PATH=/home/lmt/desktop/edirect/:$PATH‘  >>  ~/.zshrc (根据自己的配置文件选择,可能使~/.bashrc)

二、.entrez direct的功能

1.esearch   根据给定的indexed fields进行查找

2.efilter   过滤之前查找到的的结果

3.efetch   根据指定的格式下载所需的数据

。。。。。

三、用法举例

下载核酸或蛋白序列(fasta格式)

esearch -db nucleotide -query  ‘CHN-JS-2014‘  |  efetch    -format    fasta       >  11.fasta             #下载的为全基因组碱基序列

>KP757892.1 Porcine deltacoronavirus isolate CHN-JS-2014, complete genome
ACATGGGGACTAAAGATAAAAATTATAGCATTAGTCTATAATTTTATCTCCCTAGCTTCGCTAGTTCTCT
ACCGACACCAATCCAGGTGCGTCTGCCACCAAGTTGGCTACCCTTTCTAGGGGCGCTTTCGCGCTTGCTC
ACCATTAGATTACCTGGAAACCAGCCATTCAGGTTGGAGTTTCCCCAGGCTCTTTTGTGTGGGCATTAGC

 

esearch  -db  necleotide -query  ‘CHN-JS-2014‘  |  efetch  -format   gene_fasta   >  22.fasta      #下载的为各个区段的基因的碱基序列,如S/E/M等,分开的

>lcl|KP757892.1_gene_3 [gene=E] [locus_tag=PDCoV-CHN-JS-2014_gp3] [location=22797..23048]
ATGGTAGTCGACGACTGGGCCGTTACCATCCCTGGACAATATATTATTGCTATACTAGTTGTCATCTGCA
TTGGTGTGGCACTACTTTTTATTAACACTTGCTTAGCTTGTGTTAAATTATTTTACAAGTGCTACCTAGG
GGCAGCATACCTTGTTAGGCCTATTATAGTGTACTACTCCAAGCCGAACCCCGTACCTGAGGATGAGTTT
GTAAAAGTACACCAATTTCCTAGAAACACTCACTATGTCTGA
>lcl|KP757892.1_gene_4 [gene=M] [locus_tag=PDCoV-CHN-JS-2014_gp4] [location=23041..23694]
ATGTCTGACGCAGAAGAGTGGCAAATTATTGTTTTCATTGCGATCATATGGGCGCTTGGCGTCATCCTCC
AAGGAGGCTATGCCACGCGTAATCGTGTGATCTATGTTATTAAACTTATTCTGCTTTGGCTGCTCCAACC
CTTCACCCTAGTGGTGACCATTTGGACCGCAGTTGACAGATCATCTAAGAAGGACGCAGTTTTCATTGTG
TCCATAATTTTTGCCGTACTGACCTTCATATCCTGGGCCAAGTACTGGTATGACTCAATTCGCTTATTAA
TGAAAACCAGATCTGCATGGGCACTCTCACCTGAGAGTAGACTCCTTGCAGGGATTATGGATCCAATGGG
TACATGGAGGTGCATTCCCATCGACCACATGGCTCCAATTCTCACACCAGTCGTTAAGCATGGCAAGCTC

 

esearch  -db  necleotide -query  ‘CHN-JS-2014‘  |  efetch  -format   fasta_cds_aa     >  33.fasta            #下载的为各个区段的基因的蛋白序列,分开的(在核酸库里搜索,试着用蛋白库,发现报错)

>lcl|KP757892.1_prot_AKC54443.1_3 [gene=E] [locus_tag=PDCoV-CHN-JS-2014_gp3] [protein=envelope protein] [protein_id=AKC54443.1] [location=22797..23048] [gbkey=CDS]
MVVDDWAVTIPGQYIIAILVVICIGVALLFINTCLACVKLFYKCYLGAAYLVRPIIVYYSKPNPVPEDEF
VKVHQFPRNTHYV
>lcl|KP757892.1_prot_AKC54444.1_4 [gene=M] [locus_tag=PDCoV-CHN-JS-2014_gp4] [protein=membrane protein] [protein_id=AKC54444.1] [location=23041..23694] [gbkey=CDS]
MSDAEEWQIIVFIAIIWALGVILQGGYATRNRVIYVIKLILLWLLQPFTLVVTIWTAVDRSSKKDAVFIV
SIIFAVLTFISWAKYWYDSIRLLMKTRSAWALSPESRLLAGIMDPMGTWRCIPIDHMAPILTPVVKHGKL
KLHGQELANGISVRNPPQDMVIVSPSDTFHYTFKKPVESNNDPEFAVLIYQGDRASNAGLHTITTSKAGD
ARLYKYM

 

esearch  -db  necleotide -query  ‘CHN-JS-2014‘  |  efetch  -format   fasta_cds_na     >  44.fasta            #下载的为各个区段基因的碱基序列,如S/E/M等,分开的,和22.fasta结果一样,只是注释信息较多

下载序列(非fasta格式)

>lcl|KP757892.1_cds_AKC54443.1_3 [gene=E] [locus_tag=PDCoV-CHN-JS-2014_gp3] [protein=envelope protein] [protein_id=AKC54443.1] [location=22797..23048] [gbkey=CDS]
ATGGTAGTCGACGACTGGGCCGTTACCATCCCTGGACAATATATTATTGCTATACTAGTTGTCATCTGCA
TTGGTGTGGCACTACTTTTTATTAACACTTGCTTAGCTTGTGTTAAATTATTTTACAAGTGCTACCTAGG
GGCAGCATACCTTGTTAGGCCTATTATAGTGTACTACTCCAAGCCGAACCCCGTACCTGAGGATGAGTTT
GTAAAAGTACACCAATTTCCTAGAAACACTCACTATGTCTGA
>lcl|KP757892.1_cds_AKC54444.1_4 [gene=M] [locus_tag=PDCoV-CHN-JS-2014_gp4] [protein=membrane protein] [protein_id=AKC54444.1] [location=23041..23694] [gbkey=CDS]
ATGTCTGACGCAGAAGAGTGGCAAATTATTGTTTTCATTGCGATCATATGGGCGCTTGGCGTCATCCTCC
AAGGAGGCTATGCCACGCGTAATCGTGTGATCTATGTTATTAAACTTATTCTGCTTTGGCTGCTCCAACC
CTTCACCCTAGTGGTGACCATTTGGACCGCAGTTGACAGATCATCTAAGAAGGACGCAGTTTTCATTGTG
TCCATAATTTTTGCCGTACTGACCTTCATATCCTGGGCCAAGTACTGGTATGACTCAATTCGCTTATTAA
TGAAAACCAGATCTGCATGGGCACTCTCACCTGAGAGTAGACTCCTTGCAGGGATTATGGATCCAATGGG
TACATGGAGGTGCATTCCCATCGACCACATGGCTCCAATTCTCACACCAGTCGTTAAGCATGGCAAGCTC

 

esearch  -db  necleotide -query  ‘CHN-JS-2014‘  |  efetch  -format   gb     >  55.fasta                                   #下载的格式和在NCBI里的界面结果显示一样。

LOCUS       KP757892               25420 bp ss-RNA     linear   VRL 17-DEC-2015
DEFINITION  Porcine deltacoronavirus isolate CHN-JS-2014, complete genome.
ACCESSION   KP757892
VERSION     KP757892.1
KEYWORDS    .
SOURCE      Porcine deltacoronavirus
  ORGANISM  Porcine deltacoronavirus
            Viruses; ssRNA viruses; ssRNA positive-strand viruses, no DNA
            stage; Nidovirales; Coronaviridae; Coronavirinae.
REFERENCE   1  (bases 1 to 25420)
  AUTHORS   Dong,N., Fang,L., Zeng,S., Sun,Q., Chen,H. and Xiao,S.
  TITLE     Porcine Deltacoronavirus in Mainland China
  JOURNAL   Emerging Infect. Dis. 21 (12), 2254-2255 (2015)
   PUBMED   26584185
REFERENCE   2  (bases 1 to 25420)
  AUTHORS   Dong,N., Fang,L., Zeng,S., Sun,Q. and Xiao,S.
  TITLE     Direct Submission
  JOURNAL   Submitted (06-FEB-2015) State Key Laboratory of Agricultural
            Microbiology, Huazhong Agricultural University, 1 Shizishan Street,
            Wuhan, Hubei 430070, China
COMMENT     ##Assembly-Data-START##
            Sequencing Technology :: Sanger dideoxy sequencing
            ##Assembly-Data-END##
FEATURES             Location/Qualifiers
。。。。
。。。。。
。。。。。
。。。。 gene
22797..23048 /gene="E" /locus_tag="PDCoV-CHN-JS-2014_gp3" CDS 22797..23048 /gene="E" /locus_tag="PDCoV-CHN-JS-2014_gp3" /codon_start=1 /product="envelope protein" /protein_id="AKC54443.1" /translation="MVVDDWAVTIPGQYIIAILVVICIGVALLFINTCLACVKLFYKC YLGAAYLVRPIIVYYSKPNPVPEDEFVKVHQFPRNTHYV" gene 23041..23694 /gene="M"
。。。。。。
。。。。。。。

 

linux command line 利用Entrez Direct下载NCBI数据

标签:ast   tree   zip   -o   nrv   osi   ase   nec   base   

原文地址:http://www.cnblogs.com/lmt921108/p/8087474.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!