一、软件的安装
1.软件下载:
curl ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/edirect.zip -O (熟悉curl下载文件的方法,见http://www.cnblogs.com/duhuo/p/5695256.html)
2.解压
unzip edirect.zip
3.添加环境变量
echo ‘export PATH=/home/lmt/desktop/edirect/:$PATH‘ >> ~/.zshrc (根据自己的配置文件选择,可能使~/.bashrc)
二、.entrez direct的功能
1.esearch 根据给定的indexed fields进行查找
2.efilter 过滤之前查找到的的结果
3.efetch 根据指定的格式下载所需的数据
。。。。。
三、用法举例
下载核酸或蛋白序列(fasta格式)
esearch -db nucleotide -query ‘CHN-JS-2014‘ | efetch -format fasta > 11.fasta #下载的为全基因组碱基序列
>KP757892.1 Porcine deltacoronavirus isolate CHN-JS-2014, complete genome ACATGGGGACTAAAGATAAAAATTATAGCATTAGTCTATAATTTTATCTCCCTAGCTTCGCTAGTTCTCT ACCGACACCAATCCAGGTGCGTCTGCCACCAAGTTGGCTACCCTTTCTAGGGGCGCTTTCGCGCTTGCTC ACCATTAGATTACCTGGAAACCAGCCATTCAGGTTGGAGTTTCCCCAGGCTCTTTTGTGTGGGCATTAGC
esearch -db necleotide -query ‘CHN-JS-2014‘ | efetch -format gene_fasta > 22.fasta #下载的为各个区段的基因的碱基序列,如S/E/M等,分开的
>lcl|KP757892.1_gene_3 [gene=E] [locus_tag=PDCoV-CHN-JS-2014_gp3] [location=22797..23048] ATGGTAGTCGACGACTGGGCCGTTACCATCCCTGGACAATATATTATTGCTATACTAGTTGTCATCTGCA TTGGTGTGGCACTACTTTTTATTAACACTTGCTTAGCTTGTGTTAAATTATTTTACAAGTGCTACCTAGG GGCAGCATACCTTGTTAGGCCTATTATAGTGTACTACTCCAAGCCGAACCCCGTACCTGAGGATGAGTTT GTAAAAGTACACCAATTTCCTAGAAACACTCACTATGTCTGA >lcl|KP757892.1_gene_4 [gene=M] [locus_tag=PDCoV-CHN-JS-2014_gp4] [location=23041..23694] ATGTCTGACGCAGAAGAGTGGCAAATTATTGTTTTCATTGCGATCATATGGGCGCTTGGCGTCATCCTCC AAGGAGGCTATGCCACGCGTAATCGTGTGATCTATGTTATTAAACTTATTCTGCTTTGGCTGCTCCAACC CTTCACCCTAGTGGTGACCATTTGGACCGCAGTTGACAGATCATCTAAGAAGGACGCAGTTTTCATTGTG TCCATAATTTTTGCCGTACTGACCTTCATATCCTGGGCCAAGTACTGGTATGACTCAATTCGCTTATTAA TGAAAACCAGATCTGCATGGGCACTCTCACCTGAGAGTAGACTCCTTGCAGGGATTATGGATCCAATGGG TACATGGAGGTGCATTCCCATCGACCACATGGCTCCAATTCTCACACCAGTCGTTAAGCATGGCAAGCTC
esearch -db necleotide -query ‘CHN-JS-2014‘ | efetch -format fasta_cds_aa > 33.fasta #下载的为各个区段的基因的蛋白序列,分开的(在核酸库里搜索,试着用蛋白库,发现报错)
>lcl|KP757892.1_prot_AKC54443.1_3 [gene=E] [locus_tag=PDCoV-CHN-JS-2014_gp3] [protein=envelope protein] [protein_id=AKC54443.1] [location=22797..23048] [gbkey=CDS] MVVDDWAVTIPGQYIIAILVVICIGVALLFINTCLACVKLFYKCYLGAAYLVRPIIVYYSKPNPVPEDEF VKVHQFPRNTHYV >lcl|KP757892.1_prot_AKC54444.1_4 [gene=M] [locus_tag=PDCoV-CHN-JS-2014_gp4] [protein=membrane protein] [protein_id=AKC54444.1] [location=23041..23694] [gbkey=CDS] MSDAEEWQIIVFIAIIWALGVILQGGYATRNRVIYVIKLILLWLLQPFTLVVTIWTAVDRSSKKDAVFIV SIIFAVLTFISWAKYWYDSIRLLMKTRSAWALSPESRLLAGIMDPMGTWRCIPIDHMAPILTPVVKHGKL KLHGQELANGISVRNPPQDMVIVSPSDTFHYTFKKPVESNNDPEFAVLIYQGDRASNAGLHTITTSKAGD ARLYKYM
esearch -db necleotide -query ‘CHN-JS-2014‘ | efetch -format fasta_cds_na > 44.fasta #下载的为各个区段基因的碱基序列,如S/E/M等,分开的,和22.fasta结果一样,只是注释信息较多
下载序列(非fasta格式)
>lcl|KP757892.1_cds_AKC54443.1_3 [gene=E] [locus_tag=PDCoV-CHN-JS-2014_gp3] [protein=envelope protein] [protein_id=AKC54443.1] [location=22797..23048] [gbkey=CDS] ATGGTAGTCGACGACTGGGCCGTTACCATCCCTGGACAATATATTATTGCTATACTAGTTGTCATCTGCA TTGGTGTGGCACTACTTTTTATTAACACTTGCTTAGCTTGTGTTAAATTATTTTACAAGTGCTACCTAGG GGCAGCATACCTTGTTAGGCCTATTATAGTGTACTACTCCAAGCCGAACCCCGTACCTGAGGATGAGTTT GTAAAAGTACACCAATTTCCTAGAAACACTCACTATGTCTGA >lcl|KP757892.1_cds_AKC54444.1_4 [gene=M] [locus_tag=PDCoV-CHN-JS-2014_gp4] [protein=membrane protein] [protein_id=AKC54444.1] [location=23041..23694] [gbkey=CDS] ATGTCTGACGCAGAAGAGTGGCAAATTATTGTTTTCATTGCGATCATATGGGCGCTTGGCGTCATCCTCC AAGGAGGCTATGCCACGCGTAATCGTGTGATCTATGTTATTAAACTTATTCTGCTTTGGCTGCTCCAACC CTTCACCCTAGTGGTGACCATTTGGACCGCAGTTGACAGATCATCTAAGAAGGACGCAGTTTTCATTGTG TCCATAATTTTTGCCGTACTGACCTTCATATCCTGGGCCAAGTACTGGTATGACTCAATTCGCTTATTAA TGAAAACCAGATCTGCATGGGCACTCTCACCTGAGAGTAGACTCCTTGCAGGGATTATGGATCCAATGGG TACATGGAGGTGCATTCCCATCGACCACATGGCTCCAATTCTCACACCAGTCGTTAAGCATGGCAAGCTC
esearch -db necleotide -query ‘CHN-JS-2014‘ | efetch -format gb > 55.fasta #下载的格式和在NCBI里的界面结果显示一样。
LOCUS KP757892 25420 bp ss-RNA linear VRL 17-DEC-2015 DEFINITION Porcine deltacoronavirus isolate CHN-JS-2014, complete genome. ACCESSION KP757892 VERSION KP757892.1 KEYWORDS . SOURCE Porcine deltacoronavirus ORGANISM Porcine deltacoronavirus Viruses; ssRNA viruses; ssRNA positive-strand viruses, no DNA stage; Nidovirales; Coronaviridae; Coronavirinae. REFERENCE 1 (bases 1 to 25420) AUTHORS Dong,N., Fang,L., Zeng,S., Sun,Q., Chen,H. and Xiao,S. TITLE Porcine Deltacoronavirus in Mainland China JOURNAL Emerging Infect. Dis. 21 (12), 2254-2255 (2015) PUBMED 26584185 REFERENCE 2 (bases 1 to 25420) AUTHORS Dong,N., Fang,L., Zeng,S., Sun,Q. and Xiao,S. TITLE Direct Submission JOURNAL Submitted (06-FEB-2015) State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, 1 Shizishan Street, Wuhan, Hubei 430070, China COMMENT ##Assembly-Data-START## Sequencing Technology :: Sanger dideoxy sequencing ##Assembly-Data-END## FEATURES Location/Qualifiers 。。。。 。。。。。
。。。。。
。。。。 gene 22797..23048 /gene="E" /locus_tag="PDCoV-CHN-JS-2014_gp3" CDS 22797..23048 /gene="E" /locus_tag="PDCoV-CHN-JS-2014_gp3" /codon_start=1 /product="envelope protein" /protein_id="AKC54443.1" /translation="MVVDDWAVTIPGQYIIAILVVICIGVALLFINTCLACVKLFYKC YLGAAYLVRPIIVYYSKPNPVPEDEFVKVHQFPRNTHYV" gene 23041..23694 /gene="M"
。。。。。。
。。。。。。。