Я пытаюсь получить информацию о некоторых генах, их длине ДНК и мРНК, и я делаю это, получая около 500 идентификаторов генов, а затем отправляя запрос в ncbi, чтобы получить информацию о длине ДНК генов, а затем отправляя еще один запрос на получение информации о его мРНК, и код делает это прекрасно. Проблема в том, что после отправки около 30 запросов получение следующих запросов занимает очень много времени, почти 20 минут на запрос. И в их документации ничего не упоминается об ограничении или чем-то еще, единственное, что упоминается, это ограничение скорости - 3 запроса в секунду. Итак, есть ли способ получить эту информацию примерно о 1000 генах с той же скоростью, с которой я получаю самые первые 30 генов?
Вот мой код:
from Bio import Entrez, SeqIO
Entrez.email = "MY_EMAIl"
Entrez.api_key = "My_API"
# Search for Homo sapiens genes
handle = Entrez.esearch(db="nucleotide", term='Homo sapiens[Organism]', rettype="gb", retmax=100)
record = Entrez.read(handle)
handle.close()
nucleotide_ids = record["IdList"] # List of Nucleotide IDs
print(f"Found {len(nucleotide_ids)} IDs for term 'Homo sapiens[Organism]'")
# Loop through the list of gene IDs and fetch information
for n in nucleotide_ids:
# Fetch the GenBank record for each gene ID
handle = Entrez.efetch(db="nucleotide", id=n, rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")
handle.close()
# Get the DNA length directly from the sequence
dna_length = len(record.seq)
print(n)
print(f"DNA Length: {dna_length} bp")
# Extract chromosome number from the source feature (if available)
chromosome = "Not available"
for feature in record.features:
if feature.type == "source" and "chromosome" in feature.qualifiers:
chromosome = feature.qualifiers["chromosome"][0]
break # Exit after finding the chromosome
print(f"Chromosome: {chromosome}")
# Initialize variable for mRNA transcript ID
mrna_transcript_id = None
# Loop through features to find the first mRNA feature with a transcript ID
for feature in record.features:
if feature.type == "mRNA" and "transcript_id" in feature.qualifiers:
# Extract the transcript ID
mrna_transcript_id = feature.qualifiers["transcript_id"][0]
print("mRNA transcript ID:", mrna_transcript_id)
break # Stop after finding the first mRNA transcript ID
# Check if mRNA transcript ID was found
if mrna_transcript_id:
# Fetch only the base part of the transcript ID, removing any version number
transcript_id = mrna_transcript_id.split(".")[0]
# Fetch the mRNA sequence data using the transcript ID to get its length
handle = Entrez.efetch(db="nucleotide", id=transcript_id, rettype="gb", retmode="text")
mrna_record = SeqIO.read(handle, "genbank")
handle.close()
# Calculate and print the mRNA length
mrna_length = len(mrna_record.seq)
print(f"mRNA Length: {mrna_length} bp")
else:
print("No mRNA transcript ID found in the record.")
А вот результат, который всегда занимает очень много времени:
> Found 100 IDs for term 'Homo sapiens[Organism]'
2844834341
DNA Length: 3046 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831162
DNA Length: 317 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831161
DNA Length: 314 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831160
DNA Length: 678 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831159
DNA Length: 680 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831158
DNA Length: 493 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831157
DNA Length: 492 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831156
DNA Length: 331 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831155
DNA Length: 329 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831154
DNA Length: 777 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831153
DNA Length: 777 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831152
DNA Length: 420 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831151
DNA Length: 441 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831150
DNA Length: 633 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831149
DNA Length: 634 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844741236
DNA Length: 800 bp
Chromosome: Not available
DNA Length: 633 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831149
DNA Length: 634 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844741236
DNA Length: 800 bp
Chromosome: Not available
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831149
DNA Length: 634 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844741236
DNA Length: 800 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831149
DNA Length: 634 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844741236
DNA Length: 800 bp
Chromosome: Not available
DNA Length: 634 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844741236
DNA Length: 800 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844741236
DNA Length: 800 bp
Chromosome: Not available
2844741236
DNA Length: 800 bp
Chromosome: Not available
DNA Length: 800 bp
Chromosome: Not available
Chromosome: Not available
No mRNA transcript ID found in the record.
No mRNA transcript ID found in the record.
2302519926
DNA Length: 31849 bp
Chromosome: 8
mRNA transcript ID: NM_078480.3
mRNA Length: 1868 bp
2302519820
DNA Length: 10320 bp
Chromosome: 17
mRNA transcript ID: NM_172089.4
mRNA Length: 1857 bp
1546674230
DNA Length: 297579 bp
Chromosome: 21
mRNA transcript ID: NM_000484.4
mRNA Length: 3583 bp
1519312921
DNA Length: 4473 bp
Chromosome: 7
No mRNA transcript ID found in the record.
1243938659
DNA Length: 14518 bp
Chromosome: 8
mRNA transcript ID: NM_002467.6
mRNA Length: 3721 bp
1190332594
DNA Length: 60422 bp
Chromosome: 15
mRNA transcript ID: NM_001080541.3
mRNA Length: 11419 bp
1122782504
DNA Length: 67646 bp
Chromosome: 3
mRNA transcript ID: NM_001904.4
mRNA Length: 3661 bp
1050193339
DNA Length: 64762 bp
Chromosome: 2
mRNA transcript ID: NM_021027.3
mRNA Length: 2371 bp
432134280
DNA Length: 13042 bp
Chromosome: 2
mRNA transcript ID: NM_022152.6
mRNA Length: 2292 bp
347800663
DNA Length: 32232 bp
Chromosome: 11
mRNA transcript ID: NM_001369449.1
mRNA Length: 1806 bp
345525407
DNA Length: 27064 bp
345525407
DNA Length: 27064 bp
DNA Length: 27064 bp
Chromosome: 15
mRNA transcript ID: NM_014994.3
mRNA Length: 7182 bp
224809401
DNA Length: 30237 bp
Chromosome: 12
mRNA transcript ID: NM_001982.3
mRNA Length: 5615 bp
2850708328
DNA Length: 2457175 bp
Chromosome: contig-37
No mRNA transcript ID found in the record.
2850708327
DNA Length: 2784344 bp
Chromosome: contig-36
No mRNA transcript ID found in the record.
Подробнее здесь: https://stackoverflow.com/questions/791 ... -a-very-lo
После отправки около 30 запросов в NCBI последующие запросы обрабатываются очень долго. ⇐ Python
Программы на Python
1731611723
Anonymous
Я пытаюсь получить информацию о некоторых генах, их длине ДНК и мРНК, и я делаю это, получая около 500 идентификаторов генов, а затем отправляя запрос в ncbi, чтобы получить информацию о длине ДНК генов, а затем отправляя еще один запрос на получение информации о его мРНК, и код делает это прекрасно. Проблема в том, что после отправки около 30 запросов получение следующих запросов занимает очень много времени, почти 20 минут на запрос. И в их документации ничего не упоминается об ограничении или чем-то еще, единственное, что упоминается, это ограничение скорости - 3 запроса в секунду. Итак, есть ли способ получить эту информацию примерно о 1000 генах с той же скоростью, с которой я получаю самые первые 30 генов?
Вот мой код:
from Bio import Entrez, SeqIO
Entrez.email = "MY_EMAIl"
Entrez.api_key = "My_API"
# Search for Homo sapiens genes
handle = Entrez.esearch(db="nucleotide", term='Homo sapiens[Organism]', rettype="gb", retmax=100)
record = Entrez.read(handle)
handle.close()
nucleotide_ids = record["IdList"] # List of Nucleotide IDs
print(f"Found {len(nucleotide_ids)} IDs for term 'Homo sapiens[Organism]'")
# Loop through the list of gene IDs and fetch information
for n in nucleotide_ids:
# Fetch the GenBank record for each gene ID
handle = Entrez.efetch(db="nucleotide", id=n, rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")
handle.close()
# Get the DNA length directly from the sequence
dna_length = len(record.seq)
print(n)
print(f"DNA Length: {dna_length} bp")
# Extract chromosome number from the source feature (if available)
chromosome = "Not available"
for feature in record.features:
if feature.type == "source" and "chromosome" in feature.qualifiers:
chromosome = feature.qualifiers["chromosome"][0]
break # Exit after finding the chromosome
print(f"Chromosome: {chromosome}")
# Initialize variable for mRNA transcript ID
mrna_transcript_id = None
# Loop through features to find the first mRNA feature with a transcript ID
for feature in record.features:
if feature.type == "mRNA" and "transcript_id" in feature.qualifiers:
# Extract the transcript ID
mrna_transcript_id = feature.qualifiers["transcript_id"][0]
print("mRNA transcript ID:", mrna_transcript_id)
break # Stop after finding the first mRNA transcript ID
# Check if mRNA transcript ID was found
if mrna_transcript_id:
# Fetch only the base part of the transcript ID, removing any version number
transcript_id = mrna_transcript_id.split(".")[0]
# Fetch the mRNA sequence data using the transcript ID to get its length
handle = Entrez.efetch(db="nucleotide", id=transcript_id, rettype="gb", retmode="text")
mrna_record = SeqIO.read(handle, "genbank")
handle.close()
# Calculate and print the mRNA length
mrna_length = len(mrna_record.seq)
print(f"mRNA Length: {mrna_length} bp")
else:
print("No mRNA transcript ID found in the record.")
А вот результат, который всегда занимает очень много времени:
> Found 100 IDs for term 'Homo sapiens[Organism]'
2844834341
DNA Length: 3046 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831162
DNA Length: 317 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831161
DNA Length: 314 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831160
DNA Length: 678 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831159
DNA Length: 680 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831158
DNA Length: 493 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831157
DNA Length: 492 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831156
DNA Length: 331 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831155
DNA Length: 329 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831154
DNA Length: 777 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831153
DNA Length: 777 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831152
DNA Length: 420 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831151
DNA Length: 441 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831150
DNA Length: 633 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831149
DNA Length: 634 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844741236
DNA Length: 800 bp
Chromosome: Not available
DNA Length: 633 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831149
DNA Length: 634 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844741236
DNA Length: 800 bp
Chromosome: Not available
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831149
DNA Length: 634 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844741236
DNA Length: 800 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844831149
DNA Length: 634 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844741236
DNA Length: 800 bp
Chromosome: Not available
DNA Length: 634 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844741236
DNA Length: 800 bp
Chromosome: Not available
No mRNA transcript ID found in the record.
2844741236
DNA Length: 800 bp
Chromosome: Not available
2844741236
DNA Length: 800 bp
Chromosome: Not available
DNA Length: 800 bp
Chromosome: Not available
Chromosome: Not available
No mRNA transcript ID found in the record.
No mRNA transcript ID found in the record.
2302519926
DNA Length: 31849 bp
Chromosome: 8
mRNA transcript ID: NM_078480.3
mRNA Length: 1868 bp
2302519820
DNA Length: 10320 bp
Chromosome: 17
mRNA transcript ID: NM_172089.4
mRNA Length: 1857 bp
1546674230
DNA Length: 297579 bp
Chromosome: 21
mRNA transcript ID: NM_000484.4
mRNA Length: 3583 bp
1519312921
DNA Length: 4473 bp
Chromosome: 7
No mRNA transcript ID found in the record.
1243938659
DNA Length: 14518 bp
Chromosome: 8
mRNA transcript ID: NM_002467.6
mRNA Length: 3721 bp
1190332594
DNA Length: 60422 bp
Chromosome: 15
mRNA transcript ID: NM_001080541.3
mRNA Length: 11419 bp
1122782504
DNA Length: 67646 bp
Chromosome: 3
mRNA transcript ID: NM_001904.4
mRNA Length: 3661 bp
1050193339
DNA Length: 64762 bp
Chromosome: 2
mRNA transcript ID: NM_021027.3
mRNA Length: 2371 bp
432134280
DNA Length: 13042 bp
Chromosome: 2
mRNA transcript ID: NM_022152.6
mRNA Length: 2292 bp
347800663
DNA Length: 32232 bp
Chromosome: 11
mRNA transcript ID: NM_001369449.1
mRNA Length: 1806 bp
345525407
DNA Length: 27064 bp
345525407
DNA Length: 27064 bp
DNA Length: 27064 bp
Chromosome: 15
mRNA transcript ID: NM_014994.3
mRNA Length: 7182 bp
224809401
DNA Length: 30237 bp
Chromosome: 12
mRNA transcript ID: NM_001982.3
mRNA Length: 5615 bp
2850708328
DNA Length: 2457175 bp
Chromosome: contig-37
No mRNA transcript ID found in the record.
2850708327
DNA Length: 2784344 bp
Chromosome: contig-36
No mRNA transcript ID found in the record.
Подробнее здесь: [url]https://stackoverflow.com/questions/79178451/aafter-sending-around-30-requests-to-ncbi-the-subsequent-requests-take-a-very-lo[/url]
Ответить
1 сообщение
• Страница 1 из 1
Перейти
- Кемерово-IT
- ↳ Javascript
- ↳ C#
- ↳ JAVA
- ↳ Elasticsearch aggregation
- ↳ Python
- ↳ Php
- ↳ Android
- ↳ Html
- ↳ Jquery
- ↳ C++
- ↳ IOS
- ↳ CSS
- ↳ Excel
- ↳ Linux
- ↳ Apache
- ↳ MySql
- Детский мир
- Для души
- ↳ Музыкальные инструменты даром
- ↳ Печатная продукция даром
- Внешняя красота и здоровье
- ↳ Одежда и обувь для взрослых даром
- ↳ Товары для здоровья
- ↳ Физкультура и спорт
- Техника - даром!
- ↳ Автомобилистам
- ↳ Компьютерная техника
- ↳ Плиты: газовые и электрические
- ↳ Холодильники
- ↳ Стиральные машины
- ↳ Телевизоры
- ↳ Телефоны, смартфоны, плашеты
- ↳ Швейные машинки
- ↳ Прочая электроника и техника
- ↳ Фототехника
- Ремонт и интерьер
- ↳ Стройматериалы, инструмент
- ↳ Мебель и предметы интерьера даром
- ↳ Cантехника
- Другие темы
- ↳ Разное даром
- ↳ Давай меняться!
- ↳ Отдам\возьму за копеечку
- ↳ Работа и подработка в Кемерове
- ↳ Давай с тобой поговорим...
Мобильная версия