SEQUENCE PROCESSING SYSTEM AND METHOD THEREOF

Provided are a system and method for efficiently distributing parallel processing of massive amounts of read data of a nucleotide sequence. According to one embodiment of the present invention, the parallel processing method for nucleotide sequence reads may include the steps of: allocating, by one...

Full description

Saved in:

Bibliographic Details
Main Authors	KWAK JAE HYUCK, SONG SEOK IL, LEE HYEON BYEONG, BYUN EUN KYU
Format	Patent
Language	English Korean
Published	28.04.2021
Subjects	INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTEDFOR SPECIFIC APPLICATION FIELDS PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Provided are a system and method for efficiently distributing parallel processing of massive amounts of read data of a nucleotide sequence. According to one embodiment of the present invention, the parallel processing method for nucleotide sequence reads may include the steps of: allocating, by one of a plurality of computing nodes, a partial partition of nucleotide sequence data composed of nucleotide sequence reads to two or more working nodes among the plurality of computing nodes; mapping, by each working node, each of the nucleotide sequence reads included in the allocated partition to a corresponding position of a reference sequence and generating a duplicate identification key of the nucleotide sequence read by using the corresponding position of the reference sequence; and using, by each of working nodes, a duplicate read list which is an identifier list of nucleotide sequence reads having the same duplicate identification key to remove duplicate reads in the allocated partition, and align the nucleotide sequence reads in the allocated partition based on the reference sequence corresponding positions of the nucleotide sequence reads in the allocated partition. 방대한 양의 염기 서열 리드 데이터를 효율적으로 분산 병렬 처리하는 시스템 및 방법이 제공된다. 본 발명의 일 실시예에 따른 염기 서열 리드의 병렬 처리 방법은, 복수의 컴퓨팅 노드 중 하나가, 염기 서열 리드(read)들로 구성된 염기 서열 데이터의 일부 파티션을, 상기 복수의 컴퓨팅 노드 중 둘 이상의 작업 노드에 할당하는 단계, 상기 각각의 작업 노드가, 상기 할당된 파티션에 포함된 각각의 염기 서열 리드를 참조 서열의 대응 위치에 매핑하고, 상기 참조 서열의 대응 위치를 이용하여 상기 염기 서열 리드의 중복 식별키를 생성하는 단계, 및 상기 각각의 작업 노드가, 동일한 상기 중복 식별키를 가지는 염기 서열 리드들의 식별자 리스트인 중복 리드 리스트를 이용하여, 상기 할당된 파티션 내의 중복 리드를 제거하고, 상기 할당된 파티션 내의 상기 염기 서열 리드들의 상기 참조 서열 대응 위치를 기준으로 상기 할당된 파티션 내의 상기 염기 서열 리드들을 정렬하는 단계를 포함할 수 있다.
Bibliography:	Application Number: KR20190129507