POJ 1007 DNA Sorting

来源:岁月联盟 编辑:exp 时间:2012-10-24

一、题目信息
DNA Sorting
Time Limit: 1000MS   Memory Limit: 10000K
Total Submissions: 68122   Accepted: 27076
Description

One measure of ``unsortedness'' in a sequence is the number of pairs of entries that are out of order with respect to each other. For instance, in the letter sequence ``DAABEC'', this measure is 5, since D is greater than four letters to its right and E is greater than one letter to its right. This measure is called the number of inversions in the sequence. The sequence ``AACEDGG'' has only one inversion (E and D)---it is nearly sorted---while the sequence ``ZWQM'' has 6 inversions (it is as unsorted as can be---exactly the reverse of sorted).

You are responsible for cataloguing a sequence of DNA strings (sequences containing only the four letters A, C, G, and T). However, you want to catalog them, not in alphabetical order, but rather in order of ``sortedness'', from ``most sorted'' to ``least sorted''. All the strings are of the same length.
Input

The first line contains two integers: a positive integer n (0 < n <= 50) giving the length of the strings; and a positive integer m (0 < m <= 100) giving the number of strings. These are followed by m lines, each containing a string of length n.
Output

Output the list of input strings, arranged from ``most sorted'' to ``least sorted''. Since two strings can be equally sorted, then output them according to the orginal order.
Sample Input

10 6
AACATGAAGG
TTTTGGCCAA
TTTGGCCAAA
GATCAGATTT
CCCGGGGGGA
ATCGATGCAT
Sample Output

CCCGGGGGGA
AACATGAAGG
GATCAGATTT
ATCGATGCAT
TTTTGGCCAA
TTTGGCCAAA
二、算法分析
1.枚举法
    本题实际就是计算一个序列的逆序数。关于逆序数的概念请自己查阅。最直接的方法就是对于序列中的每个元素,找在这个元素之后,并比这个元素小的元素数,我们暂且称这个数为一个元素的的逆序数。一个序列的所有元素逆序数的宗和,就是一个序列的逆序数。
    枚举法的求一个序列的逆序数时间复杂度是O(n^2),再加上对序列排序,时间复杂度最低也是O(n^2*mlogm)。
    另外注意的是,由于题目要求Since two strings can be equally sorted, then output them according to the orginal order.也即是,两个逆序数相同的序列,输出的顺序应该跟输入的顺序是一样的。这就要求对序列的排序,应该采用稳定排序。用C++STL中的算法stable_sort就可以了。
2.归并排序
    就单纯求逆序数来说,用归并排序是一个较好的方法,在归并排序的过程中,加入记录元素交换次数代码就可以求出逆序数。时间复杂度能降到O(nlogn)。总的时间复杂度能降到O(nlogn * mlogm)。

三、参考代码
1.枚举法C++代码
[cpp] 
#include <iostream> 
#include <vector> 
#include <string> 
#include <algorithm> 
#include <windows.h> 
using namespace std; 
 
size_t countUnsortedness(const string &s) 

    int sum = 0; 
    for(int i = 0;i < s.size();i++) 
        for(int j = i+1; j <s.size(); j++) 
            if (s[j] < s[i] ) 
                sum++; 
    return sum; 

 
bool comp(const string &s1,const string &s2) 

    return countUnsortedness(s1) < countUnsortedness(s2) ? true : false; 

 
int main() 

    vector< string > DNAs; 
    int n,m; 
    cin >> n >> m; 
    for(string s;m--;cin>>s,DNAs.push_back(s)); 
    stable_sort(DNAs.begin(),DNAs.end(),comp); 
    for(int i = 0; i < DNAs.size(); i++) 
        cout << DNAs[i] << endl; 
    system("pause"); 
    return 0; 

2.归并排序C++代码