emu in blogjava

  BlogJava :: 首页 :: 新随笔 :: 联系 :: 聚合  :: 管理 ::
  171 随笔 :: 103 文章 :: 1052 评论 :: 2 Trackbacks

Problem Statement
????
A cancer screening protocol is given data about a cell and gives the cell a score. The protocol also specifies a threshold value. The protocol classifies cells with scores less than the threshold as non-cancerous, and cells with scores greater than or equal to the threshold as cancerous.
The effectiveness of a protocol is calculated by applying it to data about known cells from a test bank. The "squared_error" for a cell is 0 if the protocol successfully classifies it. If a cell is misclassified, its squared_error is calculated as the square of the difference between the threshold and the cell's score. The "mean_squared_error" for the protocol is the average of the squared_errors from all the cells.
We want to choose a threshold that will minimize the mean_squared_error for our protocol. Create a class MSQErr that contains a method minErr that is given a int[] scores giving the scores calculated by the protocol for the test cells, and a string cancerous that indicates for each test cell whether it is cancerous ('C') or non-cancerous ('N'). The method chooses an optimal threshold and returns the resulting minimum mean_squared_error.
The i-th character of cancerous corresponds to the i-th element of scores.
Definition
????
Class:
MSQErr
Method:
minErr
Parameters:
int[], string
Returns:
double
Method signature:
double minErr(int[] score, string cancerous)
(be sure your method is public)
????

Notes
-
The returned value must be accurate to within a relative or absolute value of 1E-9
Constraints
-
cancerous will contain between 2 and 50 characters inclusive.
-
The number of elements in scores will equal the number of characters in cancerous.
-
Each character in cancerous will be 'N' or 'C'.
-
Each value in scores will be between -1,000 and 1,000 inclusive.
Examples
0)

????
{3,3,1,8}
"NNNC"
Returns: 0.0
By choosing a threshold of 5, this protocol classifies all of the test cells correctly.
1)

????
{5,2,3,6}
"CCNC"
Returns: 0.125
By choosing the threshold at 2.5, the cells with scores of 2 and 3 will both be misclassified. This will result in a mean_squared_error of (0 + (2.5-2)^2 + (3-2.5)^2 + 0)/4 = 0.125. This is the only threshold that will give such a low mean_squared_error.
2)

????
{5,2,3,6,2}
"CCNCN"
Returns: 0.1
The same threshold is best, but now the sum of the squared errors is divided by 5 instead of 4.
3)

????
{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}
"NNNCNNNCNNNCNCCCCCCC"
Returns: 2.34

 
This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2003, TopCoder, Inc. All rights reserved.

posted on 2005-08-16 09:28 emu 阅读(1253) 评论(3)  编辑  收藏 所属分类: google编程大赛模拟题及入围赛真题

评论

# emu的解法 2005-08-16 09:39 emu
我估计均方差是一条类似抛物线的曲线(有一个顶点或者一个平的顶部),用二分法收敛到曲线的顶点上。但是由于计算机二进制计算精度限制的原因,实际收敛到的地方可能和顶点有微小的偏差(比如0.1,0.2,0.3,0.4用二进制表示都是循环小数,因此用二分法如果不做特别处理就收敛不到这些点上)。


public class MSQErr {
public static void main(String[] args) {
MSQErr err = new MSQErr();
assertEquals(err.minErr(new int[] {3, 3, 1, 8}, "NNNC"), 0);
assertEquals(err.minErr(new int[] {5, 2, 3, 6}, "CCNC"), 0.125);
assertEquals(err.minErr(new int[] {5, 2, 3, 6, 2}, "CCNCN"), 0.1);
assertEquals(err.minErr(new int[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20},
"NNNCNNNCNNNCNCCCCCCC"), 2.34);
System.out.println("all pass");
}

private static void assertEquals(double a, double b) {
if (a != b)
throw new RuntimeException("assert failed!! Expected " + b +
" but got " + a);
}

public double minErr(int[] score, String cancerous) {
double max = 0, min = 1000;
for (int i = 0; i < score.length; i++) {
int s = score[ i ];
if (s > max)
max = s;
if (s < min)
min = s;
}
double mid = (max + min) / 2;
double d1 = 1, d2 = 2;
while (d1 != d2) {
d1 = getErr(score, cancerous, mid);
d2 = getErr(score, cancerous, mid + (max - min) / 1000000000);
if (d1 > d2)
min = mid;
else
max = mid;
mid = (max + min) / 2;
}
return Math.floor(d1 * 1000000000) / 1000000000;
}

public double getErr(int[] score, String cancerous, double threshold) {
double result = 0;
for (int i = 0; i < score.length; i++) {
double d = score[ i ] - threshold;
char c = cancerous.charAt(i);
if (d > 0 && c == 'N' || d < 0 && c == 'C') {
result += d * d;
}
}
return result / score.length;
}
}



对它的正确性没有充分验证过,因为
d2 = getErr(score, cancerous, mid + (max - min) / 1000000000);
我只是想知道向x轴正方向走一小步看看抛物线的走向是向上还是向下。如果这一小步恰好跨过了抛物线顶点的话对方向的判断有可能就是错的,不过这样的事件发生的概率实在是太小了。
参数验证没有做,如果给错误参数的话,有可能抛异常(cancerous串太短的话),也有可能忽略错误的数据(cancerous太长或者不完全由N和C构成)。题目中没有明确的提出应该怎么处理这样的数据,由它去吧。
  回复  更多评论
  

# 秋水无恨的javascript解法 2005-08-16 09:42 emu
用了一些高深的数学原理,不过对“th+=e/n;”能否把闸值修正到曲线线的定点,我有一点怀疑。

<SCRIPT LANGUAGE="JavaScript">
<!--
function minErr(s,c)
{
for(var th=2;th<50;++th)
{
var e = 0,n=0;
for(var i=0;i<s.length;++i)
{
if(c.charAt(i )!=(s[i]>=th?"C":"N"))
{
++n;
e+=(s[i]-th);
}
}
if(e<=0)
{
th+=e/n;
break;
}
}
e = 0;
for(var i=0;i<s.length;++i)
{
if(c.charAt(i )!=(s[i]>=th?"C":"N"))
{
e+=(s[i]-th)*(s[i]-th);
}
}
return e/s.length;
}

//-->
</SCRIPT>
  回复  更多评论
  

# re: MSQErr(赛前模拟题) 2006-07-23 19:10 清风剑
用求导的方法取得抛物线顶点可行吗?  回复  更多评论
  


只有注册用户登录后才能发表评论。


网站导航:
博客园   IT新闻   Chat2DB   C++博客   博问