Problem Statement
????
A cancer screening protocol is given data about a cell and gives the cell a score. The protocol also specifies a threshold value. The protocol classifies cells with scores less than the threshold as non-cancerous, and cells with scores greater than or equal to the threshold as cancerous.
The effectiveness of a protocol is calculated by applying it to data about known cells from a test bank. The "squared_error" for a cell is 0 if the protocol successfully classifies it. If a cell is misclassified, its squared_error is calculated as the square of the difference between the threshold and the cell's score. The "mean_squared_error" for the protocol is the average of the squared_errors from all the cells.
We want to choose a threshold that will minimize the mean_squared_error for our protocol. Create a class MSQErr that contains a method minErr that is given a int[] scores giving the scores calculated by the protocol for the test cells, and a string cancerous that indicates for each test cell whether it is cancerous ('C') or non-cancerous ('N'). The method chooses an optimal threshold and returns the resulting minimum mean_squared_error.
The i-th character of cancerous corresponds to the i-th element of scores.
Definition
????
Class:
MSQErr
Method:
minErr
Parameters:
int[], string
Returns:
double
Method signature:
double minErr(int[] score, string cancerous)
(be sure your method is public)
????
Notes
-
The returned value must be accurate to within a relative or absolute value of 1E-9
Constraints
-
cancerous will contain between 2 and 50 characters inclusive.
-
The number of elements in scores will equal the number of characters in cancerous.
-
Each character in cancerous will be 'N' or 'C'.
-
Each value in scores will be between -1,000 and 1,000 inclusive.
Examples
0)
????
{3,3,1,8}
"NNNC"
Returns: 0.0
By choosing a threshold of 5, this protocol classifies all of the test cells correctly.
1)
????
{5,2,3,6}
"CCNC"
Returns: 0.125
By choosing the threshold at 2.5, the cells with scores of 2 and 3 will both be misclassified. This will result in a mean_squared_error of (0 + (2.5-2)^2 + (3-2.5)^2 + 0)/4 = 0.125. This is the only threshold that will give such a low mean_squared_error.
2)
????
{5,2,3,6,2}
"CCNCN"
Returns: 0.1
The same threshold is best, but now the sum of the squared errors is divided by 5 instead of 4.
3)
????
{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}
"NNNCNNNCNNNCNCCCCCCC"
Returns: 2.34
This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2003, TopCoder, Inc. All rights reserved.