This research proposes an efficient model for predicting the survival rate of patients affected by lung cancer. The researchers collected data from four feature categories (population, recognition, treatment, and result) of cancer patients based on the importance of the survival of patients with lung cancer. Analyses of the predicted survival rates of the patients indicate that, among the classification algorithms, Decision Tree C5.0 results the highest accuracy. The models were created using algorithms based on the level of death risk in five stages: six months, nine months, one year, two years, and five years. In this paper, we proposed a mechanism for feature selection. Our mechanism combines the results of some feature section algorithm. The results illustrate that out mechanism outperform other feature selection algorithms. After applying the proposed mechanism for feature selection, the accuracy of the C5.0 algorithm was equivalent to 97.93%.
This study proposes an Ensemble feature selection algorithm for predict survival of patients with lung cancer.
L. GloecklerRies, A. M. Reichman, D. Lewis, R. B. F. Hankey, and B. K. Edwards, "Cancer survival and incidence from the surveillance, epidemiology, and end results (SEER) Program," Oncologist, 2003.
A. Ankit, M. Sanchit, N. Ramanathan, P. Lalith, and C. Alok, "A lung cancer outcome calculator using ensemble data mining on SEER data," Electrical Engg. and Computer Science Northwestern University, 2011.
K. Lang, J. Korn, D. W. Lee, L. M. Lines, C. C. Earle, and J. Menzine, "BMC Cancer, USA," 2009.
S. Palaniappan and A. Rafiah, "Intelligent heart disease prediction system using data mining techniques. Department of information technology Malaysia university of science and technology," 2008.
D. Delen, G. Walker, and A. Kadam, "Predicting breast cancer survivability: A comparison of three data mining methods," Artificial Intelligence in Medicine, vol. 34, pp. 113-127, 2005.
M. Lundin, J. Lundin, H. BurkeB, S. Toikkanen, L. Pylkkänen, and H. Joensuu, "Artificial neural networks applied to survival prediction in breast cancer," Oncology International Journal for Cancer Resaerch and Treatment, vol. 57, pp. 281-286, 1999.
C. Shearer, "The CRISP-DM model: The new blueprint for data mining," J. Data Warehousing, vol. 5, pp. 13-22, 2000.
M. Kantardzic, Data mining: Concepts, models, methods, and algorithms, 2nd ed. Simltaneously in Canada: WILEY, 2011.
SEER, "Surveillance, epidemiology, and end results (SEER) program (www.seer.cancer.gov) limited-use data (1973-2006)," National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch2009.
M. Green and M. Ohlsson, "Comparision of standard resampling methods for performance estimation of artificial neural network ensembles. Computational biology and biological physics group, department of theoretical physics, Lund University," 2006.