Sep 19, 2018 Download Contoh Skripsi data mining 1 penerapan data mining untuk memprediksi klasifikasi jumlah.
![]()
IRWANTO, NIM. 12650064 (2016)PENERAPAN DATA MINING UNTUK MENGETAHUI POLAPEMILIHAN PROGRAM STUDI MAHASISWA BARU UIN SUNANKALIJAGA MENGGUNAKAN ALGORITMA K-MEANS CLUSTERING. Skripsi thesis, UIN SUNAN KALIJAGA YOGYAKARTA.
Abstract
The admission process for new students at State Islamic University ofSunan Kalijaga abundant produce data that covers personal data of students . It willcontinue to take place every year so that the data stored in the database will be manymore . It is unfortunate if the data are not put to good use as a positive thing for theuniversity.This study uses data mining application with the k-means clusteringmethods in order to know the pattern of election of a new study program for studentsin the Islamic State University of Sunan Kalijaga. The raw data that has beenobtained is then carried out pre-processing data that includes data cleansing, dataintegration, data selection and transformation of data. Then after the raw datathrough these stages, the next step is to do data mining techniques using k-meansclustering algorithm. Where in this stage, the data are similar and the samecharacteristics are grouped within a particular cluster. Attributes that are used inthis technique is a program of study, majors in schools, and The origin of the school.Once the data mining process, there are three clusters are formed. Since eachcluster that can be seen voting patterns of students to courses. The tendency to choosecan be seen in the first cluster, where the cluster is a program of study that is most indemand by students. From the data as many as 5705 students, 2299 students arecontained in the first cluster, there are 2101 students in the second cluster and 1305students entered in the third cluster. From the results of this study can be seen that thefirst cluster is the highest value, so the tendency of students to choose courses at UINSunan Kalijaga can be determined by looking at the data in the first cluster and follow thesecond and third.
![]() Actions (login required)
Pendekatan Level Data untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat SoftwareAbstract
Dataset software metrics secara umum bersifat tidak seimbang, hal ini dapat menurunkan kinerja model prediksi cacat software karena cenderung menghasilkan prediksi kelas mayoritas. Secara umum ketidakseimbangan kelas dapat ditangani dengan dua pendekatan, yaitu level data dan level algoritma. Pendekatan level data ditujukan untuk memperbaiki keseimbangan kelas, sedangkan pendekatan level algoritma ditujukan untuk memperbaiki algoritma atau menggabungkan (ensemble) pengklasifikasi agar lebih konduktif terhadap kelas minoritas. Pada penelitian ini diusulkan pendekatan level data dengan resampling, yaitu random oversampling (ROS), dan random undersampling (RUS), dan mensintesis menggunakan algoritma FSMOTE. Pengklasifikasi yang digunakan adalah Naϊve Bayes. Hasil penelitian menunjukkan bahwa model FSMOTE+NB merupakan model pendekatan level data terbaik pada prediksi cacat software karena nilai sensitivitas dan G-Mean model FSMOTE+NB meningkat secara signifikan, sedangkan model ROS+NB dan RUS+NB tidak meningkat secara signifikan.
References
Anantula, P. R., & Chamarthi, R. (2011). Defect Prediction and Analysis Using ODC Approach in a Web Application. (IJCSIT) International Journal of Computer Science and Information Technologies, 2(5), 2242-2245.
Attenberg, J., & Ertekin, S. (2013). Class Imbalance and Active Learning. In H. He, & Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications (pp. 101-149). New Jersey: John Wiley & Sons.
Batuwita, R., & Palade, V. (2010). Efficient Resampling Methods for Training Support Vector Machines with Imbalanced Datasets. Proceedings of the International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). Barcelona: IEEE Computer Society. doi:10.1109/IJCNN.2010.5596787
Bramer, M. (2007). Principles of Data Mining. London: Springer.
Carver, R. H., & Nash, J. G. (2012). Doing Data Analysis with SPSS® Version 18. Boston: Cengage Learning.
Catal, C. (2012). Performance Evaluation Metrics for Software Fault Prediction Studies. Acta Polytechnica Hungarica, 9(4), 193-206.
Chis, M. (2008). Evolutionary Decision Trees and Software Metrics for Module Defects Identification. World Academy of Science, Engineering and Technology, 273-277.
Corder, G. W., & Foreman, D. I. (2009). Nonparametric Statistics for Non-statisticians: A Step-by-step Approach. New Jersey: John Wiley & Sons.
Demšar, J. (2006). Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, 1–30.
Dubey, R., Zhou, J., Wang, Y., Thompson, P. M., & Ye, J. (2014). Analysis of Sampling Techniques for Imbalanced Data: An n = 648 ADNI Study. NeuroImage, 220–241.
Fakhrahmad, S. M., & Sami, A. (2009). Effective Estimation of Modules' Metrics in Software Defect Prediction. Proceedings of the World Congress on Engineering (pp. 206-211). London: Newswood Limited.
Galar, M., Fernández, A., Barrenechea, E., & Herrera, F. (2013). EUSBoost: Enhancing Ensembles for Highly Imbalanced width='360'>
![]() Comments are closed.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
December 2022
Categories |