اجتماع‏یابی صفحات وب در شبکه‏های اینترنتی دارای ویژگی‏ با استفاده از رویکرد برنامه‏ریزی ریاضی

علی نژاد, اسماعیل; تیمورپور, بابک

اجتماع‏یابی صفحات وب در شبکه‏های اینترنتی دارای ویژگی‏ با استفاده از رویکرد برنامه‏ریزی ریاضی

نوع مقاله : مقاله پژوهشی

نویسندگان

اسماعیل علی نژاد ¹

بابک تیمورپور ²

¹ دانشجوی دکترای مهندسی صنایع، دانشکده مهندسی صنایع‏ و سیستم‏های دانشگاه تربیت مدرس

² استادیار گروه مهندسی صنایع، دانشکده مهندسی صنایع‏ و سیستم‏های دانشگاه تربیت مدرس

چکیده

اجتماع‏یابی (کشف اجتماعات) یکی از شاخه‏های نوظهور و پرطرف‏دار در علم داده‌کاوی و تحلیل شبکه‏های اجتماعی است که کاربردهای فراوانی در کشف و تحلیل اجتماع‌ها در سایت‌های اینترنتی، شبکه‌های زیستی، علمی و پژوهشی و غیره دارد. اجتماع‏یابی صفحات اینترنتی می‏تواند به‌طور ویژه به مدیران سایت‏های اینترنتی در تخصیص پهنای بهینه به شبکه‏ صفحات وب تحت نظارتشان کمک کند. در اکثر روش‌های اجتماع‏یابی موجود فقط از توپولوژی شبکه (ارتباطات، یال‏ها) برای گروه‏بندی گره‏ها (صفحات وب) استفاده می‏شود؛ درحالی‌که نتایج پژوهش‏های اخیر نشان داده ‏است که این‏گونه روش‏ها باید به‌گونه‌ای تغییر کند که در آن‌ها علاوه بر توپولوژی، ویژگی‌های ذاتی گره‏ها نیز در فرآیند اجتماع‏یابی لحاظ شود. ازاین‌رو در این مقاله برای اولین بار با لحاظ کردنِ هم‌زمانِ ویژگی‌های ذاتی صفحات وب و ارتباطات میان آن‌ها، یک مدل ریاضی برای کشف اجتماعات در شبکه‏های اینترنتی توسعه داده‌شده است. روش پیشنهادی این پژوهش بدین‌صورت است که برای لحاظ کردن ویژگی‏ها در فرآیند اجتماع‏یابی، ابتدا با استفاده از یک رویکرد ریاضی، میزان شباهتِ صفحات وب به کمک یک سنجه شباهت (مانند جاکارد یا ضریب انطباق) و بردار ویژگی‏ها محاسبه و به‌عنوان وزن به یال‌های موجود بین آن‌ها در شبکۀ اینترنتی موردنظر افزوده می‏شود. با این کار عملاً یک شبکه اینترنتی ویژگی‏دار با یال‌های غیر موزون به یک شبکه بدون ویژگی با یالهای موزون تبدیل می‏شود. سپس با استفاده از یک مدل ریاضی (که مختص شبکه‏هایی با یال‌های موزون است)، اجتماعات موجود در این شبکۀ موزون کشف می‏شود. برای اعتبارسنجی و اثبات کارایی، در قالب آزمون‏های فرض آماری ادعاشده است که کیفیت اجتماعات کشف‏شده توسط رویکرد ریاضی پیشنهادی (که ویژگی‏های صفحات وب را لحاظ می‏کند) به‌طور آماری بهتر از مدل‏های ریاضی پیشین (که از ویژگی‏ها چشم‏پوشی می‏کند) است. نتایج آزمون‏های‏ آماری روی شبکه اینترنتی واقعی نشان می‏دهد که مدل پیشنهادی این پژوهش در حالتی که از معیار جاکارد برای محاسبه میزان شباهت صفحات وب استفاده می‏کند به‌طور معنی‌داری (با P-value=0.01) باعث کشف اجتماعاتی بهتر در قیاس با مدل‏های ریاضی پیشین شده است. همچنین نتایج دیگر آزمون‏‏های آماری نیز نشان می‏دهد که انتخاب سنجۀ شباهتِ متناسب با ماهیت شبکه، تأثیر بسزایی در میزان کیفیت رویکرد پیشنهادی دارد.

کلیدواژه‌ها

اجتماع‏یابی

بهینه‏سازی پودمانگی

توپولوژی شبکه

شبکه اینترنتی

صفحات وب

مدل ریاضی

ویژگی‏های گره‏

عنوان مقاله English

Detecting web communities in attributed internet networks using a mathematical programming approach

نویسندگان English

Esmaeil Alinezhad ¹

Babak Teimourpour ²

¹ Ph.D. Student in Industrial Engineering, Faculty of Industrial and Systems Engineering, Tarbiat Modares University

² Assistant Prof., Faculty of Industrial and Systems Engineering, Tarbiat Modares University

چکیده English

Community detection is one of the emerging and well-known topics in the area of data mining and social network analysis, which has wide variety applications in discovering communities in real-world networks such as biological networks, internet weblogs, scientific and research websites, etc. Web community detection can especially help admins assign the optimal bandwidth to the websites of theirown networks. Most of web community detection approaches only use the network topology to discover the web communities. However, the results of the most recent researches show that traditional community detection methods have to be substantially modified to consider web attributes as well as network topology. Therefore, in this paper, a mathematical programming approach is developed for community detection in attributed internet networks by simultaneously considering both network topology and node attributes. In this approach, first, similarities of web pages are calculated using node attributes and a desired similarity measure and are considered as the weight of the corresponding edges. Then, communities of the resulted weighted network will be detected by the proposed mathematical model. To validate and prove the efficiency, it is hypothesized that the detected communities of the proposed approach have a better quality than that of previous models. Experimental results demonstrate that the proposed approach has the ability to significantly improve the quality of detected web communities, when the model uses the Jaccard index. However, the results of other hypotheses indicate that the correct selection of similarity measure has a significant impact on the quality of the detected communities.

کلیدواژه‌ها English

Community detection

Internet network

Mathematical model

Modularity optimization

Network topology

Node attributes

Web pages

Agarwal, Gaurav, and David Kempe. 2008. “Modularity-Maximizing Graph Communities via Mathematical Programming.” The European Physical Journal B-Condensed Matter and Complex Systems 66 (3): 409–18.

Beiró, Mariano G., Jorge R. Busch, Sebastian P. Grynberg, and J. Ignacio Alvarez-Hamelin. 2013. “Obtaining Communities with a Fitness Growth Process.” Physica A: Statistical Mechanics and Its Applications 392 (9): 2278–93.

Bello-Orgaz, Gema, Sancho Salcedo-Sanz, and David Camacho. 2018. “A Multi-Objective Genetic Algorithm for Overlapping Community Detection Based on Edge Encoding.” Information Sciences 462 (September): 290–314.

Bennetta, Laura, Songsong Liub, Lazaros G. Papageorgioub, and Sophia Tsokaa. 2012. “A Mathematical Programming Approach to Community Structure Detection in Complex Networks.” In Symposium on Computer Aided Process Engineering, 17:20.

Brandes, U., D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski, and D. Wagner. 2008. “On Modularity Clustering.” IEEE Transactions on Knowledge and Data Engineering 20 (2): 172–88.

Chen, Jingchun, and Bo Yuan. 2006. “Detecting Functional Modules in the Yeast Protein-Protein Interaction Network.” Bioinformatics (Oxford, England) 22 (18): 2283–90.

Chen, William Y. C., Andreas W. M. Dress, and Winking Q. Yu. 2008. “Community Structures of Networks.” Mathematics in Computer Science 1 (3): 441–57.

Cruz, Juan David, Cécile Bothorel, and François Poulet. 2011. “Entropy Based Community Detection in Augmented Social Networks.” In Computational Aspects of Social Networks (cason), 2011 International Conference on, 163–68. IEEE.

Fortunato, Santo. 2010a. “Community Detection in Graphs.” Physics Reports 486 (3-5): 75–174.

Fortunato, Santo, and Darko Hric. 2016. “Community Detection in Networks: A User Guide.” Physics Reports 659 (November): 1–44.

Hric, Darko, Richard K. Darst, and Santo Fortunato. 2014. “Community Detection in Networks: Structural Communities versus Ground Truth.” Physical Review E 90 (6).

Liu, Chuang, Linan Fan, Zhou Liu, Xiang Dai, Jiamei Xu, and Baoren Chang. 2018. “Community Detection in Complex Networks by Using Membrane Algorithm.” International Journal of Modern Physics C 29 (01): 1850003.

Liu, Yan, Alexandru Niculescu-Mizil, and Wojciech Gryc. 2009. “Topic-Link LDA: Joint Models of Topic and Author Community.” In Proceedings of the 26th Annual International Conference on Machine Learning, 665–72. ACM.

Li, Wenye. 2013. “Revealing Network Communities with a Nonlinear Programming Method.” Information Sciences 229 (April): 18–28.

Li, Zhen, Zhisong Pan, Guyu Hu, Guopeng Li, and Xingyu Zhou. 2017. “Detecting Semantic Communities in Social Networks.” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E100.A (11): 2507–12.

Neville, Jennifer, Micah Adler, and David Jensen. 2003. “Clustering Relational Data Using Attribute and Link Information.” In Proceedings of the Text Mining and Link Analysis Workshop, 18th International Joint Conference on Artificial Intelligence, 9–15.

Newman, MEJ, and M Girvan. 2004. “Finding and Evaluating Community Structure in Networks.” Physical Review E, 1–16.

Pool, Simon, Francesco Bonchi, and Matthijs van Leeuwen. 2014. “Description-Driven Community Detection.” ACM Transactions on Intelligent Systems and Technology (TIST) 5 (2): 28.

Qin, Meng, Di Jin, Dongxiao He, Bogdan Gabrys, and Katarzyna Musial. 2017. “Adaptive Community Detection Incorporating Topology and Content in Social Networks.” In , 675–82. ACM Press.

Said, Anwar, Rabeeh Ayaz Abbasi, Onaiza Maqbool, Ali Daud, and Naif Radi Aljohani. 2018. “CC-GA: A Clustering Coefficient Based Genetic Algorithm for Detecting Communities in Social Networks.” Applied Soft Computing 63 (February): 59–70.

Sheldon, Prof Ben. 2010. “Community Detection Algorithms : A Comparative Evaluation on Artificial and Real-World Networks.”

Steinhaeuser, Karsten, and Nitesh V. Chawla. 2008. “Community Detection in a Large Real-World Social Network.” In Social Computing, Behavioral Modeling, and Prediction, 168–75. Springer.

Villa-Vialaneix, Nathalie, Madalina Olteanu, and Christine Cierco-Ayrolles. 2013. “Carte Auto-Organisatrice Pour Graphes Étiquetés.” In Atelier Fouilles de Grands Graphes (FGG)-EGC’2013, Article – numéro.

Wu, Peng, and Li Pan. 2018. “Mining Application-Aware Community Organization with Expanded Feature Subspaces from Concerned Attributes in Social Networks.” Knowledge-Based Systems 139 (January): 1–12.

Xiong, Lu, Kangshun Li, and Lei Yang. 2018. “A Parallel Immune Genetic Algorithm for Community Detection in Complex Networks.” International Journal of High Performance Computing and Networking 11 (3): 242–50.

Xu, Gang, Laura Bennett, Lazaros G. Papageorgiou, and Sophia Tsoka. 2010. “Module Detection in Complex Networks Using Integer Optimisation.” Algorithms for Molecular Biology 5: 36.

Xu, G., S. Tsoka, and L. G. Papageorgiou. 2007. “Finding Community Structures in Complex Networks Using Mixed Integer Optimisation.” The European Physical Journal B 60 (2): 231–39.

Xu, Zhiqiang, Yiping Ke, Yi Wang, Hong Cheng, and James Cheng. 2012. “A Model-Based Approach to Attributed Graph Clustering.” In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, 505–16. ACM.

Zhou, Yang, Hong Cheng, and Jeffrey Xu Yu. 2009. “Graph Clustering Based on Structural/attribute Similarities.” Proceedings of the VLDB Endowment 2 (1): 718–29.