Patel, Shivam Dimplekumar (2021) Augmenting Network Performance Datasets with Weather, Sports, and Social Media Data for Improved Predictions. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
1MBPatel_MCompSc_F2021.pdf - Accepted Version |
Abstract
Understanding network performance enables network providers to manage their network
better. Network performance degradation can lead to network service issues causing monetary
loss and customer churn for the network providers. Accurate network performance
prediction potentially enables proactive resource allocation to attenuate the anticipated network
performance degradation and associated service issues. Previous literature attempted
to predict network performance using historical network data. However, real-world network
performance is impacted by various external factors. Existing literature fails to consider
such external factors that can improve the understanding and predictions of the network
performance. This thesis aims to examine if external factors can improve the network
performance understanding and predictions. To this end, we inspect the correlation of
network performance data with various external data sources such as weather parameters,
sports events, and social media posts. Then, we perform network performance data augmentation
using the contextual information in such external data. We investigate the network
performance prediction improvements using Recurrent Neural Network (RNN) with
Long Short Term Memory (LSTM) units after data augmentation. Predictive experiments
with data augmentation using NFL sports events highlight a 23% improvement in the network
performance predictions. Data augmentation using other external sources considered
fails to improve the network performance predictions.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Patel, Shivam Dimplekumar |
Institution: | Concordia University |
Degree Name: | M. Comp. Sc. |
Program: | Computer Science |
Date: | 4 June 2021 |
Thesis Supervisor(s): | Glatard, Tristan and Jaumard, Brigitte |
Keywords: | external factors affecting network performance, telecom networks, data augmentation with NFL events, Predictive analysis, Long Short Term Memory models, improved packet loss prediction |
ID Code: | 988497 |
Deposited By: | Shivam Dimplekumar Patel |
Deposited On: | 29 Nov 2021 17:09 |
Last Modified: | 29 Nov 2021 17:09 |
References:
[1] 2018 Ratings Wrap: NFL Laps the Field, Again. https://www.sportsmediawatch.com/2019/01/top-sports-audiences-2018-
list/. (accessed: 2020-05-30).
[2] Frontier weather - dtn. https://www.dtn.com/weather/utilities/
frontier-weather/?utm_campaign=frontierweather. (accessed:
2020-05-05).
[3] Percentage of households with internet use in the united states from 1997
to 2019. https://www.statista.com/statistics/189349/ushouseholds-
home-internet-connection-subscription. (accessed:
2020-05-20).
[4] Sports reference | sports stats, fast, easy, and up-to-date. https://www.sportsreference.
com/. (accessed: 2020-06-07).
[5] Sports Tourism: State of the Industry Report. https://www.
sportseta.org/portals/sportscommissions/Documents/
Reports/TourismEconomics%20-%20Sports%20ETA%20SOTI%20-
%20FINAL_82620.pdf. (accessed: 2020-05-31).
[6] statistical models, hypothesis tests, and data exploration. https://www.
statsmodels.org/stable/index.html. (accessed: 2020-05-17).
[7] Top 10 most popular sports in america 2021 (tv ratings). https://sportsshow.
net/most-popular-sports-in-america/. (accessed: 2020-06-07).
[8] Twint - twitter intelligence tool. https://github.com/twintproject/
twint#readme, 2018.
[9] A. K. Ahmad, A. Jafar, and K. Aljoumaa. Customer churn prediction in telecom
using machine learning and social network analysis in big data platform. CoRR,
abs/1904.00690, 2019.
[10] M. Azeem, M. Usman, and A. C. M. Fong. A churn prediction model for prepaid
customers in telecom using fuzzy classifiers. Telecommunication Systems: Modelling,
Analysis, Design and Management, 66(4):603–614, December 2017.
[11] A. Bhorkar, K. Zhang, and J. Wang. Deepauto: A hierarchical deep learning framework
for real-time prediction in cellular networks, 2019.
[12] J. Blanford, Z. Huang, A. Savelyev, and A. MacEachren. Geo-located tweets. enhancing
mobility maps and capturing cross-border movement. PLoS One, 10(6), June
2015. Publisher Copyright: © 2015 Blanford et al. This is an open access article distributed
under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original
author and source are credited.
[13] E. Brynjolfsson, L. M. Hitt, and H. H. Kim. Strength in numbers: How does datadriven
decisionmaking affect firm performance? SSRN Electronic Journal, Apr 2011.
[14] C. Charlebois and J. Stevens. MORE THAN MONEY: Leveraging the Benefits of
Sport Hosting in Niagara. Policy Brief 30, Brock University, Niagara Community
Observatory, Centre for Sport Capacity. (accessed: 2020-06-05).
[15] A. Cheema and V. M. Patrick. Influence of warm versus cool temperatures on
consumer choice: A resource depletion account. Journal of Marketing Research,
49(6):984–995, 2012.
[16] C. Gratton, S. Shibli, and R. Coleman. The economic impact of major sports events:
A review of ten events in the uk. The Sociological Review, 54(2_suppl):41–58, 2006.
[17] J. He, W. Shen, P. Divakaruni, L. Wynter, and R. Lawrence. Improving traffic prediction
with tweet semantics. In Proceedings of the Twenty-Third International Joint
Conference on Artificial Intelligence, IJCAI ’13, page 1387–1393. AAAI Press, 2013.
[18] A. Hébert, T. Guédon, T. Glatard, and B. Jaumard. High-resolution road vehicle
collision prediction for the city of montreal. CoRR, abs/1905.08770, 2019.
[19] M. Hermans and B. Schrauwen. Training and analysing deep recurrent neural networks.
In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger,
editors, Advances in Neural Information Processing Systems, volume 26. Curran
Associates, Inc., 2013.
[20] A. Heyes and S. Saberian. Temperature and decisions: Evidence from 207,000 court
cases. American Economic Journal: Applied Economics, 11(2):238–65, April 2019.
[21] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation,
9(8):1735–1780, 1997.
[22] R. Hofstede, I. Drago, G. C. M. Moura, and A. Pras. Carrier ethernet oam: An
overview and comparison to ip oam. In Proceedings of the 5th International Conference
on Autonomous Infrastructure, Management, and Security: Managing the
Dynamics of Networks and Services, AIMS’11, page 112–123, Berlin, Heidelberg,
2011. Springer-Verlag.
G. Ian, B. Yoshua, and A. Courville. Deep Learning. MIT Press, 2016. http:
//www.deeplearningbook.org.
[24] A. S. Khatouni, F. Soro, and D. Giordano. A machine learning application for latency
prediction in operational 4g networks. In 2019 IFIP/IEEE Symposium on Integrated
Network and Service Management (IM), pages 71–74, 2019.
[25] A. Klein, C. Craun, and R. S. Lee. Airport delay prediction using weather-impacted
traffic index (witi) model. In 29th Digital Avionics Systems Conference, pages 2.B.1–
1–2.B.1–13, 2010.
[26] A. Koesdwiady, R. Soua, and F. Karray. Improving traffic flow prediction with
weather information in connected cars: A deep learning approach. IEEE Transactions
on Vehicular Technology, 65(12):9508–9517, 2016.
[27] J. Krumm, A. L. Kun, and P. Varsányi. Tweetcount: Urban insights by counting
tweets. In Proceedings of the 2017 ACM International Joint Conference on Pervasive
and Ubiquitous Computing and Proceedings of the 2017 ACM International Symposium
on Wearable Computers, UbiComp ’17, page 403–411, New York, NY, USA,
2017. Association for Computing Machinery.
[28] K. Leetaru. Is twitter really faster than the news? Forbes, Feb 2019.
[29] S. Li, A. Blake, and R. Thomas. Modelling the economic impact of sports events:
The case of the beijing olympics. Economic Modelling, 30:235–244, 2013.
[30] R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica. Tune:
A research platform for distributed model selection and training. arXiv preprint
arXiv:1807.05118, 2018.
[31] A. Martín, A. B. A. Julián, and F. Cos-Gayón. Analysis of twitter messages using
big data tools to evaluate and locate the activity in the city of valencia (spain). Cities,
86:37–50, 2019.
[32] R. G. Miller. Simultaneous statistical inference. Springer, 1981.
[33] S. A. Myers, A. Sharma, P. Gupta, and J. Lin. Information network or social network?
the structure of the twitter follow graph. In Proceedings of the 23rd International
Conference on World Wide Web, WWW ’14 Companion, page 493–498, New York,
NY, USA, 2014. Association for Computing Machinery.
[34] C. Nadeau and Y. Bengio. Inference for the generalization error. Mach. Learn.,
52(3):239–281, Sept. 2003.
[35] A. Nikravesh, D. R. Choffnes, E. Katz-Bassett, Z. M. Mao, and M. Welsh. Mobile
network performance from user devices: A longitudinal, multidimensional analysis.
In M. Faloutsos and A. Kuzmanovic, editors, Passive and Active Measurement, pages
12–22, Cham, 2014. Springer International Publishing.
J. Pamina, J. Beschi Raja, S. Sam Peter, S. Soundarya, S. Sathya Bama, and M. S.
Sruthi. Inferring machine learning based parameter estimation for telecom churn prediction.
In S. Smys, J. M. R. S. Tavares, V. E. Balas, and A. M. Iliyasu, editors, Computational
Vision and Bio-Inspired Computing, pages 257–267, Cham, 2020. Springer
International Publishing.
[37] L. Pierucci, A. Romoli, R. Fantacci, and D. Micheli. An optimized neural network for
monitoring key performance indicators in hsdpa. In 21st Annual IEEE International
Symposium on Personal, Indoor and Mobile Radio Communications, pages 2041–
2045, 2010.
[38] M.-T. Puth, M. Neuhäuser, and G. D. Ruxton. Effective use of spearman’s and
kendall’s correlation coefficients for association between two measured traits. Animal
Behaviour, 102:77–84, 2015.
[39] D. Raca, A. H. Zahran, C. Sreenan, R. Sinha, E. Halepovic, R. Jana, and V. Gopalakrishnan.
Back to the future: Throughput prediction for cellular networks using radio
kpis. In HotWireless ’17, 2017.
[40] S. Ranjan, S. Sood, and V. Verma. Twitter sentiment analysis of real-time customer
experience feedback for predicting growth of indian telecom companies. In 2018 4th
International Conference on Computing Sciences (ICCS), pages 166–174, 2018.
[41] Z. Sayeed, E. Grinshpun, D. Faucher, and S. Sharma. Long-term application-level
wireless link quality prediction. In 2015 36th IEEE Sarnoff Symposium, pages 40–45,
2015.
[42] Student. The probable error of a mean. Biometrika, 6(1):1–25, 1908.
[43] R. Toledo. The average household’s internet data usage has jumped 38x in 10
years. https://decisiondata.org/news/report-the-averagehouseholds-
internet-data-usage-has-jumped-38x-in-10-
years/. (accessed: 2020-05-05).
[44] S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen. Microblogging during two natural
hazards events: What twitter may contribute to situational awareness. In Proceedings
of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’10, page
1079–1088, New York, NY, USA, 2010. Association for Computing Machinery.
[45] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau,
E. Burovski, P. Peterson, W.Weckesser, J. Bright, S. J. van derWalt, M. Brett, J.Wilson,
K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J.
Carey, ˙I. Polat, Y. Feng, E.W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman,
I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro,
F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Contributors. SciPy 1.0: Fundamental
Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020.
[46] X. Wang, M. S. Gerber, and D. E. Brown. Automatic crime prediction using events
extracted from twitter posts. In S. J. Yang, A. M. Greenberg, and M. Endsley, editors,
Social Computing, Behavioral - Cultural Modeling and Prediction, pages 231–238,
Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
[47] B. L. Welch. The significance of the difference between two means when the population
variances are unequal. Biometrika, 29(3/4):350–362, 1938.
[48] Wikipedia. Walk forward optimization. https://en.wikipedia.org/wiki/
Walk_forward_optimization. (accessed: 2020-07-10.
[49] Y. Yang, Z. Wei, Q. Chen, and L. Wu. Using external knowledge for financial event
prediction based on graph neural networks. In Proceedings of the 28th ACM International
Conference on Information and Knowledge Management, CIKM ’19, page
2161–2164, New York, NY, USA, 2019. Association for Computing Machinery.
Repository Staff Only: item control page