5G Network Coverage Hole Prediction and Detection Using Machine Learning

Tobechukwu Chidozie Obiefuna; Bourdillon Odianonsen Omijeh

doi:10.24018/ejeng.2024.9.1.3102

Tobechukwu Chidozie Obiefuna

Bourdillon Odianonsen Omijeh

Abstract Views
226

Downloads
96

Citations

Share

Submitted Aug 29, 2023
Published Jan 18, 2024

10.24018/ejeng.2024.9.1.3102

Abstract viewed = 226 times

Abstract

A signal-free area in a wireless network is called a coverage hole (CH). The signal at this location is either nonexistent or too weak to be detected or monitored. There may sometimes be coverage gaps or places with poor radio frequency (RF) performance due to wireless infrastructure components’ inability to adapt to changing RF dynamics and offer adequate coverage of the locations. Finding coverage gaps and RF problem spots needs a client-side approach rather than the traditional infrastructure-driven solution because of the importance of network intuition. This article’s goal is to locate coverage gaps or weak signal places in a variety of scenarios, including 5G KPIs and QoS parameters (QCI, or quality of service class identifier). The primary objective is to apply classification techniques to determine which use cases or network slices are impacted by the decreased signal strength. Training and test datasets for supervised machine learning techniques are pre-collected measured report data from a live 5G network monitoring counter and data system. Since most KPIs are numerical data, the study uses the classification methods ANN, RF, NB, and LR. This is not at all like the traditional methods—such as driving tests, etc.—for gathering data for coverage-hole detection. Orange Canvas and Microsoft Excel are two instances of data mining technologies that are used for both detection and prediction.

Keywords: 5G coverage hole machine learning quality class identifier

Downloads

Download data is not yet available.

Introduction

In older cellular networks, drive testing, customer complaints, and software/hardware alerts have also been utilized to detect CHs [1]. These approaches are costly, labor-intensive, and unreliable. In an effort to solve some of these problems, the 3GPP standardizes the minimization of drive test (MDT) approach [2]. The measurement data from user equipment (UEs), which include RSS and geographic location, may be used by the serving base station (BS) to enhance the CH collection process by creating MDT coverage maps. The MDT could, however, continue to run into problems with UE reporting and positioning/quantization. If location services are not accessible, localization approaches such as those described by [3] must be applied, which may lead to erroneous coverage maps.

The detection of network coverage gaps has been the subject of several research to date, using a range of methodologies, including data-mining tools. For the first, as it is based on an analysis of the extended Radio Link Failure (RLF) triggering report (event-triggered report), field measurements are not required [4]. This method, however, is limited to identifying the worst-case scenarios resulting from connection failures; it cannot recognize other scenarios, such as signal deterioration before a breakdown, or other data seen in regular updates.

Evaluation of ML Algorithms

Among other things, preprocessing the data leads to the production of a class and a model. The performance of the model is then evaluated using test datasets. The model is validated, and the efficacy of the inducer model is evaluated for potential future application using a variety of machine learning (ML) metrics and measurements [5]. Well-known assessment metrics for classifier performance include the confusion matrix, accuracy, precision, and recall-measure in addition to the Receiver Operating Characteristics (ROC) curve [6] are seen in Fig. 1.

Confusion Matrix

The confusion matrix contains both the actual data and the model’s projected classes. The raw data generated by the classification scheme during testing includes the number of accurate and inaccurate classifications for each class. The basic performance of the classifier may be assessed or measured by comparing the expected and actual labels. The confusion matrix’s diagonal cells display the possibility that a sample will be correctly categorized or the result of a true-positive test. A false positive result from a test indicates that the samples were mislabeled and categorized. Certain words—both positive and negative—may be essential to determining the classification model’s performance in a batch of double-class labeled data. In [6], the following are defined:

Positive occurrences that the model classifier accurately classified as positive are known as True Positives (TP),
The negative occurrences that the algorithm correctly identified as negatives are known as True Negatives (TN),
Negative occurrences that the classifier mistakenly reported as positive are known as False Positives (FP),
Positive occurrences that the classifier inadvertently misclassified as negative are known as false negatives (FN).

A confusion matrix contains the data needed to investigate the proper operation of a classification model. A confusion matrix may include the data needed to assess the relative performance of many models, yet it is preferable to utilize this data to create a single figure that sums up each model’s relative performance. This may be accomplished by using evaluation parameters, which are calculated as follows: f-measure, accuracy, precision, and recall.

Accuracy

Equation (1) states that it is computed as the ratio of all successfully detected occurrences to the total number of instances. It is denoted here as AC: (1)AC=(TP+TN)(TP+TN+EP+FN)

Precision (Positive Predictive Value)

It shows the ratio of all properly identified positive cases to all instances of mistakenly classified positive cases. (2) may be used to derive positive predictive values, which are also represented by the sign P: (2)P=TP(TP+FP)

Recall (Positive Sensitivity Value)

The percentage of correctly categorized positive occurrences to all positive events is shown. It’s called the positive sensitivity value a lot. It may be calculated using (3): (3)Recall (R)=TP(TP+FN)

F-Measure

It’s a model measure that may be used when achieving a balance between recall and accuracy is necessary: (4)F−measure=2(1precision+1Recall)

Receiver Operating Characteristics (ROC)

This classifier model statistic provides a visual representation of the trade-off between true positive and false positive rates. The True Positive Rate (TPR) is the dependent variable and the False Positive Rate (FPR) is the independent variable in the two-dimensional graph. In the section that follows, the classifier (model) that predicts the target classes will be valued using the matrices that were previously provided.

An analytical method for evaluating the relationship between predictions and observations over many percentiles, often deciles, of the predicted values is the calibration plot. The calibration plot function generates calibration graphs based on the observations and predictions columns of a given set of data. The secondary data was collected from the mobile site, which provides the monthly, quarterly, and yearly 5G KPI data together with a Google map displaying the selected geographic areas with the number of Node-Bs and their dispersion. As location data based on the designated area, latitude and longitude are retained. 389.89 million mobile subscribers, 251.04 million 5G packet users, 178.68 million wireline internet users, and 105.50 million active access lines make up the total figures gathered as of September 2022.

Definition of Threshold Values

When the characteristics of a mobile service’s radio signal strength exceed the lowest levels permitted for its operation, the service is considered active. But since different operators, suppliers, and technology services have different demands, there are differences in the requirements [7]. For instance, [8] uses criteria below −95 dBm and −15 dB, respectively, to categorize WCDMA coverage regions as deprived for signal strength and signal quality. In addition, [9] sets down many criteria by which 19 European countries may decide whether to provide outdoor coverage (covered/not covered).

KPI Definitions

Every KPI includes a definition, a summary of the legacy network, and both a qualitative and quantitative justification.

The whole user data volume moved to and from end-user equipment during a predefined time period divided by the area covered by the RAN(s) radio nodes yields the traffic volume density. In multi-hop systems, user data is counted simply once. The goal of METIS (Mobile and Wireless Communications Enablers for the Twenty-First Information Society) is to boost traffic volume density on current networks by a factor of ten thousand. There is a tight relationship between this KPI and this goal.

Experienced user throughput is the average data throughput generated over a certain period of time by an end-user device on the MAC layer (user plane alone). A user’s quality of experience (QoE) with the service they are using might be inferred from this data. However, because of extra expenses related to protocols and/or higher-level traffic control (PDCP and RLC at LTE, IP, TCP/UDP/SCTP, etc.), the data rate of the service application is not as high as the throughput of an experienced user. User performance in real-world circumstances is influenced by test case arrangement, user volume, and data production. In a radio network, these factors also influence nearby cell interference and cell load.

Lag time is the amount of time that it takes for a data packet to be sent and reach its destination, which is known as latency. Round-trip time (RTT) latency, which measures the amount of time it takes for a transmitting entity to get confirmation from a receiving entity (such an internet server or other device), is an alternate method of measuring delay. The MAC layer serves as the measuring reference in each case. This does not address higher-level processing times, such as those at the application layer for encoding and decoding video and audio. Although these changes are based on the test situation, latency is generally affected by the whole network—radio, core, and backhaul/aggregation. The user data plane is the only consideration in the assessment.

One of the assessment criteria used to characterize a radio link connection’s quality in order to meet a certain service level is reliability.

The coverage area is defined as the proportion of locations where availability, an evaluation criterion, satisfies the user’s desired quality of experience. This is computed by dividing the uptime by the total minutes in the specified time frame.

Every METIS innovation, including the system’s overall architecture, is evaluated for energy efficiency using the energy consumption meter.

Unless otherwise indicated in the test scenarios, cost refers to any extra monies needed for the new METIS solution. Consequently, regardless of how much of the present infrastructure is expanded or utilized by the legacy network, its cost is not included in the METIS solution.

Coverage Scenario (Target Class) Definition

The percentage of lost packets over a certain period of time is known as the packet data loss rate, often known as dependability. The audio quality is instantly impacted by packet loss, which may range from a single, minor loss that has no effect to several burst losses that result in the audio cutting out entirely. Less than 1% should be the goal for every 15-second period.

Latency: The time lag between transmitting and receiving data is minimal with 5G technology. With 5G, 200 ms are cut down to 1 ms. According to the test case scenarios, a payload weighing 1521 bytes and having a 99.999% dependability would need less than 8 ms.

System Process

The computation starts with gathering and pre-processing the measurement data provided during communication (MR) by the terminal and base station (BS) equipment. This is done after the thresholds and coverage scenarios have been established. Next, the data is split into testing and training sets. The model framework includes testing, evaluations, data application, and model learning. The model then groups the necessary target classes or coverage possibilities, as shown below. Now, let’s look at the coverage scenarios that illustrate the different coverage classes in practical settings.

Classification and Model Evaluation

After preprocessing, the data was categorized using the ANN, RF, LR, and NB algorithms. There will be documentation of the classification dataset’s dimensions. In these datasets, the training set consists of exactly 70% of the randomly selected data values, while the testing set consists of the remaining 30%. Here, after the trained model’s assessment on a test data set, the maximum depth and maximum leaf node setting for the ANN optimization are adjusted. We compare the original labels, reported data, and predicted outcomes of ANN classification with RF, NB, and LR in order to verify the classifier’s output.

First example of coverage (Class-1)

As seen in Table I, this condition displays the region where network coverage (PDLR) and signal quality (Latency) are below the threshold. This indicates that the latency is 8 ms and the PDLR is less than 100%. It proves that in the impacted areas, there are no problems with network coverage gaps or signal quality.

Table I. Nature of Numeric Dataset Prior to Pre-Processing
Parameter	Technology supported	LTE-5GUE	QCI	Packet loss rate	Packet delay budget-latency
Type	Numeric	Numeric	Numeric	Numeric	Numeric
Unique values	299	4207	13	3	3
Has duplicates	True	True	True	True	True
Sorted	False	False	False	False	False
Missing count	299	4186	0	0	0
Minimum	5	0	1	0.000001	10
Maximum	5	20	70	0.01	300
Mean	5	10.45	24.2	0.003	80.2
Median	5	1	7	0.001	50
Mode	5	10	1	0.010	10
Standard dev.	0	5.796	28.949	0.004	108

Second coverage scenario (Class-2)

As shown in Table I, this scenario shows the coordinates where the minimal requirements for network coverage (PDLR) and signal quality (Latency) are not met. This demonstrates that the latency and PDLR are over 8 ms and above 100%, respectively. It suggests that signal quality and network coverage gaps are problems at the sites.

Prediction of Coverage Hole using ML Algorithms

Quantitative Result

The quantitative result shows that the coverage hole was found by using the quantitative aim variable, such as packet data loss rate (reliability).

Fig. 2 forecasts the coverage hole based on delay and the packet loss rate for different types of slices. The coverage hole is not far from the red and green plotted spots. The colors green, red, and blue stand for the worst-case scenario, the coverage hole, and the signal’s compliance with the required threshold, respectively.

A feature of the 5G network architecture called Ultra-reliable Low-Latency Communication (URLLC) allows for more effective scheduling of data delivery. This includes the ability to schedule overlapping broadcasts and shorter transfers with bigger subcarriers. Some packets are lost in transit under “High Packet Loss and Low Latency,” but they finally reach their destination in a reasonable length of time. Most of the time, packet loss is mostly caused by network congestion. Massive Machine is referred to as mMTC, while Enhanced Mobile Broadband is known as eMBB. The effect of slice users’ average arrival rate on the overall performance of the network Fig. 2 shows the influence of eMBB data packet magnitude on the network.

The QoS Class Identifier (QCI) mechanism is used by 3GPP Long Term Evolution (LTE) networks to ensure that carrier traffic is provided with the appropriate Quality of Service (QoS). The QoS requirements for various kinds of carrier traffic dictate how much the QCI value fluctuates. Fig. 3 shows the relationship between the Quality of service class identifier and the packet loss rate. Non-privileged subscribers may select UEs/PDNs with QCI values of 9 as their default carriers. QCIs are specifications for service quality, including resource type, priority, latency, and packet loss rate. various QCIs may be used to represent various QoS needs and different QCIs can be used to establish varied QoS demands. When QCIs are transferred between network components, specific QoS value passing and negotiation are prohibited. In compliance with QCIs, the network components may, therefore, control the QoS requirements of resource type, priority, latency, and packet loss rate of services.

To access several components of Quality of Service (QoS), including packet error rate, packet latency, and priority level, one might utilize the 5QI (5G QoS Identifier). The qualities of QoS may be divided into two categories: standardized and non-standardized.

Qualitative Result

Based on quantitative goal criteria such as latency with a threshold of less than 8 ms, the quantitative result shows that a coverage hole was found. The image shows how the features may be used to efficiently classify the target classes by modifying the software’s output (y-axis) settings to meet the specifications. This facilitates the process of figuring out the underlying cause and beginning to forecast the future.

Based on the red and green hues, the scatter plot in Fig. 4 shows a bigger delay of <50 ms and <300 ms. The needed delay of less than 8 ms is not met by this range. The test scenarios for gaming, industry, IoT, and transportation contain the latency number(s) that are over the threshold.

The packet loss rate is used in Fig. 5 to discover coverage holes, and the places that exceed the limit are shown by the redand green hues.

Fig. 6 shows how the packet loss rate is impacted by eMBB for smartphones, mMTC for Internet of Things devices and smart cities, and URLLC for the health care test case. For use cases related to transportation and industry, it also relies on mMTC. By adopting a greater subcarrier spacing or by overlapping the transmissions in the neighborhood of the healthcare user, the URLLC service in 5G assures short transmissions. When there are a lot of IoT devices, the mMTC displays visible congestion in the cellular random-access channel. Even when smartphone network traffic is heavy, you can connect to several linked devices with enough bandwidth with eMBB.

Here, the dataset shows that, throughout the course of the data period, low network coverage predominated over strong network coverage, as shown by the caption that highlights the graph’s blue and red color sections (Fig. 7). Nevertheless, the data displayed the period according to the days of the week rather than the data itself.

All user instances had poor signal during the data collection period, as seen in Fig. 8, which is characterized as delay >8 ms. The 5G standard stipulates that it must be less than 8 ms in the interval. It is evident that this result does not meet the condition, and as a result, latency seems to be a problem.

Figs. 9 and 10 shows slice type versus various use case scenerios and packet loss rate scenerios respectively. It was discovered that the dataset exhibits a stronger URLLC experience than the other two, eMBB and mMTC, for all user instances, as the blue color region and the legend explain. The blue URLLC color occupied a larger portion of the plot.

The blue-colored zone suggests that the slice type, which is more dependent on URLLC, has more excellent coverage than bad coverage in this dataset.

Conclusion

In this case, secondary 5G network data was examined and evaluated using an ML classifier in order to reduce the driving test’s inefficiency via the use of knowledge-mining techniques. A model was developed using this data in order to classify various coverage difficulties, such as the “Packet data loss rate” and “Latency” issues associated with 5G KPs.

The primary goal of introducing secondary data-based coverage scenarios classification was to save the operator money and effort. This was accomplished by making it possible to collect data speedily, minimize the requirement for human interaction, and perform network performance evaluation more affordably and easily. Complete coverage data, including traffic produced by user equipment within buildings, may be recorded from any point in space. Additionally, it offers fundamental data assistance for determining the root cause of problems, allowing the optimization team to begin optimization as soon as possible.

References

Akbari I, Onireti O, Imran A, Imran MA, Tafazolli R. How reliable is MDT-based autonomous coverage estimation in the presence of user and BS positioning error? IEEE Wirel Commun Lett. 2016;5(2):196–9.
DOI | Google Scholar

Hapsari WA, Umesh A, Iwamura M, Tomala M, Gyula B, Sebire B. Minimization of drive tests solution in 3GPP. IEEE Commun Mag. 2012;50(6):28–36.
DOI | Google Scholar

Ruble M, Güvenç I ̇. Wireless localization for mmWave networks in urban environments. EURASIP J Adv Signal Process. 2018;2018(1):35. doi: 10.1186/s13634-018-0556-6.
DOI | Google Scholar

Puttonen J, Turkka J, Alanen O, Kurjenniemi J. Coverage optimization for minimization of drive tests in LTE with extended RLF reporting. 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, pp. 1764–8. 2010.
DOI | Google Scholar

Tharwat A. Classification assessment methods. Appl Comput Informatics. 2021 Jan 1;17(1):168–92. doi: 10.1016/j.aci.2018.08.003.
DOI | Google Scholar

Al-Naymat G, Al-Kasassbeh M, Abu-Samhadanh N, Sakr S. Classification of VoIP and non-VoIP traffic using machine learning approaches. J Theor Appl Inf Technol. 2016;92(2):403–14.
Google Scholar

Services M, Coverage L. Assessment of quality of service. 2017.
Google Scholar

Kasegenya A, Anael S. Analysis of quality of service for WCDMA network in Mwanza, Tanzania. J Inf Eng Appl. 2015;5(9):18–27.
Google Scholar

BEREC. Common position on information to consumers on mobile coverage. 2018. Available from: https://www.berec.europa.eu/sites/default/files/files/document_register_store/2018/12/BoR%2818%29237_Common_position_mobile_coverage.pdf.
Google Scholar

Downloads

PDF
HTML
ePub
JATS XML

How to Cite

[1]

Obiefuna, T.C. and Omijeh, B.O. 2024. 5G Network Coverage Hole Prediction and Detection Using Machine Learning. European Journal of Engineering and Technology Research. 9, 1 (Jan. 2024), 1–9. DOI:https://doi.org/10.24018/ejeng.2024.9.1.3102.

Downloads

Download data is not yet available.

Authors

Tobechukwu Chidozie Obiefuna

Google Scholar
EJ-ENG Journal

Bourdillon Odianonsen Omijeh

Google Scholar
EJ-ENG Journal

Issue

Vol. 9 No. 1 (2024)

This work is licensed under a Creative Commons Attribution 4.0 International License.

[1] Akbari I, Onireti O, Imran A, Imran MA, Tafazolli R. How reliable is MDT-based autonomous coverage estimation in the presence of user and BS positioning error? IEEE Wirel Commun Lett. 2016;5(2):196–9.
DOI | Google Scholar

[2] Hapsari WA, Umesh A, Iwamura M, Tomala M, Gyula B, Sebire B. Minimization of drive tests solution in 3GPP. IEEE Commun Mag. 2012;50(6):28–36.
DOI | Google Scholar

[3] Ruble M, Güvenç I ̇. Wireless localization for mmWave networks in urban environments. EURASIP J Adv Signal Process. 2018;2018(1):35. doi: 10.1186/s13634-018-0556-6.
DOI | Google Scholar

[4] Puttonen J, Turkka J, Alanen O, Kurjenniemi J. Coverage optimization for minimization of drive tests in LTE with extended RLF reporting. 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, pp. 1764–8. 2010.
DOI | Google Scholar

[5] Tharwat A. Classification assessment methods. Appl Comput Informatics. 2021 Jan 1;17(1):168–92. doi: 10.1016/j.aci.2018.08.003.
DOI | Google Scholar

[6] Al-Naymat G, Al-Kasassbeh M, Abu-Samhadanh N, Sakr S. Classification of VoIP and non-VoIP traffic using machine learning approaches. J Theor Appl Inf Technol. 2016;92(2):403–14.
Google Scholar

[7] Services M, Coverage L. Assessment of quality of service. 2017.
Google Scholar

[8] Kasegenya A, Anael S. Analysis of quality of service for WCDMA network in Mwanza, Tanzania. J Inf Eng Appl. 2015;5(9):18–27.
Google Scholar

[9] BEREC. Common position on information to consumers on mobile coverage. 2018. Available from: https://www.berec.europa.eu/sites/default/files/files/document_register_store/2018/12/BoR%2818%29237_Common_position_mobile_coverage.pdf.
Google Scholar

5G Network Coverage Hole Prediction and Detection Using Machine Learning

##plugins.themes.bootstrap3.article.sidebar##

##plugins.themes.bootstrap3.article.main##

Downloads

Introduction

Evaluation of ML Algorithms

Confusion Matrix

Accuracy

Precision (Positive Predictive Value)

Recall (Positive Sensitivity Value)

F-Measure

Receiver Operating Characteristics (ROC)

Definition of Threshold Values

KPI Definitions

Coverage Scenario (Target Class) Definition

System Process

Classification and Model Evaluation

First example of coverage (Class-1)

Second coverage scenario (Class-2)

Prediction of Coverage Hole using ML Algorithms

Quantitative Result

Qualitative Result

Conclusion

References