Introduction

Every sport discipline has a method to compare skills between athletes. Runners measure time on a lap or a marathon, cyclists use network apps such as Strava. When it comes to climbers they introduce some unique grading systems.

 

0-6 grading system

Old scales were used, when people tend to climb in mountains only and it was used for exploration, not as athletics sport

 

Modern grading system

 

Despite the chart above, it is still unclear if the route rated 7b is hard as other 7b. There are not any objective values, as time or distance to present route’s grade, rather subjective climber’s experience in comparison with other routes. Experiences depend on many factors such as height or weight. That’s why soft grade, meaning a route that should have been graded lower than actually is, is a hot issue among cimber’s society.

Therefore we come up with the idea to explore dataset, thanks to Kaggle, to seek for the routes that are rated somehow like 7b but should be treated elsewhere like 7c+ or 7a.

 

The Dataset

Kaggle introduces data from web service which contains 4.1 million posts from climbers ascent. They provide lots of data but for analysis we need only a few of them.

 

 

 

 

name

crag_id

grade_id

user_id

fra_routes

Maßarbeit

16600

55

1476

7b+

Zugabe

16600

55

1476

7b+

Fingerbeißer

16600

53

1476

7b

Ich Habs Wollen Wis

16600

55

1476

7b+

Halucinacna

0

55

1476

7b+

Nove koreni

27784

55

1476

7b+

Spanelske lzi

27784

53

1476

7b

Deprese

29360

55

1476

7b+

Gottes Vergessene Kinder

21971

49

1476

7a

data needed

name - name which route’s author have given after first ascent

crag_id - id of the place where there route is placed

user_id - id of the person who have climbed a route

A great problem of the dataset is the lack of route_id column. It leads us to the point where the pair (name, crag_id) is the best approximation of the candidate key of the route. But there could be some crags that contain two different routes with the same name. What’s more people tend to name the same routes in different convention. Some of them gives a route’s variation name or simply make a typo and the system will assign it as different route. We decided to use a pair (name, crag_id). It is appropriate for majority of examples given.

 

 

Seeking for a soft grade

We propose two different ways to find a route that has a high grade however, when in comes to ascending it is easy to be accomplished.

 

User’s local maximum

We explore user’s climbing progression function over the time and peaks in the chart. We focus on local maximum because it could be the route, for instance 8c, that a climber did while they were able to do significantly easier routes.

 

Chart contains of logs from one of the strongest climber

Routes mean and dominant grade comparison

 

Second way to look for soft grades is to analyse all grades people have proposed for a route. Climbers tend to propose the same difficulty as the first person to ascent has given, or just the same as in a guidebook. But when a route is much harder or easier than it is graded, they break up and propose more accurate rate. Dominant of route grade when compared to mean grade would show presented scenario.

 

Preprocessing

 

Dataset need to be cleaned up in order to expect useful results in analysis. We take a subset where name occur more than n times. It is crucial because central_tendency algorithm calculates mean and dominant which is useless on too small dataset. Then we drop all users that have posted less then m ascents for local_maximum function. It is essential to properly analyse climbers peaks in progression. Experimentally we’ve chosen minimum quantity of route’s logs: 500 and minimum routes logged by a user: 30. Lastly many unexisting routes are being deleted with name such as “don’t know the name” or “unnamed”.

Analysis

After filtering data we were given 10415 posts of 832 routes in a data frame. While evaluating local maximum finder we use a find_peaks function from scipy. It has attributes: height, threshold, distance, prominence, width, wlen, rel_height, plateau_size.They change the sensitivity of the algorithm in finding peaks. We don’t want the noise to be described as a peak, but on the other hand no significant peak should be omitted.

The most significant arguments are threshold, which is vertical distance to its neighbouring samples and prominence - the best described by its similarity to a peak prominence in mountains.

 

 

Central tendency has one argument - difference - between mean value and dominant. Climbers tend not to change original grade even if grade is slightly easier than it should be. That’s why difference argument is set very low.

 

Local maximum arguments value is being set by using a grid search to most accurately fit confusion matrix of predicted (local maximum) and actual (central tendency) value.

 

In our model results from central tendency algorithm are labeled as Actual Values and from local maximum as Predicted.

 

After experiments we have chosen a local_maximum case with greatest accuracy and precision. Why are these parameters most significant. Obviously accuracy describes how often are true values in analysis. We focused on precision because we want to minimize routes, which were predicted as soft_grade but in fact they are just casual ones. The arguments are being set to threshold = 6 and prominence = 6. Threshold = 9 is being omitted due to the fact system find too few true positives with threshold = 9.

 

 

 

Performance

 

 

With accuracy and precission set to 6 we are given confusion matrix:

 

 

 

Real Positive

Real Negative

Predicted Positive

63

136

Predicted Negative

77

556

accuracy: 74%

precision: 32%

recall: 45%

f1: 37%

 

Confusion matrix demonstrates us how well algorithms have assigned the easiness of the route. Due to great number of routes being properly described there is big ratio of true negative.

Conclusion,

While having certainly high accuracy, why is the precision indicator so low? First of all, what hasn’t been said, climbers tend to ascend some routes as their projects, and they do it really often. It means that they choose one really hard route and try it numerous times. When they finally accomplish the route it is a peak in local_maximum finder and cannot be distinguished from soft grade ones. It is especially hard to conclude, because fact about projects require domain knowledge of climbers’ habits and is nearly impossible to deduce from data.