Summer project as a ChronicCoders Academy Fellow, focusing on leveraging ml with League of Legends
You can use the editor on GitHub to maintain and preview the content for your website in Markdown files.
Whenever you commit to this repository, GitHub Pages will run Jekyll to rebuild the pages in your site, from the content in your Markdown files.
In order to pull off this ML project, I need images of each champion in a somewhat similar angle/position. However, I cannot use official splash art or icons as those are all different sizes, angles, resolution, and styles (artists).
To gather data that I believe would be good for this project, I’ve decided to scrape 3D Models of the charaters from the website teemo.gg. Here’s what the model viewer looks like:
The model viewer allows you to select the desired champion and skin, rotate and zoom in, and view animations. For the purpose of this project, I will be using the forward-facing pose, non-animated, as seen in the above image. Luckily, it is also the default position the viewer displays the model in.
The scraper should be able to screenshot each skin within the viewer and remove the background with the website logo. The end result should look something like this:
The background here was removed using the remove background function on Microsoft Word. The AI there seems to able to find the outline of the champion easily, despite similar colors of purple in this model and the background. However, it does struggle with removing the background between the champion’s body parts and weapons. Hopefully I can improve on this using Python.
The scraper screenshot only the model viewer area (to make removing the background easier). The process looks like this:
There’s a lot of scary things I will need to figure out here.
I need to figure out how to iterate through all the champion models on this website. Looking at basic webscraping tutorials, they all suggest trying to find a pattern in the URL that can be manipulated. So, I started fiddling around, trying to find a pattern in the url that indicates what skin is being pulled up.
https://teemo.gg/model-viewer?skinid=jinx-0&model-type=champions
https://teemo.gg/model-viewer?skinid=jinx-1&model-type=champions
https://teemo.gg/model-viewer?skinid=jhin-0&model-type=champions
See a pattern?
skinid=jinx-0
skinid=jinx-1
skinid=jhin-0
It seems the “skinid” is modified by the champion name and the # of the skin, with 0 used to indicate the default skin. While this pattern is easy to replicate, some difficulties come up. How does the website deal with champion names that contain punctuation or white space or multiple words (examples: Kai’Sa, Nunu & Willump, Miss Fortune)? And how can I know the # of skins that a champion has?
Here’s where I made a mistake. I got caught up too fast in thinking I could do something with the inputs on the model-viewer to scrape all this information.
I thought I could first go to each champions default skin, guaranteed to be ‘champion_name-0’, and then scrape the skin input for the number of skins and the name of the skins. This seemed like a solid plan, as I could easily land on the default skin, and look at the button choices for skins. I jumped in, scraping the names of the champions from the champion input. However, I did have to manually figure out how champions with problematic names were dealt with, and change that in my list of champions.
Instead, if I had just taken the time to examine the source code more clearly, I would have seen that the website developers coded in a dictionary with all skins and their corresponding skin ids! And I could have not taken that wasted detour method above!
Now, how to access this beautiful source code?
By using the popular HTML scraper, ‘BeautifulSoup’, I can get this information in a parseable, cleaned form! First I scrape the ENTIRE source code as a BeautifulSoup Object.
import bs4 #(known as "BeautifulSoup
# Create beautiful soup object to examine the source code
res = requests.get('https://teemo.gg/model-viewer')
res.raise_for_status()
ModelSoup = bs4.BeautifulSoup(res.text)
Then, the tag used to identify this champion dictionary is
# Get List of Skins/Chromas and the corresponding URL
skins_html = ModelSoup.findAll('script')
skins_champs = skins_html[9].getText() # contains the dictionary of champions and skins
skins_champs = skins_champs.split(';')
skins_champs.pop(0) # remove the first index which does not contain champ/skin info
This is what the list looks like currently. It is a list of “dictionaries”, with each index being one champ.
So for each index, I will be able to pull out a champion and its skins. Within that index, is a dictionary, with entries seperated in key, value pairs of URL/skinID, skin name. Because this is text and not an actual dictionary recognizeable by python, I will need to parse and create my own dictionary of URL/skinID and skin names. There were 2 ways I thought about doing this:
Split the text by comma. The even numbers will contain the URLs/skinIDs and the odd numbers will contain the skin names. Then use a regex matcher to search for a start and end match, and grab the string in between using the package mentioned here: https://stackoverflow.com/questions/3368969/find-string-between-two-substrings
Example Splits:
index 0 = {‘id’: ‘aatrox-0’
index 1 = ‘name’: “Default Aatrox”}
Example Regex Matcher:
url_result = re.search(“‘id’: ‘(.*)’”, str)
name_result = re.search(“‘name’: "(.*)"}”, str)
Split the text by ‘},{‘. Which is how the dictionary splits its key value pairs.
Example Splits:
index 0 = ‘id’: ‘aatrox-0’, ‘name’: “Default Aatrox”
index 1 = ‘id’: ‘aatrox-1’, ‘name’: “Justicar Aatrox “
Example Regex Matcher:
url_result = re.search(“‘id’: ‘(.*)’,”, str)
name_result = re.search(“‘name’: "(.*)"”, str)
I originally went the 1st way, because I didn’t think of the 2nd way until I ran into a bug. My code would break at Gragas, unable to find a regex match.
Do you see where splitting by comma would make my code break, but splitting by dictionary would work better?
“Gragas, Esq.”
That’s right. There was a skin with a comma IN the name, so my code would split that name up. So I could not split the string by comma, and instead resorted to the second method to avoid edge cases like this.
Another thing I considered after finding this edge case was removing punctuation from the skin names and white spaces. I considered this because “:” and “/” are not valid characters for filenames, and if I wanted to name each img after the specific skin, I need to make this adjustment. I also replaced whitespaces with underscores, just as a naming convention.
Let’s take a peek at the final code and output:
# Create beautiful soup object to examine the source code
res = requests.get('https://teemo.gg/model-viewer')
res.raise_for_status()
ModelSoup = bs4.BeautifulSoup(res.text)
def scrape_model_url(str):
url_result = re.search("'id': '(.*)',", str)
if url_result:
url = url_result.group(1)
return url
else:
return "No Match"
def scrape_model_name(str):
name_result = re.search("'name': \"(.*)\"", str)
if name_result:
name = name_result.group(1)
name = name.translate(str.maketrans('', '', string.punctuation))
name = name.replace(" ", "_")
return name
else:
return "No Match"
# Get List of Skins/Chromas and the corresponding URL
skins_html = ModelSoup.findAll('script')
skins_champs = skins_html[9].getText() # contains the dictionary of champions and skins
skins_champs = skins_champs.split(';')
skins_champs.pop(0) # remove the first index which does not contain champ/skin info
# Create Dictionary of URL and Skin Name
skin_dict = {}
for skins_champ in skins_champs:
skins = skins_champ.split("},{") # would split by ",", but some names have a comma
for skin in skins:
url = scrape_model_url(skin)
name = scrape_model_name(skin)
if url == "No Match" or name == "No Match":
break
else:
skin_dict[url] = name
prev_name = name
prev_url = url
if url == "No Match" or name == "No Match":
print("Done Scraping Champion Skin Information, Last Skin is " + prev_name + " (" + prev_url + ")" )
break
Now when I visit URLs, I’ll know exactly which skin I’m screenshotting!