Wordlist Generation

INTRODUCTION

Sometimes it can be a little daunting to create custom wordlists. Generally, you either need some specialized tools, or to be well-versed in bash scripting. Another issue you’ll encounter is that wordlists tend to be large - very large. As such, when you’re working with them, it’s easy to…

accidentally exceed your machine’s memory limits
or use up too much disk space
waste a lot of CPU time.

I’ll try to help you overcome all these issues. In this guide, we’ll use a mixture of specialized tools and bash scripting.

Since wordlist generation should be highly tailored to your specific target, this guide will revolve around a fictional scenario:

We’re attacking a canadian enterprise that roasts and sells coffee.
They have website comprising of a simple landing page, that links to an About Us section.
We also know that their password policy suggests they change their passwords either monthly or seasonally. We know that they have no dedicated IT staff, and the employees tend to have an outdated concept of password security.

PLANNING

I’ll apply the following wordlist strategy:

PREPARE PRELIMINARY INFO

(1) Make a target-specific set of words

Scrape the website

Since our fictional target has a website, we’ll attempt to make a list of “meaningful” words by using CeWL to scrape the website.

Our target has a website for their business, so we’ll use cewl to generate an initial wordlist specific to this business:

cewl --depth 1 --lowercase --min_word_length 5 --write prairieroastery.txt https://prairieroastery.coffee

☝️ The “important” words should probably be within 1 link of the landing page, and we’ll make everything lowercase.

▶️ This generated a list of 356 words. It’s a good start, but a lot of the words in there aren’t very meaningful - so they probably wont be parts of passwords.

Make a list of most common words

We want a list of words to filter our previous wordlist by. To do this, we’ll get the N most common english words of length M or longer. I’ll start with one of the lists from this Github repo: https://github.com/first20hours/google-10000-english

Let’s only use the top 33% most frequent words of length 5 or greater:

grep -iE '[a-z]{5,}' google-10000-english-usa.txt > common_words_5ormore_letters.lst
wc -l common_words_5ormore_letters.lst  # There are 7781 words... 33% is approx 2500
head -n 2500 common_words_5ormore_letters.lst | tee common_words_thinned.lst

Remove words that are present in the reference list

Since our list of “common” words is pretty short, we can feed it into grep as a regular expression file and use it as a filter

grep -v -f common_words_thinned.lst prairieroastery.txt | tee target_specific.lst

▶️ This reduces the list to 112 words. Much better!

(2) Concatenate to the custom list

We’ll now find a list of really bad passwords, sourced from something reputable like Seclists. Then, we’ll concatenate this list onto our customized list of words:

cp /usr/share/seclists/Passwords/xato-net-10-million-passwords-10.txt really_bad_passwords.lst
# add in anything that seems missing from the "really bad" list:
echo "admin" >> really_bad_passwords.lst

Concatenating the wordlists

You could do this simply using cat and sort -u:

# don't do it this way:
cat target_specific.lst really_bad_passwords.lst | sort -u | tee base_wordlist.lst

… but this has really bad time/space complexity, so for large wordlists it’s a bad idea!

It’s better to apply some awk magic:

awk '!seen[$0]++' target_specific.lst really_bad_passwords.lst | tee base_wordlist.lst

Nerd moment: comparing ways to find unique entries 🤓
The awk method effectively turns the data structure into a hashset, so the complexity reduces to O(N)
Method Time
Complexity Space
Complexity
cat and sort -u O(N log N) O(N)
awk and seen O(N) O(U)
where U is the number of unique entries in the merged wordlists (therefore U <= N)
We can see that the awk and seen method is the clear winner for getting unique entries in a combination of files!

Method	Time Complexity	Space Complexity
`cat` and `sort -u`	`O(N log N)`	`O(N)`
`awk` and `seen`	`O(N)`	`O(U)`

(3) Applying a hashcat rule

Now that we’ve made base_wordlist.lst, let’s l33tify it by applying leetspeak.rule:

hashcat --stdout base_wordlist.lst -r /usr/share/hashcat/rules/leetspeak.rule \
| awk '!seen[$0]++' \
| tee leet_wordlist.lst

The result is a wordlist with entries like this:

leetspeak rule

Produce the intermediate wordlists

Next I’ll make the intermediate wordlists:

vim months_and_seasons.lst  # Manually make a list of all month names & their abbreviations, and seasons
seq 0 13 > numbers.lst; echo "69" >> numbers.lst; echo "99" >> numbers.lst
seq 17 20 > years.lst; seq 2017 2020 >> years.lst

The plan included appending the “month and year”, so let’s make a wordlist for that, too:

cracken generate -w months_and_seasons.lst -w years.lst "?w1?w2" -o months_and_years.lst
# This makes entries such as `january2019` and `winter20`.

Now let’s make the next level of intermediate wordlists, by appending each intermediate wordlist with leet_wordlist.lst.

I’ll add in an optional “separator” character: either a period, underscore, or hyphen.

base=leet_wordlist.lst;
touch merged_wordlist.lst;
for intermediate in months_and_seasons.lst years.lst months_and_years.lst numbers.lst; do
	cracken generate -w $base -w $intermediate '?w1?w2' -o no_separator.lst;
	cracken generate -w $base -w $intermediate -c '._-' '?w1?1?w2' -o with_separator.lst;
	cat no_separator.lst with_separator.lst >> merged_wordlist.lst;
	rm no_separator.lst with_separator.lst;
done

Apply the other hashcat rules

Since we want to keep this wordlist as small as possible, let’s write some custom rules, one for capitalization, and the other to optionally append an excalation point !.

Let’s keep the capitalization rules simple. Here’s what I wrote for easy_capitalize.rule:

# do nothing
:
# capitalize the first character
T0
# capitalize the first letter after a hyphen, underscore, or period
e-
e_
e.
# capitalize the first letter after a hyphen, underscore, or period, and the first character
e-T0
e.T0
e_T0

And here’s append_exclamation.rule:

:
$!

For more info on how to write these rules, check out the guide on hashcat’s wiki.

hashcat --stdout merged_wordlist.lst \
-r easy_capitalize.rule \
-r append_exclamation.rule \
| awk '!seen[$0]++' \
| tee final_wordlist.lst