Skip to main content
  1. Posts/

Optimizing Wordlists with Masks

·12 mins
passwords hashcat hashcracking
Jake Wnuk
Author
Jake Wnuk

Last Updated: 8-10-2023.

Note: Password data mentioned in this article was obtained through public resources to improve overall password security posture. Information shared in public breaches helps improve security recommendations.

Introduction #

To crack hashes, practitioners use large wordlists containing likely password candidates. They can then use them with different attack types, such as rules, to try and recover the plaintext values.

Not surprisingly, the best wordlists come from actual passwords as the human element in setting passwords tends to permeate through to create predictable patterns that are often targeted. For example, people are far less likely to add numbers to the beginning than to the end of a password. While people still set passwords starting with digits, it is less statistically common than at the end.

In this post, I will introduce my methodology for creating new password-cracking wordlists and benchmark them against other popular ones.

Extract, Transform, and Load #

I dumped all the cracked hashes on my password archive server to get started. We will work with ~958m (million) passwords for this test. To get the best results possible, I wanted to filter out any bad patterns before getting started:

# dump passwords
$ wc -l plain-passwords.lst
958104442 plain-passwords.lst

# remove low quality items
$ cat plain-passwords.lst | grep -vE 'http:\/\/https:\/\/|\@\.com|\@\.ru|\@\.cn|\@\.org|\@.*\.net|<tr>|<div>|<a href|<p>|<img src|\$HEX\[|fbobh_|\@mail|\@msn|\@aol|\@yahoo|\@gmail|\@hotmail' | grep -v '[^[:print:]]' > prepped-passwords.lst

$ wc -l prepped-passwords.lst
924073743 prepped-passwords.lst

This dropped the total count to ~924m which is quite a lot, but because we are making wordlists, any quality filtering will go a long way.

What’s Behind The Mask? #

One of my favorite strategies for creating wordlists is to use common password masks to filter wordlists for candidates. This way we can avoid seemingly “random” passwords and keep the best quality candidates together.

To do this, I used a tool called Maskcat which can turn plaintext passwords into hashcat masks with additional metadata such as the complexity and length. Additionally, masks that occurred less than three times were removed.

# make password masks
$ cat prepped-passwords.lst | maskcat mask > mask-passwords.lst

$ wc -l mask-passwords.lst
920747753 mask-passwords.lst

# aggregate masks
$ cat mask-passwords.lst | sort -T ./ | uniq -c | sort -T ./ -rn > sorted-mask-passwords.lst

$ head -n 5 sorted-all-masks.txt
83399314 ?l?l?l?l?l?l?l?l:8:1
15711147 ?l?l?l?l?l?l?l?l?l?l:10:1
14774094 ?d?d?d?d?d?d?d?d:8:1
14189820 ?d?d?d?d?d?d?d?d?d?d:10:1
13469846 ?l?l?l?l?l?l?l?l?l:9:1

Doing some math on the results, if we took just the top 5,000 masks, we would cover around 86.8% of the plaintext passwords. This is exciting because we can gain additional speed and performance by leaning out the wordlist to the most likely candidates.

Next up, let us collect all of the dumped passwords and any previously made wordlists to ensure we have complete coverage:

# get all the word sources
$ wc -l wordlist1.lst
363572796 wordlist1.lst

$ wc -l wordlist2.lst
445966458 wordlist2.lst

$ wc -l dumped-passwords.lst
924073743 dumped-passwords.lst

# file sizes
$ ll | grep lst
3.8G -rwxrwxrwx 1 jw jw 3.8G Jul 11 21:01 wordlist1.lst
5.1G -rwxrwxrwx 1 jw jw 5.1G Jul 11 21:11 wordlist2.lst
9.7G -rwxrwxrwx 1 jw jw 9.7G Jul  5 21:46 dumped-passwords.lst

We need to get the top 5,000 password masks from sorted-all-masks.txt, which will cover most of the database’s passwords. We also have metadata from maskcat that we can use to make even more specific wordlists.

After removing the item count from sorted-all-masks.txt, we can use a regex to filter the masks looking for items that are greater than or equal to eight (8) characters and between three (3) and four (4) complexity.

# top 5k masks overall
$ head top-5k-masks.txt
?l?l?l?l?l?l?l?l
?l?l?l?l?l?l?l?l?l?l
?d?d?d?d?d?d?d?d
?d?d?d?d?d?d?d?d?d?d
?l?l?l?l?l?l?l?l?l
?l?l?l?l?l?l?d?d
?l?l?l?l?l?l?l?l?d?d
?l?l?l?l?l?l?l
?l?l?l?l?l?l?d?d?d?d
?l?l?l?l?d?d?d?d

# masks from top 5k that meet above requirements
$ $ cat clean-sorted-3to4-complexity-mask-passwords.lst | grep -vE ':3:3$|:4:3$|:4:4$|:5:3$|:5:4$|:6:3$|:6:4$|:7:3$|:7:4$' > clean-sorted-3to4-complexity-ge8-len-mask-passwords.lst

$ head top-5k-3to4ge8-masks.txt
?u?l?l?l?l?l?d?d
?u?l?l?l?l?l?d?d?d?d
?u?l?l?l?l?d?d?d?d
?u?l?l?l?d?d?d?d
?u?l?l?l?l?l?l?d?d
?u?l?l?l?l?l?l?l?d?d
?s?l?l?l?d?d?d?d?d?d
?u?l?l?l?l?l?d?d?d
?u?l?l?l?l?d?d?d
?u?l?l?l?l?l?l?d

Now we take the wordlists and push them through maskcat to match entries that match the most popular masks. This will help slim down wordlists to the most probable entries.

# making a wordlist
$ cat wordlist1.lst | maskcat match top-5k-masks.txt > top5kmaskswords.lst
$ cat wordlist1.lst | maskcat match top-5k-masks.txt > top5kmaskswords-2.lst
$ cat dumped-passwords.lst | maskcat match top-5k-masks.txt > top5kmaskswords-3.lst

# making a complex wordlist
$ cat wordlist1.lst | maskcat match top-5k-3to4ge8-masks.txt > top5kmaskswords-3to4ge8.lst
$ cat wordlist1.lst | maskcat match top-5k-3to4ge8-masks.txt > top5kmaskswords-3to4ge8-2.lst
$ cat dumped-passwords.lst | maskcat match top-5k-3to4ge8-masks.txt > top5kmaskswords-3to4ge8-3.lst

# sample of a list
$ head top5kmaskswords-3to4ge8-3.lst
$01april
$01august
$01August
$01autumn
$01Autumn
$01december
$01february
$01january
$01march
$01november

After getting all the results, we can combine the files and remove duplicate values. After everything finished, we took around ~16GB (size of everything, deduped) down to ~12GB which is around 75% of the original size.

# final sizes
$ wc -l top-5k-masks-3to4ge8.lst
155719008 top-5k-masks-3to4ge8.lst

$ wc -l top-5k-masks.lst
1187881713 top-5k-masks.lst

$ ll
1.7G -rwxrwxrwx 1 jw jw 1.7G Jul 13 22:06 top-5k-masks-3to4ge8.lst
12G -rwxrwxrwx 1 jw jw  12G Jul 13 23:19 top-5k-masks.lst

Now there is a choice to make. The options are:

  • Leave wordlists as they are, with duplicates between each other
  • De-duplicate wordlists between each other

Both have advantages. By leaving the wordlists as they are, you can be sure that you have great coverage at the risk of running duplicates. If you opt to remove duplicates, you will remove the risk of running duplicates but may miss out on coverage unless you run both.

For this test, we will remove duplicates between the two with rli.bin. We will also take all the unmatched items to their own third wordlist to preserve the data. We will take the top 5k masks and reduce them by the top 5k masks with complexity. This way, the smaller complexity list retains its size, and the larger list is reduced.

# syntax
$ rli.bin -h
usage: rli.bin infile outfile removefiles...

# if the files are too large try splitting
split -n 2 file.lst

# start by removing entries from the top 5k masks passwords
$ rli.bin top-5k-masks.lst reduced-top-5k-masks.lst top-5k-masks-3to4ge8.lst

# then remove matched entries from the unmatched entries
$ rli.bin remainder.lst reduced-remainder.lst reduced-top-5k-masks.lst top-5k-masks-3to4ge8.lst

Lets check out the sizes of the final wordlists:

$ ll
3.5G -rwxrwxrwx 1 jw jw 3.5G Jul 14 14:28 final-remainder.lst
1.7G -rwxrwxrwx 1 jw jw 1.7G Jul 13 22:06 final-top-5k-mask-3to4ge8-passwords.lst
 11G -rwxrwxrwx 1 jw jw  11G Jul 14 13:19 final-top-5k-mask-passwords.lst

$ wc -l final*
  218014370 final-remainder.lst
  155719008 final-top-5k-mask-3to4ge8-passwords.lst
 1057925799 final-top-5k-mask-passwords.lst
 1431659177 total

We removed ~3.5GB of unlikely candidates into their own wordlist and shaved off ~2.4GB in duplicate values, leaving three newly optimized wordlists.

The Results #

To best measure effectiveness, we will split the wordlists into a few different sizes using the same methods above:

Wordlist Size Line Count
top5kmasks.lst 11GB 1,030,877,000
top15masks.lst 2.9GB 307,605,705
top5masks.lst 1.6GB 175,981,061
top5kmasks-c8.lst 1.3GB 123,148,699
top15masks-c8.lst 229MB 23,745,089
top5masks-c8.lst 120MB 12,849,424
top5kmasks-c8l.lst 1.3GB 123,148,699
top15masks-c8l.lst 229MB 23,745,089
top5masks-c8l.lst 120MB 12,849,424
top22masks-nd.lst 3.4GB 366,420,774
leftovers.lst 3.4GB 210,183,312

To break down the differences:

  • no-ending: no special filtering, just the top x masks
  • c8: filtered for complexity and length greater than or equal to eight (8)
  • c8l: same as c8 but everything in lowercase
  • nd: same as no-ending but skipping masks that are 100% digits
  • leftovers: list containing all of the non-matched items

For testing, we will be using the same  document from the Rules Article created by PenguinKeeper and others to get a standard process and benchmark. A table with complete information will be provided at the bottom of the article.

One thing I want to note here is that because we are using a massive password dump, there is a likely chance that the more extensive lists directly contain the test set in them. Take results with a grain of salt.

The following summarizes the results:

Application #

Overall, the results show this strategy is an effective way of creating new wordlists using cracked passwords. I would implement it into any process where you want to develop wordlists targeting specific password policy requirements, create optimized wordlists, and create wordlists to approach password hash lists with no known plains.

Appendix: Full Results Table #

Wordlist Cracked Cracked % Size (MB) Keyspace
top5kmasks.lst 3731088 71.765 10835 1030877000
rockyou2021 (news ref) 2121189 40.799 98378 8459060239
weakpass_2a 2023304 38.917 91742 7884602871
hashesorg2019 (weakpass) (Old) 1989619 38.269 13733 1279729109
hashes.org-2012-2019 (Old) 1985107 38.182 13639 1270725606
DicAssv1 1818202 34.972 216730 16141112024
weakpass_2 1787871 34.388 30542 2649982129
kaonashi 1715915 33.004 9753 866508697
ALM(PasswdOnly)(freq_sorted) 1692704 32.558 7732 640591900
foordeluxes 1669987 32.121 9792 891071188
hibpv6 1597316 30.723 10364 892631604
hibpv5 1583437 30.456 10171 875298829
hibpv4 1577954 30.351 10131 871534311
hibpv3 1565746 30.116 9320 837438728
weakpass_1 1553174 29.874 37008 3130162774
hibpv2 1552743 29.866 9157 821876827
hashes.org-2019 1548285 29.780 5513 522172105
WHYPHY2 (Not public) 1542764 29.674 2544 241084970
top22masks-nd.lst 1545697 29.730 3560 366420774
cyclone_hk 1517933 29.196 2624 257823994
foordeluxestuff 1445497 27.803 5373 482278969
Top2Billion-probable-v2 1431415 27.532 21745 1973218843
breachcompilation 1420955 27.331 9641 1012022949
b0n3z-sorted-wordlist 1416938 27.254 74512 7867573012
b0n3z 1395994 26.851 34640 3113289498
hibpv1 1362149 26.200 3544 320294199
hashes.org-2018 1357062 26.102 6430 475531709
HashesOrg (weakpass) 1275418 24.532 4457 446426190
DCHTPassv1.0 1274182 24.508 24524 3072260790
Md5decrypt-awesome-wordlist 1207631 23.228 21083 1844826117
top15masks.lst 1181174 22.719 3020 307605705
Nummer_DB 1176964 22.638 2416 202783735
only_latin 1175667 22.613 2318 198098375
antipublic 1148475 22.090 1919 189640017
unique_usernames 1144898 22.021 16874 1246520259
Top353Million-probable-v2 1099481 21.148 3788 353330260
CoinWordlist 1087705 20.921 1239 107661196
hashes.org-2015 1073228 20.643 3253 343103178
passw_from_logs 1047760 20.153 3035 222339592
EvilGhost 976476 18.782 100932 10579628569
elackops 949442 18.262 1270 102548616
passcape_comp 932728 17.940 8204 616095654
InsideProFull 904516 17.398 1612 154045162
ASLM(freq_sorted) 899640 17.304 503 41591035
ASLM(freq_sorted) cleaned 897858 17.270 397 39096069
uniq 896504 17.244 2662 243779397
Top109Million-probable-v2 890325 17.125 1142 109438614
passwords_collection 889102 17.101 2639 241584732
HyperionOnHackForumsNetRELEASE 889102 17.101 2639 241584732
crackstation 888963 17.099 15696 1212336035
top5kmasks-c8l.lst 888222 17.084 1376 123148699
wordlist_by_Kakoluk 883330 16.990 5069 445871442
MIX_logins-email-2016 867838 16.692 8432 623974701
hashes.org-2017 845231 16.257 3546 324025149
hashes.org-2016 781397 15.030 1177 102117059
18_in_1 773850 14.884 39099 5343785797
clem9669_wordlist_large 772415 14.857 14082 1113453393
kac 758212 14.584 1810 170422706
top5kmasks-c8.lst 746127 14.351 1376 123148699
Super_mega_dic 732605 14.091 2891 212443106
MegaCracker 714010 13.733 1710 148615152
kaonashi14M 690846 13.288 138 14344391
ignis-10M 652217 12.545 94 10000000
Top29Million-probable-v2 617523 11.878 299 29040646
hk_hlm_founds 604118 11.620 408 38647791
rp4 577447 11.107 509 47688304
Wordlist_82_million 571517 10.993 553 62619507
lolwtfhax 571517 10.993 553 62619507
clem9669_wordlist_medium 554971 10.674 3133 193661571
SmolDick 528213 10.160 626 40163196
clem9669_wordlist_small 527078 10.138 511 45054002
realhuman 508497 9.781 716 63941069
MECA_Passlist 508497 9.781 716 63941069
hashkiller-dict 508287 9.777 253 23685601
eNtr0pY_ALL_sort_uniq 508228 9.775 914 83653572
hashash.in 489853 9.422 221 22777141
top15masks-c8l.lst 463822 8.921 240 23745089
mathway 428797 8.248 167 16498019
14-million-pass - Screetsec 420958 8.097 140 14344384
rockyou 420944 8.097 140 14344359
hashkiller-dict 407130 7.831 224 18439169
the_best 398508 7.665 186 17532884
random_social_usernamesupd 361512 6.953 1908 154463897
M3G_THI_CTH_WORDLIST_CLEANED 358453 6.895 177 15738781
passwords 355047 6.829 194 15851426
top5masks.lst 338345 6.508 1691 175981061
clem9669_wordlist_large 331212 6.371 8375 765991502
clem9669_wordlist_medium 327430 6.298 1175 82065889
dna 318152 6.119 168 18216183
livejournal (new ref) 312965 6.020 216 20266972
clem9669_wordlist_small 306172 5.889 147 13953734
000webhost 295661 5.687 132 10620225
ignis-1M 288167 5.543 9 1000000
leftovers.lst 279328 5.373 3607 210183312
top5masks-c8l.lst 277234 5.332 126 12849424
SkullSecurityComp 275411 5.297 72 6693327
mega_slovar 260894 5.018 336 31630758
Hashkiller.com-Nilix_Collection 235919 4.538 235 22738835
collect_from_logs 212141 4.080 192 12560275
top15masks-c8.lst 201255 3.871 240 23745089
dazzlepod 151906 2.922 20 2151235
xsplit 147851 2.844 9 939014
xato-net-10-million-passwords-1000000 138406 2.662 9 1000000
10_million_password_list_top_1000000 137453 2.644 9 1000000
xato-net-10-million-usernames 127314 2.449 85 8295455
Hashkiller.com-Wordlists_compilation 113430 2.182 107 10241373
top5masks-c8.lst 109147 2.099 126 12849424
opencrack_plains_2009 94288 1.814 31 3046096
vb_passwords 88290 1.698 7 750449
Argon_Wordlist_v2 87312 1.679 2011 227784242
under1000k 83506 1.606 11 902748
Hashkiller.com-Silver_small_Wordlist 82705 1.591 30 3256289
Top304Thousand-probable-v2 80643 1.551 3 303872
vkontakte 78740 1.515 7 697404
Hashkiller.com-Common_passes 75835 1.459 15 1507155
ignis-100K 66511 1.279 1 100000
HugeWordList 65295 1.256 36 3468996
openwall-all 58908 1.133 57 5014958
hashkiller.com_irclogs.ubuntu.com_wordlist 57701 1.110 20 2225967
lulzsec 55130 1.060 3 366087
bfield 53147 1.022 5 541016
passwords.txt 52543 1.011 6 606659
xato-net-10-million-passwords-100000 51983 1.000 1 100000
InsidePro-(Mini).dic 51818 0.997 2 242298
openwall.net-all 46369 0.892 41 3721224
Ashley_Madison 45864 0.882 4 376120
yahoo 42193 0.812 5 453492
rockyou-75 35706 0.687 0 59187
darkc0de 34061 0.655 15 1471056
Hashkiller.com-Dictionary_SifreList 33480 0.644 6 583783
kippo 33178 0.638 1 115226
rockyou-70 27834 0.535 0 42661
freerainbowtables 26269 0.505 3 276259
Ragnarok_Online_2003 24351 0.468 2 228103
rockyou-65 21285 0.409 0 30290
pastebin 20406 0.392 2 184172
bad_5 19659 0.378 1 96395
hackforums 19618 0.377 1 113890
delicious_takoyaki 18118 0.348 1 72478
supercommon 16880 0.325 1 66778
under100k 16120 0.310 1 79744
cain 15758 0.303 3 306706
CainandAbel 15758 0.303 3 306558
rockyou-60 15746 0.303 0 21041
mil-dic.txt 15498 0.298 1 84245
dico 15498 0.298 1 84245
super_pwl 15041 0.289 0 58707
richelieu-french-top20000 14742 0.284 0 20000
milw0rm 14699 0.283 1 77950
Most-Popular-Letter-Passes 12899 0.248 0 47603
InsidePro-(Micro).dic 12539 0.241 0 21623
Gamers 11612 0.223 0 52061
rootkit 10752 0.207 1 70095
Top12Thousand-probable-v2 9990 0.192 0 12645
probable-v2-top12000 9990 0.192 0 12645
blackhat 9415 0.181 0 40443
ignis-10K 8820 0.170 0 10000
MAJYPWL 8699 0.167 0 30304
darkweb2017-top10000 8495 0.163 0 9999
rockyou-50 7953 0.153 0 9438
xato-net-10-million-passwords-10000 7928 0.152 0 10000
10_million_password_list_top_10000 7925 0.152 0 10000
randompastebin1 7726 0.149 1 76022
whitefox 7038 0.135 0 44166
30k 6788 0.131 0 43383
29e5152a 6456 0.124 0 49081
nmap 4551 0.088 0 5084
mayhem 4236 0.081 1 76598
richelieu-french-top5000 4232 0.081 0 5000
0vhfirstpass 4043 0.078 1 61576
BruteX 2917 0.056 0 3546
john 2449 0.047 0 3107
probable-v2-top1575 1563 0.030 0 1575
Top1575-probable-v2 1563 0.030 0 1575
xato-net-10-million-passwords-1000 974 0.019 0 1000
ignis-1K 963 0.019 0 1000
darkweb2017-top1000 936 0.018 0 999
honeynet 928 0.018 3 226928
500-worst-passwords.txt 474 0.009 0 500
us-cities 390 0.008 0 20580
twitter-banned.txt 353 0.007 0 370
twitter-banned 353 0.007 0 370
cewl_dvwa_password 302 0.006 0 1831
probable-v2-top207 207 0.004 0 207
Top207-probable-v2 207 0.004 0 207
malenames-usa-top1000 102 0.002 0 1000
xato-net-10-million-passwords-100 100 0.002 0 100
darkweb2017-top100 91 0.002 0 99
femalenames-usa-top1000 87 0.002 0 1000
familynames-usa-top1000 50 0.001 0 1000
online_brute 50 0.001 0 51
clarkson-university-82 48 0.001 0 82
common_pass 34 0.001 0 52
short 19 0.000 0 27
policy 16 0.000 0 17
telnet_cisco_default_pass 16 0.000 0 23
best15 15 0.000 0 15
darkweb2017-top10 10 0.000 0 10
xato-net-10-million-passwords-10 10 0.000 0 10
splashdata_2015 6 0.000 0 25