Smashing Hashes with Token Swapping Attacks

Released:

Modified:

Tags: Hashcracking Passwords


There are many ways to crack passwords, and offensive tooling for password security has provided a lot of new and exciting attack methods to reveal plaintext. In my previous post, I stated the following:

Not surprisingly, the best wordlists come from actual passwords as the human element in setting passwords tends to permeate through to create predictable patterns that are often targeted.

Some of my favorite attack techniques utilize existing found material back into the cracking process. There are a few ways to do this, but today I want to introduce my personal favorite, Token Swapping.

Token Swapping Concept

To best understand the attack, first, I want to introduce two principles that password crackers abuse to recover plaintext.

First, human passwords often share patterns. When people develop a new password, they tend to follow patterns to make it easier to remember. However, they don’t realize that other people also follow similar patterns.

For example, if I said I needed a password that included the following:

  • Upper Case Character
  • Lower Case Character
  • Digits
  • Special Character

You might have come up with something like Word + Digits + Special because it is the order I provided them in or because most English speakers tend to read from left to right, and it reads more normally.

Second, secret material is often shared or reused, especially among a shared user pool. For example, people use similar words when setting passwords, such as years, first names, pet names, religious items, places, phrases, and other material that might be included. Other things, like the product or service name will likely appear when a series of hashes come from the same source.

These principles show that seemingly unrelated passwords may have more in common than initially seen.

If the best word lists come from actual passwords and passwords from the same source share many patterns, then the most logical step is to incorporate other passwords from the same or similar source into new word lists.

The Token Swapping Attack

The logic we want to replicate is:

  1. Split plaintext into segments
  2. Convert each token into a hashcat mask
  3. Find other plaintext that contain the same mask
  4. Swap the token into other plaintexts using the mask

This provides two benefits:

  • Plaintext material is preserved
  • Plaintext masks and structure is preserved

This means we can reuse the existing founds into the process exactly how they were seen and place them directly into known patterns without guessing the password structure.

Flow of the token swap attack
An example flow of the token swap attack logic

This process also has a high degree of entropy, as the number of new candidates you can generate grows greatly depending on the token size and the number of available plaintexts to swap.

Testing On Data

To test this, we will use a few publicly available breaches where many quick-win items have been picked off. No names or hash types will be shared, but breaches will be referred to by their ID number. We will only be working with the left hashes, so all the low-hanging fruit should be picked off.

Test 1

To kick it off, returning to 1865, I found a pattern in the remaining hashes, but I could not enumerate them with rules, hybrid, and mask attacks. Returning to this list with this new technique makes sense, as many previous items were generated into seemingly random patterns with the same character set.

rrRRRR3R
UUU4UUUk
fffKK1KK
kkkkk7kf
TT3TTTTE
eeee5eSS
mNNNN2NN
yyyyyRR3
SS3SSJJJ
hNNNN2NN

If we look at the mask structure, we can see some patterns but nothing that would be super easy to enumerate:

$ cat freq-sorted-maskcat-masks.tmp | head
    107 ?u?u?d?u?u?u?u?u:8:2:192
    103 ?u?u?u?d?u?u?u?u:8:2:192
    103 ?l?l?l?l?l?d?l?l:8:2:192
    103 ?d?u?u?u?u?u?u?u:8:2:192
    100 ?u?d?u?u?u?u?u?u:8:2:192
     98 ?l?l?l?l?l?l?d?l:8:2:192
     73 ?l?l?l?l?l?l?l?d:8:2:192
     72 ?l?l?l?l?d?l?l?l:8:2:192
     42 ?l?l?l?l?l?l?d?u:8:3:192
     32 ?u?u?d?l?l?l?l?l:8:3:192

When I first found the pattern, I could enumerate 618 new founds, and when I returned, 994 were left. To get started, we can repeat the token swapping attack to make a new word list, then run it with 5,000 rules to see how it works.

Session..........: tokenswap
Status...........: Exhausted
Hash.Mode........: *
Hash.Target......: 1865.left
Time.Started.....: Wed Dec 21 04:05:03 2022 (0 secs)
Time.Estimated...: Wed Dec 21 04:05:03 2022 (0 secs)
Kernel.Feature...: Optimized Kernel
Guess.Base.......: File (/root/.local/share/hashcat/sessions/2711.induct/hashcat.loopback.1671595495_1860)
Guess.Mod........: Rules (/files/5k.rule)
Guess.Queue......: 1/1 (100.00%)
Speed.#1.........:   345.6 MH/s (0.61ms) @ Accel:256 Loops:256 Thr:128 Vec:1
Recovered........: 991/994 (99.70%) Digests (total), 0/994 (0.00%) Digests (new), 991/994 (99.70%) Salts
Progress.........: 4925270000/4925270000 (100.00%)
Rejected.........: 0/4925270000 (0.00%)
Restore.Point....: 991/991 (100.00%)
Restore.Sub.#1...: Salt:993 Amplifier:4864-5000 Iteration:0-256
Candidate.Engine.: Device Generator
Candidates.#1....: Wzzzzzz06 -> rrrGGG1G7
Hardware.Mon.#1..: Temp: 54c Fan:  0% Util: 64% Core:2055MHz Mem:9251MHz Bus:16

…and 99.70% of the remaining hashes were recovered with this method.

Test 2

Okay, so what if that was a fluke? Let us test again on 465 but this time, we will directly compare what looping over the existing founds with rules looks like compared to token swapping. This way, we can get an idea of its effectiveness.

To start, we can take the founds, repeat the token swap, and then run an attack with the same 5,000 rules from before. For this one, we can include --loopback and note that the final runtime was ~2 hours.

Session..........: ruleloopback
Status...........: Exhausted
Hash.Mode........: *
Hash.Target......: 465.left
Time.Started.....: Tue Dec 20 21:58:17 2022 (22 mins, 29 secs)
Time.Estimated...: Tue Dec 20 22:20:46 2022 (0 secs)
Kernel.Feature...: Optimized Kernel
Guess.Base.......: File (/root/.local/share/hashcat/sessions/10.induct/hashcat.loopback.1671566861_7605)                                                    
Guess.Mod........: Rules (/files/5k.rule)
Guess.Queue......: 1/1 (100.00%)
Speed.#1.........: 61521.6 kH/s (0.58ms) @ Accel:128 Loops:256 Thr:512 Vec:1
Recovered........: 161/111762 (0.14%) Digests (total), 161/111762 (0.14%) Digests (new), 142/107691 (0.13%) Salts                                           
Remaining........: 111601 (99.86%) Digests, 107549 (99.87%) Salts
Recovered/Time...: CUR:1,N/A,N/A AVG:0.31,N/A,N/A (Min,Hour,Day)
Progress.........: 82922070000/82922070000 (100.00%)
Rejected.........: 0/82922070000 (0.00%)
Restore.Point....: 154/154 (100.00%)
Restore.Sub.#1...: Salt:107690 Amplifier:4864-5000 Iteration:0-256
Candidate.Engine.: Device Generator
Candidates.#1....: * -> *
Hardware.Mon.#1..: Temp: 62c Fan: 49% Util: 62% Core:1965MHz Mem:9251MHz Bus:16

Then we can compare it to a token swap attack with token sizes 4-7 with no rules and ~6.5 hours of runtime.

Session..........: tokenswap
Status...........: Exhausted
Hash.Mode........: *
Hash.Target......: 465.left
Time.Started.....: Tue Dec 20 22:32:45 2022 (6 hours, 26 mins)
Time.Estimated...: Wed Dec 21 04:59:16 2022 (0 secs)
Kernel.Feature...: Optimized Kernel
Guess.Base.......: File (mutate.lst)
Guess.Queue......: 1/1 (100.00%)
Speed.#1.........:  4400.3 MH/s (0.06ms) @ Accel:512 Loops:1 Thr:128 Vec:1
Recovered........: 2386/111762 (2.13%) Digests (total), 2225/111762 (1.99%) Digests (new), 2181/107691 (2.03%) Salts
Remaining........: 109376 (97.87%) Digests, 105510 (97.97%) Salts
Recovered/Time...: CUR:0,289,N/A AVG:5.76,345.40,N/A (Min,Hour,Day)
Progress.........: 196338334616547/196338334616547 (100.00%)
Rejected.........: 131969935950/196338334616547 (0.07%)
Restore.Point....: 1823163817/1823163817 (100.00%)
Restore.Sub.#1...: Salt:107690 Amplifier:0-1 Iteration:0-1
Candidate.Engine.: Device Generator
Candidates.#1....: zzz298756zzz -> саша9999
Hardware.Mon.#1..: Temp: 70c Fan: 65% Util: 64% Core:1950MHz Mem:9251MHz Bus:16

While we can’t quite compare these one to one, the token swap attack had 2,386 total cracks, while the loopback had just 161 (~13x more). Additionally, if we compare the time investment, loopback yielded around 80 founds/hr, and the token swap yielded 367 founds/hr.

If we also look at the submission, we see several are also unique founds:

| Hash List ID | Found Hashes | New Plains |
| --- | --- | --- |
| 465 | +2,386 | +1,374 |

Test 3

For the last test, we will test this attack’s sustainability. Theoretically, this attack is exceptionally non-deterministic because of how the swaps are generated. Let’s put it to the test by repeating it again and again on the same target, 2221.

Instead of using the 5K rule again, we will make a tiny rule to see if the candidates are doing the heavy lifting or if the rules are.

$ cat rule.tmp
:
]

The target hashes are very slow to calculate, so this attack will be repeated three (3) times with mutating lists of a few different sizes to see how it impacts the results.

First, we can make a word list using all the existing founds. The final size of the first word list was 35 GB.

Recovered........: 54016/8900340 (0.61%) Digests (total), 0/8900340 (0.00%) Digests (new), 17/1048387 (0.00%) Salts

We can make a word list for the second round only using the new founds we got from the previous round. The final size for the second found word list was 5.3 GB.

Recovered........: 65472/8900340 (0.74%) Digests (total), 0/8900340 (0.00%) Digests (new), 19/1048387 (0.00%) Salts

For the final round, we can make another word list only using all the new founds, and we will make it larger but limit the token size to only six (6), five (5), and four (4). The final size of the last word list was 14 GB, but it ended up getting stopped about 13.4% of the way through.

Recovered........: 68038/8900340 (0.76%) Digests (total), 0/8900340 (0.00%) Digests (new), 20/1048387 (0.00%) Salts

Finally, let us look into the masks of the new founds to see the diversity of patterns we cover with this method.

   3325 ?d?d?d?d?d?d?d?d?d?d:10:1:100
   3039 ?u?l?l?l?l?s?d?d?d?d:10:4:203
   2712 ?u?l?l?l?s?d?d?d?d:9:4:177
   2242 ?l?l?l?l?d?d?d?d:8:2:144
   2161 ?l?l?l?l?l?d?d?d?d:9:2:170
   2067 ?u?l?l?l?l?l?s?d?d?d?d:11:4:229
   1845 ?l?l?l?l?l?l?d?d?d?d:10:2:196
   1771 ?l?l?l?l?l?l?l?l:8:1:208
   1738 ?l?l?l?l?l?l?l?l?l:9:1:234
   1460 ?u?l?l?l?l?s?d?d?d:9:4:193
   1272 ?l?l?l?l?l?l?l?l?l?l:10:1:260
   1239 ?u?l?l?l?l?l?s?d?d?d:10:4:219
   1104 ?l?l?l?l?s?d?d?d?d:9:3:177
   1091 ?l?l?l?l?l?l?d?d:8:2:176
   1046 ?l?l?l?l?l?l?d?d?d:9:2:186
   1016 ?l?l?l?l?l?s?d?d?d?d:10:3:203
    919 ?l?l?l?l?l?l?l?d?d:9:2:202
    917 ?l?l?l?l?l?l?l?l?l?l?l:11:1:286
    879 ?l?l?l?l?l?d?d?d:8:2:160
    821 ?u?l?l?l?l?l?s?d?d:9:4:209
| Hash List ID | Found Hashes | New Plains |
| --- | --- | --- |
| 2221 | +65,796 | +60,664 |

Improving The Token Swapping Technique

The above examples are from early 2023. As I am writing an update, it has been around a year and a half since the first post, and I wanted to provide some updates to the technique and overall usage. The examples follow the same flow diagrammed above with a tool called maskcat that has been incorporated into the tool PTT. However, these examples contain one issue: if the user-supplied input is not of high quality, then the swaps being performed could quickly migrate away from the intended source.

Because the attack is designed to create a large amount of material that replicates the source, this is vital to its core function. If the user failed to supply quality input, the output would be suboptimal. This is why using matching masks is generally recommended to isolate patterns before swapping, so a user could know how the swaps would likely take place.

Then in maskcat the user Shooter3k suggested the concept of the retain mode, which would create partial masks by preserving key tokens. This was done completely independently of the swapping concept, but I realized it solved one of its key issues: the ability to target how metadata was moved around and preserve key structure before the swaps.

This ended up being implemented into the logic of a new method, which is the current implementation in PTT. In this method, a user can control exactly where the swap would occur and what components should not be swapped under any conditions. This was a small change that added the pre-processing step of partially converting the input into partial masks. The goal is to increase the likelihood of a user reaching the intended outcome at the risk of adding additional complexity to the path to get there.

Flow of the second form of token swap attack
An example flow of the new token swap attack logic

One key concept of token swapping is to simulate the target for key spaces that are very difficult to target with other methods. A simple example is gigantic repetitive keyspaces like digits. A keyspace of ?l?l?l?l?l?d?d?d?d?d?d?d?d?d?d?d?d is challenging to enumerate unless the target material is likely to share metadata with each other. In this manner, we go from a large keyspace to staying within the “cluster” and making the processing much more efficient.

In the case of simulating non-repetitive material, by using partial masks, you can identify common metadata, then use retain to make sure it is always present and end up with higher-quality results that still simulate the target more so than if it may have gotten swapped out at random.

Swapping excels by creating a wordlist that closely emulates the targets, where rules can be applied and tried items can be removed easily. I think as a primary method, it excels at finding potentially missed patterns, emulating target material, and covering difficult-to-target attack surfaces.

Application

This technique is great because it combines multiple password-cracking aspects into a repeatable methodology. The following is a simple example of how to implement this technique with PTT:

$ cat pass.lst
love@123
@123love

$ cat retain.txt
love

Create retain masks:

$ ptt -f pass.lst -tf retain.txt -t mask-retain | tee retained.mask
?s?d?d?dlove
love?s?d?d?d

Then swap on them with matching values:

$ cat swap.lst
$333
#888
#123

$ ptt -f retained.mask -tf swap.lst -t mask-swap
#123love
$333love
love$333
love#888
love#123
#888love

Overall, I would recommend it for the following:

  • Hash lists where other attacks are exhausting. This attack can quickly generate a massive amount of new candidates from very little source material.
  • Fast algorithms where you can exhaust your wordlists quickly and need to generate new ones.
  • Hash lists where options like loopback, mask attacks, and hybrid attacks can not enumerate the pattern.
  • “Random” or generated candidates where the character set is consistent.
  • Large repeated keyspaces where metadata is likely to be shared among the user pool.

I would not recommend it for the following:

  • Slow hashes where the wordlist size becomes a significant factor in completion time.
  • Hash lists where there is little shared material between plaintext or random collections of hashes.

Thanks for reading.

Reference

The following are aliases referenced above:

# unique sort file

usort() {
	if [[ $# -ne 1 ]]; then
		echo 'unique sort file inplace'
		echo 'EXAMPLE: usort <FILE>'
	else
		LC_ALL=C sort -u $1 -T ./ -o $1
	fi
}