volatile write

Build your own SQLite -- Codecrafters

I recently went through the "Build your own SQLite" challenge on Codecrafters -- an education platform with tracks "designed for experienced engineers to build complex software".

I learned of the platform from Jon Gjengsets video about it. The full access to the challenges is paid content, only the first two tasks per challenge are free and I wanted to evaluate to see whether I would sign up. I then chose SQLite because I felt it was best suited for me to evaluate the platform within the first two stages because:

  1. I work for a database company and while I am not working on the actual database directly, I am at least in theory familiar with the concepts. I've never looked into the details and how SQLite really works, so it has a good amount of overlap with what I do day-to-day that I was fairly confident I would be able to complete all stages in terms of understanding and interest. (It's not only challenge for which that is the case, though)
  2. The SQLite track was the only one with a stage that is classified as Hard within the first two stages and I wanted to get a feeling for what it means for a stage to be Hard. Also, SQLite seems to be mostly Hard stages, so I could also get a feeling for what looks to be somewhat of an upper bound difficulty wise. And since I has some rough familiarity with the topic, I felt I would be better able to judge difficulty. I would say that this was probably my main deciding factor.
  3. Jon did cover BitTorrent, not SQLIte didn't already cover that topic, so there was nothing to cheat from :)

After the first two tasks, I got offered a 40% discount and with a long weekend ahead of me, I paid for a year long membership, and went on to finish the rest of the challenge.

Here's the repo with my code: https://github.com/knutwalker/codecrafters-sqlite-rust

Overall, I enjoyed the SQLite challenge a lot. The second task comes with quite a steep increase in difficulty, going from "uncomment this line" to "read a btree from the db file, here's a link to the docs, glhf". I didn't really knew anything beforehand about how the challenges are structured, so this set some expectations that solving the tasks requires me to do some research on my own and it's not just a "paint-by-numbers" style tutorial. I think this is quite important to actually get to build my own SQLite. I was a bit overzealous in how I solved the second task, but that also set me up to breeze through the next tasks, which also added to the feeling the second task was quite a steep increase, since I basically did main work for the next three or so tasks already.

I think the selection of tasks for the challenge are good and make you have to understand most of the details of the btree layout without complicating it, like I would be doing an MVP style sqlite. That is, there were no overflow pages and not really any edge cases to consider. I also spent a while on refactoring my code after I completed all the tasks, even making sure that the output matches what the sqlite3 command would produce.

One thing that I would say is missing is doing more than reading the database, i.e. updating, deleting and creating records (or even tables), how to do transactions, rollback, etc.

I ran a few comparisons against the original sqlite3 binary. I took the companies.db example table, which was about 8 MiB in size, and ran a few INSERT INTO FROM SELECT * like statements on it to blow it up to about 1 GiB (which is supposed to be size of the db that is being used in the final task).

First, I verified that we produce the same output, using the example query from the task.

sqlite3 companies-big.db "SELECT id,name from companies where country = 'eritrea'" > sqlite-output.txt

./target/release/sqlite-starter-rust companies-big.db "SELECT id,name from companies where country = 'eritrea'" > mysqlite-output.txt

diff {my,}sqlite-output.txt  #no output -- no difference

Then, I ran a few benchmarks using hyperfine and spent some time to improve the performance of, ehem, mysqlite 🙈.

hyperfine --warmup=10 -N -L engine 'sqlite3,./target/release/sqlite-starter-rust' "{engine} companies-big.db \"SELECT id,name from companies where country = 'eritrea'\""
Benchmark 1: sqlite3 companies-big.db "SELECT id,name from companies where country = 'eritrea'"
  Time (mean ± σ):       5.1 ms ±   0.3 ms    [User: 1.2 ms, System: 1.9 ms]
  Range (min  max):     4.6 ms    6.6 ms    542 runs

Benchmark 2: ./target/release/sqlite-starter-rust companies-big.db "SELECT id,name from companies where country = 'eritrea'"
  Time (mean ± σ):       4.4 ms ±   0.2 ms    [User: 1.4 ms, System: 2.3 ms]
  Range (min  max):     4.0 ms    5.5 ms    559 runs

Summary
  ./target/release/sqlite-starter-rust companies-big.db "SELECT id,name from companies where country = 'eritrea'" ran
    1.18 ± 0.10 times faster than sqlite3 companies-big.db "SELECT id,name from companies where country = 'eritrea'"

This shows both implementations to be in the same ballpark, but this is also not a fair comparison, since sqlite3 does so much more, being a full SQLite implementation and all that.

Overall, it had a good time to spend a long weekend on it.

Thoughts? Leave a comment