Introduction
I started with the reddit post here.
From there, installed ollama and downloaded the recommended "qwen3-vl:8b".
But there was no easy way to interface with the model, so i found sn2md.
Using these two tools together, I was able to set up a really nice workflow for converting my .note files to .md
Hardware
x2 Xeon E5-2699, 125.8 GB RAM, RTX 3060 (12GB of vRam).
Software
Linux Mint 22.2, sn2md, ollama
Models Used
mirage335/Qwen-3-VL-8B-Instruct-virtuoso:latest 9.8 GB adelnazmy2002/Qwen3-VL-4B-Instruct:Q4_K_M 3.3 GB adelnazmy2002/Qwen3-VL-8B-Instruct:latest 6.1 GB qwen3-vl:8b 6.1 GB glm-ocr:latest 2.2 GB (Occupies 4+gb of vram)
Configuration
OLLAMA_KEEP_ALIVE=1 OLLAMA_NUM_PARALLEL=1 OLLAMA_GPU=1 ollama run <MODEL>
~/.config/sn2md.toml prompt=""" You are an OCR engine, not a writing assistant. Task: - Read the handwritten note in the image. - Output the exact transcription of the text as plain markdown. Critical constraints: - Do NOT explain what you are doing. - Do NOT think step-by-step. - Do NOT describe, analyze, or comment on the note. - Do NOT use phrases like "let's", "wait", "first line", "next line", "line X", "Got it", or "step by step". - Do NOT mention spellings or say how words are written. - Do NOT repeat any single word more than twice in a row. - If you notice yourself repeating a word or phrase, immediately stop and output your best single transcription of the whole note. - Your entire response must be ONLY the final transcription text, nothing else. """
Benchmarks
For processing and testing, I used a densly populated (as in top to bottom filled with text on 99.9% of pages) .note file of 261 pages in length.
My handwriting is horrible. In spite of this, the majority of the document came out as it should.
mirage335/Qwen-3-VL-8B-Instruct-virtuoso:latest 1 Hour 19 Minutes 5 Seconds adelnazmy2002/Qwen3-VL-4B-Instruct:Q4_K_M 26 Minutes 21 Seconds adelnazmy2002/Qwen3-VL-8B-Instruct:latest 43 Minutes 13 Seconds qwen3-vl:8b 3 Hours 48 Minutes 41 Seconds glm-ocr:latest 22 Minutes 21 Seconds
Notes
Overall, I found the the Qwen3-VL-8B-Instruct to be the most useful. however I did find it repeating "I will adhere to the Precision & Conciseness Protocol." in random places, so I will need to add that to my prompt. As I analyze the output in more detail (I just did a once over of all of them compared in Meld), I will add more notes.
I created a model that can be used as is and is available on ollama: rumplestilzken/supernote1
This model is based on the Qwen3-VL-8B-Instruct model with the newest prompt.
2/4/26 - glm-ocr takes half the time of the next best model. It is about evenly matched for accuracy, perhaps slightly better than the Qwen3-VL-8B-Instruct. However the Qwen3 model does better at creating white space around dialog. The glm-ocr is without a doubt a feat in the OCR world, i just need more from it as a fiction writer.
Disclaimer:
I did attempt to use the 30B A3B variants of some of these models, but it was going to take something like 10 hours and if I wanted to wait that long I would use my Manta to convert the novel sized documents.
