Differences between revisions 1 and 3 (spanning 2 versions)
Revision 1 as of 2026-01-31 10:14:23
Size: 40
Comment:
Revision 3 as of 2026-01-31 10:47:32
Size: 2984
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Describe Supernote/OCRConversion here. <<TableOfContents()>>

= Introduction =
I started with the reddit post [[https://www.reddit.com/r/Supernote/comments/1pqs7zy/local_vlms_for_handwriting_recognition_way_better/|here]].

From there, installed ollama and downloaded the recommended "qwen3-vl:8b".

But there was no easy way to interface with the model, so i found [[https://github.com/dsummersl/sn2md|sn2md]].

Using these two tools together, I was able to set up a really nice workflow for converting my .note files to .md

= Hardware =
x2 Xeon E5-2699, 125.8 GB RAM, RTX 3060 (12GB of vRam).

= Software =
Linux Mint 22.2, sn2md, ollama

= Models Used =
{{{
mirage335/Qwen-3-VL-8B-Instruct-virtuoso:latest 9.8 GB
adelnazmy2002/Qwen3-VL-4B-Instruct:Q4_K_M 3.3 GB
adelnazmy2002/Qwen3-VL-8B-Instruct:latest 6.1 GB
qwen3-vl:8b 6.1 GB
}}}

= Configuration =
{{{
OLLAMA_KEEP_ALIVE=1 OLLAMA_NUM_PARALLEL=1 OLLAMA_GPU=1 ollama run <MODEL>
}}}

{{{
~/.config/sn2md.toml

prompt="""
You are an OCR engine, not a writing assistant.

Task:
- Read the handwritten note in the image.
- Output the exact transcription of the text as plain markdown.

Critical constraints:
- Do NOT explain what you are doing.
- Do NOT think step-by-step.
- Do NOT describe, analyze, or comment on the note.
- Do NOT use phrases like "let's", "wait", "first line", "next line", "line X", "Got it", or "step by step".
- Do NOT mention spellings or say how words are written.
- Do NOT repeat any single word more than twice in a row.
- If you notice yourself repeating a word or phrase, immediately stop and output your best single transcription of the whole note.
- Your entire response must be ONLY the final transcription text, nothing else.
"""

}}}

= Benchmarks =

For processing and testing, I used a densly populated (as in top to bottom filled with text on 99.9% of pages) .note file of 261 pages in length.

My handwriting is horrible. In spite of this, the majority of the document came out as it should.

{{{
mirage335/Qwen-3-VL-8B-Instruct-virtuoso:latest 1 Hour 19 Minutes 5 Seconds
adelnazmy2002/Qwen3-VL-4B-Instruct:Q4_K_M 26 Minutes 21 Seconds
adelnazmy2002/Qwen3-VL-8B-Instruct:latest 43 Minutes 13 Seconds
qwen3-vl:8b 3 Hours 48 Minutes 41 Seconds
}}}


= Notes =

Overall, I found the the Qwen3-VL-8B-Instruct to be the most useful. however I did find it repeating "I will adhere to the Precision & Conciseness Protocol." in random places, so I will need to add that to my prompt. As I analyze the output in more detail (I just did a once over of all of them compared in Meld), I will add more notes.

= Disclaimer: =

I did attempt to use the 30B A3B variants of some of these models, but it was going to take something like 10 hours and if I wanted to wait that long I would use my Manta to convert the novel sized documents.

Introduction

I started with the reddit post here.

From there, installed ollama and downloaded the recommended "qwen3-vl:8b".

But there was no easy way to interface with the model, so i found sn2md.

Using these two tools together, I was able to set up a really nice workflow for converting my .note files to .md

Hardware

x2 Xeon E5-2699, 125.8 GB RAM, RTX 3060 (12GB of vRam).

Software

Linux Mint 22.2, sn2md, ollama

Models Used

mirage335/Qwen-3-VL-8B-Instruct-virtuoso:latest    9.8 GB
adelnazmy2002/Qwen3-VL-4B-Instruct:Q4_K_M          3.3 GB    
adelnazmy2002/Qwen3-VL-8B-Instruct:latest          6.1 GB
qwen3-vl:8b                                        6.1 GB

Configuration

OLLAMA_KEEP_ALIVE=1 OLLAMA_NUM_PARALLEL=1 OLLAMA_GPU=1 ollama run <MODEL>

~/.config/sn2md.toml 

prompt="""
You are an OCR engine, not a writing assistant.

Task:
- Read the handwritten note in the image.
- Output the exact transcription of the text as plain markdown.

Critical constraints:
- Do NOT explain what you are doing.
- Do NOT think step-by-step.
- Do NOT describe, analyze, or comment on the note.
- Do NOT use phrases like "let's", "wait", "first line", "next line", "line X", "Got it", or "step by step".
- Do NOT mention spellings or say how words are written.
- Do NOT repeat any single word more than twice in a row.
- If you notice yourself repeating a word or phrase, immediately stop and output your best single transcription of the whole note.
- Your entire response must be ONLY the final transcription text, nothing else.
"""

Benchmarks

For processing and testing, I used a densly populated (as in top to bottom filled with text on 99.9% of pages) .note file of 261 pages in length.

My handwriting is horrible. In spite of this, the majority of the document came out as it should.

mirage335/Qwen-3-VL-8B-Instruct-virtuoso:latest    1 Hour 19 Minutes 5 Seconds
adelnazmy2002/Qwen3-VL-4B-Instruct:Q4_K_M          26 Minutes 21 Seconds
adelnazmy2002/Qwen3-VL-8B-Instruct:latest          43 Minutes 13 Seconds
qwen3-vl:8b                                        3 Hours 48 Minutes 41 Seconds      

Notes

Overall, I found the the Qwen3-VL-8B-Instruct to be the most useful. however I did find it repeating "I will adhere to the Precision & Conciseness Protocol." in random places, so I will need to add that to my prompt. As I analyze the output in more detail (I just did a once over of all of them compared in Meld), I will add more notes.

Disclaimer:

I did attempt to use the 30B A3B variants of some of these models, but it was going to take something like 10 hours and if I wanted to wait that long I would use my Manta to convert the novel sized documents.

Supernote/OCRConversion (last edited 2026-02-04 17:28:09 by rumplestilzken)