Page 1 of 1

higher quality and less bandwidth by conversion to vector formats

Posted: Wed May 17, 2017 3:31 pm
by kuribas
Hi,

I want to propose a way to create better quality output and less bandwidth by converting scanned pages to vector output. Bitmap images take a lot of space and have much redundancy. By converting the outlines to a vector image, and grouping similar elements together output can be improved, and bandwidth saved by a large factor. I am not suggesting OCR, because OCR does extra semantic analysis, and is fragile. Semantic analysis would be only an optional extra, but by default the system would work without actually understanding the content. I am willing to implement such a system, and I wonder if funding would be available for something like this.

Kristof Bastiaensen

Re: higher quality and less bandwidth by conversion to vector formats

Posted: Sat May 20, 2017 3:29 am
by Sallen112
This sounds interesting, i'll let our leader, Feldmahler know.

Re: higher quality and less bandwidth by conversion to vector formats

Posted: Sat May 20, 2017 5:15 am
by imslp
Hi Kristof,

This sounds quite interesting and yes, funding is available, but first I'll need to know a bit more about your background and how the conversion works on a more technical level. My e-mail is eguo@imslp.org, and it may also be helpful to have a Skype call at some point. Let me know.

Thanks,
Edward

Re: higher quality and less bandwidth by conversion to vector formats

Posted: Sat May 20, 2017 8:37 am
by coulonnus
Without moving to a vector format, the information theory makes bitmap images much smaller if they represent clean figures than if they represent dirty images. A typical typeset pdf page is about 15 kB big and a decent scanned page is about 100 kB big. When I convert a typeset pdf to tif, change the page layout and reconvert the images to pdf the result is not much bigger than the original pdf. And a Henle scan is smaller than a ca.1800 scan. :-)

Then all pdf's made with scans would be much smaller if we had an application that recognizes a staff line and replaces it with a clean staff line. Same for note stems, beams etc. Other symbols and text indications could come later (OCR).

I have already made .001% of the job :-) with an application that deletes stains smaller than the dot of a lowercase i in indication like vivace. But don't expect a size reduction bigger than about 5% so far. See an example here http://imslp.org/wiki/Piano_Sonata_in_F ... rel_Anton)

Re: higher quality and less bandwidth by conversion to vector formats

Posted: Mon May 22, 2017 6:17 am
by coulonnus
Also read https://en.wikipedia.org/wiki/SmartScore It converts images to MIDI and to MusicXML. I think the best bandwidth advice is: retypeset it! :-)

Re: higher quality and less bandwidth by conversion to vector formats

Posted: Tue Jun 06, 2017 2:17 pm
by daphnis
I'm unclear what OP is proposing here. An implementation of an existing process, methodology, and format; a new one; or both?

Re: higher quality and less bandwidth by conversion to vector formats

Posted: Wed Jun 07, 2017 10:12 pm
by Choralia
imslp wrote:Hi Kristof,

This sounds quite interesting and yes, funding is available, but first I'll need to know a bit more about your background and how the conversion works on a more technical level. My e-mail is eguo@imslp.org, and it may also be helpful to have a Skype call at some point. Let me know.

Thanks,
Edward
I hope this idea is progressing behind the scenes. Further to better quality and reduced bandwidth (as well as reduced storage space), conversion to vector format may work as a pre-processing layer for optical music recognition programs, thus facilitating the transformation of scanned scores into files compatible with music editing software. Quite interesting, IMO.

Max

Re: higher quality and less bandwidth by conversion to vector formats

Posted: Thu Jun 08, 2017 12:20 am
by imslp
Yep, this is progressing, will announce when the time comes.

Re: higher quality and less bandwidth by conversion to vector formats

Posted: Thu Jun 08, 2017 5:29 am
by coulonnus