Page 1 of 2

BNF higher resolution downloads ?

Posted: Thu Jan 24, 2013 6:10 pm
by Grisou
Hi !

I'm using the BNF database for uploading scans to IMSLP, but Carolus told me that the resolution isn't good enough. Is there a way to download the files in higher quality ? Perhaps it is because I don't use PDFArchitect properly ?

Thanks ! (Sorry for the eventual grammar mistakes.)

Re: BNF higher resolution downloads ?

Posted: Thu Jan 24, 2013 11:51 pm
by cypressdome
Hi Grisou,

Kalliwoda covered a method for doing this in this thread: Acquiring scans from Gallica/BNF.

To summarize, the PDF files from BNF give you images that are at best 90 dpi. Looking at the example of Gutmann's Conte de soir, Op.50 that you uploaded the images within that PDF are approximately 1025 by 1247 pixels. The score's original size is given as 35cm (around 13 3/4 inches). So 1247/13.75 = 90 dpi.

With BNF's zoom viewer you can get the url for each page of the score and make some modifications which result in this:

http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236

btv1b52000301j = identifies this as Gutman's Op.50

f1 = identifies this as page one so changing this to f2 gives you page 2, etc.

l=5 is the current zoom level. 6 is the highest

r=0,0,2236,2236 = this tells what part of the image to display. 0,0 is the upper left corner and 2236,2236 tells it to display 2236 pixels to the right and 2236 pixels down from the upper left corner. 2236 is the largest number of pixels that BNF will allow to be displayed.

At zoom level 5 for this particular score you get the entire image which is 1713 by 2164 pixels. This is around 150 dpi. Some scores at level 5 will get cropped.

At zoom level 6 this http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236 will give you a cropped image but you can download 4 images (2 columns, 2 rows) for the entire page like this:

http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236

Then you can create a blank image in Gimp or Photoshop and copy/paste these images into it butting them up together. The result is an image for the page that is 3422 by 4322 pixels which is around 300 dpi. Some of the scores that I have been working with would require 6 images (3 columns, 3 rows) per page. Actually, the method I use to get BNF's level 6 images is requiring the download of 63 images per page and while the process of downloading and stitching the images together is more automated it isn't without its pitfalls and also requires a great deal of preparation work. For short works like Gutman's 6 page Op.50 the 4-image-per-page manual download and stitching method isn't too burdensome and of course the 150 dpi images you can get easier than the 300 dpi images and those are a significant improvement over the 90 dpi images.

Hope this proves helpful,
Cypressdome

Re: BNF higher resolution downloads ?

Posted: Tue Feb 05, 2013 10:24 am
by coulonnus
The French site http://www.actualitte.com/ has many articles about BNF and digitization. To summarize them, don't expect any new BNF high resolution scans in the next 10 years!

Re: BNF higher resolution downloads ?

Posted: Tue Feb 05, 2013 10:12 pm
by cypressdome
Coulonnus,

As someone whose ability to read French is based upon Google Translate and my two years of study at a third-rate high school over twenty years ago could you perhaps elaborate on what is taking place at BNF to cause this?

Thanks,
Cypressdome

Re: BNF higher resolution downloads ?

Posted: Wed Feb 06, 2013 7:13 am
by coulonnus
The main article http://www.actualitte.com/bibliotheques ... -40048.htm says that some firms will continue the digitization job. You will be able to see this scans for free if you visit the BNF. (I don't know what the printing conditions will be). Otherwise the access will not be free, even for other major libraries in the world.

This article also contains a few comparisons in English with other libraries in the world.

http://www.actualitte.com/tribunes/bnf- ... n-1916.htm says that 95% of those scans won't be online for 10 years. The remaining 5% will join Gallica, the online section of BNF.

Bruno Racine, director of BNF, wrote an article in << Le Monde >> : http://www.lemonde.fr/idees/article/201 ... _3232.html There are so many documents to scan that we need private help. We are not making PD material something private. These scans will be something new. The profits will be used to digitize more documents.

I can't summarize all other articles. A "BNF" search on this site gives many results!

Re: BNF higher resolution downloads ?

Posted: Wed Feb 27, 2013 4:58 pm
by nachus001
Hey:

I've created a script (very crude indeed) to download high quality scans from BNF out of a plurality of tiles. It's a python script. I've used python since it's comprehensible for me (I'm a C programmer) and as a lilypond user in a windows box (I suppose many of you are lilypond users) Python it's already installed with your lilypond installation. If you're running a linux box, then you'll probably have it installed along with the other necessary tools anyways. In any case please check.

So, what the script does is, to download a bunch of jpg tiles (being the page division user defined) and then assemble it to individual high quality pages. The tile size
is set to the resulting size of a 4X4 tiles per page as absolute (and default) minimal (you can increase the page division if you wish)

What you need to run it is: Python installed, with the paths correctly set, cURL installed with the path correctly set, and Imagemagick installed with the paths correctly set.
check:
http://www.python.org/
http://curl.haxx.se/
http://www.imagemagick.org

To make it work you need first to open a directory for your piece . This is mandatory by common sense, since there will be heavy file activity and the script will delete all the downloaded tiles once it assembled the pages (it will issue the command del PAGE* so, be warned)

Then open the console and CD to your piece's directory . In a web browser, go to the gallica site and go for the document of your choice. select the max possible document zoom and point your mouse to the far bottom right zone of the image (you may need to drag the document scan to make viewable that part of the scan). Once there, right click the said image/tile and click "image properties" (or so, depending on your browser) option. In firefox a popup will apear with a link like this

http://gallica.bnf.fr/proxy?method=R&ar ... 08,256,256 Briefly said, this command says "Hey gallica, put yerself in zoom 6 and gimme' a tile of 256X256 pixels from the Y=6144 and X=4608 coord' "
As, from some time on we can not do the trick o requesting a tile of 6144X4608 from the coordinates 0,0 anymore (as cypressdome says, it' was limited to a smaller area) we need a script that automates this task for us, and access the hi definition zoom.

Now we copy this address, write down the document pages quantity (26 in this case) and write to the command line (the double quotes are mandatory here):

Code: Select all

getbnf -a "http://gallica.bnf.fr/proxy?method=R&ark=btv1b9009896r.f1&l=6&r=6144,4608,256,256" -p26 -o "cambiniduos2va2bk_"
That's it!! for this document in particular, it took +-30 minutes to download and assemble all (more than 400 tiles) and get a directory with 26 jpg's of +-4mB each. My internet connection is rather very slow, but I think that with connections of better bandwidth and speed the download will be more brief and expeditive.

Here is the script. Test it with few pages, and if you need more options type getbnf -h (or ask me here!)
I hope it's of some usefulness for the comunity

Cheers
Nachus

Re: BNF higher resolution downloads ?

Posted: Wed Feb 27, 2013 9:55 pm
by cypressdome
Hi nachus001!

If I can get this to work I will be unbelievably happy! I can only guess that I have some type of path/environment variable issue as when I run the script I get this error message: "python: can't open file 'getbnf.py': [Errno 2] No such file or directory". I'm running Windows 7 and have run Python scripts in the past to grab images from Hathi Trust and to images displayed using Zoomify but in both cases it was run from Python's own directory. I've got Python, Imagemagick, and curl in my %PATH% environment variable, have PYTHONPATH set, and have the appropriate registry entries under the subkey HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\3.2\PythonPath. Any ideas?

Thanks,
Cypressdome

Re: BNF higher resolution downloads ?

Posted: Thu Feb 28, 2013 12:45 am
by nachus001
cypressdome wrote:Hi nachus001!

If I can get this to work I will be unbelievably happy! I can only guess that I have some type of path/environment variable issue as when I run the script I get this error message: "python: can't open file 'getbnf.py': [Errno 2] No such file or directory". I'm running Windows 7 and have run Python scripts in the past to grab images from Hathi Trust and to images displayed using Zoomify but in both cases it was run from Python's own directory. I've got Python, Imagemagick, and curl in my %PATH% environment variable, have PYTHONPATH set, and have the appropriate registry entries under the subkey HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\3.2\PythonPath. Any ideas?

Thanks,
Cypressdome
Hi cypressdome:

I have a win xp box and what I did was to make a directory "getbnf" and add to the path ' %PATH%;D:\getbnf '
Once I did this I just type "getbnf" anywhere and the script just runs.. Maybe it's better to copy the script to the python executables directory or to any other directory you have pointed in the path. On the other side windows 7 manages the path completely different than windows xp as far as I know

regards
Nachus

Re: BNF higher resolution downloads ?

Posted: Thu Feb 28, 2013 1:15 am
by nachus001
cypressdome:

I found this for win7.
http://geekswithblogs.net/renso/archive ... ows-7.aspx

they don't do anything with the registry

regards
Nachus

Re: BNF higher resolution downloads ?

Posted: Thu Feb 28, 2013 2:00 am
by cypressdome
Thanks Nachus!

That is where I've got the path variable set. I've now downloaded and installed Python 3.3 (had been using 3.2) and have c:\python33 now listed in my path. Here's the message running the script gives me now:

Code: Select all

G:\bnf>python getbnf.py -a "http://gallica.bnf.fr/proxy?method=R&ark=btv1b525007218.f1&l=6&r=5632,4352,256,256" -p2 -o "septmel"
  File "getbnf.py", line 70
    print 'write getbnf -h for further information!'
                                                   ^
SyntaxError: invalid syntax
At least it seems to have progressed beyond not being able to find the script. I also tried it on an old XP machine I had that still had Python 3.2 on it and it was giving me the same "no such file/directory message." That's when I went back to Win7 and installed Python 3.3

Thanks,
Cypressdome

Re: BNF higher resolution downloads ?

Posted: Thu Feb 28, 2013 12:59 pm
by nachus001
Cypressdome:

you don't need to run python with the script argument.
The script will run alone, as the first line is #!/usr/bin/python

write this at th console prompt

Code: Select all

getbnf_p3 -a "http://gallica.bnf.fr/proxy?method=R&ark=btv1b525007218.f1&l=6&r=5632,4352,256,256" -p2 -o "septmel"
And for the error. I have the python 2.4.5 version (the one that is installed with lilypond) and it works for me. Python 3.X uses
the print() function instead of the print ' ' keyword. And print ' ' keyword occurrences in the code won't work anymore. So I have
modified the script for python 3.X and up, with the print function. It also work for me (py 2.4.5) I downloaded two pages of 4MB out of your prompt address

Tell me if there is any problem
cheers

Nachus

Re: BNF higher resolution downloads ?

Posted: Thu Feb 28, 2013 9:01 pm
by cypressdome
Nachus,

You are my new hero! That updated script worked like a charm. The only change I had to make was to add the "py" extension to "getbnf_p3" in the command line (probably an issue with my Windows and/or various Python installations). Over a 12mbps connection about 1/3 the way around the world from France it took about 14 minutes to download the images and stitch together 29 pages.

Many, many thanks!
Cypressdome

Re: BNF higher resolution downloads ?

Posted: Fri Mar 01, 2013 12:14 am
by cypressdome
Nachus,

Would there be some way to modify the script so that the output file names have the page numbers include some leading zeros? I've started to convert to black and white a 121 page score and when the system sorts the file names alphabetically the result is Filename_1, 10, 11, 12, 100, 101, etc. Even with shorter scores you have to deal with 1, 10, 11, 12,..., 19, 2, 20, 21, etc. If not, it certainly isn't a terrible burden to live with on my end.

Thanks again,
Cypressdome

Re: BNF higher resolution downloads ?

Posted: Fri Mar 01, 2013 8:38 pm
by nachus001
Here it is:

I was bothered too by that odd ordering that appears in Irfanview (because of the lack of leading zeroes). Now I corrected the output page numbering with five leading zeroes, in order to cover any document size. Here it is

Re: BNF higher resolution downloads ?

Posted: Sat Mar 02, 2013 5:26 am
by cypressdome
Thanks Nachus that worked perfectly! Now to start transferring the many Chopin first editions they have!

Cypressdome