Page 1 of 1

Plate harvesting script

Posted: Thu Aug 13, 2009 2:54 am
by daphnis
Any of the programmers out there familiar with the wiki, would there be any possible way of creating a harvesting script or function that reads publisher, plate, and date info., then dumps that into the appropriate publisher page? For example, because of the number of works we have published by Durand, we probably have the largest collection of plate and publisher information aside from the publishers themselves. We haven't been good about keeping these added to the Durand publisher page, and doing so now would be an incredible resource, especially for dating publications that are otherwise not represented in any composers' catalogues d'Ĺ“uvres.

I'd still like to request a change to the wiki upload pages that include publisher fields separating publisher, date and plate, which can then, upon submission, enter that information into the respective publisher page. This may be a tall order, but this sure would be nice to have as it would further strengthen our already vast and comprehensive publisher pages.

Re: Plate harvesting script

Posted: Thu Aug 13, 2009 3:27 am
by Carolus
That's an interesting idea. I wonder if including the city, publisher name, date, and plate number as separate fields would make the the automatic generation of such lists more possible. As it stands, I think we have the best plate number lists on the net. It's interesting how some publishers' plate numbers are a fairly reliable indicator of date, while others can be downright whimsical.

Re: Plate harvesting script

Posted: Thu Aug 13, 2009 3:38 am
by vinteuil
Yes indeed. I now carry around lists for plate numbers of those precise publishers (Richault, Ricordi) to date things (come as they do ;).

Re: Plate harvesting script

Posted: Thu Aug 13, 2009 3:44 am
by daphnis
I wonder if including the city, publisher name, date, and plate number as separate fields would make the the automatic generation of such lists more possible.
Yes, that's the general idea, and I think doing so would absolutely be possible.
It's interesting how some publishers' plate numbers are a fairly reliable indicator of date, while others can be downright whimsical.
Agreed. However with the Durand example, and because I myself have submitted so many without taking the time to add them into the Durand page, this would be a hugely helpful in dating publications since they're fairly straight-forward in their assignments.

Re: Plate harvesting script

Posted: Thu Aug 13, 2009 3:49 am
by vinteuil
Great ideas! You should talk to Leonard and Feldmahler about it (it's always safer to ask on their talk pages, as they often do not find these forums)

Re: Plate harvesting script

Posted: Thu Aug 13, 2009 2:10 pm
by Mazin
Ask for an IMSLP api while you're at it. :D
perlnerd666 wrote:Yes indeed. I now carry around lists for plate numbers of those precise publishers (Richault, Ricordi) to date things (come as they do ;).
Think you could share these lists?

Re: Plate harvesting script

Posted: Thu Aug 13, 2009 6:46 pm
by vinteuil
They're on the publisher pages :)

Re: Plate harvesting script

Posted: Fri Aug 14, 2009 3:29 pm
by Leonard Vertighel
daphnis wrote:... a harvesting script or function that reads publisher, plate, and date info...
Feasible, if the info is available in a sufficiently standardized format. Entries that do not adhere to a known format would have to be skipped. I don't think that I'll have time to work on it in the near future though.

Re: Plate harvesting script

Posted: Fri Aug 14, 2009 3:31 pm
by vinteuil
I've been working on standardizing. There are two main issues:
1. The format: Place: Publisher, Date. Plate No. ## is not adhered to. It is extremely common to see: Place: Publisher, Date, plate No. ###. This is frankly rather low on my list of things to correct, so a huge number of pages are like that.
2. A lot of the time, there's no plate number. Also, the n.d. (DATE) format might trip it up.

Re: Plate harvesting script

Posted: Fri Aug 14, 2009 4:53 pm
by Leonard Vertighel
Actually it seems to me that the harvesting step should be reasonably straightforward. (Certainly several entries would be skipped, but we can list those separately for review.) The problem is the insertion of the data into the existing tables. For example, variations in the publisher names are likely to cause difficulties. I'm not sure if we can do much more than creating a big list of harvested data, sorted by publisher name and then by year, and then manually add the missing data to the existing tables. Thoughts?

Re: Plate harvesting script

Posted: Fri Aug 14, 2009 5:55 pm
by reinhold
perlnerd666 wrote:I've been working on standardizing. There are two main issues:
1. The format: Place: Publisher, Date. Plate No. ## is not adhered to. It is extremely common to see: Place: Publisher, Date, plate No. ###.
Actually, in musicology (at least according to http://www.hfm-weimar.de/v1/musikwissen ... hieren.pdf), the standard format is:
Place: Publisher Year, PN ###.

Cheers,
Reinhold

Re: Plate harvesting script

Posted: Fri Aug 14, 2009 7:25 pm
by homerdundas
I'd like to second the request for an API. My main complaint is that the wikimedia Category functions are clumsy - we need a better underlying database. We shouldn't be 'manually' devising lists of genres - all the info pertaining to a work should appear on the work page - and indexing by that info should be automatic - including plate number listings.

Re: Plate harvesting script

Posted: Fri Aug 14, 2009 9:24 pm
by Leonard Vertighel
homerdundas wrote:I'd like to second the request for an API.
Could you give an example what you would use it for?
My main complaint is that the wikimedia Category functions are clumsy - we need a better underlying database. We shouldn't be 'manually' devising lists of genres - all the info pertaining to a work should appear on the work page - and indexing by that info should be automatic - including plate number listings.
This has been discussed before. The main problem is that it is very hard to devise a system that is flexible enough to include all possible cases and is still manageable. If anyone has a concrete proposal, feel free to open a new thread. I think here we should focus on the harvesting script, which seems more feasible in the short term.