Site scraping

Moderator: kcleung

Post Reply
jeroen
Posts: 1
Joined: Wed Mar 16, 2011 4:09 pm
notabot: 42
notabot2: Human

Site scraping

Post by jeroen »

Hello,

For a school project, we're creating a website which makes searching for music scores a lot easier for people who teach in music. It's main advantage is that it's possible to search on instrumentation and duration, so the teacher can find a score which makes it possible to let his 2 students who play the flute practice together with his 2 students who play piano for let's say 45 minutes. There are several groups with the same assignment. One project probably will go live after completion.

Now we're looking for a start for our database. We saw your immense source of scores. We were thinking of indexing all the available scores on imslp, extracting the instrumentation and length, combine this with a link to imslp, and putting it in our database. This way our visitors could find scores based on a given instrumentation and/or length and would be redirected to imslp.

Users would also get the possibility to add a link to a score which isn't available in our database.

When trying to get all the information from your wiki, we got banned because of 'site ripping'. This isn't very surprising, because we are indeed ripping a part of the content of the site. Now my question is if it is permitted to rip only the part 'title', 'composer', 'instrumentation', 'length' and 'link'. Sorry for trying before asking for permission.


Thanks a lot,

Jeroen
student @ University of Ghent, Belgium
KGill
Copyright Reviewer
Posts: 1295
Joined: Thu Apr 09, 2009 10:16 pm
notabot: 42
notabot2: Human

Re: Site scraping

Post by KGill »

I am unable to answer about the site ripping script (not being the server admin), but I feel I should point out that it is already possible to sort works by instrumentation on IMSLP, so I'm afraid that part of your project might constitute something of a duplication of effort. Using the entrance page for our Category Walker, one can browse most of the works on the site by not only instrumentation, but also work type, language, etc. (There are many more instrumentations than the ones shown in the relevant section on that page, to prevent it from getting too long. The page is sort of the 'tip of the iceberg' of the categorization system.) Indexing by duration would also be a problem, as there are many, many works that do not have that field filled in. I would venture to say that it would probably be a good idea for that to be implemented in some form on IMSLP itself, if possible. I hate to sound like I'm discouraging you, but it sounds like what you're thinking of amounts to an improvement on IMSLP, and it would be great to improve it directly :)
steltz
active poster
Posts: 1861
Joined: Sat Dec 13, 2008 2:30 pm
notabot: 42
notabot2: Human

Re: Site scraping

Post by steltz »

Dear Jerome:

Following on from what Kenny said, it sounds like your search is in part redundant and is already available via our "category walker". It will give you pieces by instrumentation. On the left side of the main page, click on "Work genre" (alternative in the centre of the page there is "Genre, Instrumentation, or Language". This takes you to a page with lots of names like "airs", "folksongs", etc. but if you scroll further down you will find things like " For 4 players", which is where you would find things for 2 flutes and 2 pianos.

If a genre has more than one subgenre, you will see a "subgenres" button next to the genre name. Click on this to find out, for example, that we don't have anything for two flutes and two pianos. But you could find other works for 4 players.

You can also (from the main genre page) find all things that involve the flute, though if you click on this, you will notice that there are 1,155 items, so scrolling through them all might be a bit time consuming. The search by the number of players, and using the subgenres, is generally fairly quick.

As to timings, very few people actually fill in this information. I think a lot of people that are scanning old music haven't actually played it, and don't know how long the works are. So this won't be a useful search tool for most items on IMSLP.

I hope this answers your questions -- using the existing search capability means that you don't need to do any site ripping.

(By the way, I met the clarinet professor from Ghent (or one of them) two days ago at a masterclass. Very great player!!)
bsteltz
Post Reply