Nationality Field

Any posts related to the categorization and standardization of IMSLP

Moderators: vinteuil, Davydov

Post Reply
gardano
forum adept
Posts: 62
Joined: Wed Mar 02, 2011 8:08 pm
notabot: 42
notabot2: Human

Nationality Field

Post by gardano »

I'm parsing all nationalities in order to allow filtering by nationality, and it seems to me that this field could use some clean-up (unless I'm not understanding the current uses of this field). Here's what I get:

As you can see, in many instances, a nationality has
* Nationality
* Nationality + 'people'
* Nationality [pluralized]

It certainly would make my life easier if each nationality were presented as either the proper name of the country/region ("Germany") or the name as adjective ("German").

Comments?

'Albanian'
'Algerians'
'American people'
'American'
'Americans'
'Argentinian'
'Argentinians'
'Armenians'
'Australian people'
'Australian'
'Australians'
'Austrian'
'Austrians'
'Basques'
'Belgian'
'Belgians'
'Bosnian'
'Brazilian'
'Brazilians'
'British'
'Britishs'
'Bulgarians'
'Canadian'
'Canadians'
'Catalan'
'Catalans'
'Chilean'
'Chileans'
'Chineses'
'Colombians'
'Croatian'
'Croatians'
'Cubans'
'Czech people'
'Czech'
'Czechs'
'Danish people'
'Danish'
'Danishs'
'Dutch people'
'Dutch'
'Dutchs'
'Dutchs'
'Ecuadorian'
'Egyptians'
'English people'
'English'
'Englishs '
'Englishs'
'Estonians'
'Filipinos'
'Finnish'
'Finnishs'
'French People'
'French people'
'French'
'Frenchs'
'Georgians'
'German people'
'German'
'Germans'
'Germans'
'Greeks'
'Guatemalan'
'Haitians'
'Hawaiians'
'Hungarian people'
'Hungarian'
'Hungarians'
'Icelandics'
'Indian'
'Indians'
'Iranian'
'Irish'
'Irishs'
'Israelis'
'Italian people'
'Italian'
'Italians'
'Japanese'
'Japaneses'
'Latvian'
'Latvians'
'Liechtensteinian'
'Lithuanian'
'Lithuanians'
'Malteses'
'Mexican'
'Mexicans'
'Monegasques'
'Norwegian'
'Norwegians'
'Paraguayans'
'Peruvians'
'Polish people'
'Polish'
'Polishs'
'Portuguese'
'Portugueses'
'Puerto Ricans'
'Romanian'
'Romanians'
'Russian'
'Russians'
'Scottish'
'Scottishs'
'Serbian'
'Serbians'
'Slovakians'
'Slovenian'
'Slovenians'
'Soviet'
'Soviets'
'Spanish'
'Spanishs'
'Swedish'
'Swedishs'
'Swiss people'
'Swiss'
'Swisss'
'Turkishs'
'Ukrainian'
'Ukrainians'
'Uruguayans'
'Venezuelan'
'Venezuelans'
'Welsh'
'Welshs'
KGill
Copyright Reviewer
Posts: 1295
Joined: Thu Apr 09, 2009 10:16 pm
notabot: 42
notabot2: Human

Re: Nationality Field

Post by KGill »

That's very strange - I've never seen anything like many of the examples you gave. There should be one of three forms given: the old one ('Austrian composers'), the new correct one ('Austrian'), and an alternate form that appears on only a few pages ('Austrian people'). There shouldn't be any pages whatsoever with (to continue the example) 'Austrians' or 'Austria'. Could you give a few specific examples of composer pages with one of those forms?
gardano
forum adept
Posts: 62
Joined: Wed Mar 02, 2011 8:08 pm
notabot: 42
notabot2: Human

Re: Nationality Field

Post by gardano »

KGill wrote:That's very strange - I've never seen anything like many of the examples you gave. There should be one of three forms given: the old one ('Austrian composers'), the new correct one ('Austrian'), and an alternate form that appears on only a few pages ('Austrian people'). There shouldn't be any pages whatsoever with (to continue the example) 'Austrians' or 'Austria'. Could you give a few specific examples of composer pages with one of those forms?
I'll look. But I'm just looking at my parser's output -- what it's gotten when importing the data. As I do searches on the data, I'm not getting hits back for, say, "Frenchs", or "Germans". As you can imagine, doing a full import is a heavy and expensive operation. Next time I do so, I'll see where those items are coming from. Seems they are appearing in a nationality field somewhere, but I have yet to discover where.

I'll let you know.
gardano
forum adept
Posts: 62
Joined: Wed Mar 02, 2011 8:08 pm
notabot: 42
notabot2: Human

Re: Nationality Field

Post by gardano »

I see what the problem is. I had a bug in my parsing code.

The list actually looks like this:

'Albanian'
'Algerian composers'
'American composers'
'American people'
'American'
'Argentinian composers'
'Argentinian'
'Armenian composers'
'Australian composers'
'Australian people'
'Australian'
'Austrian composers'
'Austrian'
'Basque composers'
'Belgian composers'
'Belgian'
'Bosnian'
'Brazilian composers'
'Brazilian'
'British composers'
'British'
'Bulgarian composers'
'Canadian composers'
'Canadian'
'Catalan composers'
'Catalan'
'Chilean composers'
'Chilean'
'Chinese composers'
'Colombian composers'
'Croatian composers'
'Croatian'
'Cuban composers'
'Czech composers'
'Czech people'
'Czech'
'Danish composers'
'Danish people'
'Danish'
'Dutch composers'
'Dutch people'
'Dutch'
'Ecuadorian'
'Egyptian composers'
'English composers'
'English people'
'English'
'Estonian composers'
'Filipino composers'
'Finnish composers'
'Finnish'
'French composer'
'French composers'
'French People'
'French people'
'French'
'Georgian composers'
'German composers'
'German people'
'German'
'Greek composers'
'Guatemalan'
'Haitian composers'
'Hawaiian composers'
'Hungarian composers'
'Hungarian people'
'Hungarian'
'Icelandic composers'
'Indian composers'
'Indian'
'Iranian'
'Irish composers'
'Irish'
'Israeli composers'
'Italian composers'
'Italian people'
'Italian'
'Japanese composers'
'Japanese'
'Latvian composers'
'Latvian'
'Liechtensteinian'
'Lithuanian composers'
'Lithuanian'
'Maltese composers'
'Mexican composers'
'Mexican'
'Monegasque composers'
'Norwegian composers'
'Norwegian'
'Paraguayan composers'
'Peruvian composers'
'Polish composers'
'Polish people'
'Polish'
'Portuguese composers'
'Portuguese'
'Puerto Rican composers'
'Romanian composers'
'Romanian'
'Russian composers'
'Russian'
'Scottish composers'
'Scottish'
'Serbian composers'
'Serbian'
'Slovakian composers'
'Slovenian composers'
'Slovenian'
'Soviet composers'
'Soviet'
'Spanish composers'
'Spanish'
'Swedish composers'
'Swedish'
'Swiss composers'
'Swiss people'
'Swiss'
'Turkish composers'
'Ukrainian composers'
'Ukrainian'
'Uruguayan composers'
'Venezuelan composers'
'Venezuelan'
'Welsh composers'
'Welsh'
pml
Copyright Reviewer
Posts: 1219
Joined: Fri Mar 16, 2007 3:42 am
notabot: 42
notabot2: Human
Location: Melbourne, Australia
Contact:

Re: Nationality Field

Post by pml »

Hi Gardano,

the Nationality field has gradually changed over time as the IMSLP categories gradually enlarged from comprising only composers to its current state of including performers, writers, arrangers, etc. With over 5,500 such pages it requires some effort to overhaul them for consistency. The current preferred way these are handled is that it should only contain the adjectival form, e.g.

|Nationality=Italian

but the way the fte template is implemented, and subsequent words such as “composers” are stripped away when the nationality is added to the appropriate category, e.g. Category:Italian people.

Cheers, PML
--
PML (talk)
gardano
forum adept
Posts: 62
Joined: Wed Mar 02, 2011 8:08 pm
notabot: 42
notabot2: Human

Re: Nationality Field

Post by gardano »

pml wrote:but the way the fte template is implemented, and subsequent words such as “composers” are stripped away when the nationality is added to the appropriate category, e.g. Category:Italian people.

Cheers, PML
OK thanks for the answer. So to completely understand you, when someone now adds the category "Category:Italian people", ' people' is stripped away, or does the implementation only look for "Composer[s]"?

Thanks,
Gardano
pml
Copyright Reviewer
Posts: 1219
Joined: Fri Mar 16, 2007 3:42 am
notabot: 42
notabot2: Human
Location: Melbourne, Australia
Contact:

Re: Nationality Field

Post by pml »

Hi Gardano,

essentially, the fte template could be fed any of the following lines, and the result would be the page is added to Category:Italian people

|Nationality=Italian composers
|Nationality=Italian people
|Nationality=Italian aardvarks
|Nationality=Italian

Cheers, Philip
--
PML (talk)
gardano
forum adept
Posts: 62
Joined: Wed Mar 02, 2011 8:08 pm
notabot: 42
notabot2: Human

Re: Nationality Field

Post by gardano »

Thanks for the answer.

One last question. So is it safe to say that the first word after the "|Nationality=" becomes the nationality field value? I'm asking because of nations like "New Zealand", which wouldn't fit into that assumption...
KGill
Copyright Reviewer
Posts: 1295
Joined: Thu Apr 09, 2009 10:16 pm
notabot: 42
notabot2: Human

Re: Nationality Field

Post by KGill »

Wow, it's a good thing you brought that up :o I only now noticed that two-word nationalities (e.g., Puerto Rican) are parsed incorrectly, i.e. they are put into a category based on just the first word. (In this example, there are two victims of this.) I had previously assumed that the code simply removed the last word, as I recall it working correctly almost two months ago when the changes to the system were made. Feldmahler, would it be possible to fix this?
gardano
forum adept
Posts: 62
Joined: Wed Mar 02, 2011 8:08 pm
notabot: 42
notabot2: Human

Re: Nationality Field

Post by gardano »

Oh, I like that solution best of all -- just strip out the last word, rather than any other involved logic. And I'm glad our discussion aired issues that need fixing. Fixing is a good thing! :¬)
KGill
Copyright Reviewer
Posts: 1295
Joined: Thu Apr 09, 2009 10:16 pm
notabot: 42
notabot2: Human

Re: Nationality Field

Post by KGill »

OK, Feldmahler has happily come up with a fix for this - all the nationality categories should now work correctly. The caveat is that one now cannot insert anything other than 'composers' after the nationality (preferably, of course, there shouldn't be anything after it), or else it will break. I've removed all the 'people' that appear in the field across the site, so as of right now everything should be fixed.
pml
Copyright Reviewer
Posts: 1219
Joined: Fri Mar 16, 2007 3:42 am
notabot: 42
notabot2: Human
Location: Melbourne, Australia
Contact:

Re: Nationality Field

Post by pml »

So ignore my second posting in this thread – it is now out-of-date after Feldmahler's change – but my original reply stands! PML
--
PML (talk)
gardano
forum adept
Posts: 62
Joined: Wed Mar 02, 2011 8:08 pm
notabot: 42
notabot2: Human

Re: Nationality Field

Post by gardano »

Awesome!

It'd be much easier to just look for the string " composers" rather than composers+people, etc, or to do error-prone (on my part) string splitting.

Thanks folks for your really fine quick work!
Post Reply