?

Log in

No account? Create an account

Shy gypsy, slyly spryly tryst by my crypt IV

« previous entry | next entry »
Jul. 23rd, 2014 | 10:50 pm

More fun with wordlists[1] and vowels. My friend – the same who'd previously given me ideas – wondered about words with a high (absolute) number of occurrences of the same vowel, even in the presence of others; words like "indivisibility", he said. Well, let's look into that:

#!/usr/bin/perl

use strict;
use warnings;
use feature qw/say/;
use List::Util qw/max sum/;

my %words = ();

while(<>) {
    chomp;
    my %vcount = ();

    foreach my $vowel ("a", "e", "i", "o", "u", "y") {
        $vcount{$vowel} =()= m/($vowel)/g;
    };

    my $bestvowel_count  = max map { $vcount{$_} } keys %vcount;
    my $totalvowel_count = sum map { $vcount{$_} } keys %vcount;
    my @bestvowels = grep { $vcount{$_} == $bestvowel_count } keys %vcount;

    $words{$_} = {
        'bestvowel_count' => $bestvowel_count,
        'bestvowels_count' => $bestvowel_count * scalar @bestvowels,
        'totalvowel_count' => $totalvowel_count,
        'bestvowels' => \@bestvowels,
    }
}

foreach my $difference (0..8) {
    my @candidates =
        sort { $words{$b}->{'bestvowel_count'} <=> $words{$a}->{'bestvowel_count'} }
        grep { ($words{$_}->{'totalvowel_count'} - $words{$_}->{'bestvowel_count'}) == $difference }
        keys %words;

    my $bestvowel_count = $words{$candidates[0]}->{'bestvowel_count'};
    my @hits = grep { $words{$_}->{'bestvowel_count'} == $bestvowel_count } @candidates;

    say "$difference: best vowel count=", $bestvowel_count;
    say "$difference: (", scalar @hits, ") ",
        (@hits < 15)
        ? join ", ", @hits
        : ""
    ;

    if(@hits >= 15) {
        my $length = 0;
        my @prevhits;
        while(@hits != 0) {
            $length++;
            @prevhits = @hits;
            @hits = grep { length >= $length } @hits;
        }

        say "$difference (min=", ($length - 1), "): (", scalar @prevhits, ") ",
            (@prevhits < 15)
            ? join ", ", @prevhits
            : ""
        ;
    }
}

This script took a bit more effort than the previous ones, but the results were enlightening. Here they are (linebreaks were added manually for readability):

0: best vowel count=5
0: (11) beekeepers, reemergence, freewheelers, defenselessness, libidinizing, 
        effervescence, enfeeblements, beekeeper, enfeeblement, abracadabra, representee
1: best vowel count=6
1: (2) indivisibility, divisibilities
2: best vowel count=5
2: (21)
2 (min=16): (4) photocomposition, antimilitaristic, distensibilities, indiscrimination
3: best vowel count=5
3: (22)
3 (min=20): (1) overapprehensiveness
4: best vowel count=5
4: (8) individualization, antiinstitutionalist, inconceivabilities, antiparliamentarians, 
       antiinstitutionalists, antiparliamentarian, radiosensitivities, anitinstitutionalism
5: best vowel count=5
5: (3) institutionalization, microminiaturization, microminiaturizations
6: best vowel count=5
6: (1) antidisestablishmentarianism
7: best vowel count=3
7: (3) counterrevolutionary, antirevolutionaries, neurophysiologically
8: best vowel count=3
8: (1) counterrevolutionaries

A difference (i.e. "extra vowel" count) of zero yields the same words that were already found in the first of these posts: words with only one vowel, such as "effervescence" or "libidinizing". The rest are new, though there's several words there we've already encountered in other contexts, often due to their length (and keep in mind that this script is biased towards longer words).

The overall best word is "indivisibility" (as predicted), but intriguingly, there's also "divisibilities". For higher differences, the count goes down as there aren't many words with at least that many vowels overall (a smaller pool to choose from). At the top end (or bottom end, depending on how you look at it), "counterrevolutionaries" has many vowels that are very evenly spread.

Said friend also wondered about one more thing: what is the longest string of consecutive vowels in any word on this list? We'll whip up a script to find out:

#!/usr/bin/perl

use strict;
use warnings;
use feature qw/say/;
use List::Util qw/max/;

my %words = ();

while(<>) {
    chomp;

    my @vowelclusters = sort { length($b) <=> length($a) } m/([aeiouy]+)/g;
    my $longest = length $vowelclusters[0] // 0;
    my @longestclusters = grep { length($_) == $longest } @vowelclusters;

    $words{$_} = {
        'length' => $longest,
        'clusters' => \@longestclusters,
    }
}

my $longest = max map { $words{$_}->{'length'} } keys %words;
foreach (sort { $a cmp $b } grep { $words{$_}->{'length'} == $longest } keys %words) {
    say "$_: ", join ", ", @{ $words{$_}->{'clusters'} };
}

It turns out that the best you can do is five vowels in a row; the list has six words managing that feat. Want to guess? Here's the solutions: "cooeeing", "cooeyed", "cooeying", "miaoued", "miaouing" and "queueing". The last one is perhaps debatable; I'm not sure if the first "u" should be counted as a vowel.

And if you're wondering what half of these mean... "cooee" is an Australian thing, apparently: a cooee is "[a] long, loud call used to attract attention when at a distance, mainly done in the Australian bush", and "to cooee" is the act of producing such a call. "to miaou", meanwhile, is what British cats do in lieu of meowing.

Two more words I could think that would also qualify: "phooeying" and "phooeyed", if they existed. Since "tsktsking" is on the list, I'd argue that these are perfectly cromulent as well.

Finally, if want to play around with this stuff yourself, I've put up all the code and data from these posts for your convenience. If you'd like me to do more "research", let me know; otherwise this will likely have been the last of these posts for now.

Thank you, and good night!

  1. As usual, I found some more typos on the list as well: "mechanotheraputically" and "livlihood".

Link | Leave a comment | Share

Comments {0}