Tutorial 034 "Virgil's Vocab" - Sorting Words using
Quicksort
In Tutorial 032 I gave you a program to count the words in a text file.
The following program orders words (alphabetically) that it finds in its own
DATA statements. (In a subsequent tutorial I will show you how to automate
the production of the required data statements, by stripping out the words
from any text file, which could be a whole "booksworth" downloaded from the
net).
For example, the following is part of the output produced by processing
the 20,957 words stripped out from a text file of the same translation of
VIRGIL'S GEORGICS mentioned in Tutorial 032 . Here's the first "screensworth"
:
Sorting Words using *Quicksort*...
That took 1.06secs to sort 20,957 words
Press <SHIFT> to scroll down to see the rest
About Above Abydos Acanthus According Acerrae Achelous
Acheron Achilles Acorns
Acte Aegypt Aegyptus Aesculus Aether Aethiop Afric Again Against Ah Alack
Alburnus Alcinous Alders Alive All Allays Allotted Allure Alone Along
Alpheus
Alps Alternately Amazement Amerian Amid Amidst Aminaean Among Amphrysian
Amyclae Amyclaean Amythaon An And Anio Aonian Apollo Appear Apples Aquarius
Arabs Arcady Arched Arctos Arcturus Are Arethusa Arethuse Argitis Aristaeus
Armed Arms Around As Ascanian Ascra Asia Asian Asilus Assaracus Assyrian
At Ate
Athos Athwart Atlas Attend Aurora Ausonian Auster Autumn Avenged Avernian
Ay
Aye Bacchanalian Bacchus Bactria Baffled Baked Bakes Balearic Bare Barred
Barren Bathed Batter Be Bears Beasts Bees Before Begins Begrudge Behind
Behold
Belated Belgian Benacus Bend Beneath Beroe Beside Bespake Besport Best
Besteads
Betwixt Bide Binding Bisaltic Black Blesses Blest Bloom Blush Bones Boon
Bootes
Boreas Borne Both Brass Brave Breathed Breed Breeds Bridge Brief Bright
Brim
Bring Bristle Britain Broad Brooks Brother Brought Bruised Brushing Build
Bumastus Burn Burst Bursts Busiris But Butcher By Caerulean Caesar Caicus
Calabrian Came Camilli Can Canopus Capua Caressing Carpathian Carry Cast
Castalia Cattle Caucasus Cayster Cecropian Cecrops Cedar Celeus Centaurs
Centaury Ceos Ceraunian Cerberus Ceres Challenge Chalybs Change Chaonian
Chaos
Charmed Chestnut Chestnuts Chides Chiefly Chiron Choose Churned Ciconian
Cinyps
Cithaeron Clamour Clanian Clasped Claws Clear Clear-orbed Clings Clio
Clitumnus
Close-pent Closes Clutch Clutching Clymene Coal-black Cocytus Coeus Come
Comes
Community Confide Congeals Conspicuous Convulsed Corn-heap Corroded Corycus
Could Counted Country Crams Cretan Cries Cross-wise Crouch Crustumian
Curb
Curetes Cyclopes Cyclops Cydippe Cyllarus Cyllenian Cynthus Cyrene Cytisus
Cytorus Dacian Dance Danube Dark Dark-eddying Dark-threatening Darken
Dash
Daughter Dauntless Dawn Dawned Dead Deal Death-cold Decii Deemed Deep
.....
and the last (21st!) screen comes up as:
walls walnut walnut-fruit
wan wand wander wandered wandereth wandering wane
wanton war war-god war-horse war-tried war-trump ward wardship ware warlike
warm warmth warn warned warneth warning warnings warp warrior warriors
wars was
wash washes waste wastes watch watches water water-courses water-side
water-snake water-trough water-worms waters watery wave wave-top wavers
waves
waving wax wax-flower waxed waxen waxes way way-wildered wayfarer we weakling
wealth weapons wearers weary weave webs wed wedges weed weeds weedy weeping
weeps weevil weigh-up weight weighty-wise welcome welcomed welfare well
well-beloved well-drilled well-known well-nigh well-springs wend went
wept were
west westward wet what whate whatso wheat-ear wheaten wheel wheel-spokes
wheels
whelps when whenas whence where whereby wherefore wherefrom whereof whereso
whereto wherewith wherries whether whets whetting whey which while whilst
whirl
whirled whirling whirls whirlwind whirr whit white white-glistening whiten
whitened whiteness whither whittled who whole wholesome wholly whom whose
why
wicker-ware wide wide-spread wield wielder wife wild wiles will willed
willing
willow willow-bands willow-branches willow-leaves willow-scythe willows
wills
wilt wilt- win wind winding windless window-slits winds windy wine wine-cup
wine-god wine-infuriate wine-juice wine-press winepress wines winged winging
wings winnow winnowing winnowing-time wins winter winter-suns winters
wintry
wise wit with withal withholds withies within without witless woebegone
wolf
wolf-kin wolves womb won wonder wonderment wondrous wondrous-pale wont
wonted
wood wooded woodland woods wooed wool wool-clad woolly word words work
world
worm worse worser worships worst-hued worsted worthy wot would wouldst
wound
wounds wove woven wrap wrath wreak wreathe wreathed wreathes wreaths wretched
wretches wrong wrongs wrought yawn yawns ye yea yeaning year yearly yearning
years yellow yellow-haired yellowing yet yew-tree yews yield yielding
yields
yoke yon yore you young younglings your youth youthful zeal zoned zones
5738 different words
counted but -- <You> and <you> each count
Press<SPACE> to go again...
You can see its done a fantastic job of producing an index or vocab list,
call it what you will, and it indictaes what makes Virgil Virgil (not
forgetting his translators!). You will also see that capital letters
have higher priority than lower case ones, and thus all proper nouns and
first words in the verse lines appear before the more "normal" ( lowercase-only)
words.
The program avoids printing the same word more than once, when it occurs
repeatedly in the text. For instance you expect a lot of "the"s etc.
At present there may be a capitalised version and a non-capitalised
version of some words. I leave it as an exercise, for you, to solve that problem!
The following program listing gives a version of
the above but with only 44 words in DATA, to illustrate the action;
the sort is so fast it takes less than 0.01sec to run, and there's no need
here for the scrolling facility. Its also small enough to run on the free
(trial) version of BB4W...
Output :
Sorting Words using *Quicksort*...
That took 0secs to sort 44 words
Press <SHIFT> to scroll down to see the rest
Maecenas Of Or Such What are bees beneath cattle-keeping cornfield elm for
how
is it makes marry meet my or pains patient proof serves smile sod star steer
tend the themes thrifty to trial turn vine what with
38 different
words counted but e.g. <You> and <you> each count
Press<SPACE> to go again...
In the following you can just read down the words
in the DATA statements to see the story starting ...
Listing :
REM: Quicksort words
REM : Virgil's Vocab
REM: Richard Weston after "Simon"
REM: 26th June 2003
MODE8:OFF
t=0
newword=0
VDU14
COLOUR9
PRINTTAB(15)"Sorting Words using *Quicksort*..."'
COLOUR7
REPEAT
t+=1
READ word$
UNTIL word$="*"
DIM word$(t),rank(t)
RESTORE
FOR i=1 TO t-1
READ word$(i)
NEXT i
TIME=0
:
PROCsort
:
T=TIME/100
COLOUR11
PRINT"That took ";T;"secs to sort ";t;" words"
PRINT'"Press <SHIFT> to scroll down
to see the rest"'
COLOUR7
FOR i=t-1 TO 1 STEP -1
pos=POS:L=LEN(word$(rank(i)))
:
PROCcheck_already_met
:
IF alreadymet=FALSE THEN
newword+=1
IF L < (80-pos)
THEN
PRINTword$(rank(i))+"
";
ELSE
PRINT'word$(rank(i))+"
";
ENDIF
ENDIF
NEXT i
PRINT'
COLOUR9
PRINT newword;" different words counted but
e.g. <You> and <you> each count"
COLOUR2
PRINT'" Press<SPACE> to go again..."
G=GET
RUN
END
:
DEF PROCcheck_already_met
alreadymet=FALSE
IF word$(rank(i))=word$(rank(i+1)) THEN
alreadymet=TRUE
ENDIF
ENDPROC
:
DEF PROCsort:LOCAL I
FOR I = 1 TO t
rank(I)=I
NEXT
PROCquicksort(1,t)
ENDPROC
:
DEF PROCquicksort(low,high)
LOCAL left,right,it,dummy
left=low:right=high
it$=word$(rank((low+high)DIV 2))
REPEAT
IF word$(rank(left))>it$ THEN
REPEAT left=left+1
UNTIL word$(rank(left))<=it$
ENDIF
IF word$(rank(right))<it$
THEN
REPEAT right=right-1
UNTIL word$(rank(right))>=it$
ENDIF
IF left<=right THEN
dummy=rank(left)
rank(left)=rank(right)
rank(right)=dummy
left=left+1
right=right-1
ENDIF
UNTIL left>right
IF right>low THEN PROCquicksort(low,right)
IF left<high THEN PROCquicksort(left,high)
ENDPROC
:
DATA What
DATA makes
DATA the
DATA cornfield
DATA smile
DATA beneath
DATA what
DATA star
DATA Maecenas
DATA it
DATA is
DATA meet
DATA to
DATA turn
DATA the
DATA sod
DATA Or
DATA marry
DATA elm
DATA with
DATA vine
DATA how
DATA tend
DATA the
DATA steer
DATA What
DATA pains
DATA for
DATA cattle-keeping
DATA or
DATA what
DATA proof
DATA Of
DATA patient
DATA trial
DATA serves
DATA for
DATA thrifty
DATA bees
DATA Such
DATA are
DATA my
DATA themes
DATA "*"
Annotated Listing :
REM: Quicksort words
REM : Virgil's Vocab
REM: Richard Weston after "Simon"
REM: 26th June 2003
MODE8:OFF
t=0
newword=0
VDU14 *** Paged Mode on
COLOUR9
PRINTTAB(15)"Sorting Words using *Quicksort*..."'
COLOUR7
REPEAT
t+=1 *** totals up the words
found
READ word$
UNTIL word$="*" *** end marker
DIM word$(t),rank(t) *** store the data in
these arrays
RESTORE *** sends DATA pointer back to the
start of the DATA
FOR i=1 TO t-1 *** (t-1) avoids
the asterisk being counted as a word
READ word$(i)
NEXT i
TIME=0 *** sets the time to zero at this
point in the program
:
PROCsort
:
T=TIME/100 ***TIME gives the time in one
hundredths of a second
COLOUR11
PRINT"That took ";T;"secs to sort ";t;" words"
PRINT'"Press <SHIFT> to scroll down
to see the rest"'
COLOUR7
FOR i=t-1 TO 1 STEP -1 *** read backwards
or z comes first
pos=POS : L=LEN(word$(rank(i)))
*** needed to see if there's room on the line to print another word
:
PROCcheck_already_met ***
checks to see if word already met and printed
:
IF alreadymet=FALSE THEN
newword+=1 *** count
the new words
IF L < (80-pos)
THEN *** 80 characters per line in MODE 8
PRINTword$(rank(i))+"
"; ***add a space to separate the words printed"
ELSE
PRINT
' word$(rank(i))+" "; *** the ' prints a new line, because the next word won't
fit on the present line
ENDIF
ENDIF
NEXT i
PRINT'
COLOUR9
PRINT newword;" different words counted but
e.g. <You> and <you> each count"
COLOUR2
PRINT'" Press<SPACE> to go again..."
G=GET
RUN
END
:
DEF PROCcheck_already_met
alreadymet=FALSE
IF word$(rank(i))=word$(rank(i+1)) THEN ***
see if the previous word in the sorted list is the same
alreadymet=TRUE
ENDIF
ENDPROC
:
DEF PROCsort:LOCAL I
FOR I = 1 TO t
rank(I)=I
NEXT
PROCquicksort(1,t)
ENDPROC
:
DEF PROCquicksort(low,high) *** This PROC
has been adapted to handle strings instead of numbers....
LOCAL left,right,it,dummy
left=low:right=high
it$=word$(rank((low+high)DIV 2)) *** eg it
becomes it$ ; value becomes word$
REPEAT
IF word$(rank(left))>it$ THEN
REPEAT left=left+1
UNTIL word$(rank(left))<=it$
ENDIF
IF word$(rank(right))<it$
THEN
REPEAT right=right-1
UNTIL word$(rank(right))>=it$
ENDIF
IF left<=right THEN
dummy=rank(left)
rank(left)=rank(right)
rank(right)=dummy
left=left+1
right=right-1
ENDIF
UNTIL left>right
IF right>low THEN PROCquicksort(low,right)
IF left<high THEN PROCquicksort(left,high)
ENDPROC
:
*** These DATA were produced automatically as a separate
text file (using a program which I give in a later tutorial)
*** They were then pasted onto the end of the modified,
amalgamated programs given in Tutorials 032 and 033...
DATA What
DATA makes
DATA the
DATA cornfield
DATA smile
DATA beneath
DATA what
DATA star
DATA Maecenas
DATA it
DATA is
DATA meet
DATA to
DATA turn
DATA the
DATA sod
DATA Or
DATA marry
DATA elm
DATA with
DATA vine
DATA how
DATA tend
DATA the
DATA steer
DATA What
DATA pains
DATA for
DATA cattle-keeping
DATA or
DATA what
DATA proof
DATA Of
DATA patient
DATA trial
DATA serves
DATA for
DATA thrifty
DATA bees
DATA Such
DATA are
DATA my
DATA themes
DATA "*"
Richard Weston's Homepage