Tutorial 039 Funny Names - Extracting Data neatly from Web-Pages


Its nice to be able to extract a table of data from a web-page and process it using a BB4W program without having to do lots of tedious retyping. (Programming is more fun than typing...).

If you just put funny names into Google you will find lots of useful sites listing names such as  Adam Zapple.
(eg try http://website.lineone.net/~gardenworks/names-a-d.htm)

Once you have found such a web-page you can save it as a text file by clicking
  1.  File/
  2. Save as.../
  3. Choose a suitable directory/
  4. Save as type (click the menu sign like a v)/
  5. then choose the text file option/
  6. edit the file name if you wish/
  7. Save 
In this way you might have a text file that looks like this:


    We love funny names

          A-DE-MN-ZFront Page


    Click On Headings For Explanation


          Adam Zapple

          Al Beback

          Al Lejance

          Alf Abett

          Ali Barster

          Amanda Sol de Werk

          Amos Skittow

          Amy Stake

          Andy Tover

          Andy Wineriss

          Angus Macoatup

          Ann Chovie

          Ann Jyna

          Ann Tenor

    etc



^ Obviously the first three lines of print are superfluous and can be edited out of the text file by hand and the file resaved.

You will see that there are leading spaces before the names and also blank lines. If the list was long it might be very tedious to edit these out by hand.  The following program demonstrates how to clean out these unwanted spaces and blank lines automatically.

Here is the output  resulting  from processing the above file :



 Press <SPACE> to choose a text file whose lines you wish to clean up :

 Full Pathname of this text file :

C:\............\Names\ funnynames.txt

Adam Zapple
Al Beback
Al Lejance
Alf Abett
Ali Barster
Amanda Sol de Werk
Amos Skittow
Amy Stake
Andy Tover
Andy Wineriss
Angus Macoatup
Ann Chovie
Ann Jyna
Ann Tenor
Anna Dapter
Anna Kronism
Anna Reksic
Anna Notherthing
Anne Dryer
Anne Kersaway
Anne Tellope
Anne Yewelevent
Annette Kurtain


etc

This technique will be taken further in subsequent tutorials when preparing randomised lists, DATA lines for programs and in a novel phonebook program...

Listing :

      REM : Removes leading spaces and blank lines
      REM : from a text file
      REM : Richard Weston, 9th July 2003
      MODE 8
      VDU14
      COLOUR1
      PRINT'" Press <SPACE> to choose a text file whose lines you wish to clean up"
      G=GET
      OFF
      :
      DIM of% 75, ff% 18, fn% 255
      !of% = 76
      of%!4 = @hwnd%
      of%!12 = ff%
      of%!28 = fn%
      of%!32 = 256
      of%!52 = 6
      $ff% = "Text Files"+CHR$0+"*.txt"+CHR$0+CHR$0
      :
      SYS "GetOpenFileName", of% TO result%
      IF result% filename$ = FNnulterm$(fn%)
      COLOUR7
      PRINT'" Full Pathname of this text file :"
      COLOUR2
      PRINT'filename$
      PRINT'"Press SHIFT to scroll down through the results"
      :
      fnum=OPENIN filename$
      IF fnum=0 THEN PRINT "No ";filename$;" data": END
      :
      COLOUR7
      REPEAT
        line$=""
        REPEAT
          temp=BGET#fnum :REM Read byte
          line$+=CHR$(temp)
        UNTIL temp=10 OR temp=13
        PROCcheckline
        IF printworthy THEN
          PRINT line$
        ENDIF
      UNTIL  EOF#fnum
      CLOSE#fnum
      :
      PRINT'" Press<SPACE> to go again..."
      G=GET
      RUN
      END
      :
      DEF FNnulterm$(P%)
      LOCAL A$
      WHILE ?P% <> 0
        A$ += CHR$?P%
        P% += 1
      ENDWHILE
      = A$
      :
      DEF PROCcheckline
      LOCAL i,L,char$,asc
      :
      WHILE LEFT$(line$,1)=" "
        line$=MID$(line$,2) : REM Remove leading spaces
      ENDWHILE
      :
      L=LEN(line$)
      printworthy=FALSE
      FOR i=1 TO L
        char$=MID$(line$,i,1)
        asc=ASC(char$)
        IF asc>32 AND asc<127 THEN printworthy=TRUE
      NEXT i
      ENDPROC

     


Annotated Listing :

      REM : Removes leading spaces and blank lines
      REM : from a text file
      REM : Richard Weston, 9th July 2003
      MODE 8
      VDU14 ***REM paged mode on ***
      COLOUR1
      PRINT'" Press <SPACE> to choose a text file whose lines you wish to clean up"
      G=GET
      OFF
      :
      DIM of% 75, ff% 18, fn% 255  *** Usual routine for opening a text file ***
      !of% = 76
      of%!4 = @hwnd%
      of%!12 = ff%
      of%!28 = fn%
      of%!32 = 256
      of%!52 = 6
      $ff% = "Text Files"+CHR$0+"*.txt"+CHR$0+CHR$0
      :
      SYS "GetOpenFileName", of% TO result%
      IF result% filename$ = FNnulterm$(fn%) *** end of routine ***
      COLOUR7
      PRINT'" Full Pathname of this text file :"
      COLOUR2
      PRINT'filename$
      PRINT'"Press SHIFT to scroll down through the results"
      :
      fnum=OPENIN filename$  *** opens the file for reading ***
      IF fnum=0 THEN PRINT "No ";filename$;" data": END
      :
      COLOUR7
      REPEAT *** reads characters from file until end of line is detected  at XXXX ***
        line$=""
        REPEAT
          temp=BGET#fnum :REM Read byte
          line$+=CHR$(temp) *** add new character ***
        UNTIL temp=10 OR temp=13 *** here's XXXX *** 10 specifies "move cursor down one line" ***
                                                                                      *** 13 specifies "move cursor to start of new line ***
        PROCcheckline ***removes leading spaces and decide whether linr contains printable characters ***
        IF printworthy THEN
          PRINT line$
        ENDIF
      UNTIL  EOF#fnum *** end of file marker ***
      CLOSE#fnum *** close up the file ***
      :
      PRINT'" Press<SPACE> to go again..."
      G=GET
      RUN
      END
      :
      DEF FNnulterm$(P%) *** needed for open file routine ***
      LOCAL A$
      WHILE ?P% <> 0
        A$ += CHR$?P%
        P% += 1
      ENDWHILE
      = A$
      :
      DEF PROCcheckline
      LOCAL i,L,char$,asc *** ensures these variables are not available to the rest of the program ***
      :
      WHILE LEFT$(line$,1)=" "
        line$=MID$(line$,2) : REM Remove leading spaces
      ENDWHILE
      :
      L=LEN(line$)
      printworthy=FALSE
      FOR i=1 TO L *** examines each character in the line ***
        char$=MID$(line$,i,1)
        asc=ASC(char$)
        IF asc>32 AND asc<127 THEN printworthy=TRUE *** These ASCII values are for the "visible" characters ***
      NEXT i
      ENDPROC

     
  
Next Tutorial
 
Richard Weston's Homepage