- •Introduction
- •Introduction - What, Why, Who etc.
- •Why am I writing this?
- •What will I cover
- •Who should read it?
- •Why Python?
- •Other resources
- •Concepts
- •What do I need?
- •Generally
- •Python
- •QBASIC
- •What is Programming?
- •Back to BASICs
- •Let me say that again
- •A little history
- •The common features of all programs
- •Let's clear up some terminology
- •The structure of a program
- •Batch programs
- •Event driven programs
- •Getting Started
- •A word about error messages
- •The Basics
- •Simple Sequences
- •>>> print 'Hello there!'
- •>>>print 6 + 5
- •>>>print 'The total is: ', 23+45
- •>>>import sys
- •>>>sys.exit()
- •Using Tcl
- •And BASIC too...
- •The Raw Materials
- •Introduction
- •Data
- •Variables
- •Primitive Data Types
- •Character Strings
- •String Operators
- •String operators
- •BASIC String Variables
- •Tcl Strings
- •Integers
- •Arithmetic Operators
- •Arithmetic and Bitwise Operators
- •BASIC Integers
- •Tcl Numbers
- •Real Numbers
- •Complex or Imaginary Numbers
- •Boolean Values - True and False
- •Boolean (or Logical) Operators
- •Collections
- •Python Collections
- •List
- •List operations
- •Tcl Lists
- •Tuple
- •Dictionary or Hash
- •Other Collection Types
- •Array or Vector
- •Stack
- •Queue
- •Files
- •Dates and Times
- •Complex/User Defined
- •Accessing Complex Types
- •User Defined Operators
- •Python Specific Operators
- •More information on the Address example
- •More Sequences and Other Things
- •The joy of being IDLE
- •A quick comment
- •Sequences using variables
- •Order matters
- •A Multiplication Table
- •Looping - Or the art of repeating oneself!
- •FOR Loops
- •Here's the same loop in BASIC:
- •WHILE Loops
- •More Flexible Loops
- •Looping the loop
- •Other loops
- •Coding Style
- •Comments
- •Version history information
- •Commenting out redundant code
- •Documentation strings
- •Indentation
- •Variable Names
- •Modular Programming
- •Conversing with the user
- •>>> print raw_input("Type something: ")
- •BASIC INPUT
- •Reading input in Tcl
- •A word about stdin and stdout
- •Command Line Parameters
- •Tcl's Command line
- •And BASIC
- •Decisions, Decisions
- •The if statement
- •Boolean Expressions
- •Tcl branches
- •Case statements
- •Modular Programming
- •What's a Module?
- •Using Functions
- •BASIC: MID$(str$,n,m)
- •BASIC: ENVIRON$(str$)
- •Tcl: llength L
- •Python: pow(x,y)
- •Python: dir(m)
- •Using Modules
- •Other modules and what they contain
- •Tcl Functions
- •A Word of Caution
- •Creating our own modules
- •Python Modules
- •Modules in BASIC and Tcl
- •Handling Files and Text
- •Files - Input and Output
- •Counting Words
- •BASIC and Tcl
- •BASIC Version
- •Tcl Version
- •Handling Errors
- •The Traditional Way
- •The Exceptional Way
- •Generating Errors
- •Tcl's Error Mechanism
- •BASIC Error Handling
- •Advanced Topics
- •Recursion
- •Note: This is a fairly advanced topic and for most applications you don't need to know anything about it. Occasionally, it is so useful that it is invaluable, so I present it here for your study. Just don't panic if it doesn't make sense stright away.
- •What is it?
- •Recursing over lists
- •Object Oriented Programming
- •What is it?
- •Data and Function - together
- •Defining Classes
- •Using Classes
- •Same thing, Different thing
- •Inheritance
- •The BankAccount class
- •The InterestAccount class
- •The ChargingAccount class
- •Testing our system
- •Namespaces
- •Introduction
- •Python's approach
- •And BASIC too
- •Event Driven Programming
- •Simulating an Event Loop
- •A GUI program
- •GUI Programming with Tkinter
- •GUI principles
- •A Tour of Some Common Widgets
- •>>> F = Frame(top)
- •>>>F.pack()
- •>>>lHello = Label(F, text="Hello world")
- •>>>lHello.pack()
- •>>> lHello.configure(text="Goodbye")
- •>>> lHello['text'] = "Hello again"
- •>>> F.master.title("Hello")
- •>>> bQuit = Button(F, text="Quit", command=F.quit)
- •>>>bQuit.pack()
- •>>>top.mainloop()
- •Exploring Layout
- •Controlling Appearance using Frames and the Packer
- •Adding more widgets
- •Binding events - from widgets to code
- •A Short Message
- •The Tcl view
- •Wrapping Applications as Objects
- •An alternative - wxPython
- •Functional Programming
- •What is Functional Programming?
- •How does Python do it?
- •map(aFunction, aSequence)
- •filter(aFunction, aSequence)
- •reduce(aFunction, aSequence)
- •lambda
- •Other constructs
- •Short Circuit evaluation
- •Conclusions
- •Other resources
- •Conclusions
- •A Case Study
- •Counting lines, words and characters
- •Counting sentences instead of lines
- •Turning it into a module
- •getCharGroups()
- •getPunctuation()
- •The final grammar module
- •Classes and objects
- •Text Document
- •HTML Document
- •Adding a GUI
- •Refactoring the Document Class
- •Designing a GUI
- •References
- •Books to read
- •Python
- •BASIC
- •General Programming
- •Object Oriented Programming
- •Other books worth reading are:
- •Web sites to visit
- •Languages
- •Python
- •BASIC
- •Other languages of interest
- •Programming in General
- •Object Oriented Programming
- •Projects to try
- •Topics for further study
Conclusions
A Case Study
For this case study we are going to expand on the word counting program we developed earlier. We are going to create a program which mimics the Unix wc program in that it outputs the number of lines, words and characters in a file. We will go further than that however and also output the number of sentences, clauses, words, letters and punctuation characters in a text file. We will follow the development of this program stage by stage gradually increasing its capability then moving it into a module to make it reusable and finally turning it into an OO implementation for maximum extendability.
It will be a Python implementation but at least the initial stages could be written in BASIC or Tcl instead. As we move to the more complex parts we will make increasing use of Python's built in data structures and therefore the difficulty in using BASIC will increase, although Tcl will still be an option. Finally the OO aspects will only apply to Python.
Additional features that could be implemented but will be left as excercises for the reader are:
•calculate the FOG index of the text,
•calculate the number of unique words used and their frequency,
•create a new version which analyses RTF files
Counting lines, words and characters
Let's revisit the previous word counter: import string
def numwords(s):
list = string.split(s) return len(list)
inp = open("menu.txt","r") total = 0
# accumulate totals for each line for line in inp.readlines():
total = total + numwords(line) print "File had %d words" % total
inp.close()
We need to add a line and character count. The line count is easy since we loop over each line we just need a variable to increment on each iteration of the loop. The character count is only marginally harder since we can iterate over the list of words adding their lengths in yet another variable.
We also need to make the program more general purpose by reading the name of the file from the command line or if not provided, prompting the user for the name. (An alternative strategy would be to read from standard input, which is what the real wc does.)
100
So the final wc looks like:
import sys, string
# Get the file name either from the commandline or the user if len(sys.argv) != 2:
name = raw_input("Enter the file name: ") else:
name = sys.argv[1]
inp = open(name,"r")
# initialise counters to zero; which also creates variables words = 0
lines = 0 chars = 0
for line in inp.readlines(): lines = lines + 1
# Break into a list of words and count them list = string.split(line)
words = words + len(list)
chars = len(line)# Use the original line length which includes spaces etc.
print "%s has %d lines, %d words and %d characters" % (name, lines, words, chars)
inp.close()
If you are familiar with the Unix wc command you know that you can pass it a wild-carded filename to get stats for all matching files as well as a grand total. This program only caters for straight filenames. If you want to extend it to cater for wild cards take a look at the glob module and build a list of names then simply iterate over the file list. You'll need temporary counters for each file then cumulative counters for the grand totals. Or you could use a dictionary instead...
Counting sentences instead of lines
When I started to think about how we could extend this to count sentences and words rather than 'character groups' as above, my initial idea was to first loop through the file extracting the lines into a list then loop through each line extracting the words into another list. Finally to process each 'word' to remove extraneous characters.
Thinking about it a little further it becomes evident that if we simply collect the words and punctuation characters we can analyse the latter to count sentences, clauses etc. (by defining what we consider a sentence/clause in terms of punctuation items). This means we only need to interate over the file once and then iterate over the punctuation - a much smaller list. Let's try sketching that in pseudo-code:
foreach line in file: increment line count if line empty:
increment paragraph count split line into character groups
foreach character group: increment group count
extract punctuation chars into a dictionary - {char:count} if no chars left:
delete group
else: increment word count
101
