Token Machines

Jed Rembold

March 20, 2024

Announcements

Midterm #2 is Friday!
- 1 hour to write and submit a functioning graphics program
- Orchestrated through Canvas
- Make sure you are setup and ready to go!
- Small Sections today and tomorrow focusing on example questions!
- I will also be around for parts of tomorrow if you have questions that can not be answered by your section leaders or the QUAD tutors
You should have gotten feedback about Wordle by now. Please let me know if you have not.
- Leaders are currently working on Breakout
Polling: rembold-class.ddns.net

Review Question!

What would be the output of the last print statement in the code to the right?

True
False
Error: Index out of range
Error: Python will not know how to compare the new Demo objects

class Demo:
    def __init__(self):
        self.x = []

    def add(self, v):
        self.x.append(v)

    def get_x(self):
        return self.x

A, B = Demo(), Demo()
C = B.get_x()
A.add(3)
B.add(3)
C.append(A)
print(A.get_x() == B.get_x())

Normal Data Types

Defining new data types through classes is commonly used to represent objects to be manipulated by the program
Examples:
- A letter cell in Wordle: stores the graphical portrayal of the cell, as well as tools to interact with the cell (changing color or letter)
- A ball in Breakout: stores the ball position and graphics, as well as tools to handle movement and bouncing
We define these classes to make it easier to write the later code governing the overall program

Abstract Data Types

An alternative use of classes though is to represent a little machine that allows us to more easily perform certain actions
- The created objects themselves are not important to the program. Only the result of their computations.
Types that are defined largely by their behavior this way are called abstract data types or ADT s

Why ADTs?

Simplicity–The internal representation is hidden from the client when used.

“I don’t need to know how a car works to drive from my home to work.”
Flexibility–If the internal representation needs to be changed by the programmer, they can do so without breaking outside compatibility.

“My mechanic could install new brakes, and I’d still drive the car in the same way.”
Security–Keeping the internal representation away from clients prevents clients from directly altering values that may cause the type to behave unexpectedly.

“My engine is not exposed to the outside world, where anyone could muck with it.”

Creating ADTs

As ADTs accomplish or facilitate a task, we need to understand what task we are hoping to accomplish
- The most useful ADTs are often not correlated to a specific program. They could work in any program.
With task in mind, what would be useful to a person trying to accomplish that task?
- These are commonly the methods that we need to define
What internal values will we need to track to accomplish these tasks or interactions?
- These will likely be the attributes that we need to define

Chop Chop

Hearken back to our Pig Latin translation program from earlier this semester
- word_to_pig_latin took a single word and translated into Pig Latin
- To translate an entire sentence, we would need code to break the sentence up into individual words, which we could then pass into word_to_pig_latin
The latter is an example of something that comes up often in computer science: breaking a larger thing into particular smaller chunks
- These “chunks” can really be anything, so the more general term computer scientists use is a token
- Breaking up a larger thing into smaller tokens would be a useful task to accomplish!

A Token Scanner

A class that plucked out individual tokens might be called a token scanner or token extractor
What would a client want from a token scanner?
- A way to pass in the necessary input
- A way to retrieve the next individual token
- A way to know when there are no more tokens
- Maybe a way to tailor or fine-tune what tokens are desired
These requirements help inform what methods should be incorporated into a token scanner class!
- Still need to determine what internal attributes might be needed

Token Scanner Methods

Exports 4 main methods:
1. |||scanner|||.set_input(|||str|||)
  - Sets the input of the token scanner to the specified string or input stream
2. |||scanner|||.next_token()
  - Returns the next token from the scanner text, or "" at the end
3. |||scanner|||.has_more_tokens()
  - Returns True if more tokens exist, False otherwise
4. |||scanner|||.ignore_whitespace()
  - Customization option which tells the scanner to ignore whitespace characters

Token Scanner Attributes

What will we need to track to implement these methods?
- What the main text is
- Where we are currently at within the text
- A boolean flag to indicate if we should be ignoring whitespace
These will generally be defined within our constructor function (or in a function called from the constructor)

Live Coding Time!

Token Scanner: One Implementation

# File: tokenscanner.py

"""
This file implements a simple version of a token scanner class.
"""

# A token scanner is an abstract data type that divides a string into
# individual tokens, which are strings of consecutive characters that
# form logical units.  This simplified version recognizes two token types:
#
#   1. A string of consecutive letters and digits
#   2. A single character string
#
# To use this class, you must first create a TokenScanner instance by
# calling its constructor:
#
#     scanner = TokenScanner()
#
# The next step is to call the set_input method to specify the string
# from which tokens are read, as follows:
#
#     scanner.set_input(s)
#
# Once you have initialized the scanner, you can retrieve the next token
# by calling
#
#    token = scanner.next_token()
#
# To determine whether any tokens remain to be read, you can either
# call the predicate method scanner.has_more_tokens() or check to see
# whether next_token returns the empty string.
#
# The following code fragment serves as a pattern for processing each
# token in the string stored in the variable source:
#
#     scanner = TokenScanner(source)
#     while scanner.has_more_tokens():
#         token = scanner.next_token()
#         . . . code to process the token . . .
#
# By default, the TokenScanner class treats whitespace characters
# as operators and returns them as single-character tokens.  You
# can set the token scanner to ignore whitespace characters by
# making the following call:
#
#     scanner.ignore_whitespace()

class TokenScanner:

    """This class implements a simple token scanner."""

# Constructor

    def __init__(self, source=""):
        """
        Creates a new TokenScanner object that scans the specified string.
        """
        self.set_input(source)
        self._ignore_whitespace_flag = False

# Public methods

    def set_input(self, source):
        """
        Resets the input so that it comes from source.
        """
        self._source = source
        self._nch = len(source)
        self._cp = 0

    def next_token(self):
        """
        Returns the next token from this scanner.  If called when no
        tokens are available, next_token returns the empty string.
        """
        if self._ignore_whitespace_flag:
            self._skip_whitespace()
        if self._cp == self._nch:
            return ""
        token = self._source[self._cp]
        self._cp += 1
        if token.isalnum():
            while self._cp < self._nch and self._source[self._cp].isalnum():
                token += self._source[self._cp]
                self._cp += 1
        return token

    def has_more_tokens(self):
        """
        Returns True if there are more tokens for this scanner to read.
        """
        if self._ignore_whitespace_flag:
            self._skip_whitespace()
        return self._cp < self._nch

    def ignore_whitespace(self):
        """
        Tells the scanner to ignore whitespace characters.
        """
        self._ignore_whitespace_flag = True

# Private methods

    def _skip_whitespace(self):
        """
        Skips over any whitespace characters before the next token.
        """
        while self._cp < self._nch and self._source[self._cp].isspace():
            self._cp += 1

Using `TokenScanner`

Need to initialize the token scanner object
- You need to create the machine before you can use it
Feed the machine the text you want to grab tokens from
Generally, keep looping as long as there are still tokens
- Each iteration, get the latest token and then do something with it

Using `TokenScanner` in `PigLatin`

from tokenscanner import TokenScanner

def to_pig_latin(text):
    translation = ""
    scanner = TokenScanner()
    scanner.set_input(text)
    while scanner.has_more_tokens():
        token = scanner.next_token()
        if token.isalpha():
            token = word_to_pig_latin(token)
        translation += token
    return translation

Token Machines

Announcements

Review Question!

Normal Data Types

Abstract Data Types

Why ADTs?

Creating ADTs

Chop Chop

A Token Scanner

Token Scanner Methods

Token Scanner Attributes

Live Coding Time!

Token Scanner: One Implementation

Using TokenScanner

Using TokenScanner in PigLatin

Using `TokenScanner`

Using `TokenScanner` in `PigLatin`