Lab-V (15/09/2016):

EXECUTING PYTHON CODE FROM FILES


In this lab, we will see how to write Python code in a file and then run it.

Preliminaries

In []:
import string

f = open('words-dataset.txt', 'r')
filecont = f.read()
wordlist = filecont.split()
fivewords = list()              # Create an empty list to store 5-letter words
for word in wordlist :          # Go through the words one by one
    if len(word) == 5 :         # Check if the word length is 5
        fivewords.append(word)  # Add the word to the list of 5-letter words
print "The list of 5-letter words in file 'words-dataset.txt' is"
print fivewords
Save the file prelim-01.py.

Then type,
python prelim-01.py in the terminal. If you did everything exactly as given, you should get the following output:

The list of 5-letter words in file 'words-dataset.txt' is
['band,', '1960.', 'Ringo', 'beat,', '1950s', 'roll,', 'later', 'music', 'rock,', 'often', 'ways.', 'early', 'their', 'first', 'music', 'built', 'their', 'clubs', '1960,', 'Best,', 'Starr', 'them.', 'Brian', 'their', 'their', 'after', 'their', 'first', '"Love', '1962.', 'Four"', 'year,', 'early', 'known', 'White', '1968)', 'Abbey', 'After', 'their', '1970,', '1980,', '2001.', 'RIAA,', 'music', 'other', '2008,', 'group', '2016,', 'chart', 'Award', 'Score', 'sales', 'group', '1988,', '2015.']

Notice that there are words such as "band," which are 4-letter words. They are there because their length is calculated by including the punctuation. Also, there are numbers such as "1988," and "2015." Can you remove such wrong words?

It is easy - check the string package documentation to find what the method str.isalpha() does. Change the if statement in the code you typed in the file prelim-01.py into

if len(word) == 5 and word.isalpha() :
and save the file. Now run the code again by typing
python prelim-01.py

If everything is done correctly, you should see the following output.

The list of 5-letter words in file 'words-dataset.txt' is
['Ringo', 'later', 'music', 'often', 'early', 'their', 'first', 'music', 'built', 'their', 'clubs', 'Starr', 'Brian', 'their', 'their', 'after', 'their', 'first', 'early', 'known', 'White', 'Abbey', 'After', 'their', 'music', 'other', 'group', 'chart', 'Award', 'Score', 'sales', 'group']

You see now that all the words are correct!


Lab Problems

  1. Write and save the Python code in a file called Lab-05-01.py.

    Read the text in the file Lab5-2.txt' and count the number of times 'e' occurred in the text.

    These are step-by-step instructions for solving the problem. You can try to convert each step into Python code.

    • Import numpy and string packages.
    • Open the file for reading.
    • Read the text from the file into a string variable, S
    • Read this document on string methods and find a method to count the number of times 'e' occurs.
    • Print the count.
    • Close the file that you opened.
  2. When you are done, run the code by typing

    python Lab-05-01.py
    in the terminal.


  3. Write and save the Python code in a file called Lab-05-02.py.

    Extend Problem 1 to count the number of times every character occurs in the text. Plot the counts as a bar graph. Finally, print the 5 highest counts.

    Look up the documentation for pyplot.bar method (look for matplotlib.pyplot.bar in the page).

    These are step-by-step instructions for solving the problem. You can try to convert each step into Python code.

    • Import numpy and string packages.
    • Import pyplot method from matplotlib package.
    • Open the file for reading.
    • Read the text from the file into a string variable, S
    • Create an empty list called counts to store the number of times each character occurs in the string S.
    • Write a for loop to go through the characters one after the other
    • Inside the for loop, count how many times each character occurs
    • Append the numbers to the counts list
    • After the for loop, plot the values in counts list as a bar graph.
    • Sort the counts list using sort() method.
    • Print the 5 highest counts.
    • Close the file that you opened.
    • Once everything is working, save the file Lab-05-02.py.
  4. When you are done, run the code by typing

    python Lab-05-02.py
    in the terminal.


  5. Write and save the Python code in a file called Lab-05-03.py.

    This is a variation on Problem 2. File marks-dataset.csv gives the marks of 100 students in a class. The grades are given as follows: any score between 40 and 54 is given 'D' grade; scores between 55 and 69 are 'C'; between 70 and 84 are 'B' and above 85 are 'A'. Plot the grade distribution, i.e., how many students received each of 'A', 'B', 'C' and 'D' grades. Then, list the Roll Numbers and Marks of the top-10 rankers.

    Look up the documentation for numpy.genfromtxt() method (look for names description in the page to find out what
    M = np.genfromtxt('marks-dataset.csv', delimiter=',', dtype='int32', names=['RollNo', 'Marks'])
    does). Also, look up documentation for numpy.sort() method. Look at the order= to find out how to sort according to Marks. It is also a good idea to look at pyplot documentation. Look at the documentation for matplotlib.pyplot.hist in the page. See what the parameter range does.

    These are step-by-step instructions for solving the problem. You can try to convert each step into Python code.

    • Import numpy and string packages.
    • Import pyplot method from matplotlib package.
    • Read the text from the file into a numpy array, M using numpy.genfromtxt() method.
    • Plot the marks as a histogram for the grade distribution
    • Sort the marks array using numpy.sort() method.
    • Print the 10 highest marks along with the Roll Numbers.
    • Once everything is working, save the file Lab-05-03.py.
  6. When you are done, run the code by typing

    python Lab-05-03.py
    in the terminal.