Running Python code from a file
Let us write a short piece of code to read text from a file and list all the 5-letter words in it. Type the following code into a file called prelim-01.py
All the text following '#' is a comment. It does not get executed by Python compiler. You need not type the comments - they will not affect the output.
import string
f = open('words-dataset.txt', 'r')
filecont = f.read()
wordlist = filecont.split()
fivewords = list() # Create an empty list to store 5-letter words
for word in wordlist : # Go through the words one by one
if len(word) == 5 : # Check if the word length is 5
fivewords.append(word) # Add the word to the list of 5-letter words
print "The list of 5-letter words in file 'words-dataset.txt' is"
print fivewords
Then type,
python prelim-01.py in the terminal. If you did everything exactly as given, you should get the following output:
Notice that there are words such as "band," which are 4-letter words. They are there because their length is calculated by including the punctuation. Also, there are numbers such as "1988," and "2015." Can you remove such wrong words?
It is easy - check the string package documentation to find what the method str.isalpha() does. Change the if statement in the code you typed in the file prelim-01.py into
If everything is done correctly, you should see the following output.
You see now that all the words are correct!
Read the text in the file Lab5-2.txt' and count the number of times 'e' occurred in the text.
These are step-by-step instructions for solving the problem. You can try to convert each step into Python code.
When you are done, run the code by typing
Extend Problem 1 to count the number of times every character occurs in the text. Plot the counts as a bar graph. Finally, print the 5 highest counts.
Look up the documentation for pyplot.bar method (look for matplotlib.pyplot.bar in the page).
These are step-by-step instructions for solving the problem. You can try to convert each step into Python code.
When you are done, run the code by typing
This is a variation on Problem 2. File marks-dataset.csv gives the marks of 100 students in a class. The grades are given as follows: any score between 40 and 54 is given 'D' grade; scores between 55 and 69 are 'C'; between 70 and 84 are 'B' and above 85 are 'A'. Plot the grade distribution, i.e., how many students received each of 'A', 'B', 'C' and 'D' grades. Then, list the Roll Numbers and Marks of the top-10 rankers.
Look up the documentation for numpy.genfromtxt() method (look for names description in the page to find out what
does). Also, look up documentation for numpy.sort() method. Look at the order= to find out how to sort according to Marks. It is also a good idea to look at pyplot documentation. Look at the
documentation for matplotlib.pyplot.hist in the page. See what the
parameter range does.
These are step-by-step instructions for solving the problem. You can try to convert each step into Python code.
When you are done, run the code by typing