# Learning From Errors in Python

No one can learn a new spoken language without making all sorts of mistakes - ranging from those that are almost unnoticable to comical or mortifying misunderstandings. This is just part of the process - one tries having a conversation, and get's lots of feedback when things aren't quite right. Error messages in coding serve a similar purpose as gentle feedback from a friend when learning a spoken language. 

Although it can be frustrating to get error messages, it is important to recognize that they provide really valuable information. To understand what each is saying, it is useful to know a little bit about common kinds of errors in python, and what each is telling you. I strongly recommend *intentionally* triggering each kind of python error you can, and taking some notes about what the error message looks like. Then when you *unintentionally* get an error message, you'll already have a pretty good idea of what kind of issue might have produced it. 
Eventually, the types of bugs that become most scary are those that *don't* produce an error message. We will discuss later on in the text ways to use formal test code and unit testing to help guard against this type of bug.

>**There are no penalties for getting lots of errors while coding** - every person who codes gets error messages. You just have to learn what they mean and how to fix them.

## Common Error Types in Python

Below I list many common types of errors in python, and examples of how one might fix them. Feel free to either read through this top-to-bottom, or to skip around and use it as a reference when you encounter a particular type of error.

The types of errors covered in the document are not all possible types of errors that can occur in python, but represent core types of errors that come up very often when coding. You can jump to specific errors or the Exercises using the links below:
  

[NameError](#NameError)  
[SyntaxError](#SyntaxError)  
[TypeError](#TypeError)  
[ValueError](#ValueError)  
[IndexError](#IndexError)  
[AttributeError](#AttributeError)   
[ZeroDivisionError](#ZeroDivisionError)  
[Exercises](#Exercises)  
[Further Reading](#Further-Reading)  


### **`NameError`**

`NameError`s are typically triggered when a variable that is referred to in your code doesn't exist. 

#### Mispelled variable names can produce `NameErrors`
One common reason you might get a `NameError` is due to misspelling the name of a variable:

In [6]:
sequence = "ATGCAT"
print(saquence)

NameError: name 'saquence' is not defined

The error message specifies that the name 'saquence' is not defined. This means that the code refers to a variable (`saquence`) that doesn't exist. In this case, fixing the spelling should resolve the problem, because the variable `sequence` that we probably meant to refer to does exist.:

In [7]:
sequence = "ATGCAT"
print(sequence)

ATGCAT


#### Incorrectly capitalized variable names can produce `NameErrors`

Python treats capitalized and uncapitalized versions of the same variable as totally different. So a variable called `dog` is totally different from `Dog` or `DOG`. If you defined a variable as lowercase, but accidentally refer to it using capital letters then it will produce a NameError:

In [46]:
input_dir = "./data"
print(f"Processing data in directory {Input_Dir}")

NameError: name 'Input_Dir' is not defined

We can fix this by making sure our capitalization is consistent.

In [47]:
input_dir = "./data"
print(f"Processing data in directory {input_dir}")

Processing data in directory ./data


In python, the standard is that all 'normal' variables and function names are lowercase, like our `input_dir` variable. 

The main exceptions to this rule are special `global` variables that are typically all caps (`VERBOSE`), and   custom classes that are written with each word capitalized (`CodonUsage`). These conventions make it easier to tell what each variable is. They also makes it easy to avoid `NameError`s due to incorrect capitaliztion, because you don't have to guess!

#### Using an external function without importing it can produce `NameErrors`

Another common reason is that you tried to use a function from an external library like `pandas` or `numpy` without first importing it.

In [13]:
coin = ["heads","tails"]
flip = choice(coin)
print("The coin flip is:", flip)

The coin flip is: heads


The `NameError` says `choice` is not defined because this function isn't one of the built-in functions that are always available in python. Instead it is part of the `random` library for working with pseudorandom numbers, and has to be imported before it can be used.

In [14]:
from random import choice
coin = ["heads","tails"]
flip = choice(coin)
print("The coin flip is:", flip)

The coin flip is: heads


## **`SyntaxError`**

`SyntaxError`s occur when the syntax or grammar of what you are trying to do can't be interpreted by python.  

#### **Mismatched parentheses** can cause `SyntaxError`s
One common syntax error is to forget to close a parentheses:

In [19]:
print("Hello"

SyntaxError: unexpected EOF while parsing (<ipython-input-19-73b2b6fc1c24>, line 1)

The text of the `SyntaxError` might seem strange - it says `unexpected EOF while parsing`. EOF is short for 'end of file'. The end of the file (or more generally the python code) was 'unexpected' because python had found an open parenthesis `(` but reached the end of the file before finding it's mate `)`. 

We can correct this error by adding the missing parenthesis:

In [157]:
print("Hello")

Hello


One sneaky thing about `SyntaxErrors` is that sometimes a mistake on one line will cause a `SyntaxError` on the *next* line. This is because the mistake on the prior line changes how python tries to interpret the later (perfectly good) line. Here's an example:

In [156]:
print("Hello"
print("No Bugs Here")

SyntaxError: invalid syntax (<ipython-input-156-8a028dc4a732>, line 2)

In [155]:
print("Hello")
print("No Bugs Here")

Hello
No Bugs Here


In order to fix this error, we can't only focus on the line that generated the error - line 2 is actually fine, it's just that a mistake in line 1 is causing it to be misinterpreted. We fix this error in exactly the same way as shown above

#### **Missing commas** in lists and other data structures can cause `SyntaxErrors`s

If you forget a comma when defining a list or tuple or dict of numbers, you will get a `SyntaxError` like the one below:

In [30]:
tumor_masses = [1,13 24]
print(tumor_masses)

SyntaxError: invalid syntax (<ipython-input-30-9f29d502336a>, line 1)

You can fix this by finding the spot where the missing comma occurs:

In [31]:
tumor_masses = [1,13,24]

#### Missing colons can cause `SyntaxError`s

Similar SyntaxErrors can arise if one omits the `:` in an `if` statement, `for` loop or function `def`inition.

In [38]:
for i,mass in enumerate(tumor_masses)
    print(f"The mass of tumor {i} is {mass}")

SyntaxError: invalid syntax (<ipython-input-38-dcab63ffe56b>, line 1)

Such errors can be corrected by adding the necessary `:`

In [39]:
for i,mass in enumerate(tumor_masses):
    print(f"The mass of tumor {i} is {mass}")

The mass of tumor 0 is 1
The mass of tumor 1 is 13
The mass of tumor 2 is 24


#### Including a space in a variable name can trigger a SyntaxError

You're probably getting the idea of what general categories of mistakes cause SyntaxErrors. I'll highlight one last common source of such errors. Python variables are only allowed to be one 'word' long - they cannot include spaces. Longer variable names are separated by underscores (hold `Shift` and hit `-` to make `_`). Let's imagine that we accidentally forgot the underscore. We would get a `SyntaxError`:

In [41]:
generation time = 15.0

SyntaxError: invalid syntax (<ipython-input-41-797d50ac175b>, line 1)

We could fix this error by ensuring each variable is a single word:

In [43]:
generation_time = 15.0

### **`IndexError`**

Indices in python are used to select items in order from strings or lists. `IndexError`s indicate that the index you gave isn't present. Here's an example: let's say we have a string of text that is only 2 characters long. If we try to select character 10,000, python gives us an IndexError: 

In [85]:
id = "H7"
id[10000]

IndexError: string index out of range

The `IndexError` says our string index is out of range. To correct this, we need to use an index that is within the length of the string.

In [86]:
id = "H7"
id[0]

'H'

The example above was intentionally rather obvious, but this type of errors most often happen because we were off by one number. Since python indices start at 0, the biggest index we can select is 1 smaller than the length of a string. So for instance, if a string is 2 characters long the largest index we can select is index 1. If we try to figure out the highest index using the full length of a string, we will be off by one and get an `IndexError`

In [88]:
sequence = "ATCGTGAGCGGCGGC"
seq_length = len(sequence)
last_nt = sequence[seq_length]

IndexError: string index out of range

We can fix this error by recognizing that the maximum index is the length of the sequence - 1

In [94]:
sequence = "ATCGTGAGCGGCGGC"
seq_length = len(sequence)
last_nt = sequence[seq_length - 1] 

As we discussed when we first learned about indexing into strings, it can help to imagine the indices as numbers that start from 0 and are written below and to the left of each character. So `common_name = "dog"` could be imagined like `common_name = "` <sub>0</sub>`d` <sub>1</sub>`o` <sub>2</sub>`g`". Imagined this way, you can think of each index as selecting the number to the right of the index. Negative indices work similarly, counting backwards from the right hand side of the string or list. However because indices select what is to the right of them, the first negative index is -1, because by definition there is no item to the right of the last item in a string or list (where index -0 would normally go). Marking up our `"dog"` example with negative indices, we would get: `common_name = "` <sub>-3</sub>`d` <sub>-2</sub>`o` <sub>-1</sub>`g`". This is slightly tricky, because it means that the largest *negative* index one can use is the length of the sequence rather than the length of the sequence minus 1.

In [96]:
common_name = "dog"
common_name[-2] 

'o'

This doesn't raise an error, but might get us the wrong character. Using `common_name[-3] will more correctly get us the first letter`. In most contexts we would access this using common_name[0] (forward indexing), but every now and again inside e.g. a `for` loop it is important to be able to index in reverse order.

In [97]:
common_name = "dog"
common_name[-3] 

'd'

I find this hard to remember if I think about it mathematically, but a bit easier if I visualize or draw out the indices as shown above.

### **`TypeError`**

`TypeError`s indicate that the type of variable you are using can never be used for the operation you were trying to perform. As a real world analogy, no matter what type of construction tool you have in your hand, it's probably a bad idea to eat it - construction tools are just not the type of thing you can eat. 

#### Trying to loop over a non-iterable object will produce a TypeError
Here's an example, in python you cannot run a for loop over an integer. Attempting to do so will generate a `TypeError`, because an integer is not an *iterable* object, and therefore cannot be used in this way. 

For instance, imagine we wanted to print to screen x and y data before using them in a plot. If we tried to iterate over an integer, we'd get a `TypeError`

In [57]:
print("x and y data:")
for x in 3:
    y = x**2
    print(x,y)

x and y data:


TypeError: 'int' object is not iterable

We can fix this way by replacing our number with a list of numbers, or using a function like range to generate all the numbers from 0 up to our integer:

In [60]:
#One possible solution is to loop over a list of numbers
print("x and y data:")
for x in [0,1,2]:
    y = x**2
    print(x,y)

#A more elegant solution is to use range() to generate numbers to iterate over
print("x and y data:")   
for x in range(3):
    y = x**2
    print(x,y)
    


x and y data:
0 0
1 1
2 4
x and y data:
0 0
1 1
2 4


Both the above solutions avoid the TypeError. 

#### Trying to do unsupported math operations on strings produces `TypeError `s

In python, you cannot divide a string of text by some number - that doesn't really make any sense, and no matter what the text is and what the number is, that type of operation will just never work. 

This is obvious in some cases, like the one below:

In [64]:
scientific_name = "Homo sapiens"
genus = scientific_name / 2 #???
print(genus)

TypeError: unsupported operand type(s) for /: 'str' and 'int'

There's no direct solution to this kind of TypeError, other than to change your approach. For example, in the above example, it sort of looks like the coder was trying to split the binomial scientific name up into it's genus ('Homo') and species ('sapiens') components. Since a division operator won't do that, we could use a more appropriate method, such as the `split` method of strings:

In [65]:
scientific_name = "Homo sapiens"
genus,species = scientific_name.split()
print(genus)

Homo


#### Numbers represented as strings can produce `TypeError`s when used in math operations

A more subtle variation of this type of error comes when dealing with strings that *represent* numbers. However, in python the string `"10"` is not the same as the integer `10`. If you try to use the string as a number without converting it to an `int` (integer) or `float` (decimal number) first, you will get a `TypeError`:

In [70]:
x = "10.0"
y = "5.6"

z = (x + y)/2.0

TypeError: unsupported operand type(s) for /: 'str' and 'float'

We can fix this error by converting both x and y into numbers before trying to use them in mathematical operations.

In [73]:
x = float("10.0")
y = float("5.6")

z = (x + y)/2.0

Obviously it would be silly to do all this since we could just define x and y as floats initially (e.g. `x = 10.0`). The place this kind of TypeError comes up commonly in Bioinfomatics is if you are processing a results file (maybe from an external program) and forget to convert the text into numbers. Imaginet that the `result_line` variable holds the raw text output from one line of a results file generated by another program.

It would be easy (at least for me!) to forget to convert the x and y results into numbers before using them in the final equation. If we did, we would get a `TypeError`:

In [77]:
result_line = "10.0,5.6"
x,y = result_line.split(",")
aveage_mass = (x + y)/2.0
print(f"Average mass: {average_mass}")

TypeError: unsupported operand type(s) for /: 'str' and 'float'

Luckily, the solution is simple - we just call float() on our two values before using them in math, just as we did in the previous example:

In [79]:
result_line = "10.0,5.6"
x,y = result_line.split(",")
average_mass = (float(x) + float(y))/2.0
print(f"Average mass: {average_mass}")

Average mass: 7.8


### `ValueError`

`ValueError`s are similar to `TypeError`s, but indicate that the operation you are trying to do with a particular variable isn't one that can be done for the *specific values of that variable*. Another way of saying this is that you've got the wrong content in your variable for what you want to do. An example would be trying to saw with a hammer. While it is possible to have a construction tool that you could saw with (hand saw, jigsaw, etc), a hammer just isn't the right construction tool for the particular operation of sawing.

#### Trying to convert non-numerical text to numbers produces a ValueError

If you try to convert the string "Pangolin" into an integer, you would get a `ValueError`. You would get a `ValueError` instead of a `TypeError` because although *some* strings can be converted into numbers (e.g. "10"), this *particular* piece of text can't be, because there is no sensible number that corresponds to a Pangolin. 

In [81]:
mass = "one hundred grams"
mass = int(mass)

ValueError: invalid literal for int() with base 10: 'one hundred grams'

We can fix this by only converting text with numerals in it into numbers

In [82]:
mass = "100"
mass = int(mass)

#### Trying to 'unpack' single items produces a ValueError

If a python variable contains a known number of items, you can unpack it using the syntax:

`data = (14,2)
 x,y = data
`

However, if you try to do this with the wrong number of variables, you get a ValueError

In [170]:
 
scientific_name = "Escherichia coli"
genus,species = scientific_name

ValueError: too many values to unpack (expected 2)

You can fix this by ensuring you have the right number of items. For example, in the above example, it looks like we are trying to break the scientific name for *E. coli* up into a genus and species name. We could use the split method to do this, and then assign the items of the list produced by splitting on white space to a genus and species variable:

In [171]:
 scientific_name = "Escherichia coli"
 genus,species = scientific_name.split()

A related issue is if you write code expecting your input data to only look a certain way (e.g. a binomial species name),but your actual data looks a bit different. For example, maybe some scientific names have extra modifiers to designate the strain of bacteria. These types of rare but valid data - which frequently break code - are called [corner cases](https://en.wikipedia.org/wiki/Corner_case).

In such cases you have to consider what you are trying to accomplish, and see if you can find a way to write the code so that these corner cases are handled correctly. For instance, if every scientific name had a genus and species name, but some had extra strain info, we could resolve the issue by explicitly taking only the first two fields resulting from splitting the scientific name:

In [178]:
scientific_name = "Escherichia coli O157:H7"
fields = scientific_name.split(maxsplit = 2)
genus = fields[0]
species = fields[1]

### **`AttributeError`**

Objects in python have various attributes associated with them that can be accessed using the name of an object, a period `.` and then the name of the attribute. For example, when we want to convert a string to uppercase, we access the string's `upper` method as an attribute, and call it:

In [99]:
rna_seq = "aucgcugcuagcugggcuagcuagc"
rna_seq = rna_seq.upper()
print(rna_seq)

AUCGCUGCUAGCUGGGCUAGCUAGC


What happens if you ask for an attribute that doesn't exist? In such cases an `AttributeError` is raised.

In [100]:
rna_seq.tongue

AttributeError: 'str' object has no attribute 'tongue'

The above example is silly, but AttributeErrors are fairly common. The two most common reasons for getting them are that you mispelled the attribute name (e.g. try `rna_seq = rna_seq.uppper()`) or that the object whose attribute you are trying to access is of a different type than you expected. 

In [106]:
temp = "-10.0"
temp.is_integer()

AttributeError: 'str' object has no attribute 'is_integer'

You can solve such issues by either modifying the object to the type you expected, or by changing which attribute you are trying to access.

In [105]:
temp = "-10.0"
temp = float(temp)
temp.is_integer()

True

### **`ZeroDivisionError`**

The result of dividing a number by 0 is undefined. As you might guess, a `ZeroDivisionError` arises if you try to divide a number by 0. Most often you didn't do this intentionally - instead you somewhere divided one variable by another, and in a particular set of results, the number in the denominator happened to be 0.

Here's an example:


In [115]:
sequences = ['AAATCG','CCCGTA','TTTTTAA','AAGGAGAGAGGGGA']
for seq in sequences:
    purine_count = seq.count("A") + seq.count("G")
    pyrimidine_count = seq.count("C")+seq.count("T")
    purine_pyrimidine_ratio = purine_count/pyrimidine_count
    print(f"The purine-pyrimidine ratio of sequence {seq} is {purine_pyrimidine_ratio}")

The purine-pyrimidine ratio of sequence AAATCG is 2.0
The purine-pyrimidine ratio of sequence CCCGTA is 0.5
The purine-pyrimidine ratio of sequence TTTTTAA is 0.4


ZeroDivisionError: division by zero

Depending on the purpose, there are several ways to handle such zero division errors. Which is best depends on the particular application. In this case, possible solutions include raising a more specific error message for users, skipping sequences for which a purine-pyrimidine ratio can't be calculated, or adding a small 'pseudocount' (typically 1) to the count of both purines and pyrimidines. The last option is typically only used if the sequences you are studying are quite long, such that the pseudocount doesn't strongly influence the results in a typical case.

In [117]:
sequences = ['AAATCG','CCCGTA','TTTTTAA','AAGGAGAGAGGGGA']
pseudocount = 1
for seq in sequences:
    purine_count = seq.count("A") + seq.count("G") + pseudocount
    pyrimidine_count = seq.count("C")+seq.count("T") + pseudocount
    purine_pyrimidine_ratio = purine_count/pyrimidine_count
    print(f"The purine-pyrimidine ratio of sequence {seq} is {purine_pyrimidine_ratio}")

The purine-pyrimidine ratio of sequence AAATCG is 1.6666666666666667
The purine-pyrimidine ratio of sequence CCCGTA is 0.6
The purine-pyrimidine ratio of sequence TTTTTAA is 0.5
The purine-pyrimidine ratio of sequence AAGGAGAGAGGGGA is 15.0


## Exercises

For each of the following questions, adjust the following code so it accomplishes it's intended purpose (as noted in the comments) but no longer produces an error. You may have to use your judgement about what that purpose is, so feel free to solve the problem in a way that makes the most practical sense to you, recognizing that there is typically more than one solution.

### Exercise 1  - Debug code for calculating GC content

Debug the following code for calculating GC content:

In [119]:
# Calculate the frequency of G & C nucleotides in a sequence

seq = "ATATGCTACTACTCGGCTACG"
gc_content = seq.count(G) + seq.count(C)/len(seq)

NameError: name 'G' is not defined

### Exercise 2 - Debug a random bird generator

Debug the following code for generating random bird common names. 

In [147]:
from random import choice
n_birds_to_generate = 10

descriptors = ["Emperor","Red-breasted","Warbling","Vampire","Night",\
               "Sea","Greater","Pond","Jungle","Barn","Drab","Lesser","Spotted",\
               "Northern","Southern","Long-beaked","Crested","Fairy","Bald"]

bird_types = ["Falcon","Merganser","Owl","Eagle","Hawk","Penguin",\
             "Dodo","Gull","Warbler","Fowl","Goose","Hummingbird","Snowcock"]

random_birds = []

for i in range(n_birds_to_generate)
    descriptor = choice(descriptors)

    bird_type = choice(bird_types)

    random_bird = f"{descriptor} {bird_type}"
    random_birds.append(random_bird)
    

SyntaxError: invalid syntax (<ipython-input-147-696f7c200593>, line 13)

### Exercise 3 Debug code for simulating Mendelian Inheritance

Debug the following code. Note that there is more than one mistake with the code that you will need to fix.

In [181]:
#This code simules Mendelian inheritance
#Each parent has two alleles or genetic variants: A and a

#Each gamete (sperm or egg) gets one random allele from
#the parent that produced that gamete

#The offspring genotype is a combination of these

from random import choice

maternal_alleles = ["A","a"]
paternal_alleles = ["A","a"]

egg_allele = choice(maternal_alleles
sperm_allele = choice(paternal_alleles)

offspring_genotype = sorted(egg_allele + sperm_alele)
print(f"The genotype of the offspring is {offspring_genotype}")

SyntaxError: invalid syntax (<ipython-input-181-eb0efb489b6d>, line 15)

### Exercise 4 - Debug broken code for outputting genome analysis results

Debug the following code that merges the results of a genomic analysis into a tab-delimited output line:

In [146]:
#Imagine we'd calculated several parameters for the genome
header_fields = ["Genus","Species","Strain","Chromosome Type","Genome Length","Coding Regions","GC content"]
header_line = "\t".join(header_fields)+"\n"
print(header_line)


#Most commonly, this type of code would be inside a for loop
#where we were analyzing many genomes, and generating one line of 
#results per genome analyzed.
#(here I just hard-code the results for simplicity). 

gc_content = 57.0
genome_length_nt = 4195195
chromosome_type = "circular"
coding_regions = 4276

#Get the genus name and species name from the full strain id
strain_id = "Bacillus subtilis SZMC 6179J"
genus,species,strain_id_part1,strain_id_part2 = strain_id.split()

result_fields = [genus,species,strain_id,chromosome_type,genome_length_nt,coding_regions,gc_content]
result_line = "\t".join(result_fields) + "\n"
print(result_line)

#We would go on to open a results file
#and write the results to it.

Genus	Species	Strain	Chromosome Type	Genome Length	Coding Regions	GC content



TypeError: sequence item 4: expected str instance, int found

### Exercise 5 - Debug code for mapping scientific names into common taxon names

**Hint:** there are 3 mistakes with the code that you'll need to fix. The first two are straightforward. Fix them first, and you will encounter the 3rd bug, which is a bit more subtle. 


In [179]:

#Look up an informal, non-scientific common name for each species
species = ["Homo sapiens","Gallus gallus","Bacillus thuringiensis",,\
           "Bacillus subtilis SZMC 6179J","Porites asteroides","Acropora palmata"]

common_name_map = {"Homo":"Mammal","Gallus","Bird","Bacillus":"Bacterium",\
                   "Porites";"Stony coral","Acroproa":"Stony coral"}

for binomial_name in species:
    genus,species = binomial_name.split()
    common_taxon_name = common_name_map[genus]
    print(f"{binomial_name} is a {common_taxon_name}")
    

SyntaxError: invalid syntax (<ipython-input-179-7176d3ae0171>, line 3)

## Further Reading

The PyLearn project has a useful [Error Encyclopedia](https://cs.carleton.edu/cs_comps/1213/pylearn/final_results/encyclopedia/) with detailed descriptions of common errors.

[Top of Page](#Learning-From-Errors-in-Python)  
[NameError](#NameError)  
[SyntaxError](#SyntaxError)  
[TypeError](#TypeError)  
[ValueError](#ValueError)  
[IndexError](#IndexError)  
[AttributeError](#AttributeError)   
[ZeroDivisionError](#ZeroDivisionError)   
[Exercises](#Exercises)  
[Further Reading](#Further-Reading)  


# Feedback and Reading Responses

You can submit feedback and reading responses to the chapter [here](https://docs.google.com/forms/d/e/1FAIpQLSeUQPI_JbyKcX1juAFLt5z1CLzC2vTqaCYySUAYCNElNwZqqQ/viewform?usp=pp_url&entry.2118603224=Error+Messages+in+Python)