By Lenin Mishra
in
python
—
Jan 22, 2022
Handle Lookup error exceptions in Python using try-except block.
The LookupError exception in Python forms the base class for all exceptions that are raised when an index or a key is not found for a sequence or dictionary respectively.
You can use LookupError exception class to handle both IndexError and KeyError exception classes.
- LookupError
--> IndexError
--> KeyError
Example 1 — Handling IndexError exception
Code/Output
# lists
x = [1, 2, 3, 4]
try:
print(x[10])
except LookupError as e:
print(f"{e}, {e.__class__}")
>>> list index out of range, <class 'IndexError'>
# strings
x = "Pylenin"
try:
print(x[10])
except LookupError as e:
print(f"{e}, {e.__class__}")
>>> string index out of range, <class 'IndexError'>
# tuples
x = (1, 2, 3, 4)
try:
print(x[10])
except LookupError as e:
print(f"{e}, {e.__class__}")
>>> tuple index out of range, <class 'IndexError'>
As you can see, it is possible to catch IndexError exceptions using the LookupError exception class. By using e.__class__ method also helps you to identify the type of LookupError. In the above example, it is an IndexError.
Example 2 — Handling KeyError exception
Code
pylenin_info = {'name': 'Lenin Mishra',
'age': 28,
'language': 'Python'}
user_input = input('What do you want to learn about Pylenin==> ')
try:
print(f'{user_input} is {pylenin_info[user_input]}')
except LookupError as e:
print(f'{e}, {e.__class__}')
Output
What do you want to learn about Pylenin==> wife
'wife', <class 'KeyError'>
Check out other Python Built-in Exception classes in Python.
built-in-exception-classes — Pylenin
A programmer who aims to democratize education in the programming world and help his peers achieve the career of their dreams.
Summary: in this tutorial, you learn how to handle exceptions in Python in the right way by using the try statement.
Introduction to the exception handling in Python
To handle exceptions, you use the try statement. The try statement has the following clauses:
Code language: Python (python)
try: # code that you want to protect from exceptions except <ExceptionType> as ex: # code that handle the exception finally: # code that always execute whether the exception occurred or not else: # code that excutes if try execute normally (an except clause must be present)
Let’s examine the try statement in greater detail.
try
In the try clause, you place the code that protects from one or more potential exceptions. It’s a good practice to keep the code as short as possible. Often, you’ll have a single statement in the try clause.
The try clause appears exactly one time in the try statement.
except
In the except clause, you place the code that handles a specific exception type. A try statement can have zero or more except clauses. Typically, each except clause handles different exception types in specific ways.
In an except clause, the as ex is optional. And the <ExceptionType> is also optional. However, if you omit the <ExceptionType> as ex, you’ll have a bare exception handler.
When specifying exception types in the except clauses, you place the most specific to least specific exceptions from top to bottom.
If you have the same logic that handles different exception types, you can group them in a single except clause. For example:
Code language: Python (python)
try: ... except <ExceptionType1> as ex: log(ex) except <ExceptionType2> as ex: log(ex)
Become
Code language: Python (python)
try: ... except (<ExceptionType1>, <ExceptionType2>) as ex: log(ex)
It’s important to note that the except order matters because Python will run the first except clause whose exception type matches the occurred exception.
finally
The finally clause may appear zero or 1 time in a try statement. The finally clause always executes whether an exception occurred or not.
else
The else clause also appears zero or 1 time. And the else clause is only valid if the try statement has at least one except clause.
Typically, you place the code that executes if the try clause terminates normally.
The following defines a function that returns the result of a number by another:
Code language: Python (python)
def divide(a, b): return a / b
If you pass 0 to the second argument, you’ll get a ZeroDivisionError exception:
Code language: Python (python)
divide(10, 0)
Error:
Code language: Python (python)
ZeroDivisionError: division by zero
To fix it, you can handle the ZeroDivisionError exception in the divide() function as follows:
Code language: Python (python)
def divide(a, b): try: return a / b except ZeroDivisionError as ex: return None
In this example, the divide() function returns None if the ZeroDivisionError occurs:
def divide(a, b): try: return a / b except ZeroDivisionError as ex: return NoneCode language: Python (python)
When using the divide() function, you need to check if the result is None:
Code language: Python (python)
result = divide(10, 0) if result is not None: print('result:', result) else: print('Invalid inputs')
But returning None may not be the best because others may accidentally evaluate the result in the if statement like this:
Code language: Python (python)
result = divide(10, 0) if result: print('result:', result) else: print('Invalid inputs')
In this case, it works. However, it won’t work if the first argument is zero. For example:
Code language: Python (python)
result = divide(0, 10) if result: print('result:', result) else: print('Invalid inputs')
A better approach is to raise an exception to the caller if the ZeroDivisionError exception occurred. For example:
Code language: Python (python)
def divide(a, b): try: return a / b except ZeroDivisionError as ex: raise ValueError('The second argument (b) must not be zero')
In this example, the divide() function will raise an error if b is zero. To use the divide() function, you need to catch the ValueError exception:
def divide(a, b): try: return a / b except ZeroDivisionError as ex: raise ValueError('The second argument (b) must not be zero') try: result = divide(10, 0) except ValueError as e: print(e) else: print('result:', result)Code language: Python (python)
Output:
Code language: Python (python)
The second argument (b) must not be zero
It’s a good practice to raise an exception instead of returning None in special cases.
Except order example
When you catch an exception in the except clause, you need to place the exceptions from most specific to the least specific in terms of exception hierarchy.
The following shows three exception classes: Exception, LookupError, and IndexError:
If you catch the exception, you need to place them in the following order: IndexError, LookupErorr, and Exception.
For example, the following defines a list of three strings and attempts to access the 4th element:
Code language: Python (python)
colors = ['red', 'green', 'blue'] try: print(colors[3]) except IndexError as e: print(type(e), 'Index error') except LookupError as e: print(type(e), 'Lookup error')
It issues the following error:
<class 'IndexError'> Index errorCode language: Python (python)
The colors[3] access causes an IndexError exception. However, if you swap the except clauses and catch the LookupError first and the IndexError second like this:
Code language: Python (python)
colors = ['red', 'green', 'blue'] try: print(colors[3]) except LookupError as e: print(type(e), 'Lookup error') except IndexError as e: print(type(e), 'Index error')
Output:
Code language: Python (python)
<class 'IndexError'> Lookup error
The exception is still IndexError but the following message is misleading.
Bare exception handlers
When you want to catch any exception, you can use the bare exception handlers. A bare exception handler does not specify an exception type:
Code language: Python (python)
try: ... except: ...
It’s equivalent to the following:
Code language: Python (python)
try: ... except BaseException: ...
A bare exception handler will catch any exceptions including the SystemExit and KeyboardInterupt exceptions.
A bare exception will make it harder to interrupt a program with Control-C and disguise other programs.
If you want to catch all exceptions that signal program errors, you can use except Exception instead:
Code language: Python (python)
try: ... except Exception: ...
In practice, you should avoid using bare exception handlers. If you don’t know exceptions to catch, just let the exception occurs and then modify the code to handle these exceptions.
To get exception information from a bare exception handler, you use the exc_info() function from the sys module.
The sys.exc_info() function returns a tuple that consists of three values:
typeis the type of the exception occurred. It’s a subclass of theBaseException.valueis the instance of the exception type.tracebackis an object that encapsulates the call stack at the point where the exception originally ocurred.
The following example uses the sys.exc_info() function to examine the exception when a string is divided by a number:
Code language: Python (python)
import sys try: '20' / 2 except: exc_info = sys.exc_info() print(exc_info)
Output:
(<class 'TypeError'>, TypeError("unsupported operand type(s) for /: 'str' and 'int'"), <traceback object at 0x000001F19F42E700>)Code language: Python (python)
The output shows that the code in the try clause causes a TypeError exception. Therefore, you can modify the code to handle it specifically as follows:
Code language: Python (python)
try: '20' / 2 except TypeError as e: print(e)
Output:
Code language: Python (python)
unsupported operand type(s) for /: 'str' and 'int'
Summary
- Use the
trystatement to handle exception. - Place only minimal code that you want to protect from potential exceptions in the
tryclause. - Handle exceptions from most specific to least specific in terms of exception types. The order of
exceptclauses is important. - The finally always executes whether the exceptions occurred or not.
- The
elseclause only executes when thetryclause terminates normally. Theelseclause is valid only if thetrystatement has at least oneexceptclause. - Avoid using bare exception handlers.
Did you find this tutorial helpful ?
Следующие исключения являются исключениями, которые обычно возникают во время исполнения программы.
Содержание:
- Исключение StopIteration
- Исключение StopAsyncIteration
- Исключение ArithmeticError
- Исключение AssertionError
- Исключение AttributeError
- Исключение BufferError
- Исключение EOFError
- Исключение ImportError
- Исключение ModuleNotFoundError
- Исключение LookupError
- Исключение IndexError
- Исключение KeyError
- Исключение MemoryError
- Исключение NameError
- Исключение UnboundLocalError
- Исключение OSError
- Исключение ReferenceError
- Исключение RuntimeError
- Исключение NotImplementedError
- Исключение RecursionError
- Исключение SyntaxError
- Исключение IndentationError
- Исключение TabError
- Исключение SystemError
- Исключение TypeError
- Исключение ValueError
- Исключение UnicodeError
- Исключение EnvironmentError
- Исключение IOError
- Исключение WindowsError
StopIteration:
Исключение StopIteration вызывается встроенной функцией next() и методом итератора __next__(), чтобы сигнализировать, что итератор больше не производит никаких элементов.
Объект исключения имеет единственный атрибут value, который задается в качестве аргумента при создании исключения и по умолчанию равен None.
Когда функция генератора или сопрограммы возвращается, создается новый экземпляр StopIteration, и значение, возвращаемое функцией, используется в качестве параметра value для конструктора исключения.
Если код генератора прямо или косвенно поднимает StopIteration, он преобразуется в RuntimeError, сохраняя StopIteration как причину нового исключения.
StopAsyncIteration:
Исключение StopAsyncIteration вызывается методом __next__() объекта асинхронного итератора, чтобы остановить итерацию.
ArithmeticError:
AssertionError:
Исключение AssertionError вызывается когда оператор assert терпит неудачу.
AttributeError:
Исключение AttributeError вызывается при сбое ссылки на атрибут или присвоения. Если объект не поддерживает ссылки на атрибуты или назначения атрибутов вообще, вызывается TypeError.
BufferError:
Исключение BufferError вызывается когда операция, связанная с буфером, не может быть выполнена.
EOFError:
Исключение EOFError вызывается когда функция input() попадает в состояние конца файла без чтения каких-либо данных. Когда методы io.IOBase.read() and io.IOBase.readline() возвращают пустую строку при попадании в EOF.
ImportError:
Исключение ImportError вызывается когда оператор import имеет проблемы при попытке загрузить модуль. Также ImportError поднимается, когда “из списка» в конструкция from ... import имеет имя, которое не может быть найдено.
Атрибуты name и path можно задать с помощью аргументов конструктора, содержащих только ключевые слова. При установке они представляют имя модуля, который был предпринят для импорта, и путь к любому файлу, который вызвал исключение, соответственно.
-
ModuleNotFoundError:Исключение
ModuleNotFoundErrorподклассImportError, который вызывается операторомimport, когда модуль не может быть найден. Он также вызывается, когда вsys.modulesимеет значениеNone.
LookupError:
Исключение LookupError — базовый класс для исключений, возникающих при недопустимости ключа или индекса, используемого в сопоставлении или последовательности: IndexError, KeyError. Исключение LookupError может быть вызван непосредственно codecs.lookup().
-
IndexError:Исключение
IndexErrorвызывается когда индекс последовательности находится вне диапазона. Индексы среза усекаются без каких либо предупреждений, чтобы попасть в допустимый диапазон. Если индекс не является целым числом, поднимается исключениеTypeError. -
KeyError:Исключение
KeyErrorвызывается когда ключ сопоставления словаря не найден в наборе существующих ключей.
MemoryError:
Исключение MemoryError вызывается, когда операции не хватает памяти, но ситуация все еще может быть спасена путем удаления некоторых объектов. Значение представляет собой строку, указывающую какой внутренней операции не хватило памяти. Обратите внимание, что из-за базовой архитектуры управления памятью интерпретатор не всегда может полностью восстановиться в этой ситуации. Тем не менее, возникает исключение, чтобы можно было напечатать трассировку стека.
NameError:
Исключение NameError вызывается, когда локальное или глобальное имя не найдено. Значение — это сообщение об ошибке, содержащее имя, которое не удалось найти.
-
UnboundLocalError:Исключение
UnboundLocalErrorвызывается, когда ссылка сделана на локальную переменную в функции или методе, но никакое значение не было привязано к этой переменной. Это подклассNameError.
OSError:
ReferenceError:
Исключение ReferenceError вызывается, когда слабый эталонный прокси-сервер, созданный функцией weakref.proxy() используется для доступа к атрибуту референта после сбора его мусора.
RuntimeError:
Исключение RuntimeError вызывается при обнаружении ошибки, которая не попадает ни в одну из других категорий. Связанное значение является строкой, указывающей, что именно пошло не так.
-
NotImplementedError:Исключение
NotImplementedErrorполучено изRuntimeError. В определяемых пользователем базовых классах абстрактные методы должны вызывать это исключение, когда им требуется, чтобы производные классы переопределяли метод, или когда класс разрабатывается, чтобы указать, что реальная реализация все еще должна быть добавлена.Заметки:
- Его не следует использовать для указания того, что оператор или метод вообще не предполагается поддерживать — в этом случае либо оставьте оператор/метод неопределенным, либо, установите его в None.
NotImplementedErrorиNotImplementedне являются взаимозаменяемыми, даже если они имеют схожие имена и цели. Смотрите подробностиNotImplementedо том, когда его использовать.
-
RecursionError:Исключение
RecursionErrorполучено изRuntimeError. ИсключениеRecursionErrorвызывается, когда интерпретатор обнаруживает, что максимальная глубина рекурсииsys.getrecursionlimit()превышена.
SyntaxError:
Исключение SyntaxError вызывается, когда синтаксический анализатор обнаруживает синтаксическую ошибку. Ошибка данного типа может произойти в инструкции import, при вызове встроенной функции exec() или eval(), или при чтении первоначального сценария или стандартный ввода, также в интерактивном режиме.
Экземпляры этого класса имеют атрибуты filename, lineno, offset и text для облегчения доступа к информации. Функция str() экземпляра исключения возвращает только сообщение.
-
IndentationError:Исключение
IndentationErrorслужит базовым классом для синтаксических ошибок, связанных с неправильным отступом. Это подклассSyntaxError. -
TabError:
Исключение TabError вызывается, когда отступ содержит несоответствующее использование символов табуляции и пробелов. Это подкласс IndentationError.
SystemError:
Исключение SystemError вызывается, когда интерпретатор обнаруживает внутреннюю ошибку, но ситуация не выглядит настолько серьезной, чтобы заставить его отказаться от всякой надежды. Ассоциированное значение — это строка, указывающая, что пошло не так (в терминах низкого уровня).
TypeError:
Исключение TypeError вызывается, когда операция или функция применяется к объекту неподходящего типа. Связанное значение представляет собой строку, содержащую сведения о несоответствии типов.
Исключение TypeError может быть вызвано пользовательским кодом, чтобы указать, что попытка выполнения операции над объектом не поддерживается и не должна поддерживаться. Если объект предназначен для поддержки данной операции, но еще не предоставил реализацию, то вызывайте исключение NotImplementedError.
Передача аргументов неправильного типа, например передача списка, когда ожидается целое число, должна привести к TypeError, но передача аргументов с неправильным значением, например число вне ожидаемых границ, должна привести к ValueError.
ValueError:
Исключение ValueError вызывается, когда операция или функция получает аргумент, который имеет правильный тип, но недопустимое значение, и ситуация не описывается более точным исключением, таким как IndexError.
UnicodeError:
EnvironmentError:
Доступно только в Windows.
IOError:
Доступно только в Windows.
WindowsError:
Доступно только в Windows.
- Exceptions are error scenarios that alter the normal execution flow of the program.
- The process of taking care of the possible exceptions is called exception handling.
- If exceptions are not handled properly, the program may terminate prematurely. It can cause data corruption or unwanted results.
- Python exception handling is achieved by three keyword blocks – try, except, and finally.
- The try block contains the code that may raise exceptions or errors.
- The except block is used to catch the exceptions and handle them.
- The catch block code is executed only when the corresponding exception is raised.
- There can be multiple catch blocks. We can also catch multiple exceptions in a single catch block.
- The finally block code is always executed, whether the program executed properly or it raised an exception.
- We can also create an “else” block with try-except block. The code inside the else block is executed if there are no exceptions raised.
How to Handle Exceptions in Python?
Let’s look at an example where we need exception handling.
def divide(x, y):
print(f'{x}/{y} is {x / y}')
divide(10, 2)
divide(10, 0)
divide(10, 4)
If we run the above program, we get the following output.
10/2 is 5.0
Traceback (most recent call last):
File "/Users/pankaj/Documents/PycharmProjects/PythonTutorialPro/hello-world/exception_handling.py", line 6, in <module>
divide(10, 0)
File "/Users/pankaj/Documents/PycharmProjects/PythonTutorialPro/hello-world/exception_handling.py", line 2, in divide
print(f'{x}/{y} is {x / y}')
ZeroDivisionError: division by zero
The second call to the divide() function raised ZeroDivisionError exception and the program terminated.
We never got the output of the third call to divide() method because we didn’t do exception handling in our code.
Let’s rewrite the divide() method with proper exception handling. If someone tries to divide by 0, we will catch the exception and print an error message. This way the program will not terminate prematurely and the output will make more sense.
def divide(x, y):
try:
print(f'{x}/{y} is {x / y}')
except ZeroDivisionError as e:
print(e)
divide(10, 2)
divide(10, 0)
divide(10, 4)
Output:
10/2 is 5.0 division by zero 10/4 is 2.5
What is BaseException Class?
The BaseException class is the base class of all the exceptions. It has four sub-classes.
- Exception – this is the base class for all non-exit exceptions.
- GeneratorExit – Request that a generator exit.
- KeyboardInterrupt – Program interrupted by the user.
- SystemExit – Request to exit from the interpreter.
Some Built-In Exception Classes
Some of the built-in exception classes in Python are:
- ArithmeticError – this is the base class for arithmetic errors.
- AssertionError – raised when an assertion fails.
- AttributeError – when the attribute is not found.
- BufferError
- EOFError – reading after end of file
- ImportError – when the imported module is not found.
- LookupError – base exception for lookup errors.
- MemoryError – when out of memory occurs
- NameError – when a name is not found globally.
- OSError – base class for I/O errors
- ReferenceError
- RuntimeError
- StopIteration, StopAsyncIteration
- SyntaxError – invalid syntax
- SystemError – internal error in the Python Interpreter.
- TypeError – invalid argument type
- ValueError – invalid argument value
Some Built-In Warning Classes
The Warning class is the base class for all the warnings. It has the following sub-classes.
- BytesWarning – bytes, and buffer related warnings, mostly related to string conversion and comparison.
- DeprecationWarning – warning about deprecated features
- FutureWarning – base class for warning about constructs that will change semantically in the future.
- ImportWarning – warning about mistakes in module imports
- PendingDeprecationWarning – warning about features that will be deprecated in future.
- ResourceWarning – resource usage warnings
- RuntimeWarning – warnings about dubious runtime behavior.
- SyntaxWarning – warning about dubious syntax
- UnicodeWarning – Unicode conversion-related warnings
- UserWarning – warnings generated by the user code
Handling Multiple Exceptions in a Single Except Block
A try block can have multiple except blocks. We can catch specific exceptions in each of the except blocks.
def divide(x, y):
try:
print(f'{x}/{y} is {x / y}')
except ZeroDivisionError as e:
print(e)
except TypeError as e:
print(e)
except ValueError as e:
print(e)
The code in every except block is the same. In this scenario, we can handle multiple exceptions in a single except block. We can pass a tuple of exception objects to an except block to catch multiple exceptions.
def divide(x, y):
try:
print(f'{x}/{y} is {x / y}')
except (ZeroDivisionError, TypeError, ValueError) as e:
print(e)
Catch-All Exceptions in a Single Except Block
If we don’t specify any exception class in the except block, it will catch all the exceptions raised by the try block. It’s beneficial to have this when we don’t know about the exceptions that the try block can raise.
The empty except clause must be the last one in the exception handling chain.
def divide(x, y):
try:
print(f'{x}/{y} is {x / y}')
except ZeroDivisionError as e:
print(e)
except:
print("unknown error occurred")
Using else Block with try-except
The else block code is optional. It’s executed when there are no exceptions raised by the try block.
def divide(x, y):
try:
print(f'{x}/{y} is {x / y}')
except ZeroDivisionError as e:
print(e)
else:
print("divide() function worked fine.")
divide(10, 2)
divide(10, 0)
divide(10, 4)
Output:
The else block code executed twice when the divide() function try block worked without any exception.
Using finally Block with try-except
The finally block code is executed in all the cases, whether there is an exception or not. The finally block is used to close resources and perform clean-up activities.
def divide(x, y):
try:
print(f'{x}/{y} is {x / y}')
except ZeroDivisionError as e:
print(e)
else:
print("divide() function worked fine.")
finally:
print("close all the resources here")
divide(10, 2)
divide(10, 0)
divide(10, 4)
Output:
Now that we have seen everything related to exception handling in Python, the final syntax is:
try -> except 1...n -> else -> finally
We can have many except blocks for a try block. But, we can have only one else and finally block.
Creating Custom Exception Class
We can create a custom exception class by extending Exception class. The best practice is to create a base exception and then derive other exception classes. Here are some examples of creating user-defined exception classes.
class EmployeeModuleError(Exception):
"""Base Exception Class for our Employee module"""
pass
class EmployeeNotFoundError(EmployeeModuleError):
"""Error raised when employee is not found in the database"""
def __init__(self, emp_id, msg):
self.employee_id = emp_id
self.error_message = msg
class EmployeeUpdateError(EmployeeModuleError):
"""Error raised when employee update fails"""
def __init__(self, emp_id, sql_error_code, sql_error_msg):
self.employee_id = emp_id
self.error_message = sql_error_msg
self.error_code = sql_error_code
The naming convention is to suffix the name of exception class with “Error”.
Raising Exceptions
We can use raise keyword to throw an exception from our code. Some of the possible scenarios are:
- Function input parameters validation fails
- Catching an exception and then throwing a custom exception
class ValidationError(Exception):
pass
def divide(x, y):
try:
if type(x) is not int:
raise TypeError("Unsupported type")
if type(y) is not int:
raise TypeError("Unsupported type")
except TypeError as e:
print(e)
raise ValidationError("Invalid type of arguments")
if y is 0:
raise ValidationError("We can't divide by 0.")
try:
divide(10, 0)
except ValidationError as ve:
print(ve)
try:
divide(10, "5")
except ValidationError as ve:
print(ve)
Output:
We can't divide by 0. Unsupported type Invalid type of arguments
Nested try-except Blocks Example
We can have nested try-except blocks in Python. In this case, if an exception is raised in the nested try block, the nested except block is used to handle it. In case the nested except is not able to handle it, the outer except blocks are used to handle the exception.
x = 10
y = 0
try:
print("outer try block")
try:
print("nested try block")
print(x / y)
except TypeError as te:
print("nested except block")
print(te)
except ZeroDivisionError as ze:
print("outer except block")
print(ze)
Output:
outer try block nested try block outer except block division by zero
Python Exception Handling Best Practices
- Always try to handle the exception in the code to avoid abnormal termination of the program.
- When creating a custom exception class, suffix its name with “Error”.
- If the except clauses have the same code, try to catch multiple exceptions in a single except block.
- Use finally block to close heavy resources and remove heavy objects.
- Use else block to log successful execution of the code, send notifications, etc.
- Avoid bare except clause as much as possible. If you don’t know about the exceptions, then only use it.
- Create module-specific exception classes for specific scenarios.
- You can catch exceptions in an except block and then raise another exception that is more meaningful.
- Always raise exceptions with meaningful messages.
- Avoid nested try-except blocks because it reduces the readability of the code.
References:
- Python Exception Handling Documentation
def __init__(self, raw_hdfs_file, fs, mode, encoding=None, errors=None):
self.mode = mode
self.base_mode, is_text = common.parse_mode(self.mode)
self.buff_size = raw_hdfs_file.buff_size
if self.buff_size <= 0:
self.buff_size = common.BUFSIZE
if is_text:
self.__encoding = encoding or self.__class__.ENCODING
self.__errors = errors or self.__class__.ERRORS
try:
codecs.lookup(self.__encoding)
codecs.lookup_error(self.__errors)
except LookupError as e:
raise ValueError(e)
else:
if encoding:
raise ValueError(
"binary mode doesn't take an encoding argument")
if errors:
raise ValueError("binary mode doesn't take an errors argument")
self.__encoding = self.__errors = None
cls = io.BufferedReader if self.base_mode == "r" else io.BufferedWriter
self.f = cls(raw_hdfs_file, buffer_size=self.buff_size)
self.__fs = fs
info = fs.get_path_info(self.f.raw.name)
self.__name = info["name"]
self.__size = info["size"]
self.closed = False
def validate_encoding_error_handler(setting, value, option_parser, config_parser=None, config_section=None):
try:
codecs.lookup_error(value)
except AttributeError: # prior to Python 2.3
if value not in ("strict", "ignore", "replace", "xmlcharrefreplace"):
raise (
LookupError(
'unknown encoding error handler: "%s" (choices: '
'"strict", "ignore", "replace", or "xmlcharrefreplace")' % value
),
None,
sys.exc_info()[2],
)
except LookupError:
raise (
LookupError(
'unknown encoding error handler: "%s" (choices: '
'"strict", "ignore", "replace", "backslashreplace", '
'"xmlcharrefreplace", and possibly others; see documentation for '
"the Python ``codecs`` module)" % value
),
None,
sys.exc_info()[2],
)
return value
def test_lookup_error(self):
#sanity
self.assertRaises(LookupError, codecs.lookup_error, "blah garbage xyz")
def garbage_error1(someError): pass
codecs.register_error("blah garbage xyz", garbage_error1)
self.assertEqual(codecs.lookup_error("blah garbage xyz"), garbage_error1)
def garbage_error2(someError): pass
codecs.register_error("some other", garbage_error2)
self.assertEqual(codecs.lookup_error("some other"), garbage_error2)
def change_encoding(file, encoding=None, errors=ERRORS):
encoding = encoding or file.encoding
errors = errors or file.errors
codecs.lookup_error(errors)
newfile = io.TextIOWrapper(file.buffer, encoding, errors,
line_buffering=file.line_buffering)
newfile.mode = file.mode
newfile._changed_encoding = True
return newfile
def register_surrogateescape():
"""
Registers the surrogateescape error handler on Python 2 (only)
"""
if utils.PY3:
return
try:
codecs.lookup_error(FS_ERRORS)
except LookupError:
codecs.register_error(FS_ERRORS, surrogateescape_handler)
def validate_encoding_error_handler(setting, value, option_parser,
config_parser=None, config_section=None):
try:
codecs.lookup_error(value)
except LookupError:
raise LookupError(
'unknown encoding error handler: "%s" (choices: '
'"strict", "ignore", "replace", "backslashreplace", '
'"xmlcharrefreplace", and possibly others; see documentation for '
'the Python ``codecs`` module)' % value)
return value
def test_lookup(self):
self.assertEquals(codecs.strict_errors, codecs.lookup_error("strict"))
self.assertEquals(codecs.ignore_errors, codecs.lookup_error("ignore"))
self.assertEquals(codecs.strict_errors, codecs.lookup_error("strict"))
self.assertEquals(
codecs.xmlcharrefreplace_errors,
codecs.lookup_error("xmlcharrefreplace")
)
self.assertEquals(
codecs.backslashreplace_errors,
codecs.lookup_error("backslashreplace")
)
def test_lookup(self):
if test_support.due_to_ironpython_bug("http://tkbgitvstfat01:8080/WorkItemTracking/WorkItem.aspx?artifactMoniker=148421"):
return
self.assertEquals(codecs.strict_errors, codecs.lookup_error("strict"))
self.assertEquals(codecs.ignore_errors, codecs.lookup_error("ignore"))
self.assertEquals(codecs.strict_errors, codecs.lookup_error("strict"))
self.assertEquals(
codecs.xmlcharrefreplace_errors,
codecs.lookup_error("xmlcharrefreplace")
)
self.assertEquals(
codecs.backslashreplace_errors,
codecs.lookup_error("backslashreplace")
)
def latscii_error( uerr ):
key = ord(uerr.object[uerr.start:uerr.end])
try:
return unichr(decoding_map[key]), uerr.end
except KeyError:
handler = codecs.lookup_error('replace')
return handler(uerr)
def open_file_read_unicode(fname, which_error_handler="replace-if-possible"):
"""Open and read the file named 'fname', returning a Unicode string.
It will also try to gloss over any Unicode-decoding errors that may occur,
such as:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x97 in position 867373: invalid start byte
It will return the string read (as a Unicode string object), plus a boolean
value of whether the string contains non-ASCII Unicode. It will also return
a list of objects describing any Unicode-decoding errors that occurred.
(So IN SUMMARY, it returns a tuple of THREE ITEMS. I HOPE THIS IS CLEAR.)
"""
error_handler = codecs.lookup_error(which_error_handler)
error_handler.reset()
# Note that we open the file with the encoding "utf-8-sig", since this
# encoding will remove the BOM (byte-order mark) if present.
# See http://docs.python.org/library/codecs.html ; search for "-sig".
f = codecs.open(fname, encoding="utf-8-sig", errors=which_error_handler)
# 's' will be a Unicode string, which may or may not contain non-ASCII.
s = f.read()
return (s, contains_non_ascii_unicode(s), error_handler.errors)
def validate_encoding_error_handler(name, value):
try:
codecs.lookup_error(value)
except AttributeError: # prior to Python 2.3
if value not in ('strict', 'ignore', 'replace'):
raise (LookupError(
'unknown encoding error handler: "%s" (choices: '
'"strict", "ignore", or "replace")' % value),
None, sys.exc_info()[2])
except LookupError:
raise (LookupError(
'unknown encoding error handler: "%s" (choices: '
'"strict", "ignore", "replace", "backslashreplace", '
'"xmlcharrefreplace", and possibly others; see documentation for '
'the Python ``codecs`` module)' % value),
None, sys.exc_info()[2])
return value
def quote_ident(self, str):
encodable = str.encode("utf-8", "strict").decode("utf-8")
nul_index = encodable.find("x00")
if nul_index >= 0:
error = UnicodeEncodeError("NUL-terminated utf-8", encodable, nul_index, nul_index + 1, "NUL not allowed")
error_handler = codecs.lookup_error(errors)
replacement, _ = error_handler(error)
encodable = encodable.replace("x00", replacement)
return '"' + encodable.replace('"', '""') + '"'
def encode(self, input, errors='strict'):
error = codecs.lookup_error(errors)
def repl(match):
start, end = match.span()
return encoding_map.get(match.group()) or
error(UnicodeEncodeError(encoding, input, start, end,
"undefined conversion emoji"))[0]
output = google_emoji_re.sub(repl, input)
return (base_codec.encode(output, errors)[0], len(input))
def quote_identifier(s, errors="strict"):
encodable = s.encode("utf-8", errors).decode("utf-8")
nul_index = encodable.find("x00")
if nul_index >= 0:
error = UnicodeEncodeError("NUL-terminated utf-8", encodable, nul_index, nul_index + 1, "NUL not allowed")
error_handler = codecs.lookup_error(errors)
replacement, _ = error_handler(error)
encodable = encodable.replace("x00", replacement)
return """ + encodable.replace(""", """") + """
def convert(conv, data, final, errors='strict'):
try:
res = conv.convert(data, finished=final,
options=Option.DontUseReplacementChar)
return (res,len(res))
except UnicodeEncodeError as uerr:
rep,rp = codecs.lookup_error(errors)(uerr)
try:
prefix = conv.convert(uerr.object[:uerr.start] + rep, finished=final,
options=Option.DontUseReplacementChar)
except UnicodeEncodeError:
raise UnicodeEncodeError(*(uerr.args[:4] + ('cannot convert replacement %r to target encoding' % rep,)))
suffix = Codec.convert(conv, data[rp:], final, errors)
return (prefix+suffix[0],rp+suffix[1])
except UnicodeDecodeError as uerr:
rep,rp = codecs.lookup_error(errors)(uerr)
prefix = conv.convert(uerr.object[:uerr.start], finished=final,
options=Option.DontUseReplacementChar)
suffix = Codec.convert(conv, data[rp:], final, errors)
return (prefix+rep+suffix[0],rp+suffix[1])
def imap_utf7_decode(input, errors='strict'):
error = codecs.lookup_error(errors)
output = []
shifted = 0
b64 = False
i = 0
while i < len(input):
b = input[i]
if b64:
if b == 0x2d: # '-'
if shifted == i:
output.append('&')
else:
dec = bytes(input[shifted:i]) + b'=' * ((4 - (i - shifted)) % 4)
try:
utf16 = base64.b64decode(dec, altchars=b'+,', validate=True)
output.append(utf16.decode('utf-16-be'))
except (binascii.Error, UnicodeDecodeError) as e:
if isinstance(e, binascii.Error):
reason = 'invalid Base64'
else:
reason = 'invalid UTF-16BE'
exc = UnicodeDecodeError('imap-utf-7', input, shifted - 1, i + 1,
reason)
replace, i = error(exc)
shifted = i
output.append(replace)
b64 = False
continue
shifted = i + 1
b64 = False
else:
if b == 0x26: # '&'
output.append(codecs.decode(input[shifted:i], 'ascii'))
shifted = i + 1
b64 = True
if b < 0x20 or b > 0x7e:
output.append(codecs.decode(input[shifted:i], 'ascii'))
exc = UnicodeDecodeError('imap-utf-7', input, i, i + 1,
'character must be Base64 encoded')
replace, i = error(exc)
shifted = i
output.append(replace)
continue
i += 1
if b64:
exc = UnicodeDecodeError('imap-utf-7', input, len(input), len(input),
'input does not end in US-ASCII')
replace, cont = error(exc)
output.append(replace)
else:
output.append(codecs.decode(input[shifted:], 'ascii'))
return ''.join(output), len(input)
def _quote_identifier(self, s, errors="ignore"):
encodable = s.encode("utf-8", errors).decode("utf-8")
nul_index = encodable.find("x00")
if nul_index >= 0:
error = UnicodeEncodeError("utf-8", encodable, nul_index, nul_index + 1, "NUL not allowed")
error_handler = codecs.lookup_error(errors)
replacement, _ = error_handler(error)
encodable = encodable.replace("x00", replacement)
return u'"' + encodable.replace('"', '""') + u'"'
def _fscodec():
encoding = sys.getfilesystemencoding()
if encoding == 'mbcs':
errors = 'strict'
else:
try:
from codecs import lookup_error
lookup_error('surrogateescape')
except LookupError:
errors = 'strict'
else:
errors = 'surrogateescape'
def fsencode(filename):
"""
Encode filename to the filesystem encoding with 'surrogateescape' error
handler, return bytes unchanged. On Windows, use 'strict' error handler if
the file system encoding is 'mbcs' (which is the default encoding).
"""
if isinstance(filename, six.binary_type):
return filename
elif isinstance(filename, six.text_type):
return filename.encode(encoding, errors)
else:
raise TypeError("expect bytes or str, not %s" % type(filename).__name__)
def fsdecode(filename):
"""
Decode filename from the filesystem encoding with 'surrogateescape' error
handler, return str unchanged. On Windows, use 'strict' error handler if
the file system encoding is 'mbcs' (which is the default encoding).
"""
if isinstance(filename, six.text_type):
return filename
elif isinstance(filename, six.binary_type):
return filename.decode(encoding, errors)
else:
raise TypeError("expect bytes or str, not %s" % type(filename).__name__)
return fsencode, fsdecode
def test_fake_error_class(self):
handlers = [
codecs.strict_errors,
codecs.ignore_errors,
codecs.replace_errors,
codecs.backslashreplace_errors,
codecs.xmlcharrefreplace_errors,
codecs.lookup_error('surrogateescape'),
codecs.lookup_error('surrogatepass'),
]
for cls in UnicodeEncodeError, UnicodeDecodeError, UnicodeTranslateError:
class FakeUnicodeError(str):
__class__ = cls
for handler in handlers:
with self.subTest(handler=handler, error_class=cls):
self.assertRaises(TypeError, handler, FakeUnicodeError())
class FakeUnicodeError(Exception):
__class__ = cls
for handler in handlers:
with self.subTest(handler=handler, error_class=cls):
with self.assertRaises((TypeError, FakeUnicodeError)):
handler(FakeUnicodeError())
def ignore_unicode_errors(errors='ignore'):
"""Overwrite the ``strict`` codecs error handler temporarily.
This is useful e.g. if the engine truncates a string, which results in a
string that contains a splitted multi-byte character at the end of the
string.
:param str errors:
Error handler that will be looked up via :func:`codecs.lookup_error`.
:raise LookupError:
Raised if the error handler was not found.
Example:
.. code:: python
import memory
# Allocate four bytes to create an erroneous string
ptr = memory.alloc(4)
# Write data to the memory that will usually result in a
# UnicodeDecodeError
ptr.set_uchar(ord('a'), 0)
ptr.set_uchar(ord('b'), 1)
ptr.set_uchar(226, 2) # Add the invalid byte
ptr.set_uchar(0, 3) # Indicate the end of the string
with ignore_unicode_errors():
# Read the data as a string. Now, it will only print 'ab', because
# the invalid byte has been removed/ignored.
print(ptr.get_string_array())
"""
old_handler = codecs.lookup_error('strict')
codecs.register_error('strict', codecs.lookup_error(errors))
try:
yield
finally:
codecs.register_error('strict', old_handler)
def latscii_error(uerr):
text = uerr.object[uerr.start:uerr.end]
ret = ''
for c in text:
key = ord(c)
try:
ret += unichr(decoding_map[key])
except KeyError:
handler = codecs.lookup_error('replace')
return handler(uerr)
return ret, uerr.end
def _quote(s, errors='strict'):
encodable = s.encode('utf-8', errors).decode('utf-8')
nul_index = encodable.find('x00')
if nul_index >= 0:
error = UnicodeEncodeError('NUL-terminated utf-8', encodable,
nul_index, nul_index + 1, 'NUL not allowed')
error_handler = codecs.lookup_error(errors)
replacement, _ = error_handler(error)
encodable = encodable.replace('x00', replacement)
return '"' + encodable.replace('"', '""') + '"'
def _quote_id(self, s, errors=u"strict"):
encodable = s.encode("utf-8", errors).decode(u"utf-8")
nul_index = encodable.find(u"x00")
if nul_index >= 0:
error = UnicodeEncodeError(u"NUL-terminated utf-8", encodable,
nul_index, nul_index + 1, u"NUL not allowed")
error_handler = codecs.lookup_error(errors)
replacement, _ = error_handler(error)
encodable = encodable.replace(u"x00", replacement)
return u""" + encodable.replace(u""", u"""") + u"""
def __init__(self, transmogrifier, name, options, previous):
self.previous = previous
if options.get('from'):
from_ = options['from'].strip().lower()
if from_ != 'unicode':
if from_ == 'default':
from_ = _get_default_encoding(transmogrifier.context)
# Test if the decoder is available
codecs.getdecoder(from_)
self.from_ = from_
self.from_error_handler = options.get(
'from-error-handler', self.from_error_handler).strip().lower()
# Test if the error handler is available
codecs.lookup_error(self.from_error_handler)
if options.get('to'):
to = options['to'].strip().lower()
if to != 'unicode':
if to == 'default':
to = _get_default_encoding(transmogrifier.context)
# Test if the encoder is available
codecs.getencoder(to)
self.to = to
self.to_error_handler = options.get(
'to-error-handler', self.to_error_handler).strip().lower()
# Test if the error handler is available
codecs.lookup_error(self.to_error_handler)
self.matcher = Matcher(*options['keys'].splitlines())
self.condition = Condition(options.get('condition', 'python:True'),
transmogrifier, name, options)
def quote_identifier(s, errors="strict"):
# Quotes a SQLite identifier. Source: http://stackoverflow.com/a/6701665
encodable = s.encode("utf-8", errors).decode("utf-8")
nul_index = encodable.find("x00")
if nul_index >= 0:
error = UnicodeEncodeError("NUL-terminated utf-8", encodable,
nul_index, nul_index + 1, "NUL not allowed")
error_handler = codecs.lookup_error(errors)
replacement, _ = error_handler(error)
encodable = encodable.replace("x00", replacement)
return """ + encodable.replace(""", """") + """
def _fscodec():
encoding = sys.getfilesystemencoding()
errors = "strict"
if encoding != "mbcs":
try:
codecs.lookup_error("surrogateescape")
except LookupError:
pass
else:
errors = "surrogateescape"
def fsencode(filename):
"""
Encode filename to the filesystem encoding with 'surrogateescape' error
handler, return bytes unchanged. On Windows, use 'strict' error handler if
the file system encoding is 'mbcs' (which is the default encoding).
"""
if isinstance(filename, bytes):
return filename
else:
return filename.encode(encoding, errors)
return fsencode
def test_longstrings(self):
# test long strings to check for memory overflow problems
errors = [ "strict", "ignore", "replace", "xmlcharrefreplace", "backslashreplace"]
# register the handlers under different names,
# to prevent the codec from recognizing the name
for err in errors:
codecs.register_error("test." + err, codecs.lookup_error(err))
l = 1000
errors += [ "test." + err for err in errors ]
for uni in [ s*l for s in (u"x", u"u3042", u"axe4") ]:
for enc in ("ascii", "latin-1", "iso-8859-1", "iso-8859-15", "utf-8", "utf-7", "utf-16"):
for err in errors:
try:
uni.encode(enc, err)
except UnicodeError:
pass
def quote_identifier(s, errors="replace"):
'''
SqLite does not provide an identifier sanitizer so we use this method
'''
encodable = s.encode("utf-8", errors).decode("utf-8")
nul_index = encodable.find("x00")
if nul_index >= 0:
error = UnicodeEncodeError("NUL-terminated utf-8", encodable,
nul_index, nul_index + 1, "NUL not allowed")
error_handler = codecs.lookup_error(errors)
replacement, _ = error_handler(error)
encodable = encodable.replace("x00", replacement)
return """ + encodable.replace(""", """") + """
def decode(self, input, errors='strict', final=True):
error_function = codecs.lookup_error(errors)
input_buffer = ByteBuffer.wrap(array('b', input))
decoder = Charset.forName(self.encoding).newDecoder()
output_buffer = CharBuffer.allocate(min(max(int(len(input) / 2), 256), 1024))
builder = StringBuilder(int(decoder.averageCharsPerByte() * len(input)))
while True:
result = decoder.decode(input_buffer, output_buffer, False)
pos = output_buffer.position()
output_buffer.rewind()
builder.append(output_buffer.subSequence(0, pos))
if result.isUnderflow():
if final:
_process_incomplete_decode(self.encoding, input, error_function, input_buffer, builder)
break
_process_decode_errors(self.encoding, input, result, error_function, input_buffer, builder)
return builder.toString(), input_buffer.position()
def test_badandgoodsurrogateescapeexceptions(self):
surrogateescape_errors = codecs.lookup_error('surrogateescape')
# "surrogateescape" complains about a non-exception passed in
self.assertRaises(
TypeError,
surrogateescape_errors,
42
)
# "surrogateescape" complains about the wrong exception types
self.assertRaises(
TypeError,
surrogateescape_errors,
UnicodeError("ouch")
)
# "surrogateescape" can not be used for translating
self.assertRaises(
TypeError,
surrogateescape_errors,
UnicodeTranslateError("udc80", 0, 1, "ouch")
)
# Use the correct exception
for s in ("a", "udc7f", "udd00"):
with self.subTest(str=s):
self.assertRaises(
UnicodeEncodeError,
surrogateescape_errors,
UnicodeEncodeError("ascii", s, 0, 1, "ouch")
)
self.assertEqual(
surrogateescape_errors(
UnicodeEncodeError("ascii", "audc80b", 1, 2, "ouch")),
(b"x80", 2)
)
self.assertRaises(
UnicodeDecodeError,
surrogateescape_errors,
UnicodeDecodeError("ascii", bytearray(b"a"), 0, 1, "ouch")
)
self.assertEqual(
surrogateescape_errors(
UnicodeDecodeError("ascii", bytearray(b"ax80b"), 1, 2, "ouch")),
("udc80", 2)
)


