Python Tricks: A Shocking Truth About String Formatting

Python Tricks: A Shocking Truth About String Formatting,第1张

Python Tricks: A Shocking Truth About String Formatting

Remember the Zen of Python and how there should be “one obvious way to do something ?” You might scratch your head when you find out that there are four major ways to do string formatting in Python.

Today I’ll demonstrate how these four string formatting approaches work and what their respective strengths and weaknesses are. I’ll also give you my simple “rule of thumb”(经验法则) for how I pick the best general-purpose string formatting approach.

Let’s jump right in, as we’ve got a lot to cover. In order to have a simple toy example for experimentation, let’s assume we’ve got the following variables(or constants, really) to work with:

>>> errno = 50159747054
>>> name = 'Bob'

And based on these variables we’d like to generate an output string with the following error message:

'Hey Bob, there is  a 0xbadc0ffee error!'

Now, that error could really spoil a dev’s Monday morning! But we’re here to discuss string formatting today. So let’s get to work.

#1 - “Old Style” String Formatting

Strings in Python have a unique built-in operation that can be accessed with the %-operator. It’s a shortcut that lets you do simple positional formatting very easily. If you’ve ever worked with a printf-style function in C, you’ll instantly recognize how this works. Here’s a simple example:

>>> 'Hello, %s' % name
'Hello, Bob'

I’m using the %s format specifier here to tell Python where to subsitute the value of name, represented as a string. This is called “old style” string formatting.

In old style string formatting there are also other format specifiers available that let you control the output string. For example, it’s possible to convert numbers to hexadecimal notation or to add whitespace padding to generate nicely formatted tables and reports.

Here, I’m using the %x format specifier to convert an int value to a string and to represent it as a hexadecimal number:

>>> '%x' % errno
'badc0ffee'

The “old style” string formatting syntax changes slightly if you want to make multiple substitutions in a single string. Because the %-operator only takes one argument, you need to wrap the right-hand side in a tuple, like so:

>>> 'Hey %s, there is a 0x%x error!' %(name, errno)
'Hey Bob, there is a 0xbadc0ffee error!'

It’s also possible to refer to variable subsitutions by name in your format string, if you pass a mapping to the %-operator:

>>> 'Hey %(name)s, there is a 0x%(errno)x error!' % {"name":name, "errno":errno}'Hey Bob, there is a 0xbadc0ffee error!'

This makes your format strings easier to maintain and easier to modify in the future. You don’t have to worry about making sure the order you’re passing in the values matches up with the order the values are referenced in the format string.

I’m sure you’ve been wondering why this printf-style formatting is called “old style” string formatting. Well, let me tell you. It was technically superseded by “new style” formatting, which we’re going to talk about in a minute. But while “old style” formatting has been deemphasized, it hasn’t been deprecated. It is still supported in the latest version of Python.

#2 - “New Style” String Formatting

Python3 introduced a new way to do string formatting that was also later back-ported to Python2.7. This “new style” string formatting gets rid of the %-operator specifier syntax and makes the syntax for string formatting more regular. Formatting is now handled by calling a format() function on a string object.

You can use the format() function to do simple positional formatting, just like you could with “old style” formatting:

>>> 'Hello, {}'.format(name)
'Hello, Bob'

Or , you can refer to your variable substitutions by name and use them in any order you want. This is quite a powerful feature as it allows for re-arranging the order of display without changing the arguments passed to the format function:

>>> 'Hey {name}, there is a 0x{errno:x} error!'.format(name=name, errno=errno)
'Hey Bob, there is a 0xbadc0ffee error!'

This also shows that the syntax to format an int variable as a hexadecimal string has changed. Now we need to pass a format spec by adding a “:x” suffix after the variable name.

Overall, the format string syntax has become more powerful without complicating the simpler use cases. It pays off to read up on this string formatting mini-language in the Python documentation.

In Python3, this “new style” string formatting is preferred over %-style formatting. However, starting with Python3.6 there’s an even better way to format your strings. I’ll tell you all about it in the next section.

#3 - Literal String Interpolation(Python3.6+)

Python3.6 adds yet another way to format strings, called Formatted String Literals. This new way of formatting strings lets you use embedded Python expressions inside string constants. Here’s a simple example to give you a feel for the feature:

>>> f'Hello, {name}'
'Hello, Bob'

This new formatting syntax is powerful. Because you can embed arbitrary Python expressions, you can even do inline arithmetic with it, like this:

>>> a = 5
>>> b = 10
>>> f'Five plus ten is {a + b} and not {2 * (a + b)}.'
'Five plus ten is 15 and not 30.'

Behind the scenes, formatted string literals are a Python parser feature that converts f-strings into a series of string constants and expressions. They then get joined up to build the final string.

Imagine we had the following greet() function that contains an f-string:

>>> def greet(name, question):
...     return f"Hello, {name}! How's it {question}?"
... 
>>> greet('Bob', 'going')
"Hello, Bob! How's it going?"

When we disassemble the function and inspect what’s going on behind the scenes, we can see that the f-string in the function gets transformed into something similar to the following:

>>> def greet(name, question):
...     return("Hello, " + name + "! How's it " + question +"?")

The real implementation is slightly faster than that because it uses the BUILD_STRING opcode as an optimization. But functionally they’re the same:

>>> import dis
>>> dis.dis(greet)
  2           0 LOAD_CONST               1 ('Hello, ')
              2 LOAD_FAST                0 (name)
              4 BINARY_ADD
              6 LOAD_CONST               2 ("! How's it ")
              8 BINARY_ADD
             10 LOAD_FAST                1 (question)
             12 BINARY_ADD
             14 LOAD_CONST               3 ('?')
             16 BINARY_ADD
             18 RETURN_VALUE

String literals also support the existing format string syntax of the str.format() method. That allows you to solve the same formatting problems we’ve discussed in the previous two sections:

>>> f"Hey {name}, there's a {errno:#x} error!" 
"Hey Bob, there's a 0xbadc0ffee error!"

Python’s new Formatted String Literals are similar to the JavaScript Template Literals added in ES2015. I think they’re quite a nice addition to the language, and I’ve already started using them in my day-to-day Python3 work. You can learn more about Formatted String Literals in the offical Python documentation.

#4 - Template Strings

One more technique for string formatting in Python is Template Strings. It’s a simpler and less powerful mechanism, but in some cases this might be exactly what you’re looking for.

Let’s take a look at a simple greeting example:

>>> from string import Template
>>> t = Template('Hey, $name!')
>>> t.substitute(name=name)
'Hey, Bob!'

You see here that we need to import the Template class from Python’s built-in string module. Template strings are not a core language feature but they’re supplied by a module in the standard library.

Another difference is that template strings don’t allow format specifiers. So in order to get our error string example to work, we need to transform our int error number into a hex-string ourselves:

>>> templ_string = 'Hey $name, there is a $error error!'
>>> Template(templ_string).substitute(name=name, error=hex(errno))
'Hey Bob, there is a 0xbadc0ffee error!'

That worked great but you’re probably wondering when you use template strings in your Python programs. In my opinion, the best use case for template strings is when you’re handling format strings generated by users of your program. Due to their reduced complexity, template strings are a safer choice.

The more complex formatting mini-languages of other string formatting techniques might introduce security vulnerabilities to your programs. For example, it’s possible for format strings to access arbitrary variables in your program.

That means, if a malicious user can supply a format string they can also potentially leak secret keys and other sensitive information!Here’s a simple proof of concept of how this attack might be used:

>>> SECRET = 'this-is-a-secret'
>>> class Error:
...     def __init__(self):
...             pass

>>> err = Error()
>>> user_input = '{error.__init__.__globals__[SECRET]}'
# Uh-oh...
>>> user_input.format(error=err)
'this-is-a-secret'

See how the hypothetical attacker was able to extract our secret string by accessing the __globals__ dictionary from the format string? Scary, huh! Template Strings close this attack vector, and this makes them a safer choice if you’re handling format strings generated from user input:

>>> user_input = '${error.__init__.__globals__[SECRET]}'
>>> Template(user_input).substitute(error=err)
Traceback (most recent call last):
  File "", line 1, in <module>
  File "/Users/liuxiaowei/opt/anaconda3/lib/python3.9/string.py", line 121, in substitute
    return self.pattern.sub(convert, self.template)
  File "/Users/liuxiaowei/opt/anaconda3/lib/python3.9/string.py", line 118, in convert
    self._invalid(mo)
  File "/Users/liuxiaowei/opt/anaconda3/lib/python3.9/string.py", line 101, in _invalid
    raise ValueError('Invalid placeholder in string: line %d, col %d' %
ValueError: Invalid placeholder in string: line 1, col 1
Which String Formatting Method Should I Use?

I totally get that having so much choice for how to format your strings in Python can feel very confusing. This would be a good time to bust out some flowchart infographic…

But I’m not going to do that. Instead, I’ll try to boil it down to the simple rule of thumb that I apply when I’m writing Python.

Here we go—you can use this rule of thum any time you’re having difficulty deciding which string formatting method to use, depending on the circumstances:

Dan’s Python String Formatting Rule of Thumb:

if your format strings are user-supplied, use Template Strings to avoid security issues. Otherwise, use Literal String Interpolation if you’re on Python3.6+, and “New Style” String Formatting if you’re not.

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/langs/915788.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-05-16
下一篇 2022-05-16

发表评论

登录后才能评论

评论列表(0条)

保存