715

I've very recently migrated to Py 3.5. This code was working properly in Python 2.7:

with open(fname, 'rb') as f:
    lines = [x.strip() for x in f.readlines()]

for line in lines:
    tmp = line.strip().lower()
    if 'some-pattern' in tmp: continue
    # ... code

After upgrading to 3.5, I'm getting the:

TypeError: a bytes-like object is required, not 'str'

error on the last line (the pattern search code).

I've tried using the .decode() function on either side of the statement, also tried:

if tmp.find('some-pattern') != -1: continue

- to no avail.

I was able to resolve almost all 2:3 issues quickly, but this little statement is bugging me.

Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997
masroore
  • 7,815
  • 3
  • 15
  • 21
  • 13
    Why are you opening the file in binary mode but treat it as text? – Martijn Pieters Oct 10 '15 at 13:28
  • 5
    @MartijnPieters thanks for spotting the file open mode! Changing it to text-mode solved the issue... the code had worked reliably in Py2k for many years though... – masroore Oct 10 '15 at 13:30
  • 4
    @masroore see: https://www.python.org/dev/peps/pep-0404/#strings-and-bytes – Roberto Oct 10 '15 at 13:56
  • 11
    I am encountering this too where I have a requests `result = requests.get` and I attempt to `x = result.content.split("\n")`. I am a little confused by the error message because it seems to imply that `result.content` is a string and `.split()` is requiring a bytes-like object..?? ( "a bytes-like object is required, not 'str"').. –  Feb 25 '17 at 18:04

9 Answers9

658

You opened the file in binary mode:

with open(fname, 'rb') as f:

This means that all data read from the file is returned as bytes objects, not str. You cannot then use a string in a containment test:

if 'some-pattern' in tmp: continue

You'd have to use a bytes object to test against tmp instead:

if b'some-pattern' in tmp: continue

or open the file as a textfile instead by replacing the 'rb' mode with 'r'.

Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997
  • 16
    If you peek at the various documents that ppl have linked to, you'll see that everything "worked" in Py2 because default strings were bytes whereas in Py3, default strings are Unicode, meaning that any time you're doing I/O, esp. networking, byte strings are the standard, so you must learn to move b/w Unicode & bytes strings (en/decode). For files, we now have "r" vs. "rb" (and for 'w' & 'a') to help differentiate. – wescpy Mar 06 '17 at 06:24
  • 4
    @wescpy: Python 2 has `'r'` vs `'rb'` *too*, switching between binary and text file behaviours (like translating newlines and on certain platforms, how the EOF marker is treated). That the `io` library (providing the default I/O functionality in Python 3 but also available in Python 2) now *also decodes* text files by default is the real change. – Martijn Pieters Mar 06 '17 at 07:44
  • 2
    @MartijnPieters: Yes, agreed. In 2.x, I only used the `'b'` flag when having to work with binary files on DOS/Windows (as binary is the POSIX default). It's good that there is a dual purpose when using `io` in 3.x for file access. – wescpy Mar 07 '17 at 02:14
  • `r` does not work with `zipfile` 's `.open()`. **Example:** `def get_aoi1(zip): z = zipfile.ZipFile(zip) for f in z.namelist(): with z.open(f, 'r') as rptf: for l in rptf.readlines(): if l.find("$$") != -1: return l.split('=') else: return print(l) test = get_aoi1('testZip.zip')` – ericOnline Jan 06 '21 at 23:33
  • 2
    @ericOnline `ZipFile.open()` docs [explicitly state that only binary mode is supported](https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.open) (*Access a member of the archive as a **binary** file-like object*). You can wrap the file object in [`io.TextIOWrapper()`](https://docs.python.org/3/library/io.html#io.TextIOWrapper) to achieve the same effect. – Martijn Pieters Jan 07 '21 at 22:01
  • 1
    @ericOnline also, don’t use `.readlines()` when you can iterate over the file object directly. Especially when you only need info from a single line. Why read everything into memory when that info could be found in the first buffered block? – Martijn Pieters Jan 07 '21 at 22:12
269

You can encode your string by using .encode()

Example:

'Hello World'.encode()
Yahya
  • 645
  • 6
  • 21
theofpa
  • 3,379
  • 1
  • 12
  • 9
61

Like it has been already mentioned, you are reading the file in binary mode and then creating a list of bytes. In your following for loop you are comparing string to bytes and that is where the code is failing.

Decoding the bytes while adding to the list should work. The changed code should look as follows:

with open(fname, 'rb') as f:
    lines = [x.decode('utf8').strip() for x in f.readlines()]

The bytes type was introduced in Python 3 and that is why your code worked in Python 2. In Python 2 there was no data type for bytes:

>>> s=bytes('hello')
>>> type(s)
<type 'str'>
Suresh
  • 1,347
  • 9
  • 7
  • 1
    Python 2 does indeed have a type for bytes, it's just confusingly called `str` while the type for text strings is called `unicode`. In Python 3 they changed the meaning of `str` so that it was the same as the old `unicode` type, and renamed the old `str` to `bytes`. They also removed a bunch of cases where it would automatically try to convert from one to the other. – Mark Ransom Feb 11 '21 at 18:38
29

You have to change from wb to w:

def __init__(self):
    self.myCsv = csv.writer(open('Item.csv', 'wb')) 
    self.myCsv.writerow(['title', 'link'])

to

def __init__(self):
    self.myCsv = csv.writer(open('Item.csv', 'w'))
    self.myCsv.writerow(['title', 'link'])

After changing this, the error disappears, but you can't write to the file (in my case). So after all, I don't have an answer?

Source: How to remove ^M

Changing to 'rb' brings me the other error: io.UnsupportedOperation: write

Community
  • 1
  • 1
meck373
  • 988
  • 1
  • 16
  • 27
16

for this small example:

import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com', 80))
mysock.send(**b**'GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n')

while True:
    data = mysock.recv(512)
    if ( len(data) < 1 ) :
        break
    print (data);

mysock.close()

adding the "b" before 'GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n' solved my problem

Thirumal
  • 3,128
  • 2
  • 24
  • 45
starter
  • 161
  • 1
  • 2
14

Use encode() function along with hardcoded String value given in a single quote.

Ex:

file.write(answers[i] + '\n'.encode())

OR

line.split(' +++$+++ '.encode())
Shiv Buyya
  • 2,321
  • 22
  • 20
12

You opened the file in binary mode:

The following code will throw a TypeError: a bytes-like object is required, not 'str'.

for line in lines:
    print(type(line))# <class 'bytes'>
    if 'substring' in line:
       print('success')

The following code will work - you have to use the decode() function:

for line in lines:
    line = line.decode()
    print(type(line))# <class 'str'>
    if 'substring' in line:
       print('success')
Matan Hugi
  • 912
  • 6
  • 15
6

why not try opening your file as text?

with open(fname, 'rt') as f:
    lines = [x.strip() for x in f.readlines()]

Additionally here is a link for python 3.x on the official page: https://docs.python.org/3/library/io.html And this is the open function: https://docs.python.org/3/library/functions.html#open

If you are really trying to handle it as a binary then consider encoding your string.

2

I got this error when I was trying to convert a char (or string) to bytes, the code was something like this with Python 2.7:

# -*- coding: utf-8 -*-
print( bytes('ò') )

This is the way of Python 2.7 when dealing with unicode chars.

This won't work with Python 3.6, since bytes require an extra argument for encoding, but this can be little tricky, since different encoding may output different result:

print( bytes('ò', 'iso_8859_1') ) # prints: b'\xf2'
print( bytes('ò', 'utf-8') ) # prints: b'\xc3\xb2'

In my case I had to use iso_8859_1 when encoding bytes in order to solve the issue.

Hope this helps someone.

Ibrahim.H
  • 618
  • 1
  • 8
  • 14
  • 1
    Note that the `coding` comment at the top of the file doesn't affect the way `bytes` or `encode` works, it only changes the way characters in your Python source are interpreted. – Mark Ransom Feb 11 '21 at 18:43