re.subは、"Expected string or bytes-like object&quotでエラーになります。

Question

さらに

ソース非AMP版編集

re.subは、"Expected string or bytes-like object&quotでエラーになります。

このエラーに関する複数の投稿を読みましたが、未だに解決できません'。私の関数をループさせようとすると

def fix_Plan(location):
    letters_only = re.sub("[^a-zA-Z]",  # Search for all non-letters
                          " ",          # Replace all non-letters with spaces
                          location)     # Column and row to search    

    words = letters_only.lower().split()     
    stops = set(stopwords.words("english"))      
    meaningful_words = [w for w in words if not w in stops]      
    return (" ".join(meaningful_words))    

col_Plan = fix_Plan(train["Plan"][0])    
num_responses = train["Plan"].size    
clean_Plan_responses = []

for i in range(0,num_responses):
    clean_Plan_responses.append(fix_Plan(train["Plan"][i]))

以下がそのエラーです。

Traceback (most recent call last):
  File "C:/Users/xxxxx/PycharmProjects/tronc/tronc2.py", line 48, in <module>
    clean_Plan_responses.append(fix_Plan(train["Plan"][i]))
  File "C:/Users/xxxxx/PycharmProjects/tronc/tronc2.py", line 22, in fix_Plan
    location)  # Column and row to search
  File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36\lib\re.py", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

smci

編集された質問 29日 12月 2018 в 8:38

プログラミング

regex python pandas nltk

1日 5月 2017 в 10:47

3 ビュー

Bilal Chandio

ソース非AMP版編集

re.match()関数を使うのが良いと思います。ここに参考になる例があります。

import re
import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')
sentences = word_tokenize("I love to learn NLP \n 'a :(")
#for i in range(len(sentences)):
sentences = [word.lower() for word in sentences if re.match('^[a-zA-Z]+', word)]  
sentences

0

解説 (0)

msaif

ソース非AMP版編集

最も簡単な解決策は、ループさせようとしている列にpythonのstr関数を適用することです。

pandasを使用している場合を使用している場合、これは次のように実装できます。

dataframe['カラム名']=dataframe['カラム名'].apply(str)

0

解説 (0)

abccd · Accepted Answer · 2017-05-01T23:08:27+00:00

コメントにあるように、いくつかの値は文字列ではなく、浮動小数点であるように見えました。これを re.sub に渡す前に文字列に変更する必要があります。最も簡単な方法は、re.sub を使用する際に location を str(location) に変更することです。たとえそれがすでに str であったとしても、変更しておいて損はないでしょう。

letters_only = re.sub("[^a-zA-Z]",  # Search for all non-letters
                          " ",          # Replace all non-letters with spaces
                          str(location))