pythonでファイルからスペース以外の特殊文字を削除するには？

Question

さらに

ソース非AMP版編集

pythonでファイルからスペース以外の特殊文字を削除するには？

膨大な量のテキスト（一行ずつ）があり、特殊文字を除去したいが、文字列のスペースや構造は維持したい。

hello? there A-Z-R_T(,**), world, welcome to python.
this **should? the next line#followed- by@ an#other %million^ %%like $this.

は、次のようになります。

hello there A Z R T world welcome to python
this should be the next line followed by another million like this

pythonlearn

編集された質問 12日 4月 2017 в 2:01

プログラミング

file regex python string string-formatting

12日 4月 2017 в 1:44

6 ビュー

Eliethesaiyan

ソース非AMP版編集

しかし、私は単純な正規表現を追加して、単語のない文字をすべて取り除きます。

print  re.sub(r'\W+', ' ', string)
>>> hello there A Z R_T world welcome to python

4

0

解説 (0)

wwii

ソース非AMP版編集

特殊文字をNoneにマッピングする辞書の作成

d = {c:None for c in special_characters}

辞書を使って翻訳テーブルを作る。テキスト全体を変数に読み込んで、テキスト全体にstr.translateを使用します。

wwii

編集した答え 12日 4月 2017 в 6:23

0

解説 (0)

Chiheb Nexus · Accepted Answer · 2017-04-12T01:57:25+00:00

このパターンは、regexでも使用できます。

import re
a = '''hello? there A-Z-R_T(,**), world, welcome to python.
this **should? the next line#followed- by@ an#other %million^ %%like $this.'''

for k in a.split("\n"):
    print(re.sub(r"[^a-zA-Z0-9]+", ' ', k))
    # Or:
    # final = " ".join(re.findall(r"[a-zA-Z0-9]+", k))
    # print(final)

出力します。

hello there A Z R T world welcome to python 
this should the next line followed by an other million like this

編集します。

そうでなければ、最終行を list に格納することができます。

final = [re.sub(r"[^a-zA-Z0-9]+", ' ', k) for k in a.split("\n")]
print(final)

出力します。

['hello there A Z R T world welcome to python ', 'this should the next line followed by an other million like this ']