최선의 방법은 문자열에서 스트립 문장 부호

Question

원본 비 AMP 버전 편집

최선의 방법은 문자열에서 스트립 문장 부호

비해 보다 간편하게 선행돼야 할 것 같다.

import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("",""), string.punctuation)

있나요?

Georgy

편집된 질문26일 6월 2019 в 1:36

프로그래밍

python string punctuation

5일 11월 2008 в 5:30

21 파운드

질문에 대한 의견 (7)

Eratosthenes

원본 비 AMP 버전 편집

정규식 아는 경우 정도로 간단한 저들이요

import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)

Eratosthenes

편집된 답변16일 9월 2019 в 5:36

121

0

해설 (4)

SparkAndShine

원본 비 AMP 버전 편집

내가 사용 편리성을 위해, sum up the 적어두십시오 스트라이핑을 문자열으로 파이썬 2 와 Python 에서 문장 모두 3. 다른 대한 상세한 설명을 참조하십시오.

파이썬 2*

import string

s = "string. With. Punctuation?"
table = string.maketrans("","")
new_s = s.translate(table, string.punctuation)      # Output: string without punctuation

파이썬 3*

import string

s = "string. With. Punctuation?"
table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
new_s = s.translate(table)                          # Output: string without punctuation

SparkAndShine

편집된 답변9일 10월 2019 в 12:54

64

0

해설 (0)

pyrou

원본 비 AMP 버전 편집

myString.translate(None, string.punctuation)

51

0

해설 (7)

S.Lott

원본 비 AMP 버전 편집

난 대개 이 같은 일이.

>>> s = "string. With. Punctuation?" # Sample string
>>> import string
>>> for c in string.punctuation:
...     s= s.replace(c,"")
...
>>> s
'string With Punctuation'

27

0

해설 (2)

Björn Lindqvist

원본 비 AMP 버전 편집

'만' 은 ASCII 스테링스펀치우이션 ! (물론 훨씬 속도가 느린) 보다 정확한 방법은 이니고데다타 모듈에서는 사용합니다.

# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with -  «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s

뿐만 아니라 다른 종류의 문자 및 스트라이프 일반화할 수 있습니다.

''.join(ch for ch in s if category(ch)[0] != 'SP')

또한 다음과 같은 문자를 스트립 *+& # $ ',' ~ 167 &quot punctuation&quot, 어떤 것이 아닐 수도 있습니다. # 39 의 one& 따라 관점.

Björn Lindqvist

편집된 답변7일 10월 2019 в 5:46

24

0

해설 (2)

Vinko Vrsalovic

원본 비 AMP 버전 편집

아니다, 그러나 간단해진다는 경우 다른 방법으로 더 익숙한 re 가족이었지

import re, string
s = "string. With. Punctuation?" # Sample string 
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

21

0

해설 (3)

Martijn Pieters

원본 비 AMP 버전 편집

Str '또는' 2 '3' 의 파이썬 파이썬 유니코드 값뿐만 ['스트리트란슬레이트 ()'] (http://docs.python.org/3/library/stdtypes.html # 스트리트란슬레이트) 불과 사전. 고데푸앵츠 (정수) 에서 조회됩니다 아무것도 없음 '및' 해당 매핑에서는 매핑됨 제거됩니다.

지루려면 (일부?) 문장 부호 어졌다면 사용합니다.

import string

remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)

이 ['딕트리프롬키스 ()' 클래스 메서드를] (http://docs.python.org/3/library/stdtypes.html # 딕트리프롬키스) 는 이 모든 값을 기준으로 '없음' 으로 만들기 위해 사소한 매핑에서는 e0100042.log 키.

모든 * 문장 지루려면 아니라 조금 더 큰 ASCII 문자 처리 너회의 표 합니다. 참조 [J. F. Sebastian& # 39 의 오토메이티드] (https://stackoverflow.com/questions/11066400/remove-punctuation-from-unicode-formatted-strings/11066687 # 11066687) (파이썬 3 버전):

import unicodedata
import sys

remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
                                 if unicodedata.category(chr(i)).startswith('P'))

12

0

해설 (4)

Zach

원본 비 AMP 버전 편집

'실패' 스테링스펀치우이션 구두점 로드되는지 사용되는 실제. Ascii 가 아닌 문장 부호를 사용할 수 있는 솔루션을 어때?

import regex
s = u"string. With. Some・Really Weird、Non？ASCII。 「（Punctuation）」?"
remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE)
remove.sub(u" ", s).strip()

개인적으로 ᄀ군의 문장 부호 있는 수 있는 최선의 방법입니다 분리하십시오 문자열에서 파이썬 이유:

It 절감뿐만 모든 유니코드 문자 처리
39 - It&} {S, s, 예를 들어 수정가능으로 쉽게 제거할 수 있는 '' 스케쳐내 문장 부호, 계속 같은 기호가 지루려면 '$'.
정말 대한 구체적인 그리웠댔지 계속 얻을 수 있으며, 예를 들어 '\ {} 및 그리웠댔지 지루려면 Pd' 에서는 분리하십시오 대시.
이 또한 정상화되면 regex 공백. 이를 위해 좋은 매핑하므로 탭들, 캐리지 리턴 및 기타 기이, 단일 독창적이다.

유니코드 문자 속성이 있는 이 /dev/raw/raw1 읽을 수 있어 더 자세히 위키백과.

Peter Mortensen

편집된 답변15일 7월 2018 в 8:17

12

0

해설 (0)

Blairg23

원본 비 AMP 버전 편집

내가 본 haven& # 39, 아직 답이 없다. Regex 는 됩니다. 이 모든 문자를 절감뿐만 목동들과 word 문자 ('\w') 와 번호임 문자 ('\d'), 그 뒤에 공백 문자 ('\s').

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(ur'[^\w\d\s]+', '', s)

Peter Mortensen

편집된 답변15일 7월 2018 в 8:15

8

0

해설 (3)

Tim P

원본 비 AMP 버전 편집

S # 39 한 줄 광고문, here& 위한 파이썬 3.5:

import string
"l*ots! o(f. p@u)n[c}t]u[a'ti\"on#$^?/".translate(str.maketrans({a:None for a in string.punctuation}))

8

0

해설 (0)

Dr.Tautology

원본 비 AMP 버전 편집

이것은 내가 함수은 한다 "고 말했다. # 39 의 it& 별로 없는 것이 간단하고 효율적인, 당신이 하는 모든 문장 부호 추가하거나 제거할 수 있습니다.

def stripPunc(wordList):
    """Strips punctuation from list of words"""
    puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
    for punc in puncList:
        for word in wordList:
            wordList=[word.replace(punc,'') for word in wordList]
    return wordList

6

0

해설 (0)

David Vuong

원본 비 AMP 버전 편집

그러나 이 this is how I did it 최상의 솔루션이 될 수 없습니다.

import string
f = lambda x: ''.join([i for i in x if i not in string.punctuation])

6

0

해설 (0)

krinker

원본 비 AMP 버전 편집

이 예에서는, 내가 사랑하는 것처럼, 브라운아저씨의 업데이트하려면 @Brian 변경되는지 regex 함수를 파이썬 3 단계 迈向 움직이십시오 컴파일하십시오 내부에. 여기에 필요한 모든 기능을 한 번에 할 수 있는 시간을 내 생각을 했다 "고 쓰고 있다. 아마도 사용하고, 분산 컴퓨팅 및 can& regex 객체에는 너회의 간에 공유할 필요가 없는 # 39 명, 각 단계의 레이콩필레 '' 직장인. 또한 두 개의 서로 다른 구현을 위한 파이썬 마이크트란스 3 시간 정말 궁금하다.

table = str.maketrans({key: None for key in string.punctuation})

vs

table = str.maketrans('', '', string.punctuation)

그리고 또 다른 방법은 내가 어디서 분할됩니다 기능을 활용할 수 감소, 설정되었습니다 이터레이션에.

이는 전체 코드:

import re, string, timeit

s = "string. With. Punctuation"

def test_set(s):
    exclude = set(string.punctuation)
    return ''.join(ch for ch in s if ch not in exclude)

def test_set2(s):
    _punctuation = set(string.punctuation)
    for punct in set(s).intersection(_punctuation):
        s = s.replace(punct, ' ')
    return ' '.join(s.split())

def test_re(s):  # From Vinko's solution, with fix.
    regex = re.compile('[%s]' % re.escape(string.punctuation))
    return regex.sub('', s)

def test_trans(s):
    table = str.maketrans({key: None for key in string.punctuation})
    return s.translate(table)

def test_trans2(s):
    table = str.maketrans('', '', string.punctuation)
    return(s.translate(table))

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s

print("sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000))
print("sets2      :",timeit.Timer('f(s)', 'from __main__ import s,test_set2 as f').timeit(1000000))
print("regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000))
print("translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000))
print("translate2 :",timeit.Timer('f(s)', 'from __main__ import s,test_trans2 as f').timeit(1000000))
print("replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000))

이건 내 결과:

sets      : 3.1830138750374317
sets2      : 2.189873124472797
regex     : 7.142953420989215
translate : 4.243278483860195
translate2 : 2.427158243022859
replace   : 4.579746678471565

5

0

해설 (0)

Pablo Rodriguez Bertorello

원본 비 AMP 버전 편집

>>> s = "string. With. Punctuation?"
>>> s = re.sub(r'[^\w\s]','',s)
>>> re.split(r'\s*', s)

['string', 'With', 'Punctuation']

4

0

해설 (1)

ngub05

원본 비 AMP 버전 편집

39 의 here& 없이 솔루션을 regex.

import string

input_text = "!where??and!!or$$then:)"
punctuation_replacer = string.maketrans(string.punctuation, ' '*len(string.punctuation))    
print ' '.join(input_text.translate(punctuation_replacer).split()).strip()

output>> where and or then

덮어씁니다 공백으로 문장 부호
복귀시킴 단어 사이의 공백을 한 공간에 있는 여러
분리하십시오 후행 공백을 함께 있는 경우 스트립 ()

4

0

해설 (0)

Haythem HADHAB

원본 비 AMP 버전 편집

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(r'[^a-zA-Z0-9\s]', '', s)

4

0

해설 (1)

Dom Grey

원본 비 AMP 버전 편집

한 줄 광고문 데 도움이 될 수 있는 것은 매우 엄격한 있었다.

''.join([c for c in s if c.isalnum() or c.isspace()])

3

0

해설 (0)

Animeartistfromhell7

원본 비 AMP 버전 편집

#FIRST METHOD
#Storing all punctuations in a variable    
punctuation='!?,.:;"\')(_-'
newstring='' #Creating empty string
word=raw_input("Enter string: ")
for i in word:
     if(i not in punctuation):
                  newstring+=i
print "The string without punctuation is",newstring

#SECOND METHOD
word=raw_input("Enter string: ")
punctuation='!?,.:;"\')(_-'
newstring=word.translate(None,punctuation)
print "The string without punctuation is",newstring

#Output for both methods
Enter string: hello! welcome -to_python(programming.language)??,
The string without punctuation is: hello welcome topythonprogramminglanguage

2

0

해설 (0)

Isayas Wakgari Kelbessa

원본 비 AMP 버전 편집

with open('one.txt','r')as myFile:

    str1=myFile.read()

    print(str1)

    punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"] 

for i in punctuation:

        str1 = str1.replace(i," ") 
        myList=[]
        myList.extend(str1.split(" "))
print (str1) 
for i in myList:

    print(i,end='\n')
    print ("____________")

2

0

해설 (0)

Brian · Accepted Answer · 2008-11-05T18:36:11+00:00

39 에서 효율성 측면에서 you& 으로 갈 것입니다.

s.translate(None, string.punctuation)

더 높은 버전의 파이썬 다음 코드를 사용합니다.

s.translate(str.maketrans('', '', string.punctuation))

39 의 it& 기초형상 구체화하십시오 작전을 수행하는 C 와 there& 검색표 - # 39 의 자신의 것으로 봐 C 코드를 쓰는 것이 별로 없다.

39, t, 또 다른 옵션은 표시되어도 경우 isn& 속도용 우려가 있다.

exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)

하지만, 이는 보다 빠르게 수행할 수 있을 뿐만 아니라 각 챨 s.레프리스 won& # 39 와 같은 비 순수 파이썬 외곽진입 t 정규 표현식 스트라이스트란슬레이트 통해 알 수 있듯이, 또는 아래 타이밍. 이 유형의 문제를 한 번만 그렇게 dell. 정도로 가능한 수준으로 끄기입니다.

타이밍 코드:

import re, string, timeit

s = "string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("","")
regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_set(s):
    return ''.join(ch for ch in s if ch not in exclude)

def test_re(s):  # From Vinko's solution, with fix.
    return regex.sub('', s)

def test_trans(s):
    return s.translate(table, string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s

print "sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)
print "regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)
print "translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)
print "replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

따라서 결과는 다음과 같습니다.

sets      : 19.8566138744
regex     : 6.86155414581
translate : 2.12455511093
replace   : 28.4436721802

최선의 방법은 문자열에서 스트립 문장 부호

39 에서 효율성 측면에서 you& 으로 갈 것입니다.

39 의 it& 기초형상 구체화하십시오 작전을 수행하는 C 와 there& 검색표 - # 39 의 자신의 것으로 봐 C 코드를 쓰는 것이 별로 없다.

39, t, 또 다른 옵션은 표시되어도 경우 isn& 속도용 우려가 있다.

39 - It&} {S, s, 예를 들어 수정가능으로 쉽게 제거할 수 있는 '' 스케쳐내 문장 부호, 계속 같은 기호가 지루려면 '$'.

39 의 here& 없이 솔루션을 regex.