ldnoobw / list-of-dirty-naughty-obscene-and-otherwise-bad-words Goto Github PK
View Code? Open in Web Editor NEWList of Dirty, Naughty, Obscene, and Otherwise Bad Words
License: Creative Commons Attribution 4.0 International
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
License: Creative Commons Attribution 4.0 International
Hi,
I want to use these badwords in our filters but how do I attribute?
Can we just use the badwords without attribution?
thank you
падла
нафиг
жлоб
мразь
нахуй
гад
хуйлуша
ебаный
идиот
козел
негодяй
дегенерат
трахнуть
мразота
нахуя
ебанутые
дурак
кретин
хуипи
падла
онанист
дифичент
мерзавец
говно собачье
хуя
хуй пинать
гнидр
блять
ебаный
впизду
мать твою
засранец вонючий
пидрила
ублюдок
мудак
чёртов
дибилы
Just wanted to let you know that I've used your list in a separate project. Hope you're ok with it.
https://github.com/ChaseFlorell/jQuery.ProfanityFilter/tree/master/localResources
I think that the first version of this repo should include the most obvious bad words like "idiot" but this word is not part of the list. Why is that?
Thank you.
As an example the english lists contains 6 scientific terms such as zoophilia.
Sod
Bugger
Bolloks
pornhub.com
In German the first letter of nouns is generally a capital letter. The german bad word list contains mostly nouns which are written in lowercase. I think this might lead to many situations where bad words are missed because people are not aware of this aspect of German and don't check for the uppercase version of the bad word for some given text.
Maybe all nouns in the German bad word lists should begin with a capital letter?
Well I am dutch and I notice after finding a place where a friend's name was blocked that you really have a list that is kind of.. Bad.
There are lots of words that are clearly not offensive except in an already really explicit use:
asbak
- ashtray -> never ever used
aso
- short for "asocial person" and only ever used like it would be in english
balen
- I guess this is a translation by google translate from a version of "sucks", it is really just "to be fed/annoyed up with something" and is actually considered to be the proper formal way to say it. The other translation is a "bundle" in the farming sense (bundle of wheat).
bedonderen
- literary to cheat, but hardly ever (never?) used in an obscene way.
belazeren
- same as above, but even more formal. Used mostly by media to state "die politicus "belazerde" de boel" - that politician lied/cheated about things he said/did".
besodemieteren
/besodemieterd zijn
- to "bugger" or "screw" somebody over.. But again not used in obscene sense EVER.
beurt
- Ok this can be used in an obscene sense. But it is also one of the 1000 most commonly used words in non obscene sense. "jij bent aan de beurt/het is jouw beurt" - it is your turn. "omstebeurt" - one after another.
de hond uitlaten
- this is a joke right? It's a common version of "my dad went to the store and never came back". But it still has the normal meaning of "walking the dog" - and there is no other way to say that
dombo
- lit. "dumb person" and in a joking not obscene manner (like one would say to a toddler after he did something stupid).
droogkloot
- someone who makes silly jokes
gras maaien
- mowing the lawn...???
hol
- "cave" honestly used in the same sense as the english version.
hufter
- asocial person, often used by politicians.
klootjesvolk
- everyone who isn't a white collar worker.
nicht
- niece
op z'n sodemieter geven
- archaic to "blame someone and punish him for it" often as done by a parent/grandparent to a child
opzouten
- to go (away) ???
ouwehoeren
- to chat ?????????
publiciteitsgeil
- yearning for public attention
teef
- bitch (but only used for dogs)
vergallen
- to mess up the atmosphere not used in obscene sense
verkloten
- to mess up something not used in obscene sense
voor jan lul
- saying that you went/are somewhere but have no purpose
voor jan-met-de-korte-achternaam
- same as above
And theren the worst:
anita
- That's a girl name, quite offensive that you consider it obscene.
those are definitely used in a informal context without negative connotation (eg. emmerder / which means to bother/to get bored/ etc..)
There are many PR here, Any reason why owner and collaborator not response PR here?
@jacobemerick apparently you can give people this access. Could you please give me this access?
Now the reason I’d like collaborator access is so I can add new language files. I’ve made lists for Welsh and Afrikaans but because I don’t have collaborator access they become pull requests.
Thanks.
(Edited due to some unicode weirdness)
Several files have duplicated words in their list.
$ uniq -d da
kussekryller
$ uniq -d zh
口交
性交
性爱
阴茎
dršťka
means tripe (https://en.wikipedia.org/wiki/Tripe) and it is not used as an insult. it is almost exclusively used in connection with the tripe soup.
žrát
is the word for when animals are eating. it might be used in a pejorative context but it is not used as an insult
… chocolatine !
Well joke apart, sucer is not necessarily a bad word, a «pastille à *****» or « ***** la moëlle» would be weird.
Hi,
I found some german words which seem rather normal to me. As I don't know in which direction this list is going I thought I provide some context for them:
geil: colloquially used to describe something good/awesome. Also used to describe the state of sexual arousal
Hupen: literally is just multiple horns. Is also used as a metaphor for female breasts.
Knackwurst: Is just a type of sausage. If it would be used as an insult, that insult would be very weak. Maybe you mean Kackwurst?
Latte: colloquial word for a type of wood. Also describes an erected penis.
Milchtüten: Is just the packaging of milk. I could imagine this could refer to female breasts, but I never heard someone call them this way.
Picheln: Is colloquial for drinking alcohol.
May be you should consider sorting the list in strong bad words and weak bad words. There are many figures of speech which describe the act of masturbation or are used to describe genitilia and female breasts. If you would all ban them you would cripple the german language.
If you could clarify the goals of this list, one would be able to better complete this list.
I see even openai's chatGPT speaks this word
니애미
reference: https://hinative.com/questions/18188689
Bisexual, homosexual or similar words that are considered sexual orientation should not be considered dirty, naughty or obscene in any matter.
But maybe I misunderstood the purpose of this project.
Hi there, thanks for providing this dirty word list, and in several languages no less. Would it be possible to add an official LICENSE file to this repository? I'd be happy to send a pull request containing the right license, just let me know which license you prefer.
If you do not want to add the LICENSE file, would someone be able to describe the license in this issue? Thank you.
I made a Kriol list and I want to say the ISO language code is rop. Weird, but it is for some reason.
boccaciccio
meretrice
sise
soffocotto
uallera
zinne
Inculare (or inculato, inculata)
The slang verb for sodomizing, also commonly used to express being scammed or otherwise damaged such as "I bought this and the day after the price halved. Che inculata!"
That's ... a book about open relationships and polyamory ... I ... wtf is that doing on there?
(note that I'm absolutely not intending to get into an argument about the reason, I just can't work out what the heck the reason is)
Take a look. Include in Documentation, or integrate if desired.
This is actually a common word describing beef stomach as an ingredient in meals. See "dršťková polévka". It is sometimes being used in a similar form, but when it is used as a swear word, the form of "držka" is used most fo the time, because the people swearing actually do not waste time writing it properly. Including the full form in this list will falsely filter out a lot of texts containing information about cooking, eating or traditional czech meals.
My pull request on Welsh swear words (ISO cy) has now got over 81 words! This is more than the lists for Arabic, Czech, Danish, Esperanto, Filipino, Canadian French, Kabyle, Klingon, Korean, Norwegian, Persian, Polish, Portuguese, Spanish and Thai.
It's hard to use this in web applications with CCA license. How do I attribute it when someone downloads a js binary including these bad words?
Just a heads up. Suggesting a language in the comments is helpful, but you don’t have to. I first made an Australian Kriol list.
There are some lists of bad words in these links, It would be great if they merged into this repo
http://urbanoalvarez.es/blog/2008/04/04/bad-words-list/
https://github.com/web-mech/badwords-list/blob/master/lib/array.js
http://fffff.at/googles-official-list-of-bad-words/
http://www.hyperhero.com/en/insults.htm
http://www.frontgatemedia.com/a-list-of-723-bad-words-to-blacklist-and-how-to-use-facebooks-moderation-tool/
http://www.youswear.com/index.asp?language=English
You should really think about separating "pt" in pt_PT and pt_BR.
As a Portuguese, that speaks, talks and reads in pt_PT, I don't know half of the words on this "pt" list. The other half, are almost all not offensive at all!
"aborto" means "miscarriage"
"amador" means "amateur"
"aranha" means "spider"
"burro" means "donkey"
"cerveja" means "beer"
"comer" means "to eat"
"frango assado" means "roasted chicken"
"heterosexual" well.. you know what that means
"inferno" I believe you now what that means too
"torneira" means "faucet"
... and so on.
If I were to put this list in a Portuguese discussion board's blacklist, people wouldn't be able to write a complete sentence.
For a Portuguese, this list is useless.
I think the list format is too simple and is missing some features:
ä
, ö
and u
can be expressed as ae
, oe
and ue
or the letter ß
can be written ss
) or letters can be left out. Maybe this should be left to the filter implementation, but this should be difficult, if you don't know the languagecrime
, violence
, pornography
, illegal drugs
, insults
etc. And it would be even nicer if words can be in multiple categories (because insults can be used in "dirty talk" or violence in crime…)I'd suggest a rather simple format like CSV (comma separated values) with individual files for groups and the word lists, e.g.:
The groups file with unique group IDs:
#0 should be reserved for uncategorized
crime;1
violence;2
insult;3
# …
And the word list with regular expressions (you can check them with RegExr or similar tools) optionally followed by the group IDs (can be left blank or set to 0
for uncategorized, multiple groups separated by ,
) and the score from 1–10 (0 or empty for unrated):
cock(?!pit);;7 # This is a nice one: matches 'cock' but not 'cockpit' (uncategorized)
idiots?;3;7 # matches 'idiot' and 'idiots'
motherfucker;3;10
rap(e|ist|ing);1,2;6 # matches 'rape', 'rapist' and 'raping' but NOT 'rap'
# …
A small issue in this format is that matches are weighted the same, maybe sub-pattern matching could be used to rate each, but I don't know if this is needed (e.g. the pattern ((ass)(hole)?)
results in three groups: ass
, asshole
and hole
and multiple comma separated ratings apply to each group in order: ((ass)(hole)?);3;4,7
).
Some of the ideas (weighting and groups) were taken from this list: http://contentfilter.futuragts.com/phraselists/
What do you think?
P.S.: Somehow I feel guilty for contributing to a filter/censorship list, but I think it can be useful to some extend to keep trolls and unconstructive discussions away. I hope these lists will be used responsibly…
A grunt task and JavaScript regex using your bad words.
I think you should include 1 guy 1 jar , of you have included 2 girls 1 cup. Just in case you know ... In the girls perspective
The below three words don't seem offensive to me.
アジアのかわいい女の子
いたずら
卍
It's a holy symbol in hinduism and buddhism.
Hi,
I've read the french list, some words are strangely part of it.
Allumé means alight
Bosser is a common word for work
Veuve means widow
teuf is a slang word for party but is not pejorative
folle means crazy
bourré can mean stuffed
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.