Code Monkey home page Code Monkey logo

mention's Introduction

mention Build Status Coverage Status GoDoc

mention parses twitter like mentions and hashtags like @gernest and #Tanzania from text input.

Installation

go get github.com/gernest/mention

Usage

mention is flexible, meaning it is not only limited to @ and # tags. You can choose whatever tag you like and mention will take it from there.

twitter like mentions

For instance you have the following message

hello @gernesti I would like to follow you on twitter

And you want to know who was mentioned in the text.

package main

import (
	"fmt"
	"strings"

	"github.com/gernest/mention"
)

func main() {
	message := "hello @gernest I would like to follow you on twitter"

	tags := mention.GetTags('@', message)
	tagStrings := mention.GetTagsAsUniqueStrings('@', message)

	fmt.Println(tags)
	fmt.Println(tagStrings)
}

If you run the above example it will print [gernest] is the stdout.

twitter like hashtags

For instance you have the following message

how does it feel to be rejected? #loner

And you want to know the hashtags

package main

import (
	"fmt"
	"strings"

	"github.com/gernest/mention"
)

func main() {
	message := "how does it feel to be rejected? #loner"

	tags := mention.GetTags('#', message)

	fmt.Println(tags)
}

If you run the above example it will print [loner] in the stdout.

The API

mention exposes only one function GetTags(char rune, src string) []string

The first argument char is the prefix for your tag, this can be @ or # or whatever unicode character you prefer. Don't be worried by its type rune it is just your normal characters but in single quotes. See the examples for more information.

The second argument is the source of the input which can be from texts.

Contributing

Start with clicking the star button to make the author and his neighbors happy. Then fork the repository and submit a pull request for whatever change you want to be added to this project.

If you have any questions, just open an issue.

Author

Geofrey Ernest Twitter : @gernesti

Chad Barraford Github : @cbarraford

Licence

This project is released under the MIT licence. See LICENCE for more details.

mention's People

Contributors

arp242 avatar cbarraford avatar gernest avatar jcbwlkr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mention's Issues

Text with a @ such as email addresses is matched

This code:

func main() {
	fmt.Printf("%#v\n",
		mention.GetTags('@', strings.NewReader("[email protected]")))
	fmt.Printf("%#v\n",
		mention.GetTags('@', strings.NewReader("[email protected]"), '.'))
}

Produces the following output:

[]string{"example.com"}
[]string{"example"}

Which is unexpected – at least it is in our use case. If there is a user with the handle @example and someone writes send an email to [email protected]! then this user will be matched if . is in the terminator list. I think most people have . in the terminator list, since otherwise a @handle at the end of a sentence produces the wrong result.

I wanted to write a patch to fix this, but I'm not sure what the best way to handle this would be, as ,@handle or /@handle should be a match. Maybe ignoring [any-unicode-letter]@handle would be a good solution?

prime symbol offsets the index count by 2

Maybe related to #16
Sometimes a single character can count as three characters (ie the prime symbol which looks like a single quote). this causes the Index value of a tag to be incremented by 2. While this may work fine for golang, when you send this data to another stack (say a webUI), they may not have that increment for symbols.
So the problem is that in golang the index value of a tag may be 15 (for example), but from javascript's perspective, the correct answer is 13. The languages figure/count indexes differently.

https://play.golang.org/p/_YhJTv3s6vb

tag with space should be ignore

"how does it feel to be rejected? #loner tt ggg sjdsj dj #link"
[loner linker]

"how does it feel to be rejected? # it is #loner tt ggg sjdsj dj #linker"
Expect: [loner linker]
Got: [ it is #loner tt ggg sjdsj dj #linker]

Mention v2

@gernest i've refactored the code base again to give us more information about the tags other than just what they are. We now return (in v2 branch) a struct rather than strings.

Heres what i'm thinking... we create a new branch called v1 from master, and have that branch be the default branch for the repo, but also have v2 (which has breaking changes) that people can import via go get github.com/gernest/mention.v2 or get original version via go get github.com/gernest/mention.v1 or not include a version at all. Thoughts on versioning this repo?

Thoughts on the changes in v2??

Create official twitter mention/hashtag

Twitter has an official list of characters that are allowed/disallowed in hashtags and mentions. For example, hashtags can't start with a number, and cannot contain special characters (some are allowed like _ (underscore)).

Create a function that gets mentions/hashtags according to these standard set by Twitter. May make sense to do this in a subdirectory creating an organizational location to support other social media sites in the future like facebook, instagram, etc.

Multibyte start characters don't work

Since termIndexes is a []int and some "raw" byte indexes are accessed (e.g. str[t+1] == byte(prefix)), multibyte prefixes won't work. This:

mention.GetTags('🔥', "I'm on 🔥fire!")

Should match fire, but doesn't match anything.

Failing test case:

diff --git i/mention_test.go w/mention_test.go
index 423ed9b..904af66 100644
--- i/mention_test.go
+++ w/mention_test.go
@@ -139,6 +139,9 @@ func (s *MentionSuite) TestGetTags(c *C) {

        // use default terminators
        c.Assert(GetTags('@', "hello @test"), DeepEquals, []Tag{{'@', "test", 6}})
+
+       // Test multibyte start characters.
+       c.Assert(GetTags('🔥', "I'm on 🔥fire!"), DeepEquals, []Tag{{'🔥', "fire", 6}})
 }

 func BenchmarkGetTags(b *testing.B) {

Using apostrophe

Right side after apostrophe should't be a part of hashtag.
In case of @bob's cookies only @bob is tag.

terminator doesn't work on first character

If i GetTags on "### #. # the #foo #bar #baz #baz." and supply all special characters as runes (ie var specials = []rune(".=+-~,/ <>?[]{})(*&^%$#@!;:\"'|\\")), the resulting answer i get back is

achievements_test.go:27:
    c.Check(tags, DeepEquals, []string{"bar", "baz", "foo"})
... obtained []string = []string{"#", ".", "bar", "baz", "foo"}
... expected []string = []string{"bar", "baz", "foo"}

I only want my tags to be alphanumeric, and I can't seem to get that to work out of the box.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.