Code Monkey home page Code Monkey logo

social-media-profiles-regexs's Introduction

Regular Expressions to Match Social Media Profiles

This repository lists regular expressions to match and extract information from URLs of social media profiles. So if you find a hyperlink to this repo somewhere on the web, i.e. https://github.com/lorey/social-media-profiles-regexs/, the regular expressions in this repo allow you find out it's a Github link pointing to a repo as well as extract the username lorey and the repo name social-media-profiles-regexs from this URL.

Features:

  • detect the platform a url points to (all major platforms supported)
  • extract the information contained within the url (without opening the url, of course)
  • extract emails and phone numbers from hyperlinks

Please note: If you want to extract social media links, depending on your case, there are possibly easier ways:

  • I've created a Python library called socials that uses these expressions to automate url detection and data extraction. You input the urls, it extracts the type of platform as well as the contained information, e.g. the linked social media profiles.
  • There's also a Socials API which makes the socials python package available via REST and JSON. You can use it for free at socials.karllorey.com or deploy it yourself. You simply input any URL you want to extract profiles from. It will then fetch and return all social media links from the given website. Try it here.

If you're missing a particular platform, please feel free to add it. Also feel free to add a test that does not work. An explanation of how this repo works can be found in CONTRIBUTING.md. You might also open an issue, of course, I'm happy to help!

Table of Contents

angellist

company

(?:https?:)?\/\/angel\.co\/company\/(?P<company>[A-z0-9_-]+)(?:\/(?P<company_subpage>[A-z0-9-]+))?

Examples:

job

(?:https?:)?\/\/angel\.co\/company\/(?P<company>[A-z0-9_-]+)\/jobs\/(?P<job_permalink>(?P<job_id>[0-9]+)-(?P<job_slug>[A-z0-9-]+))

Examples:

user

(?:https?:)?\/\/angel\.co\/(?P<type>u|p)\/(?P<user>[A-z0-9_-]+)

There are root-level direct links to users, e.g. angel.co/karllorey, that get redirected to these new user links now. Sometimes it's /p/, sometimes it's /u/, haven't figured out why that is...

Examples:

crunchbase

company

(?:https?:)?\/\/crunchbase\.com\/organization\/(?P<organization>[A-z0-9_-]+)

Examples:

person

(?:https?:)?\/\/crunchbase\.com\/person\/(?P<person>[A-z0-9_-]+)

Examples:

email

mailto

(?:mailto:)?(?P<email>[A-z0-9_.+-]+@[A-z0-9_.-]+\.[A-z]+)

This matches plain emails and mailto hyperlinks. This regex is intended for scraping and not as a validation. See why: "Your email validation logic is wrong".

Examples:

facebook

profile

(?:https?:)?\/\/(?:www\.)?(?:facebook|fb)\.com\/(?P<profile>(?![A-z]+\.php)(?!marketplace|gaming|watch|me|messages|help|search|groups)[A-z0-9_\-\.]+)\/?

A profile can be a page, a user profile, or something else. Since Facebook redirects these URLs to all kinds of objects (user, pages, events, and so on), you have to verify that it's actually a user. See https://developers.facebook.com/docs/graph-api/reference/profile

Examples:

profile by id

(?:https?:)?\/\/(?:www\.)facebook.com\/(?:profile.php\?id=)?(?P<id>[0-9]+)

Examples:

github

repo

(?:https?:)?\/\/(?:www\.)?github\.com\/(?P<login>[A-z0-9_-]+)\/(?P<repo>[A-z0-9_-]+)\/?

Exclude subdomains as these redirect to github pages sometimes.

Examples:

user

(?:https?:)?\/\/(?:www\.)?github\.com\/(?P<login>[A-z0-9_-]+)\/?

Exclude subdomains other than www. as these redirect to github pages sometimes.

Examples:

google plus

user id

(?:https?:)?\/\/plus\.google\.com\/(?P<id>[0-9]{21})

Matches profile numbers with exactly 21 digits.

Examples:

username

(?:https?:)?\/\/plus\.google\.com\/\+(?P<username>[A-z0-9+]+)

Matches username.

Examples:

hackernews

item

(?:https?:)?\/\/news\.ycombinator\.com\/item\?id=(?P<item>[0-9]+)

An item can be a post or a direct link to a comment.

Examples:

user

(?:https?:)?\/\/news\.ycombinator\.com\/user\?id=(?P<user>[A-z0-9_-]+)

Examples:

instagram

profile

(?:https?:)?\/\/(?:www\.)?(?:instagram\.com|instagr\.am)\/(?P<username>[A-Za-z0-9_](?:(?:[A-Za-z0-9_]|(?:\.(?!\.))){0,28}(?:[A-Za-z0-9_]))?)

The rules:

  • Matches with one . in them disco.dude but not two .. disco..dude
  • Ending period not matched discodude.
  • Match underscores _disco__dude
  • Max characters of 30 1234567890123456789012345678901234567890

Examples:

linkedin

company

(?:https?:)?\/\/(?:[\w]+\.)?linkedin\.com\/(?P<company_type>(company)|(school))\/(?P<company_permalink>[A-z0-9-ร€-รฟ\.]+)\/?

This matches companies and schools. Permalink is an integer id or a slug. The id permalinks redirect to the slug permalinks as soon as one is set. Permalinks can contain special characters. Recently, company links that are actually schools get redirected to newly introduced /school/ permalinks, see the university example below.

Examples:

post

(?:https?:)?\/\/(?:[\w]+\.)?linkedin\.com\/feed\/update\/urn:li:activity:(?P<activity_id>[0-9]+)\/?

Direct link to a Linkedin post, only contains a post id.

Examples:

profile

(?:https?:)?\/\/(?:[\w]+\.)?linkedin\.com\/in\/(?P<permalink>[\w\-\_ร€-รฟ%]+)\/?

These are the currently used, most-common urls ending in /in/

Examples:

profile_pub

(?:https?:)?\/\/(?:[\w]+\.)?linkedin\.com\/pub\/(?P<permalink_pub>[A-z0-9_-]+)(?:\/[A-z0-9]+){3}\/?

These are old public urls not used anymore, more info at quora

Examples:

medium

post

(?:https?:)?\/\/medium\.com\/(?:(?:@(?P<username>[A-z0-9]+))|(?P<publication>[a-z-]+))\/(?P<slug>[a-z0-9\-]+)-(?P<post_id>[A-z0-9]+)(?:\?.*)?

Examples:

post of subdomain publication

(?:https?:)?\/\/(?P<publication>(?!www)[a-z-]+)\.medium\.com\/(?P<slug>[a-z0-9\-]+)-(?P<post_id>[A-z0-9]+)(?:\?.*)?

Can't match these with the regular post regex as redefinitions of subgroups are not allowed in pythons regex.

Examples:

user

(?:https?:)?\/\/medium\.com\/@(?P<username>[A-z0-9]+)(?:\?.*)?

Examples:

user by id

(?:https?:)?\/\/medium\.com\/u\/(?P<user_id>[A-z0-9]+)(?:\?.*)

Now redirects to new user profiles. Follow with a head or get request.

Examples:

phone

phone number

(?:tel|phone|mobile):(?P<number>\+?[0-9. -]+)

Should be cleaned afterwards to strip dots, spaces, etc.

Examples:

  • tel:+49 900 123456
  • tel:+49900123456

reddit

user

(?:https?:)?\/\/(?:[a-z]+\.)?reddit\.com\/(?:u(?:ser)?)\/(?P<username>[A-z0-9\-\_]*)\/?

Examples:

skype

profile

(?:(?:callto|skype):)(?P<username>[a-z][a-z0-9\.,\-_]{5,31})(?:\?(?:add|call|chat|sendfile|userinfo))?

Matches Skype's URLs to add contact, call, chat. More info at Skype SDK's docs.

Examples:

  • skype:echo123
  • skype:echo123?call

snapchat

profile

(?:https?:)?\/\/(?:www\.)?snapchat\.com\/add\/(?P<username>[A-z0-9\.\_\-]+)\/?

Examples:

stackexchange

user

(?:https?:)?\/\/(?:www\.)?stackexchange\.com\/users\/(?P<id>[0-9]+)\/(?P<username>[A-z0-9-_.]+)\/?

This is the meta-platform above stackoverflow, etc. Username can be changed at any time, user_id is persistent.

Examples:

stackexchange network

user

(?:https?:)?\/\/(?:(?P<community>[a-z]+(?!www))\.)?stackexchange\.com\/users\/(?P<id>[0-9]+)\/(?P<username>[A-z0-9-_.]+)\/?

While there are some "named" communities in the stackexchange network like stackoverflow, many only exist as subdomains, i.e. gaming.stackexchange.com. Again, username can be changed at any time, user_id is persistent.

Examples:

stackoverflow

question

(?:https?:)?\/\/(?:www\.)?stackoverflow\.com\/questions\/(?P<id>[0-9]+)\/(?P<title>[A-z0-9-_.]+)\/?

Examples:

user

(?:https?:)?\/\/(?:www\.)?stackoverflow\.com\/users\/(?P<id>[0-9]+)\/(?P<username>[A-z0-9-_.]+)\/?

Username can be changed at any time, user_id is persistent.

Examples:

telegram

profile

(?:https?:)?\/\/(?:t(?:elegram)?\.me|telegram\.org)\/(?P<username>[a-z0-9\_]{5,32})\/?

Matches for t.me, telegram.me and telegram.org.

Examples:

twitter

status

(?:https?:)?\/\/(?:[A-z]+\.)?twitter\.com\/@?(?P<username>[A-z0-9_]+)\/status\/(?P<tweet_id>[0-9]+)\/?

Examples:

user

(?:https?:)?\/\/(?:[A-z]+\.)?twitter\.com\/@?(?!home|share|privacy|tos)(?P<username>[A-z0-9_]+)\/?

Allowed for usernames are alphanumeric characters and underscores.

Examples:

vimeo

user

(?:https?:)?\/\/vimeo\.com\/user(?P<id>[0-9]+)

Examples:

video

(?:https?:)?\/\/(?:(?:www)?vimeo\.com|player.vimeo.com\/video)\/(?P<id>[0-9]+)

Examples:

xing

profile

(?:https?:)?\/\/(?:www\.)?xing.com\/profile\/(?P<slug>[A-z0-9-\_]+)

Default slugs are Firstname_Lastname. If several people with the same name exist, a number is appended.

Examples:

youtube

channel

(?:https?:)?\/\/(?:[A-z]+\.)?youtube.com\/channel\/(?P<id>[A-z0-9-\_]+)\/?

Examples:

user

(?:https?:)?\/\/(?:[A-z]+\.)?youtube.com\/user\/(?P<username>[A-z0-9]+)\/?

Examples:

video

(?:https?:)?\/\/(?:(?:www\.)?youtube\.com\/(?:watch\?v=|embed\/)|youtu\.be\/)(?P<id>[A-z0-9\-\_]+)

Matches youtube video links like https://www.youtube.com/watch?v=dQw4w9WgXcQ and shortlinks like https://youtu.be/dQw4w9WgXcQ

Examples:

Monster Regex

If you want to match and extract the data from all urls with one regex, use this monster. It will return the data for all the platforms above. The regex subgroups are prefixed with the platform, e.g. angellist__company instead of just company in the angellist company regex, as some regex implementations don't support defining subgroups more than once which would introduce errors if the same subgroup name is used in two or more platforms.

(?P<angellist__company>(?:https?:)?\/\/angel\.co\/company\/(?P<angellist__company__company>[A-z0-9_-]+)(?:\/(?P<angellist__company__company_subpage>[A-z0-9-]+))?)|(?P<angellist__job>(?:https?:)?\/\/angel\.co\/company\/(?P<angellist__job__company>[A-z0-9_-]+)\/jobs\/(?P<angellist__job__job_permalink>(?P<angellist__job__job_id>[0-9]+)-(?P<angellist__job__job_slug>[A-z0-9-]+)))|(?P<angellist__user>(?:https?:)?\/\/angel\.co\/(?P<angellist__user__type>u|p)\/(?P<angellist__user__user>[A-z0-9_-]+))|(?P<crunchbase__company>(?:https?:)?\/\/crunchbase\.com\/organization\/(?P<crunchbase__company__organization>[A-z0-9_-]+))|(?P<crunchbase__person>(?:https?:)?\/\/crunchbase\.com\/person\/(?P<crunchbase__person__person>[A-z0-9_-]+))|(?P<email__mailto>(?:mailto:)?(?P<email__mailto__email>[A-z0-9_.+-]+@[A-z0-9_.-]+\.[A-z]+))|(?P<facebook__profile>(?:https?:)?\/\/(?:www\.)?(?:facebook|fb)\.com\/(?P<facebook__profile__profile>(?![A-z]+\.php)(?!marketplace|gaming|watch|me|messages|help|search|groups)[A-z0-9_\-\.]+)\/?)|(?P<facebook__profile_by_id>(?:https?:)?\/\/(?:www\.)facebook.com/(?:profile.php\?id=)?(?P<facebook__profile_by_id__id>[0-9]+))|(?P<github__repo>(?:https?:)?\/\/(?:www\.)?github\.com\/(?P<github__repo__login>[A-z0-9_-]+)\/(?P<github__repo__repo>[A-z0-9_-]+)\/?)|(?P<github__user>(?:https?:)?\/\/(?:www\.)?github\.com\/(?P<github__user__login>[A-z0-9_-]+)\/?)|(?P<google_plus__user_id>(?:https?:)?\/\/plus\.google\.com\/(?P<google_plus__user_id__id>[0-9]{21}))|(?P<google_plus__username>(?:https?:)?\/\/plus\.google\.com\/\+(?P<google_plus__username__username>[A-z0-9+]+))|(?P<hackernews__item>(?:https?:)?\/\/news\.ycombinator\.com\/item\?id=(?P<hackernews__item__item>[0-9]+))|(?P<hackernews__user>(?:https?:)?\/\/news\.ycombinator\.com\/user\?id=(?P<hackernews__user__user>[A-z0-9_-]+))|(?P<instagram__profile>(?:https?:)?\/\/(?:www\.)?(?:instagram\.com|instagr\.am)\/(?P<instagram__profile__username>[A-Za-z0-9_](?:(?:[A-Za-z0-9_]|(?:\.(?!\.))){0,28}(?:[A-Za-z0-9_]))?))|(?P<linkedin__company>(?:https?:)?\/\/(?:[\w]+\.)?linkedin\.com\/(?P<linkedin__company__company_type>(company)|(school))\/(?P<linkedin__company__company_permalink>[A-z0-9-ร€-รฟ\.]+)\/?)|(?P<linkedin__post>(?:https?:)?\/\/(?:[\w]+\.)?linkedin\.com\/feed\/update\/urn:li:activity:(?P<linkedin__post__activity_id>[0-9]+)\/?)|(?P<linkedin__profile>(?:https?:)?\/\/(?:[\w]+\.)?linkedin\.com\/in\/(?P<linkedin__profile__permalink>[\w\-\_ร€-รฟ%]+)\/?)|(?P<linkedin__profile_pub>(?:https?:)?\/\/(?:[\w]+\.)?linkedin\.com\/pub\/(?P<linkedin__profile_pub__permalink_pub>[A-z0-9_-]+)(?:\/[A-z0-9]+){3}\/?)|(?P<medium__post>(?:https?:)?\/\/medium\.com\/(?:(?:@(?P<medium__post__username>[A-z0-9]+))|(?P<medium__post__publication>[a-z-]+))\/(?P<medium__post__slug>[a-z0-9\-]+)-(?P<medium__post__post_id>[A-z0-9]+)(?:\?.*)?)|(?P<medium__post_of_subdomain_publication>(?:https?:)?\/\/(?P<medium__post_of_subdomain_publication__publication>(?!www)[a-z-]+)\.medium\.com\/(?P<medium__post_of_subdomain_publication__slug>[a-z0-9\-]+)-(?P<medium__post_of_subdomain_publication__post_id>[A-z0-9]+)(?:\?.*)?)|(?P<medium__user>(?:https?:)?\/\/medium\.com\/@(?P<medium__user__username>[A-z0-9]+)(?:\?.*)?)|(?P<medium__user_by_id>(?:https?:)?\/\/medium\.com\/u\/(?P<medium__user_by_id__user_id>[A-z0-9]+)(?:\?.*))|(?P<phone__phone_number>(?:tel|phone|mobile):(?P<phone__phone_number__number>\+?[0-9. -]+))|(?P<reddit__user>(?:https?:)?\/\/(?:[a-z]+\.)?reddit\.com\/(?:u(?:ser)?)\/(?P<reddit__user__username>[A-z0-9\-\_]*)\/?)|(?P<skype__profile>(?:(?:callto|skype):)(?P<skype__profile__username>[a-z][a-z0-9\.,\-_]{5,31})(?:\?(?:add|call|chat|sendfile|userinfo))?)|(?P<snapchat__profile>(?:https?:)?\/\/(?:www\.)?snapchat\.com\/add\/(?P<snapchat__profile__username>[A-z0-9\.\_\-]+)\/?)|(?P<stackexchange__user>(?:https?:)?\/\/(?:www\.)?stackexchange\.com\/users\/(?P<stackexchange__user__id>[0-9]+)\/(?P<stackexchange__user__username>[A-z0-9-_.]+)\/?)|(?P<stackexchange_network__user>(?:https?:)?\/\/(?:(?P<stackexchange_network__user__community>[a-z]+(?!www))\.)?stackexchange\.com\/users\/(?P<stackexchange_network__user__id>[0-9]+)\/(?P<stackexchange_network__user__username>[A-z0-9-_.]+)\/?)|(?P<stackoverflow__question>(?:https?:)?\/\/(?:www\.)?stackoverflow\.com\/questions\/(?P<stackoverflow__question__id>[0-9]+)\/(?P<stackoverflow__question__title>[A-z0-9-_.]+)\/?)|(?P<stackoverflow__user>(?:https?:)?\/\/(?:www\.)?stackoverflow\.com\/users\/(?P<stackoverflow__user__id>[0-9]+)\/(?P<stackoverflow__user__username>[A-z0-9-_.]+)\/?)|(?P<telegram__profile>(?:https?:)?\/\/(?:t(?:elegram)?\.me|telegram\.org)\/(?P<telegram__profile__username>[a-z0-9\_]{5,32})\/?)|(?P<twitter__status>(?:https?:)?\/\/(?:[A-z]+\.)?twitter\.com\/@?(?P<twitter__status__username>[A-z0-9_]+)\/status\/(?P<twitter__status__tweet_id>[0-9]+)\/?)|(?P<twitter__user>(?:https?:)?\/\/(?:[A-z]+\.)?twitter\.com\/@?(?!home|share|privacy|tos)(?P<twitter__user__username>[A-z0-9_]+)\/?)|(?P<vimeo__user>(?:https?:)?\/\/vimeo\.com\/user(?P<vimeo__user__id>[0-9]+))|(?P<vimeo__video>(?:https?:)?\/\/(?:(?:www)?vimeo\.com|player.vimeo.com\/video)\/(?P<vimeo__video__id>[0-9]+))|(?P<xing__profile>(?:https?:)?\/\/(?:www\.)?xing.com\/profile\/(?P<xing__profile__slug>[A-z0-9-\_]+))|(?P<youtube__channel>(?:https?:)?\/\/(?:[A-z]+\.)?youtube.com\/channel\/(?P<youtube__channel__id>[A-z0-9-\_]+)\/?)|(?P<youtube__user>(?:https?:)?\/\/(?:[A-z]+\.)?youtube.com\/user\/(?P<youtube__user__username>[A-z0-9]+)\/?)|(?P<youtube__video>(?:https?:)?\/\/(?:(?:www\.)?youtube\.com\/(?:watch\?v=|embed\/)|youtu\.be\/)(?P<youtube__video__id>[A-z0-9\-\_]+))

social-media-profiles-regexs's People

Contributors

2019342a avatar dependabot[bot] avatar gaelreinaudi avatar lorey avatar phordijk avatar yeganemehr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

social-media-profiles-regexs's Issues

More social media profiles

I see that there hasn't been any additions within the last 2 years. Here are the ones that you've identified that match some of the initial regex filters I've written. (NOTE: These also work extremely well when using the jQuery validation and adding the regex below to the data-rule-pattern attributes.)

Instagram
^http(?:s)?:\/\/(?:www\.)?instagram\.com\/([a-zA-Z0-9_]+)$

Google Plus
^http(?:s)?:\/\/plus\.google\.com\/(.+)$

Pinterest
^http(?:s)?:\/\/(?:www\.)?pinterest\.com\/([a-zA-Z0-9_]+)$

Vimeo Video
^http(?:s)?:\/\/(?:www\.)?vimeo\.com\/(?:channels\/[0-9]+\/)?([0-9]+)$ ^http(?:s)?:\/\/(?:www\.)?vine\.co\/v\/([a-zA-Z0-9_\/]+)$

VimeoChannel
^http(?:s)?:\/\/(?:www\.)?vimeo\.com\/channels\/([0-9]+)$

Skype
^skype:([a-zA-Z0-9_]+)\?call$

WordPress
^http(?:s)?:\/\/(?:[a-zA-Z0-9-]+)?.wordpress\.com\/(?!feed)(.+)?$

YouTube Channel
^http(?:s)?:\/\/(?:www\.)?youtube\.com\/channel\/(?:\w+\/)?([a-zA-Z0-9_-]+)$

YouTube Embed
^http(?:s)?:\/\/(?:www\.)?youtube.com\/embed\/([a-zA-Z0-9_-]+)$

YouTube User
^http(?:s)?:\/\/(?:www\.)?youtube\.com\/user\/(?:\w+\/)?([a-zA-Z0-9_-]+)$

YouTube Video
^http(?:s)?:\/\/(?:www\.)?youtube\.com\/(?!user)(?!channel)(?:\w+\/)?([a-zA-Z0-9_-]+)(?:\/)?$

YouTube Video
^http(?:s)?:\/\/youtu.be/([a-zA-Z0-9_-]+)$

StackOverflow
^http(?:s)?:\/\/(?:www\.)?stackoverflow\.com\/(.+)$

[QUESTION] Use or not use https as default

Most social networks use https or redirect entries on port 80 to 443 ... unknown that does not use https ๐Ÿ˜จ

In cases (obvious) where the service uses https but the user informs http ... the link may even be correct because it points to a valid endpoint once the application server redirects it, however in my opinion (not based on a normalization) the regex should fail (deny)

The GitHub for example (the regex) in my view is incorrect because the site itself does not transmit data if not by https and the www subdomain is not used

// how are you now
http(s)?:\/\/(www\.)?github\.com/[A-z 0-9 _ -]+\/?

// perhaps a more correct approach
https:\/\/github\.com/[a-zA-Z0-9_-]+\/?

Add test case for #23

Let's add a test case for #23 to ensure people see examples of accented slugs in the wild ๐Ÿ˜

Support Recruiter profiles for LinkedIn

LinkedIn Recruiter urls do not have an in or pub on the path but recruiter.

Example : http(s)?:\/\/([\w]+\.)?linkedin\.com\/(in|pub|recruiter)\/[A-z0-9_-]+\/?

Remove "www" and "s" from capture group?

I've been developing a library for server-side & client-side link identification & validation. I've been using the non-capturing groups (?:s)? and (?:www\.)?. The regex parser will use it to match the text, but ignores it later when returning the results and can make it easier to identify the account/user/media ID.

Example: ^http(?:s)?:\/\/(?:www\.)?facebook\.com\/([a-zA-Z0-9_]+)$

RegEx ASCII Special Characters

As I was testing some patterns, I noticed that you used 'A-z' for alphanumeric characters. When you do that, it will match any character between 65-122 ASCII, that includes lowercase, uppercase, and a few special characters, including underscores. To fix that and only include characters a-z(97-122) and A-Z(65-90), you need to change 'A-z' to 'a-zA-Z'

Why are tho two forward slashes not optional in URLs?

For example https://www.facebook.com and //www.facebook.com would work, whereas www.facebook.com wouldn't work.

Is this an error?

It would be beneficial to have regex's not require the https:// for the off chance they're not specified in some document being scraped

How can i get in RE2 syntax?

(?:https?:)?//(?:www.)?(?:facebook|fb).com/(?P(?![A-z]+.php)(?!marketplace|gaming|watch|me|messages|help|search|groups)[A-z0-9_-.]+)/?

my string looks like this

[
   {
      "node":{
         "id":"100084753152635",
         "url":"https:\/\/www.facebook.com\/profile.php?id=100084753152635",
         "name":"Hannah"
      }
   },
   {
      "node":{
         "id":"100049247496610",
         "__isProfile":"User",
         "url":"https:\/\/www.facebook.com\/sayar.tole.31",
         "name":"\u1010\u102d\u102f\u1038 \u101d\u1031 \u101c\u1004\u103a\u1038"
      }
   }
]

Youtube regex

https?://[w.]{4}?(youtube.com/user/[A-z_0-9-.]{2,100})

Regexes that filter username part from the regex

I noticed that the regexes used do not provide username back as part of the regex. An example on how to do this is:

(?:(?:http|https)://)?(?:www.|m.)?facebook.com/(?!home.php)(?:(?:\w)#!/)?(?:pages/)?(?:[?\w-]/)?(?:profile.php?id=(?=\d.*))?([\w.-]+)$

This allows you to filter out all the things you do not need and only return username as a result.
Wouldn't it be a good idea to take this as the standard for this repo?

Example for instagram that only returns username:
https://regex101.com/r/DBVLCq/1

(Might also be a good idea to always include a regex101.com address so people can add tests and improve regexes

Fix linkedin recruiter and add /talent url regex

There is a bug in the linkedin recruiter URL regex, it lacks the "profile" part

This is how the link should be
https://www.linkedin.com/recruiter/profile/476162262,HHNH,name

Also, the URL regex for the "talent" endpoint is missing
https://www.linkedin.com/talent/profile/AEEAABxhqNYBJa6QzJWImzsC_q6ugSZg2H6s7pA

Please let me know if you'd like a pull request or you'll fix them by yourself :)

Edit: /sales is also missing.
I'll create a regex and a PR later :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.