Code Monkey home page Code Monkey logo

josephlimtech / linkedin-profile-scraper-api Goto Github PK

View Code? Open in Web Editor NEW
482.0 8.0 138.0 11.01 MB

๐Ÿ•ต๏ธโ€โ™‚๏ธ LinkedIn profile scraper returning structured profile data in JSON.

License: MIT License

TypeScript 100.00%
puppeteer nodejs scraper scraping scraping-websites website-scraper json linkedin-profile-scraper linkedin scrapers crawler crawling spider expressjs linkedin-scraper linkedin-scraping linkedin-bot linkedin-crawler profile-data linkedin-profile

linkedin-profile-scraper-api's Introduction

LinkedIn Profile Scraper

LinkedIn profile scraper using Puppeteer headless browser. So you can use it on a server. Returns structured data in JSON format.

This scraper will extract publicly available data:

๐Ÿง‘โ€๐ŸŽจ Profile: name, title, location, picture, description and url

๐Ÿ‘จโ€๐Ÿ’ผ Experiences: title, company name, location, duration, start date, end date and the description

๐Ÿ‘จโ€๐ŸŽ“ Education: school name, degree name, start date and end date

๐Ÿฆธ Volunteer experiences: title, company, description, start date and end date name

๐Ÿ‹๏ธโ€โ™‚๏ธ Skills: name and endorsement count

All dates are formatted to a generic format.

Commercial Alternative: Proxycurl LinkedIn APIs

Proxycurl APIs enrich people and company profiles with structured data

Scrape public LinkedIn profile data at scale with Proxycurl APIs.

  • Scraping Public profiles are battle tested in court in HiQ VS LinkedIn case.
  • GDPR, CCPA, SOC2 compliant
  • High rate limit - 300 requests/minute
  • Fast - APIs respond in ~2s
  • Fresh data - 88% of data is scraped real-time, other 12% are not older than 29 days
  • High accuracy
  • Tons of data points returned per profile

Built for developers, by developers.

Getting started

In order to scrape LinkedIn profiles, you need to make sure the scraper is logged-in into LinkedIn. For that you need to find your account's session cookie. I suggest creating a new account on LinkedIn and enable all the privacy options so people don't see you visiting their profiles when using the scraper.

  1. Create a new account on LinkedIn, or use one you already have
  2. Login to that account using your browser
  3. Open your browser's Dev Tools to find the cookie with the name li_at. Use that value for sessionCookieValue when setting up the scraper.
  4. Install: npm install linkedin-profile-scraper

Usage

// TypeScript
import { LinkedInProfileScraper } from 'linkedin-profile-scraper';

// Plain Javascript
// const { LinkedInProfileScraper } = require('linkedin-profile-scraper')

(async() => {
  const scraper = new LinkedInProfileScraper({
    sessionCookieValue: 'LI_AT_COOKIE_VALUE',
    keepAlive: false
  });

  // Prepare the scraper
  // Loading it in memory
  await scraper.setup()

  const result = await scraper.run('https://www.linkedin.com/in/someone/')
  
  console.log(result)
})()

See Example response for an example response.

Faster recurring scrapes

Set keepAlive to true to keep Puppeteer running in the background for faster recurring scrapes. This will keep your memory usage high as Puppeteer will sit idle in the background.

By default the scraper will close after a successful scrape. Freeing up your memory.

Detect when session is expired

Known LinkedIn sessions could expire after some time. This usually happens when you do not use LinkedIn for a while. The scraper can notify you about this specifically, so you can act upon that.

You should obtain a new li_at cookie value from the LinkedIn.com website when this error shows and update the sessionCookieValue with that new value. For that, follow the Getting started steps above in this readme.

(async() => {
  try {
    const scraper = new LinkedInProfileScraper({
      sessionCookieValue: 'LI_AT_COOKIE_VALUE'
    });

    await scraper.setup()

    const result = await scraper.run('https://www.linkedin.com/in/someone/')
  
    console.log(result)
  } catch (err) {
    if (err.name === 'SessionExpired) {
      // Do something when the scraper notifies you it's not logged-in anymore
    }
  }
})()

Example response

{
  "userProfile": {
    "fullName": "Nat Friedman",
    "title": "CEO at GitHub",
    "location": {
      "city": "San Francisco",
      "province": "California",
      "country": null
    },
    "photo": "https://media-exp1.licdn.com/dms/image/C5603AQE4l0oJ3jxk9A/profile-displayphoto-shrink_200_200/0?e=1595462400&v=beta&t=Q3Ai4vRCFwUXK_M68mCFaQNn8kZF6QGJnuupJV0lFwo",
    "description": null,
    "url": "https://www.linkedin.com/in/natfriedman/"
  },
  "experiences": [
    {
      "title": "CEO",
      "company": "GitHub",
      "employmentType": null,
      "location": {
        "city": null,
        "province": "San Francisco Bay",
        "country": null
      },
      "startDate": "2018-10-01T00:00:00+02:00",
      "endDate": "2020-05-20T17:23:33+02:00",
      "endDateIsPresent": true,
      "description": null,
      "durationInDays": 598
    },
    {
      "title": "Cofounder",
      "company": "AI Grant",
      "employmentType": null,
      "location": null,
      "startDate": "2017-04-01T00:00:00+02:00",
      "endDate": "2020-05-20T17:23:33+02:00",
      "endDateIsPresent": true,
      "description": "AI Grant is a non-profit distributed research lab. We fund brilliant minds across the world to work on cutting-edge artificial intelligence projects. Almost all of the work we sponsor is open source, and we evaluate applications on the strength of the applicants and their ideas, with no credentials required. To date we have provided over $500,000 in money and resources to more than fifty teams around the world. โ€ฆ see more",
      "durationInDays": 1146
    },
    {
      "title": "Cofounder, Chairman",
      "company": "California YIMBY",
      "employmentType": null,
      "location": {
        "city": "Sacramento",
        "province": "California",
        "country": null
      },
      "startDate": "2017-04-01T00:00:00+02:00",
      "endDate": "2020-05-20T17:23:33+02:00",
      "endDateIsPresent": true,
      "description": "California YIMBY is a statewide housing policy organization I cofounded in 2017. Our goal is to systematically alter state policy to increase the rate of home building and address the statewide housing shortage in California. Denser housing reduces poverty, increases intergenerational socieconomic mobility, reduces carbon emissions, accelerates economic growth, reduces displacement, and improves well-being. โ€ฆ see more",
      "durationInDays": 1146
    },
    {
      "title": "Corporate Vice President, Developer Services",
      "company": "Microsoft",
      "employmentType": null,
      "location": {
        "city": "San Francisco",
        "province": null,
        "country": null
      },
      "startDate": "2016-03-01T00:00:00+01:00",
      "endDate": "2020-05-20T17:23:33+02:00",
      "endDateIsPresent": true,
      "description": "At Microsoft I am responsible for Visual Studio Team Services and App Center, application lifecycle and devops services with more than a million users. I am also responsible for the internal company-wide engineering systems.",
      "durationInDays": 1542
    },
    {
      "title": "Advisor",
      "company": "Stripe",
      "employmentType": null,
      "location": null,
      "startDate": "2011-05-01T00:00:00+02:00",
      "endDate": "2020-05-20T17:23:33+02:00",
      "endDateIsPresent": true,
      "description": "We were one of Stripe's first customers before they launched, and since then I've enjoyed advising them on matters large and small. Patrick and John are amazing and I'm certain I've learned more than I've helped!",
      "durationInDays": 3308
    },
    {
      "title": "CEO and Cofounder",
      "company": "Xamarin",
      "employmentType": null,
      "location": {
        "city": "San Francisco",
        "province": null,
        "country": null
      },
      "startDate": "2011-05-01T00:00:00+02:00",
      "endDate": "2016-03-01T00:00:00+01:00",
      "endDateIsPresent": false,
      "description": "At Xamarin our mission was to make mobile development fast, easy, and fun. Xamarin allows developers to write native apps for iOS and Android using C#. We also built a number of other mobile development tools, including app monitoring and testing services, and a developer training program.In 4.5 years, the company grew to $50M ARR and thousands of paying customers. Xamarin is where I fell in love with go-to-market and sales. We built two successful sales models: a high-velocity, low-touch inside sales model, and a more traditional enterprise model. Both models benefited from strong inbound marketing and thousands of daily signups.The company was 350 people when we were acquired by Microsoft. โ€ฆ see more",
      "durationInDays": 1767
    },
    {
      "title": "Chief Technology Officer, Open Source",
      "company": "Novell",
      "employmentType": null,
      "location": null,
      "startDate": "2003-01-01T00:00:00+01:00",
      "endDate": "2009-01-01T00:00:00+01:00",
      "endDateIsPresent": false,
      "description": "After Novell bought my company, I led all the Linux client efforts and served as CTO for open source. I also ran GroupWise, a $150M collaboration product.",
      "durationInDays": 2193
    },
    {
      "title": "Cofounder, Chairman",
      "company": "GNOME Foundation",
      "employmentType": null,
      "location": null,
      "startDate": "2000-01-01T00:00:00+01:00",
      "endDate": "2003-01-01T00:00:00+01:00",
      "endDateIsPresent": false,
      "description": "The GNOME Foundation works to further the goal of the GNOME project: to create a computing platform for use by the general public that is composed entirely of free software. Many companies were involved in GNOME, so I co-created the GNOME Foundation to provide independent governance for the project. It was one of the first such open source foundations.",
      "durationInDays": 1097
    },
    {
      "title": "Cofounder, CEO",
      "company": "Ximian",
      "employmentType": null,
      "location": null,
      "startDate": "1999-01-01T00:00:00+01:00",
      "endDate": "2003-01-01T00:00:00+01:00",
      "endDateIsPresent": false,
      "description": null,
      "durationInDays": 1462
    }
  ],
  "education": [
    {
      "schoolName": "Massachusetts Institute of Technology",
      "degreeName": "Bachelor of Science",
      "fieldOfStudy": "Mathematics",
      "startDate": "1995-01-01T00:00:00+01:00",
      "endDate": "1999-01-01T00:00:00+01:00",
      "durationInDays": 1462
    },
    {
      "schoolName": "Massachusetts Institute of Technology",
      "degreeName": "Bachelor of Science",
      "fieldOfStudy": "Computer Science",
      "startDate": "1995-01-01T00:00:00+01:00",
      "endDate": "1999-01-01T00:00:00+01:00",
      "durationInDays": 1462
    }
  ],
  "volunteerExperiences": [],
  "skills": [
    {
      "skillName": "Mobile Applications",
      "endorsementCount": 99
    },
    {
      "skillName": "Software Development",
      "endorsementCount": 99
    },
    {
      "skillName": "Cloud Computing",
      "endorsementCount": 96
    },
    {
      "skillName": "Software Engineering",
      "endorsementCount": 57
    },
    {
      "skillName": "Start-ups",
      "endorsementCount": 40
    },
    {
      "skillName": "Agile Methodologies",
      "endorsementCount": 25
    },
    {
      "skillName": "Product Management",
      "endorsementCount": 24
    },
    {
      "skillName": "Web Applications",
      "endorsementCount": 20
    },
    {
      "skillName": "Web Services",
      "endorsementCount": 13
    },
    {
      "skillName": "Entrepreneurship",
      "endorsementCount": 12
    },
    {
      "skillName": "User Experience",
      "endorsementCount": 9
    },
    {
      "skillName": "Startups",
      "endorsementCount": 5
    },
    {
      "skillName": "Strategy",
      "endorsementCount": 1
    },
    {
      "skillName": "SaaS",
      "endorsementCount": 63
    },
    {
      "skillName": "Open Source",
      "endorsementCount": 46
    },
    {
      "skillName": "Linux",
      "endorsementCount": 30
    },
    {
      "skillName": "C#",
      "endorsementCount": 24
    },
    {
      "skillName": "Git",
      "endorsementCount": 5
    },
    {
      "skillName": "AJAX",
      "endorsementCount": 3
    },
    {
      "skillName": "jQuery",
      "endorsementCount": 2
    },
    {
      "skillName": "Ruby",
      "endorsementCount": 1
    },
    {
      "skillName": "Strategic Partnerships",
      "endorsementCount": 20
    },
    {
      "skillName": "Leadership",
      "endorsementCount": 0
    },
    {
      "skillName": "Management",
      "endorsementCount": 0
    }
  ]
}

About using the session cookie

This module uses the session cookie of a succesfull login into LinkedIn, instead of an e-mail and password to set you logged-in. I did this because LinkedIn has security measures by blocking login requests from unknown locations or requiring you to fill in Captcha's upon login. So, if you run this from a server and try to login with an e-mail address and password, your login could be blocked. By using a known session, we prevent this from happening and allows you to use this scraper on any server on any location.

So, using a session cookie is the most reliable way that I currently know.

You probably need to follow the setup steps when the scraper logs show it's not logged-in anymore.

About the performance

  • Upon start the module will open a headless browser session using Chromium. That session could be kept alive using the keepAlive option. Chromium uses about 75MB memory when in idle.
  • Scraping usually takes a few seconds, because the script needs to scroll through the page and expand several elements in order for all the data to appear.

Usage limits

LinkedIn has some usage limits in place. Please respect those and use their options to increase limits. More info: LinkedIn Commercial Use Limit

linkedin-profile-scraper-api's People

Contributors

josephlimtech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linkedin-profile-scraper-api's Issues

Make own LinkedIn Bot With Google Sheets and Google Apps Script

Hello,
I am trying to create a LinkedIn bot with Google Sheets and Google Apps Script to posts automatically on my LinkedIn company page, like I managed to do with my Twitter account following this tutorial: https://javascript.plainenglish.io/build-your-own-twitter-bot-with-google-sheets-d9a8ef955fa1

The only tutorials I found for LinkedIn are with postman to make the LinkedIn API work but no way to schedule posts from a prepared list.

Thank you for your help.

Add support to get the cookie by login credentials

Hey, i would love adding a login with email/password feature as an alternative to grabbing the cookie value by hand, I've already made those additions in my cloned version and i would love sending a PR about it.

it supports the following:

  • login using predefined credentials as ENV variables (working both on headless and non-headless mode)
  • login manually prompt (browser will open on the linkedin login screen allowing the user to enter his email & password and then grabbing the cookie and append it)

i think it removes the clutter and overwhelming part of with trying to find your cookie value by yourself makes it much more easier to use, and in addition you can invalidate and make a better retry mechanism.

Thanks

How to get other languages than primary one? How to avoid the "see more..." as in the code sample in the README.md

Hi @jvandenaardweg and thanks for sharing this.

I am currently trying to build my resume using the OAuth2 api, the unofficial API (see eilonmore/linkedin-private-api#31) and this plugin.

I have found different issue:

  • I can't get the english profile, only the primary and french one (unless I use the )
  • I can't expand the "see more..." on an experience description (as the code sample in the README.md) making this plugin not totally efficient to retrieve all the data.

Do you have any idea on how I can bypass this?

Empty user profile data in some instances

Hello,

I've been trying to setup this script but I've noticed that in some instances, the script is returning empty user profile data for no apparent reason.

Here are a few profiles that return as empty despite having actual content:

https://www.linkedin.com/in/matthew-lazarowitz/
https://www.linkedin.com/in/vamsee-krishna-081a3a17/
https://www.linkedin.com/in/lohith-rangappa-16325644/

All these profiles essentially return:

Got user profile data: {"fullName":null,"title":null,"location":null,"photo":null,"description":null,"url":"https://www.linkedin.com/feed/"}

Is this a known issue? Is there any way I can resolve this?

Thank you

Hi @jvandenaardweg ,

Thank you for building and sharing this awesome resource! I've learned a lot just by reading through your code. For example, the way you bypass reCaptcha using the session cookie was so simple, effective, and helpful - compared to using 3rd party api to solve reCaptcha.

Anyways, thanks again! =]

David

Extension

How do i use this as a chrome extension,if possible?

This is not an issue!

If you provide some extra Linkedin profile links from 'people you may know. It'll be helpful to scrapping data in a loop from others' profiles.

I've made some changes in the code and I'm able to do scrapping data in a loop. If you permit me to push the changes please let me know. Or you'll do it by yourself, it'll help a lot.

Example:

{
   ...
    volunteerExperiences: [],
  skills: [
    { skillName: 'C', endorsementCount: 31 },
    { skillName: 'Entrepreneurship', endorsementCount: 22 },
    { skillName: 'Semiconductors', endorsementCount: 17 }
  ],
  people: [
    { link: 'https://www.linkedin.com/in/kajal-nadar-419256102/' },
    { link: 'https://www.linkedin.com/in/aashishdua/' },
    { link: 'https://www.linkedin.com/in/yilin-ding/' },
    { link: 'https://www.linkedin.com/in/ashokcherian/' },
    { link: 'https://www.linkedin.com/in/lopa-detroja/' }
  ]
}

Please bring the 'people'. It'll help a lot.

What's the best way to update session cookies on a scraper running on server after expiry?

For me, new free accounts seem to get blocked after 50 profiles/day, sessions seem to randomly expire.

When the session cookie expires, as per the readme, we're supposed to use a new one. But we can't really keep doing this manual step every time when scraping at scale (think millions of accounts), so I'm wondering if there's a way to automate this.

Any ideas?

Save output in DB

Does anyone developed saving the output on a mongoDB and could share an example? Tks

Experience link to button

They have change the experience more button from:
#experience-section .pv-profile-section__see-more-inline.link
to
#experience-section button.pv-profile-section__see-more-inline

(src/index.ts line 539)

UnhandledPromiseRejectionWarning: TypeError: linkedin_profile_scraper_1.default is not a constructor

This first showed up, when executing my test with ts-jest. But it looks like it's a general issue.

node: v14.2.0
typescript: v3.9.5
ts-node: v8.10.1

ยป npx ts-node src/profile/linkedin/LinkedInCrawler.ts 
(node:58376) UnhandledPromiseRejectionWarning: TypeError: linkedin_profile_scraper_1.default is not a constructor
    at new LinkedInCrawler (/workspace/linkedin/LinkedInCrawler.ts:11:20)
    at /workspace/linkedin/LinkedInCrawler.ts:33:27
    at step (/workspace/linkedin/LinkedInCrawler.ts:33:23)
    at Object.next (/workspace/linkedin/LinkedInCrawler.ts:14:53)
    at /workspace/linkedin/LinkedInCrawler.ts:8:71
    at new Promise (<anonymous>)
    at __awaiter (/workspace/linkedin/LinkedInCrawler.ts:4:12)
    at test (/workspace/linkedin/LinkedInCrawler.ts:32:14)
    at Object.<anonymous> (/workspace/linkedin/LinkedInCrawler.ts:40:1)
    at Module._compile (internal/modules/cjs/loader.js:1176:30)
import LinkedInProfileScraper from "linkedin-profile-scraper";

class LinkedInCrawler {
  private cookie: string;
  private scraper;
  constructor(cookie) {
    this.cookie = cookie;
    this.scraper = new (LinkedInProfileScraper as any)({
      sessionCookieValue: this.cookie,
      keepAlive: true,
    });
  }

  public async loadProfileData(username: string) {
    // Prepare the scraper
    // Loading it in memory
    await this.scraper.setup();

    const result = await this.scraper.run(
      `https://www.linkedin.com/in/${username}/`
    );
    await this.scraper.close();
    return result;
  }
}

export { LinkedInCrawler };

const test = async () => {
  const linkedInCrawler = new LinkedInCrawler(
    "xxxxxxx"
  );
  const profile = await linkedInCrawler.loadProfileData("my-user");
  console.log(profile);
};

test();

Version 2: NPM package, Javascript module, TypeScript

I'll change the package a little bit so you can install it through npm, which allow you to use it the way you like to.

Progress can be followed here: https://github.com/jvandenaardweg/linkedin-profile-scraper/tree/next

Changes coming up:

  • Rewrite package to use TypeScript
  • Remove the server part and make it a javascript module which you can import in your own project
  • Add an example to use it on a server
  • Update readme
  • Package it up through NPM with an automatic Github Action and automatic versioning

all null data

Hi, I successfully installed the module. However, when I ran the scraper, I got all null results. I tried on several profiles, all null.
Is this npm module still working? or did I do something wrong?
null_data

Add proxy server for puppeteer usage

Hey,
I know puppeteer has an option to add a proxy server to the flow, I wonder why you didn't give this option as well.
I add this option to your code, please give me permission to PR, if you like this addition.
for my needs, it really helps!

Thanks,
Elior.

Error: Page crashed!

Hello,

I'm using yout project, and since yesterday i'm getting this error:

UnhandledPromiseRejectionWarning: Error: Page crashed!
at Page._onTargetCrashed (C:\Users\User01\Documents\linkedin-profile-scraper\node_modules\puppeteer\lib\Page.js:213:24)
at CDPSession. (C:\Users\User01\Documents\linkedin-profile-scraper\node_modules\puppeteer\lib\Page.js:122:56)
at CDPSession.emit (events.js:311:20)
at CDPSession._onMessage (C:\Users\User01\Documents\linkedin-profile-scraper\node_modules\puppeteer\lib\Connection.js:200:12)
at Connection._onMessage (C:\Users\User01\Documents\linkedin-profile-scraper\node_modules\puppeteer\lib\Connection.js:112:17)
at WebSocket. (C:\Users\User01\Documents\linkedin-profile-scraper\node_modules\puppeteer\lib\WebSocketTransport.js:44:24)
at WebSocket.onMessage (C:\Users\User01\Documents\linkedin-profile-scraper\node_modules\ws\lib\event-target.js:120:16)
at WebSocket.emit (events.js:311:20)
at Receiver.receiverOnMessage (C:\Users\User01\Documents\linkedin-profile-scraper\node_modules\ws\lib\websocket.js:789:20)
at Receiver.emit (events.js:311:20)
(node:2656) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:2656) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Are you planning to fix this?

No target with given id found

Hello,

I am facing this weird issue where after fetching details for a few profiles, the script essentially fails. Here's what the error looks like.

Fetching data for https://www.linkedin.com/in/amiel-hussain-7a18159/
Scraper (close): Closing browser...
Scraper (close): Closed browser!
Scraper (close): Killing browser process pid: 614045...
Scraper (setup page): An error occurred during page setup.
Scraper (setup page): Protocol error (Target.attachToTarget): No target with given id found
Scraper (close): Closing browser...
Scraper (close): Closed browser!
Scraper (close): Killing browser process pid: 614045...
Scraper (run): An error occurred during a run.
Error: Protocol error (Target.attachToTarget): No target with given id found
    at /home/karan/projects/linkedin-profile-scraper-js/node_modules/puppeteer/lib/Connection.js:57:63
    at new Promise (<anonymous>)
    at Connection.send (/home/karan/projects/linkedin-profile-scraper-js/node_modules/puppeteer/lib/Connection.js:56:16)
    at Connection.createSession (/home/karan/projects/linkedin-profile-scraper-js/node_modules/puppeteer/lib/Connection.js:127:42)
    at Target._sessionFactory (/home/karan/projects/linkedin-profile-scraper-js/node_modules/puppeteer/lib/Browser.js:76:88)
    at Target.createCDPSession (/home/karan/projects/linkedin-profile-scraper-js/node_modules/puppeteer/lib/Target.js:54:21)
    at LinkedInProfileScraper.<anonymous> (/home/karan/projects/linkedin-profile-scraper-js/node_modules/linkedin-profile-scraper/dist/index.js:115:53)
    at Generator.next (<anonymous>)
    at fulfilled (/home/karan/projects/linkedin-profile-scraper-js/node_modules/tslib/tslib.js:111:62)
    at runMicrotasks (<anonymous>) {
  message: 'Protocol error (Target.attachToTarget): No target with given id found'
}
Error in setting data TypeError: Cannot read property 'userProfile' of undefined
    at generateEmail (/home/karan/projects/linkedin-profile-scraper-js/index.js:33:26)
    at /home/karan/projects/linkedin-profile-scraper-js/index.js:84:24

Any ideas on why this is happening?

How to fix timeout error

image

After running my app the following error shows:
"UnhandledPromiseRejectionWarning: TimeoutError: Navigation timeout of 10000 ms exceeded"

I have also tried out by installing puppeteer and overwriting the default "timeout: 10000" through this way:
"page.setDefaultNavigationTimeout(0);"

but nothing happens. this error still persists. How to fix that?

Node is either not visible or not an HTMLElement

I think you can put a try - catch if can not find button, what do you think about it?
/dist/index.js:264

Scraper (run): An error occurred during a run. (node:13284) UnhandledPromiseRejectionWarning: Error: Node is either not visible or not an HTMLElement at ElementHandle._clickablePoint (/Users/christiancannata/projects/linkedin_spider/node_modules/puppeteer/lib/JSHandle.js:170:19) at process._tickCallback (internal/process/next_tick.js:68:7) -- ASYNC -- at ElementHandle.<anonymous> (/Users/christiancannata/projects/linkedin_spider/node_modules/puppeteer/lib/helper.js:94:19) at LinkedInProfileScraper.<anonymous> (/Users/christiancannata/projects/linkedin_spider/node_modules/linkedin-profile-scraper/dist/index.js:264:42) at Generator.next (<anonymous>) at fulfilled (/Users/christiancannata/projects/linkedin_spider/node_modules/tslib/tslib.js:112:62) at process._tickCallback (internal/process/next_tick.js:68:7)

Error: Failed to launch the browser process!

I'm getting this error on Ubuntu and MacOS. Tried installing chromium. Can I redirect puppeteer to this installation? Any suggestions?

Scraper (setup): Launching puppeteer in the background...
Scraper (setup): An error occurred during setup.
/home/norman/dev/scrape-playground/scrape-playground/node_modules/linkedin-profile-scraper/node_modules/puppeteer/lib/launcher/BrowserRunner.js:189 
            reject(new Error([
                   ^

Error: Failed to launch the browser process!
/home/norman/dev/scrape-playground/scrape-playground/node_modules/linkedin-profile-scraper/node_modules/puppeteer/.local-chromium/linux-756035/chrome-linux/chrome: error while loading shared libraries: libXss.so.1: cannot open shared object file: No such file or directory


TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md

    at onClose (/home/norman/dev/scrape-playground/scrape-playground/node_modules/linkedin-profile-scraper/node_modules/puppeteer/lib/launcher/BrowserRunner.js:189:20)
    at Interface.<anonymous> (/home/norman/dev/scrape-playground/scrape-playground/node_modules/linkedin-profile-scraper/node_modules/puppeteer/lib/launcher/BrowserRunner.js:179:65)
    at Interface.emit (node:events:525:35)
    at Interface.close (node:readline:590:8)
    at Socket.onend (node:readline:280:10)
    at Socket.emit (node:events:525:35)
    at endReadableNT (node:internal/streams/readable:1358:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)

Vercel Error: Could not find browser revision [number]

Hi!

I'm testing this amazing library in a proof of concept and I had a problem with a deploy at Vercel. My suspicion is that vercel needs to deal with browser binary in other way.

The complete error log is:

Scraper (constructing): Using options: {"sessionCookieValue":"<cookieValue>","keepAlive":true,"timeout":10000,"userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36","headless":true}
Scraper (setup): Launching puppeteer in the background...
Scraper (setup): An error occurred during setup.
2021-01-24T13:38:28.641Z	01798364-a27e-4032-b9e4-b7bf834a6870	ERROR	Unhandled Promise Rejection 	{"errorType":"Runtime.UnhandledPromiseRejection","errorMessage":"Error: Could not find browser revision 756035. Run \"npm install\" or \"yarn install\" to download a browser binary.","reason":{"errorType":"Error","errorMessage":"Could not find browser revision 756035. Run \"npm install\" or \"yarn install\" to download a browser binary.","stack":["Error: Could not find browser revision 756035. Run \"npm install\" or \"yarn install\" to download a browser binary.","    at ChromeLauncher.launch (/var/task/node_modules/puppeteer/lib/Launcher.js:81:23)"]},"promise":{},"stack":["Runtime.UnhandledPromiseRejection: Error: Could not find browser revision 756035. Run \"npm install\" or \"yarn install\" to download a browser binary.","    at process.<anonymous> (/var/runtime/index.js:35:15)","    at process.emit (events.js:326:22)","    at processPromiseRejections (internal/process/promises.js:209:33)","    at processTicksAndRejections (internal/process/task_queues.js:98:32)"]}
Unknown application error occurred

An exemple using aws-lambda

fullName in userProfile not return name but the total connections

Hi @jvandenaardweg, so I found a bug here. Let me reproduce the bug first.

  1. When I run the code based on the example here the code works properly but the fullName return is not what I expected as you can see below. The fullName return total connections of profile LinkedIn

Screen Shot 2021-10-28 at 19 56 54

  1. When I jump into the code I see in this line the selector does not match as fullName in LinkedIn (picture below). I think we need to change the selector to h1 to match with element in fullName LinkedIn

Screen Shot 2021-10-28 at 19 57 33

cannot install

i using nodejs run terminal: npm install linkedin-profile-scraper

image

Scraper closes on it's own

Hey I put the keepalive attribute but the scraper session keeps closing on it's own.
Currently the only way I can use this is running the sessions in serial with the setup. If I put them in promise.call, it seems the navigation causes the process to die, and if I put it in serial without the setup, it will just self close while in the middle of the run?

ERR_TOO_MANY_REDIRECTS

Hi,

I'm seeing a lot of ERR_TOO_MANY_REDIRECTS responses from linkedin while trying to scrape profiles. Is there something we could do to avoid it?

Null Response

Trying out the profile in the given example and I get this:

{
userProfile: {
fullName: '500+ connections',
title: null,
location: null,
photo: null,
description: null,
url: 'https://www.linkedin.com/in/natfriedman/'
},
experiences: [],
education: [],
volunteerExperiences: [],
skills: []
}

Anyone having the same issue? I tried a few other profiles and I get a similar result

Log:

Screenshot 2022-09-18 184055

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.