Code Monkey home page Code Monkey logo

pubtrends's People

Contributors

annav1asova avatar ctrltz avatar olegs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

mrauha amirh-ra

pubtrends's Issues

Article published in 2013 was cited by article from 2004

image

Results from PostgreSQL (year is presented for citing article):

 pmid_citing | pmid_cited | year
-------------+------------+------
    15316650 |   23453633 | 2004

The XML file for article 15316650 contains 23453633 and several other articles published since 2004 in the ReferenceList section, so this is not parser's fault: https://www.ncbi.nlm.nih.gov/pubmed/?term=15316650&report=xml&format=text

The article 15316650 was revised in 2018, but I have no idea why the reference list could be changed, as full text of the article contains only valid references.

NumberFormatException

oleg-laptop:pubtrends oleg$ java -jar crawler/build/libs/crawler-dev.jar
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
11:25:12.590 [main] INFO  Created temporary directory: /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp9114403431170330541.tmp
11:25:27.656 [main] INFO  Deleting directory: /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp9114403431170330541.tmp
Exception in thread "main" java.lang.NumberFormatException: For input string: "pubmed19n0001.xml.gz"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:580)
	at java.lang.Integer.parseInt(Integer.java:615)
	at org.jetbrains.bio.pubtrends.crawler.PubmedFTPHandler.getNewXMLsList(PubmedFTPHandler.kt:98)
	at org.jetbrains.bio.pubtrends.crawler.PubmedFTPHandler.fetch(PubmedFTPHandler.kt:21)
	at org.jetbrains.bio.pubtrends.crawler.PubmedCrawler.update(PubmedCrawler.kt:50)
	at org.jetbrains.bio.pubtrends.MainKt.main(Main.kt:7)

Number of citations is 0 with latest baseline from PubMed on Dec, 14, 2018

oleg-laptop:pubtrends oleg$ java -jar crawler/build/libs/crawler-dev.jar 2>&1 | tee log.txt
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
14:51:22.721 [main] INFO  Created temporary directory: /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp
14:51:41.676 [main] INFO  Found 976 new file(s)
14:51:41.677 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0001.xml.gz: Downloading...
14:52:14.088 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0001.xml.gz: Unpacking...
14:52:15.301 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0001.xml: Parsing...
14:52:30.269 [main] INFO  Articles: 30000, keywords: 1325, citations: 0
14:52:30.270 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0001.xml: Storing...
14:52:34.869 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0001.xml: SUCCESS
14:52:34.869 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0002.xml.gz: Downloading...
14:53:01.561 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0002.xml.gz: Unpacking...
14:53:02.894 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0002.xml: Parsing...
14:53:12.243 [main] INFO  Articles: 30000, keywords: 1572, citations: 0
14:53:12.243 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0002.xml: Storing...
14:53:15.991 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0002.xml: SUCCESS
14:53:15.992 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0003.xml.gz: Downloading...
14:53:41.260 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0003.xml.gz: Unpacking...
14:53:42.509 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0003.xml: Parsing...
14:53:52.145 [main] INFO  Articles: 30000, keywords: 1831, citations: 0
14:53:52.145 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0003.xml: Storing...
14:53:54.862 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp3309464674076092605.tmp/pubmed19n0003.xml: SUCCESS

Example of <PubmedArticle> new file format: pubmed19n0011.xml:

  <PubmedArticle>
    <MedlineCitation Status="MEDLINE" Owner="NLM">
      <PMID Version="1">304751</PMID>
      <DateCompleted>
        <Year>1978</Year>
        <Month>04</Month>
        <Day>17</Day>
      </DateCompleted>
      <DateRevised>
        <Year>2018</Year>
        <Month>11</Month>
        <Day>13</Day>
      </DateRevised>
      <Article PubModel="Print">
        <Journal>
          <ISSN IssnType="Print">0007-1447</ISSN>
          <JournalIssue CitedMedium="Print">
            <Volume>1</Volume>
            <Issue>6110</Issue>
            <PubDate>
              <Year>1978</Year>
              <Month>Feb</Month>
              <Day>18</Day>
            </PubDate>
          </JournalIssue>
          <Title>British medical journal</Title>
          <ISOAbbreviation>Br Med J</ISOAbbreviation>
        </Journal>
        <ArticleTitle>Unilateral short thumb associated with bleeding duodenal reduplication.</ArticleTitle>
        <Pagination>
          <MedlinePgn>412</MedlinePgn>
        </Pagination>
        <AuthorList CompleteYN="Y">
          <Author ValidYN="Y">
            <LastName>Modlin</LastName>
            <ForeName>I M</ForeName>
            <Initials>IM</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>Spencer</LastName>
            <ForeName>J</ForeName>
            <Initials>J</Initials>
          </Author>
        </AuthorList>
        <Language>eng</Language>
        <PublicationTypeList>
          <PublicationType UI="D002363">Case Reports</PublicationType>
          <PublicationType UI="D016428">Journal Article</PublicationType>
        </PublicationTypeList>
      </Article>
      <MedlineJournalInfo>
        <Country>England</Country>
        <MedlineTA>Br Med J</MedlineTA>
        <NlmUniqueID>0372673</NlmUniqueID>
        <ISSNLinking>0007-1447</ISSNLinking>
      </MedlineJournalInfo>
      <CitationSubset>AIM</CitationSubset>
      <CitationSubset>IM</CitationSubset>
      <MeshHeadingList>
        <MeshHeading>
          <DescriptorName UI="D000015" MajorTopicYN="Y">Abnormalities, Multiple</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D000328" MajorTopicYN="N">Adult</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D014670" MajorTopicYN="N">Ampulla of Vater</DescriptorName>
          <QualifierName UI="Q000002" MajorTopicYN="N">abnormalities</QualifierName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D004380" MajorTopicYN="N">Duodenal Obstruction</DescriptorName>
          <QualifierName UI="Q000209" MajorTopicYN="N">etiology</QualifierName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D004386" MajorTopicYN="N">Duodenum</DescriptorName>
          <QualifierName UI="Q000002" MajorTopicYN="Y">abnormalities</QualifierName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D005260" MajorTopicYN="N">Female</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D006471" MajorTopicYN="N">Gastrointestinal Hemorrhage</DescriptorName>
          <QualifierName UI="Q000209" MajorTopicYN="Y">etiology</QualifierName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D013933" MajorTopicYN="N">Thumb</DescriptorName>
          <QualifierName UI="Q000002" MajorTopicYN="Y">abnormalities</QualifierName>
        </MeshHeading>
      </MeshHeadingList>
    </MedlineCitation>
    <PubmedData>
      <History>
        <PubMedPubDate PubStatus="pubmed">
          <Year>1978</Year>
          <Month>2</Month>
          <Day>18</Day>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="medline">
          <Year>1978</Year>
          <Month>2</Month>
          <Day>18</Day>
          <Hour>0</Hour>
          <Minute>1</Minute>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="entrez">
          <Year>1978</Year>
          <Month>2</Month>
          <Day>18</Day>
          <Hour>0</Hour>
          <Minute>0</Minute>
        </PubMedPubDate>
      </History>
      <PublicationStatus>ppublish</PublicationStatus>
      <ArticleIdList>
        <ArticleId IdType="pubmed">304751</ArticleId>
        <ArticleId IdType="pmc">PMC1602955</ArticleId>
      </ArticleIdList>
      <ReferenceList>
        <Reference>
          <Citation>Br J Surg. 1960 Mar;47:477-84</Citation>
          <ArticleIdList>
            <ArticleId IdType="pubmed">13797465</ArticleId>
          </ArticleIdList>
        </Reference>
        <Reference>
          <Citation>Am J Dig Dis. 1974 Jul;19(7):673-7</Citation>
          <ArticleIdList>
            <ArticleId IdType="pubmed">4209729</ArticleId>
          </ArticleIdList>
        </Reference>
        <Reference>
          <Citation>Br J Surg. 1972 Apr;59(4):324-6</Citation>
          <ArticleIdList>
            <ArticleId IdType="pubmed">4623190</ArticleId>
          </ArticleIdList>
        </Reference>
        <Reference>
          <Citation>Am J Surg. 1971 Sep;122(3):418-20</Citation>
          <ArticleIdList>
            <ArticleId IdType="pubmed">5570620</ArticleId>
          </ArticleIdList>
        </Reference>
        <Reference>
          <Citation>Arch Surg. 1967 Feb;94(2):301-6</Citation>
          <ArticleIdList>
            <ArticleId IdType="pubmed">6016282</ArticleId>
          </ArticleIdList>
        </Reference>
      </ReferenceList>
    </PubmedData>
  </PubmedArticle>

Related to #12

ParserTest should not work with real database

At the moment after processing with all instructions in README.md I get the following error:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Dec 14, 2018 10:59:21 AM org.postgresql.core.v3.ConnectionFactoryImpl log
WARNING: SQLException occurred while connecting to localhost:5432
org.postgresql.util.PSQLException: FATAL: role "biolabs" is not permitted to log in
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433)
	at org.postgresql.core.v3.QueryExecutorImpl.readStartupMessages(QueryExecutorImpl.java:2566)
	at org.postgresql.core.v3.QueryExecutorImpl.<init>(QueryExecutorImpl.java:131)
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:210)
	at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
	at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:195)
	at org.postgresql.Driver.makeConnection(Driver.java:452)
	at org.postgresql.Driver.connect(Driver.java:254)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:247)
	at org.jetbrains.exposed.sql.Database$Companion$connect$7.invoke(Database.kt:112)
	at org.jetbrains.exposed.sql.Database$Companion$connect$7.invoke(Database.kt:71)
	at org.jetbrains.exposed.sql.Database$Companion$doConnect$3.invoke(Database.kt:91)
	at org.jetbrains.exposed.sql.Database$Companion$doConnect$3.invoke(Database.kt:71)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManager$ThreadLocalTransaction$connectionLazy$1.invoke(ThreadLocalTransactionManager.kt:25)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManager$ThreadLocalTransaction$connectionLazy$1.invoke(ThreadLocalTransactionManager.kt:22)
	at kotlin.UnsafeLazyImpl.getValue(Lazy.kt:81)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManager$ThreadLocalTransaction.getConnection(ThreadLocalTransactionManager.kt:31)
	at org.jetbrains.exposed.sql.Transaction.getConnection(Transaction.kt)
	at org.jetbrains.exposed.sql.Database.getMetadata$exposed(Database.kt:17)
	at org.jetbrains.exposed.sql.Database$url$2.invoke(Database.kt:26)
	at org.jetbrains.exposed.sql.Database$url$2.invoke(Database.kt:15)
	at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
	at org.jetbrains.exposed.sql.Database.getUrl(Database.kt)
	at org.jetbrains.exposed.sql.Database$dialect$2.invoke(Database.kt:29)
	at org.jetbrains.exposed.sql.Database$dialect$2.invoke(Database.kt:15)
	at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
	at org.jetbrains.exposed.sql.Database.getDialect$exposed(Database.kt)
	at org.jetbrains.exposed.sql.vendors.DefaultKt.getCurrentDialect(Default.kt:341)
	at org.jetbrains.exposed.sql.vendors.DefaultKt.getCurrentDialectIfAvailable(Default.kt:345)
	at org.jetbrains.exposed.sql.Column.getOnUpdate$exposed(Column.kt:14)
	at org.jetbrains.exposed.sql.Table.nullable(Table.kt:399)
	at org.jetbrains.bio.pubtrends.crawler.Publications.<clinit>(DatabaseModel.kt:7)
	at org.jetbrains.bio.pubtrends.crawler.DatabaseHandler$1.invoke(DatabaseHandler.kt:31)
	at org.jetbrains.bio.pubtrends.crawler.DatabaseHandler$1.invoke(DatabaseHandler.kt:9)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction(ThreadLocalTransactionManager.kt:103)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:74)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:57)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction$default(ThreadLocalTransactionManager.kt:57)
	at org.jetbrains.bio.pubtrends.crawler.DatabaseHandler.<init>(DatabaseHandler.kt:24)
	at org.jetbrains.bio.pubtrends.crawler.PubmedCrawler.<init>(PubmedCrawler.kt:14)
	at org.jetbrains.bio.pubtrends.crawler.ParserTest.<clinit>(ParserTest.kt:9)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:250)
	at org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:260)
	at org.junit.runners.BlockJUnit4ClassRunner$2.runReflectiveCall(BlockJUnit4ClassRunner.java:309)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.BlockJUnit4ClassRunner.methodBlock(BlockJUnit4ClassRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:349)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:314)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:312)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:396)
	at org.junit.runners.Suite.runChild(Suite.java:128)
	at org.junit.runners.Suite.runChild(Suite.java:27)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:314)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:312)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:396)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

Dec 14, 2018 10:59:21 AM org.postgresql.Driver connect
SEVERE: Connection error: 
org.postgresql.util.PSQLException: FATAL: role "biolabs" is not permitted to log in
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433)
	at org.postgresql.core.v3.QueryExecutorImpl.readStartupMessages(QueryExecutorImpl.java:2566)
	at org.postgresql.core.v3.QueryExecutorImpl.<init>(QueryExecutorImpl.java:131)
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:210)
	at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
	at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:195)
	at org.postgresql.Driver.makeConnection(Driver.java:452)
	at org.postgresql.Driver.connect(Driver.java:254)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:247)
	at org.jetbrains.exposed.sql.Database$Companion$connect$7.invoke(Database.kt:112)
	at org.jetbrains.exposed.sql.Database$Companion$connect$7.invoke(Database.kt:71)
	at org.jetbrains.exposed.sql.Database$Companion$doConnect$3.invoke(Database.kt:91)
	at org.jetbrains.exposed.sql.Database$Companion$doConnect$3.invoke(Database.kt:71)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManager$ThreadLocalTransaction$connectionLazy$1.invoke(ThreadLocalTransactionManager.kt:25)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManager$ThreadLocalTransaction$connectionLazy$1.invoke(ThreadLocalTransactionManager.kt:22)
	at kotlin.UnsafeLazyImpl.getValue(Lazy.kt:81)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManager$ThreadLocalTransaction.getConnection(ThreadLocalTransactionManager.kt:31)
	at org.jetbrains.exposed.sql.Transaction.getConnection(Transaction.kt)
	at org.jetbrains.exposed.sql.Database.getMetadata$exposed(Database.kt:17)
	at org.jetbrains.exposed.sql.Database$url$2.invoke(Database.kt:26)
	at org.jetbrains.exposed.sql.Database$url$2.invoke(Database.kt:15)
	at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
	at org.jetbrains.exposed.sql.Database.getUrl(Database.kt)
	at org.jetbrains.exposed.sql.Database$dialect$2.invoke(Database.kt:29)
	at org.jetbrains.exposed.sql.Database$dialect$2.invoke(Database.kt:15)
	at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
	at org.jetbrains.exposed.sql.Database.getDialect$exposed(Database.kt)
	at org.jetbrains.exposed.sql.vendors.DefaultKt.getCurrentDialect(Default.kt:341)
	at org.jetbrains.exposed.sql.vendors.DefaultKt.getCurrentDialectIfAvailable(Default.kt:345)
	at org.jetbrains.exposed.sql.Column.getOnUpdate$exposed(Column.kt:14)
	at org.jetbrains.exposed.sql.Table.nullable(Table.kt:399)
	at org.jetbrains.bio.pubtrends.crawler.Publications.<clinit>(DatabaseModel.kt:7)
	at org.jetbrains.bio.pubtrends.crawler.DatabaseHandler$1.invoke(DatabaseHandler.kt:31)
	at org.jetbrains.bio.pubtrends.crawler.DatabaseHandler$1.invoke(DatabaseHandler.kt:9)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction(ThreadLocalTransactionManager.kt:103)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:74)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:57)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction$default(ThreadLocalTransactionManager.kt:57)
	at org.jetbrains.bio.pubtrends.crawler.DatabaseHandler.<init>(DatabaseHandler.kt:24)
	at org.jetbrains.bio.pubtrends.crawler.PubmedCrawler.<init>(PubmedCrawler.kt:14)
	at org.jetbrains.bio.pubtrends.crawler.ParserTest.<clinit>(ParserTest.kt:9)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:250)
	at org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:260)
	at org.junit.runners.BlockJUnit4ClassRunner$2.runReflectiveCall(BlockJUnit4ClassRunner.java:309)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.BlockJUnit4ClassRunner.methodBlock(BlockJUnit4ClassRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:349)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:314)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:312)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:396)
	at org.junit.runners.Suite.runChild(Suite.java:128)
	at org.junit.runners.Suite.runChild(Suite.java:27)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:314)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:312)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:396)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)


java.lang.ExceptionInInitializerError
	at org.jetbrains.bio.pubtrends.crawler.DatabaseHandler$1.invoke(DatabaseHandler.kt:31)
	at org.jetbrains.bio.pubtrends.crawler.DatabaseHandler$1.invoke(DatabaseHandler.kt:9)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction(ThreadLocalTransactionManager.kt:103)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:74)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:57)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction$default(ThreadLocalTransactionManager.kt:57)
	at org.jetbrains.bio.pubtrends.crawler.DatabaseHandler.<init>(DatabaseHandler.kt:24)
	at org.jetbrains.bio.pubtrends.crawler.PubmedCrawler.<init>(PubmedCrawler.kt:14)
	at org.jetbrains.bio.pubtrends.crawler.ParserTest.<clinit>(ParserTest.kt:9)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:250)
	at org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:260)
	at org.junit.runners.BlockJUnit4ClassRunner$2.runReflectiveCall(BlockJUnit4ClassRunner.java:309)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.BlockJUnit4ClassRunner.methodBlock(BlockJUnit4ClassRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:349)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:314)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:312)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:396)
	at org.junit.runners.Suite.runChild(Suite.java:128)
	at org.junit.runners.Suite.runChild(Suite.java:27)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:314)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:312)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:396)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: org.postgresql.util.PSQLException: FATAL: role "biolabs" is not permitted to log in
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433)
	at org.postgresql.core.v3.QueryExecutorImpl.readStartupMessages(QueryExecutorImpl.java:2566)
	at org.postgresql.core.v3.QueryExecutorImpl.<init>(QueryExecutorImpl.java:131)
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:210)
	at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
	at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:195)
	at org.postgresql.Driver.makeConnection(Driver.java:452)
	at org.postgresql.Driver.connect(Driver.java:254)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:247)
	at org.jetbrains.exposed.sql.Database$Companion$connect$7.invoke(Database.kt:112)
	at org.jetbrains.exposed.sql.Database$Companion$connect$7.invoke(Database.kt:71)
	at org.jetbrains.exposed.sql.Database$Companion$doConnect$3.invoke(Database.kt:91)
	at org.jetbrains.exposed.sql.Database$Companion$doConnect$3.invoke(Database.kt:71)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManager$ThreadLocalTransaction$connectionLazy$1.invoke(ThreadLocalTransactionManager.kt:25)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManager$ThreadLocalTransaction$connectionLazy$1.invoke(ThreadLocalTransactionManager.kt:22)
	at kotlin.UnsafeLazyImpl.getValue(Lazy.kt:81)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManager$ThreadLocalTransaction.getConnection(ThreadLocalTransactionManager.kt:31)
	at org.jetbrains.exposed.sql.Transaction.getConnection(Transaction.kt)
	at org.jetbrains.exposed.sql.Database.getMetadata$exposed(Database.kt:17)
	at org.jetbrains.exposed.sql.Database$url$2.invoke(Database.kt:26)
	at org.jetbrains.exposed.sql.Database$url$2.invoke(Database.kt:15)
	at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
	at org.jetbrains.exposed.sql.Database.getUrl(Database.kt)
	at org.jetbrains.exposed.sql.Database$dialect$2.invoke(Database.kt:29)
	at org.jetbrains.exposed.sql.Database$dialect$2.invoke(Database.kt:15)
	at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
	at org.jetbrains.exposed.sql.Database.getDialect$exposed(Database.kt)
	at org.jetbrains.exposed.sql.vendors.DefaultKt.getCurrentDialect(Default.kt:341)
	at org.jetbrains.exposed.sql.vendors.DefaultKt.getCurrentDialectIfAvailable(Default.kt:345)
	at org.jetbrains.exposed.sql.Column.getOnUpdate$exposed(Column.kt:14)
	at org.jetbrains.exposed.sql.Table.nullable(Table.kt:399)
	at org.jetbrains.bio.pubtrends.crawler.Publications.<clinit>(DatabaseModel.kt:7)
	... 42 more

Citations table index optimization

Query:

EXPLAIN ANALYSE SELECT C1.pmid_citing, C1.pmid_cited, C2.pmid_cited, P.year
        FROM Citations C1
        JOIN (VALUES (10660603), (15319361), (15356349), (16424025), (16627569), (16714284), (16847060), (16955484), (17081107), (17275731), (17534140), (18414039), (18443001), (18514625), (18555664), (18769112), (18787087), (18926585), (18971624), (19200882), (19245654), (19279323), (19380253), (19448702), (19478560), (19539012), (19602051), (19740975), (19805415), (19923900), (20047144), (20096035), (20139716), (20154608), (20157541), (20169165), (20363920), (20388102), (20437201), (20445122), (20519118), (20606252), (20676050), (20729871), (20739737), (20818934), (20886754), (20965424), (21115526), (21150328), (21157483), (21159787), (21179166), (21191146), (21212465), (21415462), (21428920), (21483039), (21483870), (21501117), (21520297), (21541762), (21555915), (21562229), (21572994), (21798089), (21840335), (21858089), (21917559), (21931802), (22115588), (22125056), (22246147), (22327552), (22354768), (22363791), (22388478), (22394614), (22408430), (22410287), (22468953), (22500797), (22546364), (22580468), (22672902), (22683661), (22817723), (22958933), (22960547), (22987149), (23006971), (23061800), (23239011), (23246968), (23255104), (23276696), (23325216), (23341224), (23363784), (23374718), (23399685), (23454756), (23454868), (23470275), (23517348), (23525940), (23555298), (23606170), (23625314), (23648089), (23686362), (23688930), (23702245), (23702336), (23734707), (23817674), (23850396), (23851366), (23884442), (23936371), (23982787), (24024901), (24178346), (24236459), (24296616), (24308993), (24324270), (24336084), (24350927), (24489988), (24496328), (24508508), (24518659), (24562770), (24589862), (24607448), (24677687), (24744983), (24774073), (24799956), (24821673), (24862022), (24866016), (24899720), (24915467), (24918639), (24981831), (25038772), (25040542), (25062253), (25088526), (25110610), (25239873), (25249372), (25258312), (25341517), (25348018), (25388238), (25449851), (25470422), (25476900), (25483712), (25491300), (25540326), (25553480), (25568097), (25587030), (25596147), (25655936), (25661995), (25686248), (25758051), (25776557), (25796566), (25807975), (25827254), (25902704), (25907074), (25926513), (26017155), (26051878), (26053964), (26059377), (26158292), (26178971), (26212055), (26298231), (26359950), (26378060), (26399781), (26404510), (26431550), (26463117), (26507311), (26566676), (26598823), (26639036), (26655726), (26670233), (26679354), (26750735), (26764052), (26780446), (26879375), (26890602), (26952863), (27012089), (27036037), (27048303), (27048648), (27059126), (27071307), (27091134), (27097372), (27168224), (27179948), (27211557), (27235806), (27304501), (27330287), (27392857), (27440779), (27486771), (27501743), (27591812), (27617277), (27619662), (27694325), (27698205), (27733247), (27757122), (27789294), (27812983), (27825071), (27875990), (27897112), (27902456), (27922821), (27934653), (27959964), (27974395), (27980219), (28005429), (28012437), (28115977), (28122334), (28244876), (28254759), (28257663), (28260296), (28264931), (28301572), (28315697), (28322571), (28329151), (28371119), (28455969), (28540646), (28554316), (28603284), (28626026), (28639903), (28675698), (28694093), (28721811), (28732480), (28807816), (28831286), (28874954), (28911171), (28918902), (28929674), (28944926), (28953887), (28971552), (29027899), (29048631), (29074705), (29101804), (29157832), (29163135), (29165314), (29183728), (29316844), (29369521), (29388072), (29407795), (29408453), (29441009), (29461635), (29467291), (29473507), (29502958), (29515755), (29530582), (29570707), (29574227), (29579543), (29611102), (29726032), (29749694), (29752839), (29753771), (29804557), (29897294), (29921885), (29991711), (30036188), (30050560), (30057669), (30140974), (30153655), (30190613), (30197681), (30263780), (30359321), (30389500), (30393593), (30443855), (30510618), (30542441), (30619240), (30853664), (30902093), (31032688)) AS C1T(pmid_cited) ON (C1.pmid_cited = C1T.pmid_cited)
        JOIN Citations C2
        JOIN (VALUES (10660603), (15319361), (15356349), (16424025), (16627569), (16714284), (16847060), (16955484), (17081107), (17275731), (17534140), (18414039), (18443001), (18514625), (18555664), (18769112), (18787087), (18926585), (18971624), (19200882), (19245654), (19279323), (19380253), (19448702), (19478560), (19539012), (19602051), (19740975), (19805415), (19923900), (20047144), (20096035), (20139716), (20154608), (20157541), (20169165), (20363920), (20388102), (20437201), (20445122), (20519118), (20606252), (20676050), (20729871), (20739737), (20818934), (20886754), (20965424), (21115526), (21150328), (21157483), (21159787), (21179166), (21191146), (21212465), (21415462), (21428920), (21483039), (21483870), (21501117), (21520297), (21541762), (21555915), (21562229), (21572994), (21798089), (21840335), (21858089), (21917559), (21931802), (22115588), (22125056), (22246147), (22327552), (22354768), (22363791), (22388478), (22394614), (22408430), (22410287), (22468953), (22500797), (22546364), (22580468), (22672902), (22683661), (22817723), (22958933), (22960547), (22987149), (23006971), (23061800), (23239011), (23246968), (23255104), (23276696), (23325216), (23341224), (23363784), (23374718), (23399685), (23454756), (23454868), (23470275), (23517348), (23525940), (23555298), (23606170), (23625314), (23648089), (23686362), (23688930), (23702245), (23702336), (23734707), (23817674), (23850396), (23851366), (23884442), (23936371), (23982787), (24024901), (24178346), (24236459), (24296616), (24308993), (24324270), (24336084), (24350927), (24489988), (24496328), (24508508), (24518659), (24562770), (24589862), (24607448), (24677687), (24744983), (24774073), (24799956), (24821673), (24862022), (24866016), (24899720), (24915467), (24918639), (24981831), (25038772), (25040542), (25062253), (25088526), (25110610), (25239873), (25249372), (25258312), (25341517), (25348018), (25388238), (25449851), (25470422), (25476900), (25483712), (25491300), (25540326), (25553480), (25568097), (25587030), (25596147), (25655936), (25661995), (25686248), (25758051), (25776557), (25796566), (25807975), (25827254), (25902704), (25907074), (25926513), (26017155), (26051878), (26053964), (26059377), (26158292), (26178971), (26212055), (26298231), (26359950), (26378060), (26399781), (26404510), (26431550), (26463117), (26507311), (26566676), (26598823), (26639036), (26655726), (26670233), (26679354), (26750735), (26764052), (26780446), (26879375), (26890602), (26952863), (27012089), (27036037), (27048303), (27048648), (27059126), (27071307), (27091134), (27097372), (27168224), (27179948), (27211557), (27235806), (27304501), (27330287), (27392857), (27440779), (27486771), (27501743), (27591812), (27617277), (27619662), (27694325), (27698205), (27733247), (27757122), (27789294), (27812983), (27825071), (27875990), (27897112), (27902456), (27922821), (27934653), (27959964), (27974395), (27980219), (28005429), (28012437), (28115977), (28122334), (28244876), (28254759), (28257663), (28260296), (28264931), (28301572), (28315697), (28322571), (28329151), (28371119), (28455969), (28540646), (28554316), (28603284), (28626026), (28639903), (28675698), (28694093), (28721811), (28732480), (28807816), (28831286), (28874954), (28911171), (28918902), (28929674), (28944926), (28953887), (28971552), (29027899), (29048631), (29074705), (29101804), (29157832), (29163135), (29165314), (29183728), (29316844), (29369521), (29388072), (29407795), (29408453), (29441009), (29461635), (29467291), (29473507), (29502958), (29515755), (29530582), (29570707), (29574227), (29579543), (29611102), (29726032), (29749694), (29752839), (29753771), (29804557), (29897294), (29921885), (29991711), (30036188), (30050560), (30057669), (30140974), (30153655), (30190613), (30197681), (30263780), (30359321), (30389500), (30393593), (30443855), (30510618), (30542441), (30619240), (30853664), (30902093), (31032688)) AS C2T(pmid_cited) ON (C2.pmid_cited = C2T.pmid_cited)
        ON C1.pmid_citing = C2.pmid_citing AND C1.pmid_cited < C2.pmid_cited
        JOIN Publications P
        ON C1.pmid_citing = P.pmid
        LIMIT 100000;

In case of relatively short number of papers we see the following explain analyze report:

Limit  (cost=1716128.06..3443131.71 rows=3450 width=16) (actual time=79649.128..149931.732 rows=7930 loops=1)
  ->  Gather  (cost=1716128.06..3443131.71 rows=3450 width=16) (actual time=79649.127..149938.334 rows=7930 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Nested Loop  (cost=1715128.06..3441786.71 rows=1438 width=16) (actual time=82233.777..149838.343 rows=2643 loops=3)
              ->  Parallel Hash Join  (cost=1715127.49..3430160.88 rows=1438 width=16) (actual time=82232.856..149783.769 rows=2643 loops=3)
                    Hash Cond: (c1.pmid_citing = c2.pmid_citing)
                    Join Filter: (c1.pmid_cited < c2.pmid_cited)
                    Rows Removed by Join Filter: 6919
                    ->  Hash Join  (cost=8.12..1714918.72 rows=16052 width=8) (actual time=203.564..72203.375 rows=4275 loops=3)
                          Hash Cond: (c1.pmid_cited = "*VALUES*".column1)
                          ->  Parallel Seq Scan on citations c1  (cost=0.00..1450882.60 rows=70364660 width=8) (actual time=0.034..58069.652 rows=56291727 loops=3)
                          ->  Hash  (cost=4.06..4.06 rows=325 width=4) (actual time=0.330..0.330 rows=325 loops=3)
                                Buckets: 1024  Batches: 1  Memory Usage: 20kB
                                ->  Values Scan on "*VALUES*"  (cost=0.00..4.06 rows=325 width=4) (actual time=0.014..0.211 rows=325 loops=3)
                    ->  Parallel Hash  (cost=1714918.72..1714918.72 rows=16052 width=8) (actual time=77565.158..77565.158 rows=4275 loops=3)
                          Buckets: 65536  Batches: 1  Memory Usage: 1056kB
                          ->  Hash Join  (cost=8.12..1714918.72 rows=16052 width=8) (actual time=668.721..77555.931 rows=4275 loops=3)
                                Hash Cond: (c2.pmid_cited = "*VALUES*_1".column1)
                                ->  Parallel Seq Scan on citations c2  (cost=0.00..1450882.60 rows=70364660 width=8) (actual time=0.337..63168.461 rows=56291727 loops=3)
                                ->  Hash  (cost=4.06..4.06 rows=325 width=4) (actual time=2.525..2.525 rows=325 loops=3)
                                      Buckets: 1024  Batches: 1  Memory Usage: 20kB
                                      ->  Values Scan on "*VALUES*_1"  (cost=0.00..4.06 rows=325 width=4) (actual time=0.004..0.248 rows=325 loops=3)
              ->  Index Scan using publications_pkey on publications p  (cost=0.56..8.08 rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=7930)
                    Index Cond: (pmid = c1.pmid_citing)
Planning Time: 3.030 ms
Execution Time: 149941.944 ms

We see quite slow Parallel Seq Scan on citations c1 here, because of the fields order in citations table.
See indexes by command line:

SELECT
     tablename,
     indexname,
     indexdef
 FROM
     pg_indexes
 WHERE
     schemaname = 'public'
 ORDER BY
     tablename,
     indexname;

The index is the following:

citations            | citations_pmid_citing_pmid_cited_unique     | CREATE UNIQUE INDEX citations_pmid_citing_pmid_cited_unique ON public.citations USING btree (pmid_citing, pmid_cited)

Swapping field order in index results in huge performance boost in case of small number of papers.

After adding index with the command:

CREATE UNIQUE INDEX citations_pmid_cited_citing_unique ON public.citations USING btree (pmid_cited, pmid_citing);

Wee get the following results:

Limit  (cost=122146.00..272311.38 rows=3450 width=16) (actual time=410.287..504.142 rows=7930 loops=1)
  ->  Nested Loop  (cost=122146.00..272311.38 rows=3450 width=16) (actual time=410.286..502.944 rows=7930 loops=1)
        ->  Hash Join  (cost=122145.43..244419.09 rows=3450 width=16) (actual time=410.196..432.040 rows=7930 loops=1)
              Hash Cond: (c1.pmid_citing = c2.pmid_citing)
              Join Filter: (c1.pmid_cited < c2.pmid_cited)
              Rows Removed by Join Filter: 20756
              ->  Nested Loop  (cost=0.57..121663.29 rows=38526 width=8) (actual time=6.531..20.647 rows=12826 loops=1)
                    ->  Values Scan on "*VALUES*"  (cost=0.00..4.06 rows=325 width=4) (actual time=0.002..0.218 rows=325 loops=1)
                    ->  Index Only Scan using citations_pmid_cited_citing_unique on citations c1  (cost=0.57..373.15 rows=119 width=8) (actual time=0.025..0.056 rows=39 loops=325)
                          Index Cond: (pmid_cited = "*VALUES*".column1)
                          Heap Fetches: 12826
              ->  Hash  (cost=121663.29..121663.29 rows=38526 width=8) (actual time=402.974..402.975 rows=12826 loops=1)
                    Buckets: 65536  Batches: 1  Memory Usage: 1014kB
                    ->  Nested Loop  (cost=0.57..121663.29 rows=38526 width=8) (actual time=0.046..396.899 rows=12826 loops=1)
                          ->  Values Scan on "*VALUES*_1"  (cost=0.00..4.06 rows=325 width=4) (actual time=0.005..0.492 rows=325 loops=1)
                          ->  Index Only Scan using citations_pmid_cited_citing_unique on citations c2  (cost=0.57..373.15 rows=119 width=8) (actual time=0.776..1.208 rows=39 loops=325)
                                Index Cond: (pmid_cited = "*VALUES*_1".column1)
                                Heap Fetches: 12826
        ->  Index Scan using publications_pkey on publications p  (cost=0.56..8.08 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=7930)
              Index Cond: (pmid = c1.pmid_citing)
Planning Time: 318.455 ms
Execution Time: 526.311 ms

Corresponding changes during DB creation is required.

Error during processing

22:51:09.043 [main] PubmedXMLParser INFO  /var/folders/td/g2ws4hwj5tj48_j_tsfz8_tc0000gp/T/tmp6040019257560436124.tmp/pubmed19n0506.xml: Parsing...
22:51:14.054 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:14.526 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:14.792 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:15.166 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:15.424 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:15.814 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:16.202 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:16.599 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:16.965 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:17.327 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:17.645 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:18.020 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:18.440 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:18.855 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:19.259 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:19.681 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:20.033 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:20.436 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:20.760 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:21.109 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:21.447 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:21.715 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:21.925 [main] PostgresqlDatabaseHandler INFO  Storing 1000 articles...
22:51:21.927 [main] PubmedCrawler INFO  Deleting directory: /var/folders/td/g2ws4hwj5tj48_j_tsfz8_tc0000gp/T/tmp6040019257560436124.tmp
22:51:21.953 [main] PubmedCrawler INFO  Writing stats to /Users/oleg/.pubtrends/stats.tsv
Exception in thread "main" java.lang.IllegalStateException: Value 'Final report of the amended safety assessment of Glyceryl Laurate, Glyceryl Laurate SE, Glyceryl Laurate/$leate, Glyceryl Adipate, Glyceryl Alginate, Glyceryl Arachidate, Glyceryl Arachidonate, Glyceryl Behenate, Glyceryl Caprate, Glyceryl Caprylate, Glyceryl Caprylate/Caprate,
Glyceryl Citrate/Lactate/Linoleate/Oleate, Glyceryl Cocoate, Glyceryl Collagenate, Glyceryl Erucate, Glyceryl Hydrogenated Rosinate, Glyceryl Hydrogenated Soyate, Glyceryl $ydroxystearate, Glyceryl Isopalmitate, Glyceryl Isostearate, Glyceryl Isostearate/Myristate, Glyceryl Isostearates, Glyceryl Lanolate, Glyceryl Linoleate, Glyceryl Linolena$e, Glyceryl Montanate, Glyceryl Myristate, Glyceryl Isotridecanoate/Stearate/Adipate, Glyceryl Oleate SE, Glyceryl Oleate/Elaidate, Glyceryl Palmitate, Glyceryl Palmitate/S$earate, Glyceryl Palmitoleate, Glyceryl Pentadecanoate, Glyceryl Polyacrylate, Glyceryl Rosinate, Glyceryl Sesquioleate, Glyceryl/Sorbitol Oleate/Hydroxystearate, Glyceryl $tearate/Acetate, Glyceryl Stearate/Maleate, Glyceryl Tallowate, Glyceryl Thiopropionate, and Glyceryl Undecylenate.' can't be stored to database column because exceeds leng$h org.jetbrains.bio.pubtrends.crawler.Publications.title.columnType.colLength
        at org.jetbrains.exposed.sql.statements.UpdateBuilder.set(UpdateBuilder.kt:24)
        at org.jetbrains.exposed.sql.statements.BatchInsertStatement.set(BatchInsertStatement.kt:28)
        at org.jetbrains.bio.pubtrends.crawler.PostgresqlDatabaseHandler$store$1$1.invoke(DatabaseHandler.kt:59)
        at org.jetbrains.bio.pubtrends.crawler.PostgresqlDatabaseHandler$store$1$1.invoke(DatabaseHandler.kt:11)
        at org.jetbrains.bio.pubtrends.crawler.DatabaseHandlerKt.batchInsertOnDuplicateKeyUpdate(DatabaseHandler.kt:117)
        at org.jetbrains.bio.pubtrends.crawler.PostgresqlDatabaseHandler$store$1.invoke(DatabaseHandler.kt:55)
        at org.jetbrains.bio.pubtrends.crawler.PostgresqlDatabaseHandler$store$1.invoke(DatabaseHandler.kt:11)
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction(ThreadLocalTransactionManager.kt:103)
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:74)
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:57)
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction$default(ThreadLocalTransactionManager.kt:57)
        at org.jetbrains.bio.pubtrends.crawler.PostgresqlDatabaseHandler.store(DatabaseHandler.kt:52)
        at org.jetbrains.bio.pubtrends.crawler.PubmedXMLHandler.endElement(PubmedXMLHandler.kt:152)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
        at com.sun.org.apache.xerces.internal.impl.dtd.XMLNSDTDValidator.endNamespaceScope(XMLNSDTDValidator.java:266)
        at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.handleEndElement(XMLDTDValidator.java:2005)
        at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endElement(XMLDTDValidator.java:879)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1782)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2967)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:842)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
        at org.jetbrains.bio.pubtrends.crawler.PubmedXMLParser.parse(PubmedXMLParser.kt:33)
        at org.jetbrains.bio.pubtrends.crawler.PubmedCrawler.downloadFiles(PubmedCrawler.kt:130)
        at org.jetbrains.bio.pubtrends.crawler.PubmedCrawler.update(PubmedCrawler.kt:67)
        at org.jetbrains.bio.pubtrends.MainKt.main(Main.kt:96)

Error during processing pubmed19n0973.xml.gz

To reproduce:
Process all the files up to 0972, and edit config.properties file and launch command line:
./gradlew clean crawler:shadowJar && java -jar crawler/build/libs/crawler-dev.jar

19:04:51.864 [main] INFO  Last downloaded file: pubmed19n0972.xml.gz
19:04:51.865 [main] INFO  Created temporary directory: /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp1843002966264895509.tmp
19:04:56.614 [main] INFO  Found 6 new file(s)
19:04:56.614 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp1843002966264895509.tmp/pubmed19n0973.xml.gz: Downloading...
19:05:04.082 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp1843002966264895509.tmp/pubmed19n0973.xml.gz: Unpacking...
19:05:07.037 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp1843002966264895509.tmp/pubmed19n0973.xml: Parsing...
19:05:29.347 [main] INFO  Articles: 30000, keywords: 51392, citations: 355188
19:05:29.347 [main] INFO  /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp1843002966264895509.tmp/pubmed19n0973.xml: Storing...
19:05:37.716 [main] INFO  Deleting directory: /var/folders/yx/rkbldym139jdbtx4dsr_wb0c0000gp/T/tmp1843002966264895509.tmp
RETURNING * was aborted: ERROR: duplicate key value violates unique constraint "publications_pkey"
  Detail: Key (pmid)=(1766) already exists.  Call getNextException to see other errors in the batch.
	at org.jetbrains.exposed.sql.statements.Statement.executeIn$exposed(Statement.kt:61)
	at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:128)
	at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:122)
	at org.jetbrains.exposed.sql.statements.Statement.execute(Statement.kt:29)
	at org.jetbrains.exposed.sql.QueriesKt.batchInsert(Queries.kt:90)
	at org.jetbrains.exposed.sql.QueriesKt.batchInsert$default(Queries.kt:60)
	at org.jetbrains.bio.pubtrends.crawler.DatabaseHandler$store$1.invoke(DatabaseHandler.kt:43)
	at org.jetbrains.bio.pubtrends.crawler.DatabaseHandler$store$1.invoke(DatabaseHandler.kt:9)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction(ThreadLocalTransactionManager.kt:103)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:74)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:57)
	at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction$default(ThreadLocalTransactionManager.kt:57)
	at org.jetbrains.bio.pubtrends.crawler.DatabaseHandler.store(DatabaseHandler.kt:40)
	at org.jetbrains.bio.pubtrends.crawler.PubmedCrawler.downloadFiles(PubmedCrawler.kt:113)
	at org.jetbrains.bio.pubtrends.crawler.PubmedCrawler.update(PubmedCrawler.kt:54)
	at org.jetbrains.bio.pubtrends.MainKt.main(Main.kt:7)
RETURNING * was aborted: ERROR: duplicate key value violates unique constraint "publications_pkey"
  Detail: Key (pmid)=(1766) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:148)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2179)
	at org.postgresql.core.v3.QueryExecutorImpl.flushIfDeadlockRisk(QueryExecutorImpl.java:1297)
	at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1322)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:465)
	at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:835)
	at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1556)
	at org.jetbrains.exposed.sql.statements.InsertStatement.execInsertFunction(InsertStatement.kt:86)
	at org.jetbrains.exposed.sql.statements.InsertStatement.executeInternal(InsertStatement.kt:95)
	at org.jetbrains.exposed.sql.statements.InsertStatement.executeInternal(InsertStatement.kt:12)
	at org.jetbrains.exposed.sql.statements.Statement.executeIn$exposed(Statement.kt:59)
	... 15 more
Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "publications_pkey"
  Detail: Key (pmid)=(1766) already exists.
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2178)
	... 24 more
oleg-laptop:pubtrends oleg$

Filter out top cited papers min(1000, 80%)

For some topics, i.e. human aging, there are tremendous number of papers. We would like to omit low cited ones to make all the processing fast and interpretable visually.

Pivotal points autodetect

Pivotal points are the following according to the definition:
Screenshot 2019-05-03 at 20 29 40

Having co-citations graph we can slice it by year or by 5year chunks and try to detect merging event of any clusters based on louvain community clustering.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.