It would be great if Percolator PSM-level tab-delimited output files could be supporte

Human. It's from here: <a href="http://www.ncbi.nlm.nih.gov/pubmed?t

Support for Percolator output files,about smith-chem-wisc/flashlfq

Comments (77)

wsnoble commented on June 30, 2024 1

Yes, that's right.

"The computed mass of the spectrum precursor at the given charge state. This is equal to the precursor m/z minus the mass of a proton (1.00727646677 Da), all multiplied by the charge."

"The mass of the peptide sequence, computed as the sum of the amino acid masses plus the mass of water (18.010564684 Da or 18.0153 Da, depending on whether we are using monoisotopic or average mass)."

http://crux.ms/file-formats/txt-format.html

from flashlfq.

wsnoble commented on June 30, 2024 1

Human. It's from here:

http://www.ncbi.nlm.nih.gov/pubmed?term=26951766

ftp://massive.ucsd.edu/MSV000079437/raw/

from flashlfq.

wsnoble commented on June 30, 2024 1

I am dependent on our sysadmins to install the latest released version, so I'll wait till this gets reviewed.

from flashlfq.

trishorts commented on June 30, 2024

Thanks for the request Bill. Students are evaporating inverse to the proximity of the holiday. But we'll engage on this and see how we can accommodate you.

from flashlfq.

wsnoble commented on June 30, 2024

Ah, I thought "Scan" and "Retention Time" were two different column IDs, but it turns out it's a single column called "Scan Retention Time." Hence, my suggestion above does not make sense. Is there any way you could have an option to provide scan numbers instead of RTs?

from flashlfq.

trishorts commented on June 30, 2024

Sorry for the slow response. Just back from holiday.

I'd love to say that we'd accommodate you but I don't see an easy way to do it. All of the peak finding, etc. is based on time and time windows. With scan number, there is no straightforward way to convert. The time between MS1 scans is variable based on the number of MSn scans between, and that changes throughout the run. So, there is not a good way to look w/in a scan range. And the time for MS1s and MSns also varies, so no simply calculation can be performed. We'd need a time for each kind of scan. The easiest fix for us would be for you to add an additional column of retention time in minutes that percolator otherwise ignores. I'm happy to entertain other options if you can suggest some.

from flashlfq.

wsnoble commented on June 30, 2024

I think my suggestion perhaps was not clear. I totally agree that you need to get the RT associated with each scan. My suggestion was to enable a mode in which the input file specifies scan numbers and then the RT is read from the given mzML file.

For example, if you are using pyteomics to parse an mzml file, you can get the RT for each scan as follows:

from pyteomics.mzml import MzML

def get_rts(mzml_filename):
    "Extract retention times from an mzml file."

    rts = {} # Key = scan number, value = retention time
    mzml = MzML(mzml_filename)
    for scan in mzml:
        rts[scan["index"]] = scan["scanList"]["scan"][0]["scan start time"]
    sys.stderr.write(f"Read {len(rts)} RTs from {mzml_filename}.\n")

    return(rts)

Requiring Percolator to report the RT is problematic because Percolator does not typically have access to the original mzML file. So we would have to require that every search engine report RT in its output file format, and that Percolator parse and report these RTs in its output. It seems much easier, since FlashLFQ receives the mzML file as input, for it to parse the RTs directly out of the given mzML files. Does this make sense?

from flashlfq.

trishorts commented on June 30, 2024

Yes. I will make an attempt.
What are your plans for including filenames with file path in the percolator output? currently, there is only an integer. would you require users to make the substitution?

from flashlfq.

wsnoble commented on June 30, 2024

I am hoping we can add this functionality to Percolator directly. I'm waiting to hear from Lukas about this:

percolator/percolator#322

from flashlfq.

trishorts commented on June 30, 2024

I saw that issue cuz I am following the percolator issues. Glad you raised it. I'm already working on the coding updates to flashlfq that will allow retention time recovery from scan number.

can you confirm that "spectrum neutral mass" is the experimental neutral mass
and "peptide mass" is the theoretical neutral mass?

I think I have that right but two minutes on google didn't eliminate my doubts.

from flashlfq.

trishorts commented on June 30, 2024

can you provide me with one raw/mzml file and its companion output so that I can test my code? I have one remaining concern about whats in the sequence column. we need to be able to parse that and compute peptide mass. The trick will be how ptms are annotated.

from flashlfq.

wsnoble commented on June 30, 2024

Here you go! Lemme know if you need more info.
percolator.target.psms.txt

(The raw file is too big to be attached -- will try to get it to you some other way.)

from flashlfq.

trishorts commented on June 30, 2024

thanks. i can make a box folder or you can. please invite [email protected]

from flashlfq.

wsnoble commented on June 30, 2024

Here is a link: https://drive.google.com/file/d/1OTLCgkqcsA2ij8H8edY08-DZG1KPOwxb/view?usp=sharing

from flashlfq.

trishorts commented on June 30, 2024

ptms annotated by mass shift sub-optimal.......[15.99] [79.97] we'll see what i can do

from flashlfq.

trishorts commented on June 30, 2024

species?

from flashlfq.

wsnoble commented on June 30, 2024

By "suboptimal" do you mean that it should be four digits of precision?

from flashlfq.

trishorts commented on June 30, 2024

no. like a name of a specific mod where we could have a specific chemical formula.

the two listed I can easily guess. but many mods from search engines are much more obscure and everyone likes to do things a little differently. The "field" lacks a convention, so it can be hard on downstream processing.

In MetaMorpheus, we specify the mod by full name and residue after the amino acid involved and we have a library that is used to look up the masses. not important for flashlfq, but we also include diagnostic ions and so forth in the library that are used for annotation. And, we have access to all the typical mod databases (unimod, uniprot,...).

Point is that with a nominal mass it is impossible to produce a chemical formula and without a chemical formula you can really make a great isotope envelope to match to the various MS1 and MS2 spectra. Sorta left w/ an averagine. I gotta see how I can deal w/ this.

from flashlfq.

wsnoble commented on June 30, 2024

OK, I finally got around to trying this out. I hacked a Percolator output file to swap out filenames for file indices based on the log file. But I'm having trouble getting it to work, presumably because of some issue around absolute versus relative pathnames. Also, I'm getting an error message saying that it can't read the first line, but I don't know what's wrong with it. Can you help figure this out?

Here is the command and output:

+ flashlfq --idt percolator.target.psms.fixed.txt --rep /net/noble/vol2/user/kiahales/projects/plasmo-rdeep/data/ms2
Opening PSM file percolator.target.psms.fixed.txt
Problem reading line 1 of the identification file; Could not interpret PSM header labels from file: percolator.target.psms.fixed.txt
No peptide IDs for the specified spectra files were found! Check to make sure the spectra file names match between the ID file and the spectra files

percolator.target.psms.fixed.txt.gz

from flashlfq.

trishorts commented on June 30, 2024

i'll look at it.

from flashlfq.

trishorts commented on June 30, 2024

can you get me one or two of the mzmls that i can use for trouble shooting?

from flashlfq.

wsnoble commented on June 30, 2024

It looks like the mzMLs were actually gzipped. I thought that might be the source of the problem, so I unzipped them but alas, still no luck (i.e., same error messages).

Here is a sample mzML:

https://drive.google.com/file/d/1hcndRYhhcWXctnjxVV4_tskatrYUicLz/view?usp=sharing

from flashlfq.

trishorts commented on June 30, 2024

Thanks

from flashlfq.

trishorts commented on June 30, 2024

not sure if this is the big problem but the sample file you gave me has an empty scan that is throwing an error:

it is scan 7192

from flashlfq.

trishorts commented on June 30, 2024

The second problem I see is that one column header is fileidx instead of file_idx. Maybe the the column header changed in the output? I can make a pull request to take either if that is something that needs to be done. Please let me know on that.

from flashlfq.

trishorts commented on June 30, 2024

The third problem i see is in the protein id column. We need to be able to assign a protein to each psm. The conventional format that you showed me previously is for example "sp|P23588|IF4B_HUMAN" where "|" is the separator and there is an accession followed by a gene name. We need both pieces of info. In you example, the column has some other value that is not parsable. Your output is pasted below. Could probably deal w/o the gene name but we can't go back and forth between formats. The protein is necessary for grouping the intensities. So that has to be clean.

from flashlfq.

trishorts commented on June 30, 2024

The one open question I have is still if we can read the paths that you have. I generally don't put the full path in the file_idx column. But then again I don't run on the commandline. i'll see if I can look into this.

from flashlfq.

trishorts commented on June 30, 2024

Manually deleting empty scans is problematic b/c of missing scan numbers. We do see dead scans on fast instruments. We like to throw errors where there is a problem w/ the input so that users know there is a problem w/ the input. But maybe it makes sense to just report the dropped scans and try to ignore them. I'll talk to the other devs about this. Tricky problme

from flashlfq.

wsnoble commented on June 30, 2024

Regarding the missing scan problem, I agree that it would be nice if the software could issue a warning in that case. If you don't like that, an alternative would be to introduce a command line option that controls this behavior (e.g., --missing-scans warn|fail). The default could be "fail," but the error message could advise the end user to switch to "--missing-scans warn" if they want to.

Regarding the column name, that's my bad. I inadvertently renamed the column when I was swapping in the filenames for the indices.

Regarding the protein IDs, unfortunately we have no control over what naming scheme our users employ. All we do is take the IDs from the FASTA file. If there are multiple protein IDs associated with a single peptide, these are separated with commas. I am not sure why you need both the accession and the gene name. Are you assuming that the proteins the user is interested will always be in a public database somewhere? I don't think we can assume that in general (e.g., a database generated from an RNA-seq sample).

from flashlfq.

trishorts commented on June 30, 2024

Regarding the protein IDs, unfortunately we have no control over what naming scheme our users employ. All we do is take the IDs from the FASTA file. If there are multiple protein IDs associated with a single peptide, these are separated with commas. I am not sure why you need both the accession and the gene name. Are you assuming that the proteins the user is interested will always be in a public database somewhere? I don't think we can assume that in general (e.g., a database generated from an RNA-seq sample).

We can take every unique string as "protein" name. That will mean that you would get proteins name "sp|P23588|IF4B_Human". I had made a parser based on your previous example data where accessions were separated by genes using the vertical bar.

If they don't mind post-processing everything, I guess it doesn't matter on our end. The gene, as you say, is unnecessary i believe. Can I assume all those PF3D7s should be treated is separate proteins?

from flashlfq.

trishorts commented on June 30, 2024

Regarding the missing scan problem, I agree that it would be nice if the software could issue a warning in that case. If you don't like that, an alternative would be to introduce a command line option that controls this behavior (e.g., --missing-scans warn|fail). The default could be "fail," but the error message could advise the end user to switch to "--missing-scans warn" if they want to.

I am updating our mzML reader to ignore empty scans and provide their scan numbers. see smith-chem-wisc/mzLib#628

from flashlfq.

wsnoble commented on June 30, 2024

Can I assume all those PF3D7s should be treated is separate proteins? Yes, any unique string delimited by commas is supposed to be a unique protein ID. I think those PF3D7s are different isoforms of Plasmodium proteins.

from flashlfq.

wsnoble commented on June 30, 2024

FWIW, when I update my header line to use file_idx instead of fileidx, the first error message goes away, but I get a new one:

$ flashlfq --idt percolator.target.psms.fixed.txt --rep /net/noble/vol2/user/kiahales/projects/plasmo-rdeep/data/ms2
Opening PSM file percolator.target.psms.fixed.txt
Problem reading line 1 of the identification file; One or more errors occurred. (!msOrder.HasValue || !isCentroid.HasValue) (!msOrder.HasValue || !isCentroid.HasValue) (!msOrder.HasValue || !isCentroid.HasValue)
No peptide IDs for the specified spectra files were found! Check to make sure the spectra file names match between the ID file and the spectra files

from flashlfq.

trishorts commented on June 30, 2024

Regarding the protein IDs, unfortunately we have no control over what naming scheme our users employ. All we do is take the IDs from the FASTA file. If there are multiple protein IDs associated with a single peptide, these are separated with commas. I am not sure why you need both the accession and the gene name. Are you assuming that the proteins the user is interested will always be in a public database somewhere? I don't think we can assume that in general (e.g., a database generated from an RNA-seq sample).

We can take every unique string as "protein" name. That will mean that you would get proteins name "sp|P23588|IF4B_Human". I had made a parser based on your previous example data where accessions were separated by genes using the vertical bar.

If they don't mind post-processing everything, I guess it doesn't matter on our end. The gene, as you say, is unnecessary i believe. Can I assume all those PF3D7s should be treated is separate proteins?

I just made a pull request to allow any string to be used as a protein group name. see #111

from flashlfq.

trishorts commented on June 30, 2024

FWIW, when I update my header line to use file_idx instead of fileidx, the first error message goes away, but I get a new one:

$ flashlfq --idt percolator.target.psms.fixed.txt --rep /net/noble/vol2/user/kiahales/projects/plasmo-rdeep/data/ms2
Opening PSM file percolator.target.psms.fixed.txt
Problem reading line 1 of the identification file; One or more errors occurred. (!msOrder.HasValue || !isCentroid.HasValue) (!msOrder.HasValue || !isCentroid.HasValue) (!msOrder.HasValue || !isCentroid.HasValue)
No peptide IDs for the specified spectra files were found! Check to make sure the spectra file names match between the ID file and the spectra files

FWIW, when I update my header line to use file_idx instead of fileidx, the first error message goes away, but I get a new one:

$ flashlfq --idt percolator.target.psms.fixed.txt --rep /net/noble/vol2/user/kiahales/projects/plasmo-rdeep/data/ms2
Opening PSM file percolator.target.psms.fixed.txt
Problem reading line 1 of the identification file; One or more errors occurred. (!msOrder.HasValue || !isCentroid.HasValue) (!msOrder.HasValue || !isCentroid.HasValue) (!msOrder.HasValue || !isCentroid.HasValue)
No peptide IDs for the specified spectra files were found! Check to make sure the spectra file names match between the ID file and the spectra files

if you look above, I mentioned that i see this error when the file has an empty scan. I just made a pull request that will skip those scans during file reading. you can look at it here: #112
one these pull requests are merged I'd like you to try it again. I will try to post here when that has happened

from flashlfq.

trishorts commented on June 30, 2024

@wsnoble your latest issues have all been dealt w/ but we are waiting on a new release.

This percolator PR to add tab as protein separator came up on my radar. should i be concerned that your output will change again? https://github.com/percolator/percolator/pull/327/files

from flashlfq.

wsnoble commented on June 30, 2024

No, I don't think so. I think this is just a new option in the stand-alone version of Percolator. The Crux version already uses commas as the delimiter between protein IDs.

from flashlfq.

trishorts commented on June 30, 2024

there has been a new release of flashlfq that should address your latest concerns, https://github.com/smith-chem-wisc/FlashLFQ/releases/tag/1.2.2

from flashlfq.

wsnoble commented on June 30, 2024

Thanks, I tried this out, but it is still giving an error about not finding the mzML file. Here is a rundown:

bash-4.2$ flashlfq --idt percolator.target.psms.fixed.txt --rep /net/noble/vol2/user/kiahales/projects/plasmo-rdeep/data/ms2
Opening PSM file percolator.target.psms.fixed.txt
Problem reading line 1 of the identification file; There is an error in XML document (16744, 18).
No peptide IDs for the specified spectra files were found! Check to make sure the spectra file names match between the ID file and the spectra files
bash-4.2$ flashlfq --version | head -2
FlashLFQ 1.2.2.289
Copyright ©  2017
bash-4.2$ awk 'NR < 3 {print $1}' percolator.target.psms.fixed.txt
file_idx
mzml/Pf_C1-593_MixES-Sol_SG-1_rKCTi_VO2_108.mzML
bash-4.2$ ls mzml/Pf_C1-593_MixES-Sol_SG-1_rKCTi_VO2_108.mzML
mzml/Pf_C1-593_MixES-Sol_SG-1_rKCTi_VO2_108.mzML
bash-4.2$

I am not sure how to interpret this message: "There is an error in XML document (16744, 18)." Does this mean that flashlfq did find an mzML file and failed to parse it? How do I interpret (16744, 18)? Is there any way to find out which file it's having trouble with?

from flashlfq.

trishorts commented on June 30, 2024

i wonder if it's simply a path problem. i don't ever work in linux system and i'm not sure how this works. maybe rob can comment? and you paste the first data line here?

from flashlfq.

wsnoble commented on June 30, 2024

Sorry, it was user error on my part. On the command line I used the absolute path but in the file I used a relative pathname. When I switched both to relative pathnames, it worked. At least, it's running now. It's been going about 10 minutes and using 3.2G of memory. Is there any option for it to report status updates while it's running (like % complete)?

from flashlfq.

trishorts commented on June 30, 2024

No there is not. Could try with a single file to test speed

from flashlfq.

trishorts commented on June 30, 2024

let us know if your run finished up.

from flashlfq.

wsnoble commented on June 30, 2024

No, it ran for about 8 hours and then dumped core. It looks like there is indeed a problem with how paths are handled on linux.

+ flashlfq --idt percolator.target.psms.fixed.txt --rep mzml
Opening PSM file percolator.target.psms.fixed.txt
Done reading PSMs; found 184806
Setup is OK; read in 184806 identifications; starting FlashLFQ engine
Unhandled exception. System.ArgumentException: Path cannot be the empty string or all whitespace. (Parameter 'path')
   at System.IO.Directory.CreateDirectory(String path)
   at CMD.FlashLfqExecutable.Run(FlashLfqSettings settings) in C:\projects\flashlfq\CMD\FlashLFQExecutable.cs:line 202
   at CMD.FlashLfqExecutable.<>c.<Main>b__1_1(FlashLfqSettings options) in C:\projects\flashlfq\CMD\FlashLFQExecutable.cs:line 23
   at CommandLine.ParserResultExtensions.WithParsed[T](ParserResult`1 result, Action`1 action)
   at CMD.FlashLfqExecutable.Main(String[] args) in C:\projects\flashlfq\CMD\FlashLFQExecutable.cs:line 22
/net/gs/vol3/software/modules-sw/flashlfq/1.2.2/Linux/CentOS7/x86_64/flashlfq: line 4: 21766 Aborted                 (core dumped) ${INSTALL}/dotnet ${INSTALL}/CMD.dll "$@"

from flashlfq.

trishorts commented on June 30, 2024

maybe you could try your run (or a subset) on a windows machine to confirm that its linux specific. in the mean time I can look into the problem.

from flashlfq.

wsnoble commented on June 30, 2024

I wasn't able to try it on Windows, but I did try moving the Percolator output file into the same directory with the mzMLs and then editing the file itself and the command line to get rid of the directory name. That didn't help though: I got the same error as above.

from flashlfq.

wsnoble commented on June 30, 2024

I made a tiny (six-row) example file that does not have any directories in the mzml filename field (since I thought the slash versus backslash directory separator might be problematic). This relies on a single mzml file, which I will be happy to share (it's too big to attach here). This small example gives the same error:

flashlfq --idt percolator.target.psms.small.txt --rep .
Opening PSM file percolator.target.psms.small.txt
Done reading PSMs; found 5
Setup is OK; read in 5 identifications; starting FlashLFQ engine
Unhandled exception. System.ArgumentException: Path cannot be the empty string or all whitespace. (Parameter 'path')
   at System.IO.Directory.CreateDirectory(String path)
   at CMD.FlashLfqExecutable.Run(FlashLfqSettings settings) in C:\projects\flashlfq\CMD\FlashLFQExecutable.cs:line 202
   at CMD.FlashLfqExecutable.<>c.<Main>b__1_1(FlashLfqSettings options) in C:\projects\flashlfq\CMD\FlashLFQExecutable.cs:line 23
   at CommandLine.ParserResultExtensions.WithParsed[T](ParserResult`1 result, Action`1 action)
   at CMD.FlashLfqExecutable.Main(String[] args) in C:\projects\flashlfq\CMD\FlashLFQExecutable.cs:line 22

percolator.target.psms.small.txt

from flashlfq.

trishorts commented on June 30, 2024

Thanks Prof. Noble. That will help, I'm finding a way to debug in WSL that might be helpful.

from flashlfq.

trishorts commented on June 30, 2024

please share the mzml. i worked out how to test this.

from flashlfq.

wsnoble commented on June 30, 2024

https://drive.google.com/file/d/14YoGqBFs-bfXEtF6Ym18KqWYEnIgNkpK/view?usp=drive_web

from flashlfq.

wsnoble commented on June 30, 2024

Newsflash! Using full pathnames seems to fix the problem, at least partially. :)

$ flashlfq --idt /net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/results/wnoble/2022-06-02flashlfq-small/percolator.target.psms.small.txt --rep /net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/results/wnoble/2022-06-02flashlfq-small/
Opening PSM file /net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/results/wnoble/2022-06-02flashlfq-small/percolator.target.psms.small.txt
Done reading PSMs; found 5
Setup is OK; read in 5 identifications; starting FlashLFQ engine
Reading spectra file
Indexing MS1 peaks
Quantifying peptides for Pf_C1-593_MixES-Sol_SG-1_rKCTi_VO2_109
Checking errors
Finished Pf_C1-593_MixES-Sol_SG-1_rKCTi_VO2_109
Done quantifying
Analysis time: 0h 0m 16s
Writing output...
Finished writing output

The output files are attached. The NaNs in the protein file are somewhat concerning, but I'm guessing that is due to the tiny file I provided.
QuantifiedPeaks.txt
QuantifiedPeptides.txt
QuantifiedProteins.txt

from flashlfq.

wsnoble commented on June 30, 2024

Update on this: I am trying to run FlashLFQ on a real example. This is a Percolator file with 184k lines, drawn from a set of 11 mzml files. One thing that I noticed is that on linux the --thr option has no effect: the process runs on a single thread. I get the following output:

$ flashlfq --idt /net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/results/wnoble/2022-06-21flashlfq/rep1/C1/percolator.target.psms.txt --rep /net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/data/rep1 --out /net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/results/wnoble/2022-06-21flashlfq/rep1/C1 --chg
Opening PSM file /net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/results/wnoble/2022-06-21flashlfq/rep1/C1/percolator.target.psms.txt

But then it just hangs seemingly forever. Would one expect this analysis to take multiple days to complete? Is there any way to get it to give an indication of progress or to make the program run faster?

from flashlfq.

trishorts commented on June 30, 2024

rob thinks the problem may be in the first step. if this percolator file has no retention time output, then we have to look up the time for each scan. that step was not done in parallel. so maybe that takes a long time? can you confirm? maybe I can split the lookup by file. i'm pretty booked for the next ten days. i'll see what I can do,

from flashlfq.

wsnoble commented on June 30, 2024

I made a version on the Percolator file that contains only the header plus 9 PSMs drawn from the same mzML file. That is running now -- it's been going about 2 hours so far.

percolator.target.psms.txt

from flashlfq.

trishorts commented on June 30, 2024

can you tell me about these dots in the path? not sure how that works.

from flashlfq.

wsnoble commented on June 30, 2024

Those are relative pathnames. That works the same in DOS/Windows and Unix. The resolved pathname is /net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/data/rep1/Pf_... Bill

…

On Thu, Jun 30, 2022 at 7:41 AM trishorts ***@***.***> wrote: can you tell me about these dots in the path? not sure how that works. [image: image] <https://user-images.githubusercontent.com/16841846/176705829-c5a98880-dad7-43ba-90eb-bd4febdb6349.png> — Reply to this email directly, view it on GitHub <#103 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFL4KJS3EOS2GG5WOBGSRC3VRWWXVANCNFSM5KTSFUPA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

from flashlfq.

wsnoble commented on June 30, 2024

What is the status of this issue? I have not been able to get it to work on my end.

from flashlfq.

trishorts commented on June 30, 2024

i worked on an update that would allow no extensions, extensions, full windows file paths and full linux filepaths in the identification file. all seem to work in my tests. I also made input file processing parallel. maybe that will speed things up. I will have to wait for rob to review, suggest edits, and merge. But, you can test the temporary version by downloading the appropriate file here: https://ci.appveyor.com/project/smith-chem-wisc/flashlfq/builds/44274888/artifacts

from flashlfq.

wsnoble commented on June 30, 2024

Any progress on releasing this version?

from flashlfq.

trishorts commented on June 30, 2024

my pr has been reviewed and merged. I added flexibility for paths in the identification file.

First, can you tell me how you access flashlfq. Do you use the standard commandline, the docker or bioconda? Once I know that, I know what to focus on for getting a release. bioconda takes me the longest b/c it confuses me. the others are quite fast.

Second, do you think it would be worthwhile for me to do more testing on your data to make sure that my fixes help you? I have time. Just need links and so forth.

from flashlfq.

wsnoble commented on June 30, 2024

Thanks. Let me try this new version, and if it doesn't work I will send more sample files. I believe our sysadmins are installing via the standard command line method.

from flashlfq.

trishorts commented on June 30, 2024

rob will make the release today. i will notify you when that happens

from flashlfq.

trishorts commented on June 30, 2024

new release available here: https://github.com/smith-chem-wisc/FlashLFQ/releases
it will be a little until i can get the bioconda thing to work

from flashlfq.

trishorts commented on June 30, 2024

bioconda/bioconda-recipes#36488

from flashlfq.

trishorts commented on June 30, 2024

bioconda/anaconda build is out https://anaconda.org/bioconda/flashlfq

from flashlfq.

wsnoble commented on June 30, 2024

FYI, I got this error message again:

Unhandled exception. System.UnauthorizedAccessException: Access to the path '/net/gs/vol3/software/modules-sw/flashlfq/1.2.3/Linux/CentOS7/x86_64/LicenceAgreements.toml' is denied.

I think we know how to fix this, since you previously sent the toml file that we need to put next to the executable. But you might want to make a note about this in your installation guide for people trying to do a linux installation. We're installing from the command line.

from flashlfq.

wsnoble commented on June 30, 2024

I have a question for you regarding file formats. Currently, the version of Percolator inside Crux provides a different (more extensive) set of tab-delimited output columns than the standalone version of Percolator. We are moving away from this setup, since it's hard to maintain. The result will be that the version of Percolator in the new version of Crux will report fewer columns and will use different column names than in the old version of Crux. Presumably, it's easy to change the column names when FlashLFQ tries to parse such a file. But I wonder whether there is critical information that you need that is missing from the Percolator output. Here are the columns that Percolator produces:

PSMId	score	q-value	posterior_error_prob	peptide	proteinIds

from flashlfq.

wsnoble commented on June 30, 2024

FYI, after fixing the unhandled exception mentioned above, I can now report that I can run flashlfq successfully on the example files that you sent a while back. Yay!

Next I will try it on some of our locally generated files.

from flashlfq.

wsnoble commented on June 30, 2024

Can you help me interpret this error message?

$ flashlfq --idt /net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/results/wnoble/2022-06-29flashlfq/rep1/C1/percolator.target.psms.txt --rep /net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/data/rep1 --out /net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/results/wnoble/2022-06-29flashlfq/rep1/C1 --chg
Opening PSM file /net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/results/wnoble/2022-06-29flashlfq/rep1/C1/percolator.target.psms.txt
Done reading PSMs; found 0
No peptide IDs for the specified spectra files were found! Check to make sure the spectra file names match between the ID file and the spectra files

I am not sure what a "peptide ID" is. The input file is attached. I'm guessing I've got the column names wrong perhaps?

percolator.target.psms.txt

I did double check that the spectrum file listed in the PSM file (/net/noble/vol1/home/noble/proj/2020_kiahales_rdeep/results/wnoble/2022-06-03pipeline/../../../data/rep1/Pf_C1-593_MixES-Sol_SG-1_rKCTi_VO2_108.mzML) exists on disk.

from flashlfq.

trishorts commented on June 30, 2024

this appears to be reading fine on my end. will you send me the mzml. if you zip it, i think you can just drop it in the comment box.

from flashlfq.

trishorts commented on June 30, 2024

i tried to download it at the link you posted above but the file is in your trash:
https://drive.google.com/file/d/14YoGqBFs-bfXEtF6Ym18KqWYEnIgNkpK/view?usp=drive_web

from flashlfq.

wsnoble commented on June 30, 2024

It's too big for git (188 MB). Here is a link:

https://drive.google.com/file/d/1bezP5celq165oXY-mBivQE96LKToouNN/view?usp=sharing

from flashlfq.

trishorts commented on June 30, 2024

FlashLFQ_2022-08-23-12-25-38.zip

Ran on my machine in the GUI w/ no problem. Results attached. Will run on cmd line and report.

from flashlfq.

trishorts commented on June 30, 2024

i think cmd produced the error you see:

from flashlfq.

trishorts commented on June 30, 2024

when i run from commandline on the visual studio version I get no error.

from flashlfq.

wsnoble commented on June 30, 2024

I don't really know what "cmd" is, but I take it's something you use to try to run C# programs under Linux. Anyway, the upshot seems to be that this works under Windows but not Linux. Is that right?

from flashlfq.

trishorts commented on June 30, 2024

close. the version that I download from github doesn't work on the commandline in windows. But the version I upload to GitHub does. The catch is that I can debug the version I upload, but not the version I download. Rob is going to help on this one.

from flashlfq.

Support for Percolator output files about flashlfq HOT 77 OPEN

Comments (77)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent