ivandokov / phockup Goto Github PK
View Code? Open in Web Editor NEWMedia sorting tool to organize photos and videos from your camera in folders by year, month and day.
License: MIT License
Media sorting tool to organize photos and videos from your camera in folders by year, month and day.
License: MIT License
Add FileModifyDate as fallback field from exif data if CreatedDate is 0000... like in this example https://paste.ubuntu.com/26308843/
I experienced the following error twice while organizing a ~100 GB photo library.
Unfortunately, I deleted the images so I cannot provide you with test samples.
It looks like that creating the parsed_date
object failed for some reason.
pictures/IMG_2226.JPGTraceback (most recent call last):
File "/usr/local/bin/phockup", line 88, in
main(sys.argv[1:])
File "/usr/local/bin/phockup", line 82, in main
timestamp=timestamp
File "/usr/local/Cellar/phockup/1.5.6/src/phockup.py", line 36, in init
self.walk_directory()
File "/usr/local/Cellar/phockup/1.5.6/src/phockup.py", line 67, in walk_directory
self.process_file(file)
File "/usr/local/Cellar/phockup/1.5.6/src/phockup.py", line 144, in process_file
output, target_file_name, target_file_path = self.get_file_name_and_path(file)
File "/usr/local/Cellar/phockup/1.5.6/src/phockup.py", line 184, in get_file_name_and_path
date = Date(file).from_exif(exif_data, self.timestamp, self.date_regex)
File "/usr/local/Cellar/phockup/1.5.6/src/date.py", line 48, in from_exif
if parsed_date.get("date") is not None:
AttributeError: 'NoneType' object has no attribute 'get'
Apart from that the tool worked like a charm, so thanks for sharing!
It appears that the file path was having issues and the program was throwing errors such as file not found, etc.
I had to fall back to 1.5.7.
Creating a cross platform GUI will be the best thing that can happen for this software and I think this could be the major feature for v2 milestone.
Since I have zero experience with coding GUIs for desktop apps and especially with Pyton any help will be appreciated.
I made a research some time ago about which library to use in order to accomplish full cross platform GUI solution but haven't found any easy to use solution. Any suggestions are welcome!
Would be nice to pass an argument in the command-line allowing to choose which EXIF date field should be used as the final image date. In the example below, phockup is using the CreateDate field, but the year of this field is 2002. For this case DateTimeOriginal is the correct field.
exiftool -time:all -mimetype -j IMG_20140513_190138258.jpg
[{
"SourceFile": "IMG_20140513_190138258.jpg",
"FileModifyDate": "2014:05:13 19:01:38-03:00",
"FileAccessDate": "2018:12:21 15:02:32-02:00",
"FileInodeChangeDate": "2018:05:27 20:36:06-03:00",
"ModifyDate": "2014:05:13 19:01:38",
"DateTimeOriginal": "2014:05:13 19:01:38",
"CreateDate": "2002:12:08 12:00:00",
"MIMEType": "image/jpeg"
}]
Issue: phockup fail to process files with names like: Photo "2".jpg
Illegal characters, like double quotes, are not escaped in the exiftool
call, so files with illegal filenames fail to get exif information and go to the unknown folder, at least on Linux.
Example:
~> phockup input output
/bin/sh: 1: Syntax error: EOF in backquote substitution
input/!#$%&'"*+-.^_`|~:.jpg => output/unknown/!#$%&'"*+-.^_`|~:.jpg
Hello.
Can we have an option to choose between hardlink or symlink? Now, the -l make hardlink, which is fine, but would be better, if we can have for example -ls if we want to make symlinks.
p.s.: This is not an issue, but I could not make a new pull request :(, so that's why is this here ;)
Thanks
Currently the entire process is copying files from one location to another.
Adding a flag to move files instead of copy will be useful if you are working with big collection of files and the available space is not enough to double the files.
line 186 of src/phockup.py
contains
target_file_name = self.get_file_name(file, date).lower()
When using the -o flag to preserve original filenames, the .lower()
function should not be run on the file.
Filenames should be fully untouched when using the -o flag.
Tested and confirmed that removing the .lower()
function from this line preserves the original uppercase and lowercase formatting of the filename. However, this should probably only occur when the -o flag is passed, not in all cases as the removal of the .lower()
function would do.
Hello!
I'm having some issues using the option --regex. I'm pretty new to regex and I'm probably the problem here, but I would really appreciate some help.
The regular expression I'm using is:
"img[_-]?(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})[_-]?"
And some of the file names store in the folder are:
img-20161026-wa0011.jpg
img-20161026-wa0012.jpg
img-20161026-wa0013.jpg
img-20161101-wa0001.jpg
I also tried to use a regex tester like this one and it seems to confirm that my expression is correct.
The command used to run phockup is:
./phockup.py ~/Escritorio/notscanned/ ~/Escritorio/phockup-test/ -m -d YYYY-MM-DD -r="img[_-]?(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})[_-]?"
And with all this phockup only moves the pictures to the unknown folder:
/home/alejandro/Escritorio/notscanned/img-20161101-wa0001.jpg => /home/alejandro/Escritorio/phockup-test/unknown/img-20161101-wa0001.jpg
What am I doing wrong? Any help would be appreciated.
Thanks!
There are Python libraries that can deal with Exif data, like ExifRead.
This could be used instead of the manual invocation of exiftool and associated process handling, and would get rid of the external dependency. Win-win.
Hi Ivan,
I get some strange errors related to the input and output folders:
$ phockup . /outputdir
Input directory "." does not exist
Of course it exists, "." is always there. Perhaps the problem is that the current directory name has a space in it? Trying again with the full pathname to the current directory:
$ phockup ~/Dropbox/Camera\ Uploads /outputdir
Input directory "/home/jos/Dropbox/Camera Uploads" does not exist
Another attempt:
$ phockup . /mnt/tower/Media/Foto\'s/
Output directory "/mnt/tower/Media/Foto's/" does not exist, creating now
Traceback (most recent call last):
File "/snap/phockup/27/lib/phockup/phockup.py", line 263, in <module>
main(sys.argv[1:])
File "/snap/phockup/27/lib/phockup/phockup.py", line 28, in main
os.makedirs(outputdir)
File "/snap/phockup/27/usr/lib/python3.5/os.py", line 231, in makedirs
makedirs(head, mode, exist_ok)
File "/snap/phockup/27/usr/lib/python3.5/os.py", line 231, in makedirs
makedirs(head, mode, exist_ok)
File "/snap/phockup/27/usr/lib/python3.5/os.py", line 241, in makedirs
mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/mnt/tower'
The output directory is already there, it shouldn't have to be created. Also /mnt/tower
is not a read-only file system; it is writable, as I can confirm right after this command.
What can I do?
Jos
newbiw question ; when I run , phockup ./dropbox ./photos
all the files in dropbox are copied
for example
ls -l photos/unknown
-rw-r--r-- 1 root root 944914432 Oct 11 23:49 '2015 h2.zip'
-rw-rw-r-- 1 root root 519722441 Aug 1 17:13 'All Mail'
-rw-rw-r-- 1 root root 1859857537 Jan 14 2018 d-g.zip
-rwxr-xr-x 1 root root 112 Jan 12 2018 flat.sh
-rw-rw-r-- 1 root root 931395873 Jan 15 2018 h-i.zip
-rw-rw-r-- 1 root root 60614 Aug 1 17:10 inbox
-rw-rw-r-- 1 root root 271360 Aug 1 17:21 'Personal Folders.pst'
how do I make this program only check / search images and videos ?
Hello,
I'm unsure if this is an issue relate to your program, but when running the snap version of phockup I get this cryptic traceback:
$user@thinkpad ~ $ phockup ~/Dropbox/Kamera-Uploads/ $backupDrive/Pictures/ --date YYYY/MM_M
~/Dropbox/Kamera-Uploads/2017-06-04 11.03.04.jpgTraceback (most recent call last):
File "/snap/phockup/67/lib/phockup/phockup.py", line 343, in <module>
main(sys.argv[1:])
File "/snap/phockup/67/lib/phockup/phockup.py", line 63, in main
handle_file(os.path.join(root, filename), outputdir, dir_format, move_files)
File "/snap/phockup/67/lib/phockup/phockup.py", line 235, in handle_file
if sha256_checksum(source_file) == sha256_checksum(target_file):
File "/snap/phockup/67/lib/phockup/phockup.py", line 281, in sha256_checksum
with open(filename, 'rb') as f:
PermissionError: [Errno 13] Permission denied: '/media/julius/Backup/Pictures/2017/06_June/20170604-110304687693.jpg'
Do you have an Idea how to solve this issue?
Best
Using the option --original-filenames, my files still get renamed from "IMG_2018..." to "img_2018..."
To fix it, I had to edit the file src/phockup.py and replace the line 186 from:
target_file_name = self.get_file_name(file, date).lower()
to:
if self.original_filenames:
target_file_name = self.get_file_name(file, date)
else:
target_file_name = self.get_file_name(file, date).lower()
When the process strategy is changed to move (-m|--move
) the script should check if the source directory for any remaining file and if there are none the directory should be deleted.
This could be done for each subdirectory after each file move process (could be expensive operation) or at the end of the whole process.
I had a directory with sorted images and I added some others on top of it, so I ran phockup to get the new ones sorted. In the process of sorting the first ones, the tool moved to unknown instead of looking a the filename or other attributes.
/home/wilmar/Pictures/sorted/2017/04/20170415-173104.jpg => sorted_images/unknown/20170415-173104.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-173242.jpg => sorted_images/unknown/20170415-173242.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-173521906057.jpg => sorted_images/2017/04/15/20170415-173521906057.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-173525697619.jpg => sorted_images/2017/04/15/20170415-173525697619.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-173527947603.jpg => sorted_images/2017/04/15/20170415-173527947603.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-173535280797.jpg => sorted_images/2017/04/15/20170415-173535280797.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-173538364085.jpg => sorted_images/2017/04/15/20170415-173538364085.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-173542739040.jpg => sorted_images/2017/04/15/20170415-173542739040.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-173554072463.jpg => sorted_images/2017/04/15/20170415-173554072463.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-173556030465.jpg => sorted_images/2017/04/15/20170415-173556030465.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-173849.jpg => sorted_images/unknown/20170415-173849.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-174019666761-2.jpg => sorted_images/2017/04/15/20170415-174019666761.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-174019666761.jpg => sorted_images/2017/04/15/20170415-174019666761-2.jpg /home/wilmar/Pictures/sorted/2017/04/20170415-203447.mp4 => sorted_images/2017/04/15/20170415-203447.mp4
It could be useful to package this for PyPI, for resources on how to do this see the distutils documentation
Maybe a good one for Hacktoberfest @ivandokov?
There are cases when some cameras do not include the correct EXIF data, but they use filenames with date (and time). There is a guessing code for such filenames (IMG_20160915_123456.jpg / IMG-20160915-123456.jpg) but it is for a single more generic filename and any other different filenames are ignored. Adding an option to pass regex for date and time guessing will be good.
Currently the code does not have any kind of tests but it should.
I've created a separate repository for these tests because they will have some large dummy files and we do not want to include them in the final software.
Would be possible to implement the tools for xmp file (generated for example with darktable) ?
Thanks !
Write STDOUT to log file.
This will be very much useful if any human mistake happened during file copy or move operation.
is it possible to skip files that have not date data and would land in the unknown folder?
I have many files that would not be identifiable upon the moment that they land in an unknown directory and the context of the original path would help me deciding which occasion and date I want to use for a fix.
If the target file exists it should not overwrite it right away.
sha256
checksum should be compared and then if it matches the file should be skipped. Otherwise a new file name should be selected with a suffix. It should also change the xmp file name if exists.
It would be nice if we could choose to only classify the pictures in year/month
For me having all the photos of the same month in one directory is sufficient
It seems to not work when specifying the INPUTDIR to be a mounted directory. And no error message.
The snap version leaves phockup.sh without execute bit set, so you cannot run it.
The link from the readme "https://github.com/ivandokov/phockup/archive/v1.2.2.tar.gz"
no longer exists.
Hello.
It appears that the application can't seem to work on NFS mounted shares. I've tried to take a look into the code and replicating the issue just running within python itself (the path check) but it works there fine, which is odd.
I've also tried for kicks to change the permissions to the deadly 777. This also didn't seem to affect how it works.
Here are some pictures of what's been tried, and the original command.
The mount is as follows:
server.domain.com:/mnt/JoNas_Vol_1/ on /JoNAS type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=ip.ad.dr.ess,mountver=3,mountpor=990,mountproto=udp,locl_locl=none,addr=ip.ad.dre.ss)
I did also try using . as the initial path, and it seemed to work which was also odd, but I wasn't able to get the secondary path to work.
If you require any more information, let me know.
Thanks!
The readme.md document under Installation Windows is the line:
Download exiftool from the official website and extract the archive
However, version 10.56 is obsolete (2017-06-06). The correct link should be https://exiftool.org/ which has current version 11.87 (2020-02-13). You can validate by going to http://www.sno.phy.queensu.ca/~phil/exiftool/ which redirects to https://exiftool.org/.
Pictures/DCIM/136___05/IMG_2395.JPG~RF8ea1f.TMPTraceback (most recent call last):
File "/usr/local/bin/phockup", line 243, in <module>
main(sys.argv[1:])
File "/usr/local/bin/phockup", line 51, in main
handle_file(file, outputdir)
File "/usr/local/bin/phockup", line 178, in handle_file
exif_data = exif(file)
File "/usr/local/bin/phockup", line 66, in exif
data = check_output(['exiftool', file]).decode('UTF-8').strip().split("\\n")[0].split("\n")
File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/usr/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['exiftool', 'Pictures/DCIM/136___05/IMG_2395.JPG~RF8ea1f.TMP']' returned non-zero exit status 1
It will be awesome to have a progress bar at the bottom of the process.
With the current way of printing the process data it could be a tricky one.
Create a snap application - https://snapcraft.io/
The -r|--regex
option is missing from the docs
It would be nice to have an option
iglob
does not support recursive=True
with Python version 3.4 or lower.
Would it be possible to add a flag for this?
Some .THM files (JPEGs) with proper EXIF ended up in unknown
. file
command recognized JPEG. Possibly https://github.com/ahupp/python-magic may help.
$ ./phockup.py ~/data.photos/PhotoDVD23 ~/data.photos/PhotoDVD1_new -m -d YYYY.MM
/home/superuser/data.photos/PhotoDVD23/DSC_5469.JPG => /home/superuser/data.photos/PhotoDVD1_new/2011.07/20110709-15204290.jpg
/home/superuser/data.photos/PhotoDVD23/DSC_5470.JPG => /home/superuser/data.photos/PhotoDVD1_new/2011.07/20110709-15210110.jpg
/home/superuser/data.photos/PhotoDVD23/DSC_5471.NDFTraceback (most recent call last):
File "./phockup.py", line 75, in
main(sys.argv[1:])
File "./phockup.py", line 69, in main
date_regex=date_regex
File "/home/superuser/data.git/phockup.git/src/phockup.py", line 34, in init
self.walk_directory()
File "/home/superuser/data.git/phockup.git/src/phockup.py", line 65, in walk_directory
self.process_file(file)
File "/home/superuser/data.git/phockup.git/src/phockup.py", line 137, in process_file
output, target_file_name, target_file_path = self.get_file_name_and_path(file)
File "/home/superuser/data.git/phockup.git/src/phockup.py", line 176, in get_file_name_and_path
if exif_data and self.is_image_or_video(exif_data['MIMEType']):
KeyError: 'MIMEType'
I installed PhockUp in Ubuntu 16.04 using sudo snap install phockup
. Then I ran it like this:
$ phockup ownCloud/InstantUpload/ InstantUploadSorted/
/snap/phockup/25/command-phockup.wrapper: 6: exec: /snap/phockup/25/phockup.sh: Permission denied
Is it intentional? Do I have to run it with root permission? (I haven't tested if it works with root permission for obvious reasons.) I find it a bit surprising if arranging and copying files in my home directory requires root access. Could you please give a hint, or is this a bug?
The code is using functions and there are a few with a lot of arguments in order to pass some global settings to the function that actually needs the argument. Such example is
def handle_file(source_file, outputdir, dir_format, move_files, date_regex=None):
here.
By refactoring to a class those global arguments could be class properties set by the constructor.
By default phockup
renames the files completely using a date-based pattern. We can just keep the original name, but nothing in between. It might be useful to keep the original part of the file name and add that date prefix. Also it might be useful to change the date format, eg. to avoid the long "timestamp".
A bit of an extension of #55
If we have a regex that accepts hour information, yet has it as optional, phockup will crash.
Assuming the following custom regex:
(?P<day>\d{2})\.(?P<month>\d{2})\.(?P<year>\d{4})[_-]?((?P<hour>\d{2})\.(?P<minute>\d{2})\.(?P<second>\d{2}))?
and the following filename: IMG_27.01.2015.jpg
In date.py
match_dir = matches.groupdict()
will still create keys for hour information, but their values will be None.
match_dir = dict([a, int(x)] for a, x in match_dir.items())
will consequently crash when casting NoneType to Int
Add option to change the format of the year and month when sorting the files.
I have a feature request, but I'd be willing to contribute an implementation if you think this is a good idea: a --dry-run
option, which would just print out the log output, but actually not move any files or change any content on disk.
How would you feel about this?
EDITED for clarity: not move any files
Allow the user to append filenames to the ignore_files
list.
Line 13 in 4bac6db
Should allow the use of a flag, such as -i
, to accept both:
-i filename1.txt file2.docx
or -i filename1.txt -i file2.docx
)-i /path/to/phockupIgnoreList.txt
with contents of phockupIgnoreList.txt
being:
filename1.txt
file2.docx
(The flag may need to differ for these two, such as using -i
for in-line filename exclusions and -n
for passing an ignore file).
This ignore feature should allow extension-level exclusions, such as *.txt
as well as folder level exclusions (particularly for hidden folders), such as ./.hidden/
I just started using phockup. I tried to process a folder of 25k images and it took about 1h to complete, with no much processing power being used. So I wondered how better it could be with multi threading support.
After some coding, I managed to do that. Here are the time
output with a test folder of 36 images and 4 threads:
Original:
~> time phockup test test2 --move
6.43user 0.52system 0:06.77elapsed 102%CPU (0avgtext+0avgdata 17544maxresident)k
Multithread:
~> time phockup test test2 --move --threads 4
9.28user 0.69system 0:03.23elapsed 309%CPU (0avgtext+0avgdata 17596maxresident)k
Results: Elapsed time from 7s to 3s and CPU usage of 3x.
I had never played with threads in python before this, so I'm sure my code is not the best way to do it. I'm just making a point that multithread can improve the performance of phockup, since it relies a lot on exiftool, and it ends up being a performance bottle neck.
What I did: In the walk_directory function I split the files list into subsets, and each subset goes into a thread. The --threads (or -x) parameter defines how many threads are going to be used. I also adjusted the print commands to prevent a race condition on stdout. The patches with the code changes I did are attached.
phockup.py
phockup.py-multithread-patch.txt
src/help.py
src-help.py-multithread-patch.txt
src/phockup.py
src-phockup.py-multithread-patch.txt
Thank you for creating and sharing phockup.
At the moment the code is looking for NEF and JPG files.
It can try to read all files' exif data and act accordingly.
The matched photos will be sorted, the matched videos will be also sorted, unmatched files will be copied to unknown directory.
exiftool returns error for unknown type:
Error: Unknown file type
The name of the software is coming from the combination of the words "photos" and "backup" as this was the main idea behind Phockup but as you pronounce it you get quite mixed feeling about what it does. Initially the software was made for my own usage and I though a funny name won't do any harm, but since we got an article in OMG Ubuntu I think we need a new name. Something that is not age restricted :)
Please give your suggestion for a new name of the software.
PS: I am planing to create a GUI for easier usage and I think the rename of the software will be done when the GUI is completed (version 2).
I installed the package on a arch config via the yaourt
command.
I tried to launch the command phockup
in a bash terminal and got:
ModuleNotFoundError: No module named 'src'
It seems that the python module does not exist:
ls -al /usr/share/phockup/
total 24
drwxr-xr-x 2 root root 4096 30 déc 11:11 .
drwxr-xr-x 262 root root 12288 30 déc 10:59 ..
-rwxr-xr-x 1 root root 2241 30 déc 11:10 phockup.py
How to install correctly the repo on arch?
Thanks for your help !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.