Jump to content
Sign in to follow this  
Linyx

TMDb Scraper v. 1.3.3

Recommended Posts

What is TMDb Scraper?It is a scraper for TMDb.org and MediaInfo; it gathers theatrical data and audio/video information for most movie formats. With that said, it has been designed to work specifically with MKVs, containing H.264 video (encoded with x264) and AAC audio (encoded with NeroAacEnc), although recently many improvements have been made for AVI and MP4 containers as well as various other audio/video formats -- and more are on the way.What else can it do?It can generate a generic NFO or very customized NFOs based off of user provided templates. It can generate a generic BBCode table for use with IPB boards (which can, with a little work, serve as a nice frontend to your library) or very customized BBCode tables based off of user provided templates.It can download the poster for the selected movie.It can generate screenshots for most common movie formats (MKV, MP4, AVI, etc.).Screenshots:Title Searching:b90abc37da8dfcb109f4f0a66ef7942ccfad25fd4ac0ce492d3bf3b9032948362g.jpgMain:a9716f10a3ddb6b86263d98e6e9178b9536cc6d1c5dd060688e11c02ea8a89ac2g.jpgPoster:278ef6299e4e8e80bb6c18e36b36afce74c7c844ee6301c7600e0954bc01beb02g.jpgScreenshots:346e71ae6b29125460cd688cfacc693218c539f761537df7881af58b1faa773b2g.jpgTMDb Scraper is still in alpha, expect bugs!License is GPLv2.Programming language is Visual Basic .NET (with Visual Studio 2010).Requirements:.NET framework version 3.5 or greater.Pentium 4 or newer CPU.Download the latest source:Version 1.3.3 Source Codeor the latest build:Version 1.3.3 Full Package

Share this post


Link to post
Share on other sites

Please create a new thread with any questions or comments. Replies to this thread will be deleted.

This thread will contain fairly detailed explanations of most of the functions in the TMDb Scraper function library (Library.vb) with the hope of helping beginner programmers to better understand Visual Basic .NET and fundamental programming concepts.Prerequisites.Visual Studio 2010 (if you are planning to compile and use the code)A very basic understanding of programming concepts (what are arrays, strings, etc.)A basic understanding of Visual Basic .NET SyntaxA large portion of the functions, subs, etc. in the following code is from the TMDb Scraper library (although they may make their way into this thread, they obviously will not be accessible on MSDN).CleanNameThis function attempts to clean the filename used for searching for the title.

1	Public Function CleanName(ByVal FileName As String)2		FileName = Path.GetFileName(FileName).Replace(Path.GetExtension(FileName), "")3		Return RemoveTags(RemoveTags(FileName, "(", ")"), "[", "]").ToString.Trim4	End Function

Timing: 0.0135 milliseconds per call.Before we begin, let's look at the declaration: Public, this allows the function to be called from anywhere within the project directly -- "CleanName()" -- as opposed to using "Library.CleanName". The next part, "Function", indicates that it must have a return value (whereas Subs will not). CleanName is pretty self explanatory, it is the name of the function (and has absolutely no effect on anything). The last piece "ByVal FileName As String", indicates that it will have one required argument -- the FileName, which must be a string. ByVal is essential in this code, for ByRef -- the "other" type of argument -- allows the function to change the calling code. It would mess up the global VideoFile variable breaking many functions if ByRef were used... Remember this!On to the code, to examine this function we will use the filename: "C:Gladiator (870p) [Extended Edition].mkv". But In line 2 we use the System.IO.Path functions to get "Gladiator (870p) [Extended Edition]" out of the entire path. What it does is first gets the full filename "Gladiator (870p) [Extended Edition].mkv", which then becomes "Gladiator (870p) [Extended Edition]" after the extension ".mkv" is replaced with nothing... This could lead to problems if a filename contained the extension in more than one place -- although this cannot AFAIK happen in the English language -- the alternative fix would be to use .SubString to remove the last x amount of characters based on the Length of Path.GetExtension.In line 3 we nest a whole bunch of stuff which basically does the following:Removes anything in parenthesis (RemoveTags is obviously a "custom" function, soon to be posted), thus giving us "Gladiator [Extended Edition]". Next anything in brackets is removed, giving "Gladiator ". That is then trimmed via .Trim to remove all leading and trailing spaces, giving us a return value of "Gladiator".About RemoveTags:The "RemoveTags" function (which takes the Title As String, Opening As Char, and Closing As Char arguments) returns the Title minus the string in parenthesis or brackets. It removes the Opening and Closing characters and everything in between them (as a string, thus why we can safely Return that value back to the calling code in CleanName).

Share this post


Link to post
Share on other sites

CheckSubsThis function takes the FileName and the SubEncoding (as reported by MediaInfo for the VideoFile, aka: FileName) and checks:1) If there are subtitles muxed in with the video.2) If not, is there an external subtitle file -- if so, what encoding does it contain.

1    Public Function CheckSubs(ByRef SubEncoding As String, ByVal VideoFile As String)2        Dim Subs As String = "No"3        If SubEncoding <> Nothing Then4            Subs = "Yes"5        Else6            For Each File As String In My.Computer.FileSystem.GetFiles(Path.GetDirectoryName(VideoFile))7                If File.Contains(Path.GetFileName(VideoFile).Replace(Path.GetExtension(VideoFile), "")) Then8                    If File.Contains(".srt") = True Or File.Contains(".idx") = True Or _                        File.Contains(".sup") = True Or File.Contains(".ssa") = True Then9                        Dim MI As New MediaInfo10                        MI.Open(File)11                        SubEncoding = MI.Get_(StreamKind.Text, 0, "Format")12                        Subs = "Yes (as a separate file)"13                        Exit For14                    End If15                End If16            Next17        End If18        Return Subs19    End Function

Timing: .0003232 milliseconds per call with muxed subtitles; 7.7 milliseconds per call with external subtitles (3 files in directory).The first thing you should notice is the declaration: we used ByRef! (More on this later).First off, we declare a string called "Subs" and set it to "No" by default -- this is our Return Value. Next, if the SubEncoding variable (the subtitle encoding in the VideoFile) contains anything, then obviously we do have subtitles -- set Subs to "Yes" and return it leaving SubEncoding unchanged.However, if the video does not contain subtitles (SubEncoding is null), then we get the directory of VideoFile and parse through every file in that directory to see if any are named the same as VideoFile, but have a subtitle extension. If one of those files is a subtitle, we scan it with MediaInfo to get its encoding and set SubEncoding (remember, ByRef allows the called code to change the calling code) to whatever the external subtitle's encoding is. At this point we set Subs to "Yes (as an separate file)" and exit the loop with an "Exit For" (because there is no reason to continue scanning files if we found the right one already) -- step through the End Ifs and return Subs.

Share this post


Link to post
Share on other sites

RemoveTagsThis function removes those pesky "tags" people put on their movies -- sure it is nice to know if it is an [Extended Edition] or not, but TMDb (or IMDb, or anything else for that matter) doesn't care what version you have for lookup purposes. All that is to say, we need a clean title -- and this function assists us in our quest to get it.

1    Private Function RemoveTags(ByVal Title As String, ByVal Opening As Char, ByVal Closing As Char)2        If Title.Contains(Opening) = True And Title.Contains(Closing) = True Then3            Title = Title.Replace(Title.Substring(Title.IndexOf(Opening), _             Title.IndexOf(Closing) - Title.IndexOf(Opening) + 1), "")4        End If5        Return Title6    End Function

Timing: 0.00098 milliseconds per call.Let's get started. Declarations are all pretty obvious, except we take two arguments as Char (single character only), although it wouldn't take much to take strings instead. Title shall be "V For Vendetta (720p)", Opening is "(", Closing is ")".In line two, we simply check to make sure both characters are actually in the string before removing them (an exception could still happen if a title contained something like "V For Vendetta )720p(" -- this does need to be, and will be, fixed).In line 3, all the magic happens. The string in the tag is "(720p)" -- we get this via the SubString and specifying the Start Index and Length parameters as indexes of the Opening and Closing characters; replace that with nothing ("") and return "V For Vendetta " (remember that Trimming is handled by the caller). Fairly simple and quite fast.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

×