cancel
Showing results for 
Search instead for 
Did you mean: 

Identifying file extensions

f_buirey
Level 3
Partner Accredited

Because many of my customers have unknown datas, I have wrote a DQL Query to identify file extensions which are not listed in filegroup. 

The query is below and is working but I never try this request on a filer greater than1TB. The "WITH" Statement in SQL request used to find the last "." before extension may require a lots of CPU. What do you think of this report ?

FROM path
GET basename, size
IF extension=''
AND type='FILE'
AND substr(basename,'.')=1
SORTBY basename

/* ADVANCED SQL QUERIES

CREATE table unknownExtension (path_rowid INTEGER, extension TEXT, size_MB INTEGER,filenum INTEGER);
WITH recursive reduce(path_rowid,extension,size ) as (
  SELECT path_rowid,substr(basename,instr(basename,".")+1),size FROM path
  UNION ALL
  SELECT path_rowid,SUBSTR(extension,instr(extension,".")+1),size FROM reduce
  WHERE INSTR(extension,".")>0
  )
  INSERT INTO unknownExtension 
  SELECT path_rowid, extension, SUM(size) AS size_MB,COUNT(extension) 
  FROM reduce 
  WHERE INSTR(extension,".")=0
  group by extension
  order by extension

 

 

Frederic Buirey
0 REPLIES 0